From: Dave Jiang <dave.jiang@intel.com>
To: Terry Bowman <terry.bowman@amd.com>,
dave@stgolabs.net, jic23@kernel.org, alison.schofield@intel.com,
djbw@kernel.org, bhelgaas@google.com, shiju.jose@huawei.com,
ming.li@zohomail.com, Smita.KoralahalliChannabasappa@amd.com,
rrichter@amd.com, dan.carpenter@linaro.org,
PradeepVineshReddy.Kodamati@amd.com, lukas@wunner.de,
Benjamin.Cheatham@amd.com,
sathyanarayanan.kuppuswamy@linux.intel.com,
vishal.l.verma@intel.com, alucerop@amd.com, ira.weiny@intel.com,
corbet@lwn.net, rafael@kernel.org, xueshuai@linux.alibaba.com,
linux-cxl@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
linux-acpi@vger.kernel.org, linux-doc@vger.kernel.org
Subject: Re: [PATCH v17 07/11] PCI/CXL: Add RCH support to CXL handlers
Date: Tue, 5 May 2026 16:59:58 -0700 [thread overview]
Message-ID: <ba97bbfc-2fc3-49d2-be6a-9384a4095c2c@intel.com> (raw)
In-Reply-To: <20260505173029.2718246-8-terry.bowman@amd.com>
On 5/5/26 10:30 AM, Terry Bowman wrote:
> Restricted CXL Host (RCH) error handling is a separate path from the
> new CXL Port error handling flow. Fold RCH error handling into the
> Port flow so both share a common entry point.
>
> Update cxl_rch_handle_error_iter() to forward RCH protocol errors
> through the AER-CXL kfifo.
>
> Update cxl_handle_proto_error() to dispatch RCH errors via
> cxl_handle_rdport_errors(). cxl_handle_rdport_errors() handles both
> correctable and uncorrectable RCH protocol errors.
>
> Behavior change: an RCD uncorrectable CXL RAS error now panics via
> cxl_do_recovery(). Before this patch the RCH path returned
> PCI_ERS_RESULT_NEED_RESET via cxl_pci's err_handler. After this patch
> the same condition panics. This matches the panic policy added in the
> common CXL Port protocol error flow. CXL.cachemem traffic cannot be
> safely recovered from an uncorrectable protocol error in software.
>
> Change cxl_handle_rdport_errors() to take a PCI device instead of a
> CXL device state, matching the new caller context. The error trace events
> emitted from this path now report device=<PCI BDF> instead of device=<memN>,
> matching the rest of the unified CXL trace events. Userspace consumers keyed
> off the memdev name need to map the PCI BDF back to a memdev.
>
> Include the RCD Endpoint serial number in RCH log messages so the RCH
> can be associated with its RCD.
>
> Remove the cxlds->rcd check from cxl_cor_error_detected() and
> cxl_error_detected(). RCH errors are now forwarded by
> cxl_rch_handle_error_iter() through the AER-CXL kfifo to
> cxl_handle_proto_error(), so cxl_pci's err_handler no longer sees
> them.
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
>
> ---
>
> Changes in v16->v17:
> - Drop now-dead cxlds->rcd branches from cxl_{cor_,}error_detected().
> - Drop duplicate subject line from commit body.
> - Document panic-on-uncorrectable behavior change for RCD path.
> - Document trace event device-name change (memN -> PCI BDF) for RCH path.
> - Rewrite cxl_handle_proto_error() RC_END comment to clarify RCD/RCH shared
> interrupt relationship
> - Rewrite commit message
>
> Changes in v16:
> - New commit
> ---
> drivers/cxl/core/core.h | 4 ++--
> drivers/cxl/core/ras.c | 14 +++++++++-----
> drivers/cxl/core/ras_rch.c | 8 +++-----
> drivers/pci/pcie/aer_cxl_rch.c | 17 +----------------
> 4 files changed, 15 insertions(+), 28 deletions(-)
>
> diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
> index bc36cd1575a4..2c7387506dfb 100644
> --- a/drivers/cxl/core/core.h
> +++ b/drivers/cxl/core/core.h
> @@ -188,7 +188,7 @@ void cxl_handle_cor_ras(struct device *dev, u64 serial,
> void __iomem *ras_base);
> void cxl_dport_map_rch_aer(struct cxl_dport *dport);
> void cxl_disable_rch_root_ints(struct cxl_dport *dport);
> -void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds);
> +void cxl_handle_rdport_errors(struct pci_dev *pdev);
> void devm_cxl_dport_ras_setup(struct cxl_dport *dport);
> #else
> static inline int cxl_ras_init(void)
> @@ -205,7 +205,7 @@ static inline void cxl_handle_cor_ras(struct device *dev, u64 serial,
> void __iomem *ras_base) { }
> static inline void cxl_dport_map_rch_aer(struct cxl_dport *dport) { }
> static inline void cxl_disable_rch_root_ints(struct cxl_dport *dport) { }
> -static inline void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds) { }
> +static inline void cxl_handle_rdport_errors(struct pci_dev *pdev) { }
> static inline void devm_cxl_dport_ras_setup(struct cxl_dport *dport) { }
> #endif /* CONFIG_CXL_RAS */
>
> diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
> index 0a552d5a236e..1f1dd20623f6 100644
> --- a/drivers/cxl/core/ras.c
> +++ b/drivers/cxl/core/ras.c
> @@ -267,9 +267,6 @@ void cxl_cor_error_detected(struct pci_dev *pdev)
> return;
> }
>
> - if (cxlds->rcd)
> - cxl_handle_rdport_errors(cxlds);
> -
> cxl_handle_cor_ras(&cxlds->cxlmd->dev, pci_get_dsn(pdev),
> cxlmd->endpoint->regs.ras);
> }
> @@ -292,8 +289,6 @@ pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
> return PCI_ERS_RESULT_DISCONNECT;
> }
>
> - if (cxlds->rcd)
> - cxl_handle_rdport_errors(cxlds);
> /*
> * A frozen channel indicates an impending reset which is fatal to
> * CXL.mem operation, and will likely crash the system. On the off
> @@ -329,6 +324,15 @@ EXPORT_SYMBOL_NS_GPL(cxl_error_detected, "CXL");
> static void cxl_handle_proto_error(struct pci_dev *pdev, struct cxl_port *port,
> struct cxl_dport *dport, int severity)
> {
> + /*
> + * An RC_END device is an RCD (Restricted CXL Device). Its AER
> + * interrupt is shared with the RCH Downstream Port, so handle RCH
> + * Downstream Port protocol errors first before processing the RCD's
> + * own errors. See CXL spec r3.1 s12.2.
> + */
> + if (pci_pcie_type(pdev) == PCI_EXP_TYPE_RC_END)
May as well use is_cxl_restricted(pdev).
DJ
> + cxl_handle_rdport_errors(pdev);
> +
> if (severity == AER_CORRECTABLE) {
> cxl_handle_cor_ras(&pdev->dev, pci_get_dsn(pdev),
> to_ras_base(port, dport));
> diff --git a/drivers/cxl/core/ras_rch.c b/drivers/cxl/core/ras_rch.c
> index 61835fbafc0f..cbd02cabefbc 100644
> --- a/drivers/cxl/core/ras_rch.c
> +++ b/drivers/cxl/core/ras_rch.c
> @@ -1,7 +1,6 @@
> // SPDX-License-Identifier: GPL-2.0-only
> /* Copyright(c) 2025 AMD Corporation. All rights reserved. */
>
> -#include <linux/types.h>
> #include <linux/aer.h>
> #include "cxl.h"
> #include "core.h"
> @@ -95,9 +94,8 @@ static bool cxl_rch_get_aer_severity(struct aer_capability_regs *aer_regs,
> return false;
> }
>
> -void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds)
> +void cxl_handle_rdport_errors(struct pci_dev *pdev)
> {
> - struct pci_dev *pdev = to_pci_dev(cxlds->dev);
> struct aer_capability_regs aer_regs;
> struct cxl_dport *dport;
> int severity;
> @@ -115,9 +113,9 @@ void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds)
>
> pci_print_aer(pdev, severity, &aer_regs);
> if (severity == AER_CORRECTABLE)
> - cxl_handle_cor_ras(&cxlds->cxlmd->dev, pci_get_dsn(pdev),
> + cxl_handle_cor_ras(&pdev->dev, pci_get_dsn(pdev),
> dport->regs.ras);
> else
> - cxl_handle_ras(&cxlds->cxlmd->dev, pci_get_dsn(pdev),
> + cxl_handle_ras(&pdev->dev, pci_get_dsn(pdev),
> dport->regs.ras);
> }
> diff --git a/drivers/pci/pcie/aer_cxl_rch.c b/drivers/pci/pcie/aer_cxl_rch.c
> index e471eefec9c4..83142eac0cab 100644
> --- a/drivers/pci/pcie/aer_cxl_rch.c
> +++ b/drivers/pci/pcie/aer_cxl_rch.c
> @@ -37,26 +37,11 @@ static bool cxl_error_is_native(struct pci_dev *dev)
> static int cxl_rch_handle_error_iter(struct pci_dev *dev, void *data)
> {
> struct aer_err_info *info = (struct aer_err_info *)data;
> - const struct pci_error_handlers *err_handler;
>
> if (!is_cxl_mem_dev(dev) || !cxl_error_is_native(dev))
> return 0;
>
> - guard(device)(&dev->dev);
> -
> - err_handler = dev->driver ? dev->driver->err_handler : NULL;
> - if (!err_handler)
> - return 0;
> -
> - if (info->severity == AER_CORRECTABLE) {
> - if (err_handler->cor_error_detected)
> - err_handler->cor_error_detected(dev);
> - } else if (err_handler->error_detected) {
> - if (info->severity == AER_NONFATAL)
> - err_handler->error_detected(dev, pci_channel_io_normal);
> - else if (info->severity == AER_FATAL)
> - err_handler->error_detected(dev, pci_channel_io_frozen);
> - }
> + cxl_forward_error(dev, info);
> return 0;
> }
>
next prev parent reply other threads:[~2026-05-06 0:00 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-05 17:30 [PATCH v17 00/11] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
2026-05-05 17:30 ` [PATCH v17 01/11] PCI/AER: Introduce AER-CXL Kfifo Terry Bowman
2026-05-05 20:26 ` sashiko-bot
2026-05-05 21:17 ` Dave Jiang
2026-05-07 17:53 ` Jonathan Cameron
2026-05-07 18:26 ` Bowman, Terry
2026-05-05 17:30 ` [PATCH v17 02/11] cxl/ras: Unify Endpoint and Port AER trace events Terry Bowman
2026-05-05 21:07 ` sashiko-bot
2026-05-05 21:46 ` Dave Jiang
2026-05-07 18:08 ` Jonathan Cameron
2026-05-07 18:33 ` Bowman, Terry
2026-05-08 14:05 ` Jonathan Cameron
2026-05-09 3:49 ` Dan Williams (nvidia)
2026-05-11 12:51 ` Bowman, Terry
2026-05-11 23:28 ` Dan Williams (nvidia)
2026-05-05 17:30 ` [PATCH v17 03/11] cxl: Use common CPER handling for all CXL devices Terry Bowman
2026-05-05 21:30 ` sashiko-bot
2026-05-05 22:02 ` Dave Jiang
2026-05-05 17:30 ` [PATCH v17 04/11] cxl: Rename find_cxl_port() to find_cxl_port_by_dport() Terry Bowman
2026-05-05 22:06 ` Dave Jiang
2026-05-07 18:11 ` Jonathan Cameron
2026-05-05 17:30 ` [PATCH v17 05/11] cxl: Limit CXL-CPER kfifo registration functions scope Terry Bowman
2026-05-05 21:52 ` sashiko-bot
2026-05-05 22:16 ` Dave Jiang
2026-05-07 18:14 ` Jonathan Cameron
2026-05-05 17:30 ` [PATCH v17 06/11] PCI: Establish common CXL Port protocol error flow Terry Bowman
2026-05-05 22:28 ` sashiko-bot
2026-05-07 18:22 ` Jonathan Cameron
2026-05-05 17:30 ` [PATCH v17 07/11] PCI/CXL: Add RCH support to CXL handlers Terry Bowman
2026-05-05 23:34 ` sashiko-bot
2026-05-05 23:59 ` Dave Jiang [this message]
2026-05-05 17:30 ` [PATCH v17 08/11] cxl: Remove Endpoint AER correctable handler Terry Bowman
2026-05-05 17:30 ` [PATCH v17 09/11] cxl: Update Endpoint AER uncorrectable handler Terry Bowman
2026-05-06 17:43 ` Dave Jiang
2026-05-07 18:25 ` Jonathan Cameron
2026-05-05 17:30 ` [PATCH v17 10/11] PCI/CXL: Mask/Unmask CXL protocol errors Terry Bowman
2026-05-06 1:01 ` sashiko-bot
2026-05-06 18:00 ` Dave Jiang
2026-05-11 21:04 ` Bowman, Terry
2026-05-11 22:36 ` Dave Jiang
2026-05-07 18:29 ` Jonathan Cameron
2026-05-05 17:30 ` [PATCH v17 11/11] Documentation: cxl: Document CXL protocol error handling Terry Bowman
2026-05-06 18:34 ` Dave Jiang
2026-05-07 18:51 ` Jonathan Cameron
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ba97bbfc-2fc3-49d2-be6a-9384a4095c2c@intel.com \
--to=dave.jiang@intel.com \
--cc=Benjamin.Cheatham@amd.com \
--cc=PradeepVineshReddy.Kodamati@amd.com \
--cc=Smita.KoralahalliChannabasappa@amd.com \
--cc=alison.schofield@intel.com \
--cc=alucerop@amd.com \
--cc=bhelgaas@google.com \
--cc=corbet@lwn.net \
--cc=dan.carpenter@linaro.org \
--cc=dave@stgolabs.net \
--cc=djbw@kernel.org \
--cc=ira.weiny@intel.com \
--cc=jic23@kernel.org \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=lukas@wunner.de \
--cc=ming.li@zohomail.com \
--cc=rafael@kernel.org \
--cc=rrichter@amd.com \
--cc=sathyanarayanan.kuppuswamy@linux.intel.com \
--cc=shiju.jose@huawei.com \
--cc=terry.bowman@amd.com \
--cc=vishal.l.verma@intel.com \
--cc=xueshuai@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.