From: "Bowman, Terry" <terry.bowman@amd.com>
To: Dave Jiang <dave.jiang@intel.com>,
dave@stgolabs.net, jonathan.cameron@huawei.com,
alison.schofield@intel.com, dan.j.williams@intel.com,
bhelgaas@google.com, shiju.jose@huawei.com, ming.li@zohomail.com,
Smita.KoralahalliChannabasappa@amd.com, rrichter@amd.com,
dan.carpenter@linaro.org, PradeepVineshReddy.Kodamati@amd.com,
lukas@wunner.de, Benjamin.Cheatham@amd.com,
sathyanarayanan.kuppuswamy@linux.intel.com,
linux-cxl@vger.kernel.org, alucerop@amd.com, ira.weiny@intel.com
Cc: linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org
Subject: Re: [RESEND v13 20/25] CXL/PCI: Introduce CXL Port protocol error handlers
Date: Tue, 4 Nov 2025 15:27:44 -0600 [thread overview]
Message-ID: <f09df618-987e-4051-b5a2-fd9d2cef18e2@amd.com> (raw)
In-Reply-To: <5ed52253-a74d-4643-bdb6-a8d4852a9be7@intel.com>
On 11/4/2025 3:20 PM, Dave Jiang wrote:
>
> On 11/4/25 10:03 AM, Terry Bowman wrote:
>> Add CXL protocol error handlers for CXL Port devices (Root Ports,
>> Downstream Ports, and Upstream Ports). Implement cxl_port_cor_error_detected()
>> and cxl_port_error_detected() to handle correctable and uncorrectable errors
>> respectively.
>>
>> Introduce cxl_get_ras_base() to retrieve the cached RAS register base
>> address for a given CXL port. This function supports CXL Root Ports,
>> Downstream Ports, and Upstream Ports by returning their previously mapped
>> RAS register addresses.
>>
>> Add device lock assertions to protect against concurrent device or RAS
>> register removal during error handling. The port error handlers require
>> two device locks:
>>
>> 1. The port's CXL parent device - RAS registers are mapped using devm_*
>> functions with the parent port as the host. Locking the parent prevents
>> the RAS registers from being unmapped during error handling.
>>
>> 2. The PCI device (pdev->dev) - Locking prevents concurrent modifications
>> to the PCI device structure during error handling.
>>
>> The lock assertions added here will be satisfied by device locks introduced
>> in a subsequent patch.
>>
>> Introduce get_pci_cxl_host_dev() to return the device responsible for
>> managing the RAS register mapping. This function increments the reference
>> count on the host device to prevent premature resource release during error
>> handling. The caller is responsible for decrementing the reference count.
>> For CXL endpoints, which manage resources without a separate host device,
>> this function returns NULL.
>>
>> Update the AER driver's is_cxl_error() to recognize CXL Port devices in
>> addition to CXL Endpoints, as both now have CXL-specific error handlers.
>>
>> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
>>
>> ---
>>
>> Changes in v12->v13:
>> - Move get_pci_cxl_host_dev() and cxl_handle_proto_error() to Dequeue
>> patch (Terry)
>> - Remove EP case in cxl_get_ras_base(), not used. (Terry)
>> - Remove check for dport->dport_dev (Dave)
>> - Remove whitespace (Terry)
>>
>> Changes in v11->v12:
>> - Add call to cxl_pci_drv_bound() in cxl_handle_proto_error() and
>> pci_to_cxl_dev()
>> - Change cxl_error_detected() -> cxl_cor_error_detected()
>> - Remove NULL variable assignments
>> - Replace bus_find_device() with find_cxl_port_by_uport() for upstream
>> port searches.
>>
>> Changes in v10->v11:
>> - None
>> ---
>> drivers/cxl/core/core.h | 10 +++++++
>> drivers/cxl/core/port.c | 7 ++---
>> drivers/cxl/core/ras.c | 49 +++++++++++++++++++++++++++++++++++
>> drivers/pci/pcie/aer_cxl_vh.c | 5 +++-
>> 4 files changed, 67 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
>> index b2c0ccd6803f..046ec65ed147 100644
>> --- a/drivers/cxl/core/core.h
>> +++ b/drivers/cxl/core/core.h
>> @@ -157,6 +157,8 @@ void cxl_cor_error_detected(struct device *dev);
>> pci_ers_result_t pci_error_detected(struct pci_dev *pdev,
>> pci_channel_state_t error);
>> void pci_cor_error_detected(struct pci_dev *pdev);
>> +pci_ers_result_t cxl_port_error_detected(struct device *dev);
>> +void cxl_port_cor_error_detected(struct device *dev);
>> #else
>> static inline int cxl_ras_init(void)
>> {
>> @@ -176,6 +178,11 @@ static inline pci_ers_result_t pci_error_detected(struct pci_dev *pdev,
>> return PCI_ERS_RESULT_NONE;
>> }
>> static inline void pci_cor_error_detected(struct pci_dev *pdev) { }
>> +static inline void cxl_port_cor_error_detected(struct device *dev) { }
>> +static inline pci_ers_result_t cxl_port_error_detected(struct device *dev)
>> +{
>> + return PCI_ERS_RESULT_NONE;
>> +}
>> #endif /* CONFIG_CXL_RAS */
>>
>> /* Restricted CXL Host specific RAS functions */
>> @@ -190,6 +197,9 @@ static inline void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds) { }
>> #endif /* CONFIG_CXL_RCH_RAS */
>>
>> int cxl_gpf_port_setup(struct cxl_dport *dport);
>> +struct cxl_port *find_cxl_port(struct device *dport_dev,
>> + struct cxl_dport **dport);
>> +struct cxl_port *find_cxl_port_by_uport(struct device *uport_dev);
>>
>> struct cxl_hdm;
>> int cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm,
>> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
>> index b70e1b505b5c..d060f864cf2e 100644
>> --- a/drivers/cxl/core/port.c
>> +++ b/drivers/cxl/core/port.c
>> @@ -1360,8 +1360,8 @@ static struct cxl_port *__find_cxl_port(struct cxl_find_port_ctx *ctx)
>> return NULL;
>> }
>>
>> -static struct cxl_port *find_cxl_port(struct device *dport_dev,
>> - struct cxl_dport **dport)
>> +struct cxl_port *find_cxl_port(struct device *dport_dev,
>> + struct cxl_dport **dport)
>> {
>> struct cxl_find_port_ctx ctx = {
>> .dport_dev = dport_dev,
>> @@ -1564,7 +1564,7 @@ static int match_port_by_uport(struct device *dev, const void *data)
>> * Function takes a device reference on the port device. Caller should do a
>> * put_device() when done.
>> */
>> -static struct cxl_port *find_cxl_port_by_uport(struct device *uport_dev)
>> +struct cxl_port *find_cxl_port_by_uport(struct device *uport_dev)
>> {
>> struct device *dev;
>>
>> @@ -1573,6 +1573,7 @@ static struct cxl_port *find_cxl_port_by_uport(struct device *uport_dev)
>> return to_cxl_port(dev);
>> return NULL;
>> }
>> +EXPORT_SYMBOL_NS_GPL(find_cxl_port_by_uport, "CXL");
>>
>> static int update_decoder_targets(struct device *dev, void *data)
>> {
>> diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
>> index beb142054bda..142ca8794107 100644
>> --- a/drivers/cxl/core/ras.c
>> +++ b/drivers/cxl/core/ras.c
>> @@ -145,6 +145,39 @@ static void cxl_dport_map_ras(struct cxl_dport *dport)
>> dev_dbg(dev, "Failed to map RAS capability.\n");
>> }
>>
>> +static void __iomem *cxl_get_ras_base(struct device *dev)
>> +{
>> + struct pci_dev *pdev = to_pci_dev(dev);
>> +
>> + switch (pci_pcie_type(pdev)) {
>> + case PCI_EXP_TYPE_ROOT_PORT:
>> + case PCI_EXP_TYPE_DOWNSTREAM:
>> + {
>> + struct cxl_dport *dport;
>> + struct cxl_port *port __free(put_cxl_port) = find_cxl_port(&pdev->dev, &dport);
>> +
>> + if (!dport) {
>> + pci_err(pdev, "Failed to find the CXL device");
>> + return NULL;
>> + }
>> + return dport->regs.ras;
> The RAS MMIO mapping is done via devm_cxl_iomap_block() and is a devres against the device. Without holding the device lock, the port driver can unbind and the address mapping may go away in the middle or before cxl_handle_cor_ras()/cxl_handle_ras() being called. I think you'll have to hold the port lock here and make sure that the port driver is bound before reading the RAS register? I think the dport ras should be covered under the port umbrella.
>
>> + }
>> + case PCI_EXP_TYPE_UPSTREAM:
>> + {
>> + struct cxl_port *port __free(put_cxl_port) = find_cxl_port_by_uport(&pdev->dev);
>> +
>> + if (!port) {
>> + pci_err(pdev, "Failed to find the CXL device");
>> + return NULL;
>> + }
>> + return port->uport_regs.ras;
> same here
>
> DJ> + }
The cxl_port parent of the reported devices are locked previously. Locking is added in the CE case in the next patch.
and the UCE locking is in patch23. Locking logic is all made ASAP after after dequeueing.
Terry
>> + }
>> +
>> + dev_warn_once(dev, "Error: Unsupported device type (%X)", pci_pcie_type(pdev));
>> + return NULL;
>> +}
>> +
>> /**
>> * cxl_dport_init_ras_reporting - Setup CXL RAS report on this dport
>> * @dport: the cxl_dport that needs to be initialized
>> @@ -254,6 +287,22 @@ pci_ers_result_t cxl_handle_ras(struct device *dev, u64 serial, void __iomem *ra
>> return PCI_ERS_RESULT_PANIC;
>> }
>>
>> +void cxl_port_cor_error_detected(struct device *dev)
>> +{
>> + void __iomem *ras_base = cxl_get_ras_base(dev);
>> +
>> + cxl_handle_cor_ras(dev, 0, ras_base);
>> +}
>> +EXPORT_SYMBOL_NS_GPL(cxl_port_cor_error_detected, "CXL");
>> +
>> +pci_ers_result_t cxl_port_error_detected(struct device *dev)
>> +{
>> + void __iomem *ras_base = cxl_get_ras_base(dev);
>> +
>> + return cxl_handle_ras(dev, 0, ras_base);
>> +}
>> +EXPORT_SYMBOL_NS_GPL(cxl_port_error_detected, "CXL");
>> +
>> void cxl_cor_error_detected(struct device *dev)
>> {
>> struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
>> diff --git a/drivers/pci/pcie/aer_cxl_vh.c b/drivers/pci/pcie/aer_cxl_vh.c
>> index 5dbc81341dc4..25f9512b57f7 100644
>> --- a/drivers/pci/pcie/aer_cxl_vh.c
>> +++ b/drivers/pci/pcie/aer_cxl_vh.c
>> @@ -43,7 +43,10 @@ bool is_cxl_error(struct pci_dev *pdev, struct aer_err_info *info)
>> if (!info || !info->is_cxl)
>> return false;
>>
>> - if (pci_pcie_type(pdev) != PCI_EXP_TYPE_ENDPOINT)
>> + if ((pci_pcie_type(pdev) != PCI_EXP_TYPE_ENDPOINT) &&
>> + (pci_pcie_type(pdev) != PCI_EXP_TYPE_ROOT_PORT) &&
>> + (pci_pcie_type(pdev) != PCI_EXP_TYPE_UPSTREAM) &&
>> + (pci_pcie_type(pdev) != PCI_EXP_TYPE_DOWNSTREAM))
>> return false;
>>
>> return is_internal_error(info);
next prev parent reply other threads:[~2025-11-04 21:27 UTC|newest]
Thread overview: 103+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-04 17:02 [RESEND v13 00/25] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
2025-11-04 17:02 ` [RESEND v13 01/25] CXL/PCI: Move CXL DVSEC definitions into uapi/linux/pci_regs.h Terry Bowman
2025-11-04 17:50 ` Jonathan Cameron
2025-11-19 3:19 ` dan.j.williams
2025-12-08 18:04 ` Bjorn Helgaas
2025-12-08 22:13 ` Bowman, Terry
2025-11-04 17:02 ` [RESEND v13 02/25] PCI/CXL: Introduce pcie_is_cxl() Terry Bowman
2025-11-04 17:52 ` Jonathan Cameron
2025-11-19 3:19 ` dan.j.williams
2025-11-19 15:55 ` Bowman, Terry
2025-11-19 23:34 ` dan.j.williams
2025-11-21 20:31 ` Gregory Price
2025-11-04 17:02 ` [RESEND v13 03/25] cxl/pci: Remove unnecessary CXL Endpoint handling helper functions Terry Bowman
2025-11-04 17:53 ` Jonathan Cameron
2025-11-19 3:20 ` dan.j.williams
2025-11-04 17:02 ` [RESEND v13 04/25] cxl/pci: Remove unnecessary CXL RCH " Terry Bowman
2025-11-19 3:20 ` dan.j.williams
2025-11-04 17:02 ` [RESEND v13 05/25] cxl: Remove CXL VH handling in CONFIG_PCIEAER_CXL conditional blocks from core/pci.c Terry Bowman
2025-11-19 3:20 ` dan.j.williams
2025-11-04 17:02 ` [RESEND v13 06/25] cxl: Move CXL driver's RCH error handling into core/ras_rch.c Terry Bowman
2025-11-04 18:03 ` Jonathan Cameron
2025-11-19 3:20 ` dan.j.williams
2025-11-19 16:07 ` Bowman, Terry
2025-11-04 17:02 ` [RESEND v13 07/25] CXL/AER: Replace device_lock() in cxl_rch_handle_error_iter() with guard() lock Terry Bowman
2025-11-04 18:05 ` Jonathan Cameron
2025-11-04 19:53 ` Dave Jiang
2025-11-19 3:20 ` dan.j.williams
2025-11-04 17:02 ` [RESEND v13 08/25] CXL/AER: Move AER drivers RCH error handling into pcie/aer_cxl_rch.c Terry Bowman
2025-11-19 3:20 ` dan.j.williams
2025-11-19 8:26 ` Lukas Wunner
2025-11-19 23:36 ` dan.j.williams
2025-11-04 17:02 ` [RESEND v13 09/25] PCI/AER: Report CXL or PCIe bus error type in trace logging Terry Bowman
2025-11-04 18:08 ` Jonathan Cameron
2025-11-04 18:26 ` Bjorn Helgaas
2025-11-04 17:02 ` [RESEND v13 10/25] cxl/pci: Update RAS handler interfaces to also support CXL Ports Terry Bowman
2025-11-04 18:10 ` Jonathan Cameron
2025-11-11 8:17 ` Alison Schofield
2025-11-19 3:19 ` dan.j.williams
2025-11-04 17:02 ` [RESEND v13 11/25] cxl/pci: Log message if RAS registers are unmapped Terry Bowman
2025-11-19 3:27 ` dan.j.williams
2025-11-04 17:02 ` [RESEND v13 12/25] cxl/pci: Unify CXL trace logging for CXL Endpoints and CXL Ports Terry Bowman
2025-11-19 21:23 ` dan.j.williams
2025-11-19 22:02 ` Bowman, Terry
2025-11-19 23:40 ` dan.j.williams
2025-11-21 14:56 ` Bowman, Terry
2025-11-04 17:02 ` [RESEND v13 13/25] cxl/pci: Update cxl_handle_cor_ras() to return early if no RAS errors Terry Bowman
2025-11-05 8:30 ` Alejandro Lucero Palau
2025-11-19 22:00 ` dan.j.williams
2025-11-04 17:02 ` [RESEND v13 14/25] cxl/pci: Map CXL Endpoint Port and CXL Switch Port RAS registers Terry Bowman
2025-11-04 18:15 ` Jonathan Cameron
2025-11-04 20:03 ` Dave Jiang
2025-11-11 8:23 ` Alison Schofield
2025-11-04 17:02 ` [RESEND v13 15/25] CXL/PCI: Introduce PCI_ERS_RESULT_PANIC Terry Bowman
2025-11-04 19:03 ` Bjorn Helgaas
2025-11-20 0:17 ` dan.j.williams
2025-11-04 17:02 ` [RESEND v13 16/25] CXL/AER: Introduce pcie/aer_cxl_vh.c in AER driver for forwarding CXL errors Terry Bowman
2025-11-20 0:44 ` dan.j.williams
2025-11-20 0:53 ` dan.j.williams
2025-11-04 17:02 ` [RESEND v13 17/25] cxl: Introduce cxl_pci_drv_bound() to check for bound driver Terry Bowman
2025-11-05 17:51 ` Gregory Price
2025-11-05 19:03 ` Gregory Price
2025-11-05 22:26 ` Gregory Price
2025-11-06 17:11 ` Gregory Price
2025-11-06 23:32 ` Bowman, Terry
2025-11-11 8:33 ` Alison Schofield
2025-11-13 21:42 ` Alison Schofield
2025-11-13 22:39 ` Bowman, Terry
2025-11-20 1:24 ` dan.j.williams
2025-11-04 17:02 ` [RESEND v13 18/25] cxl: Change CXL handlers to use guard() instead of scoped_guard() Terry Bowman
2025-11-04 18:18 ` Jonathan Cameron
2025-11-04 20:15 ` Dave Jiang
2025-11-04 17:02 ` [RESEND v13 19/25] cxl/pci: Introduce CXL protocol error handlers for Endpoints Terry Bowman
2025-11-04 18:29 ` Jonathan Cameron
2025-11-04 19:09 ` Bjorn Helgaas
2025-11-04 17:03 ` [RESEND v13 20/25] CXL/PCI: Introduce CXL Port protocol error handlers Terry Bowman
2025-11-04 18:32 ` Jonathan Cameron
2025-11-04 21:20 ` Dave Jiang
2025-11-04 21:27 ` Bowman, Terry [this message]
2025-11-04 23:39 ` Dave Jiang
2025-11-04 17:03 ` [RESEND v13 21/25] PCI/AER: Dequeue forwarded CXL error Terry Bowman
2025-11-04 18:40 ` Jonathan Cameron
2025-11-04 18:45 ` Bjorn Helgaas
2025-11-20 3:33 ` dan.j.williams
2025-11-04 17:03 ` [RESEND v13 22/25] CXL/PCI: Export and rename merge_result() to pci_ers_merge_result() Terry Bowman
2025-11-04 18:41 ` Jonathan Cameron
2025-11-04 19:03 ` Bjorn Helgaas
2025-11-14 15:20 ` Bowman, Terry
2025-11-14 16:09 ` Jonathan Cameron
2025-11-04 17:03 ` [RESEND v13 23/25] CXL/PCI: Introduce CXL uncorrectable protocol error recovery Terry Bowman
2025-11-04 18:47 ` Jonathan Cameron
2025-11-04 23:43 ` Dave Jiang
2025-11-05 14:59 ` Bowman, Terry
2025-11-05 16:10 ` Dave Jiang
2025-11-11 8:37 ` Alison Schofield
2025-12-08 18:40 ` Bjorn Helgaas
2025-11-04 17:03 ` [RESEND v13 24/25] CXL/PCI: Enable CXL protocol errors during CXL Port probe Terry Bowman
2025-11-04 17:03 ` [RESEND v13 25/25] CXL/PCI: Disable CXL protocol error interrupts during CXL Port cleanup Terry Bowman
2025-11-20 3:10 ` dan.j.williams
2025-11-04 19:11 ` [RESEND v13 00/25] Enable CXL PCIe Port Protocol Error handling and logging Bjorn Helgaas
2025-11-04 21:54 ` Bowman, Terry
2025-11-04 22:12 ` Bjorn Helgaas
2025-12-04 17:30 ` Bowman, Terry
2025-12-08 18:42 ` Bjorn Helgaas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f09df618-987e-4051-b5a2-fd9d2cef18e2@amd.com \
--to=terry.bowman@amd.com \
--cc=Benjamin.Cheatham@amd.com \
--cc=PradeepVineshReddy.Kodamati@amd.com \
--cc=Smita.KoralahalliChannabasappa@amd.com \
--cc=alison.schofield@intel.com \
--cc=alucerop@amd.com \
--cc=bhelgaas@google.com \
--cc=dan.carpenter@linaro.org \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=dave@stgolabs.net \
--cc=ira.weiny@intel.com \
--cc=jonathan.cameron@huawei.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=lukas@wunner.de \
--cc=ming.li@zohomail.com \
--cc=rrichter@amd.com \
--cc=sathyanarayanan.kuppuswamy@linux.intel.com \
--cc=shiju.jose@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox