From: Dave Jiang <dave.jiang@intel.com>
To: Terry Bowman <terry.bowman@amd.com>,
dave@stgolabs.net, jic23@kernel.org, alison.schofield@intel.com,
djbw@kernel.org, bhelgaas@google.com, shiju.jose@huawei.com,
ming.li@zohomail.com, Smita.KoralahalliChannabasappa@amd.com,
rrichter@amd.com, dan.carpenter@linaro.org,
PradeepVineshReddy.Kodamati@amd.com, lukas@wunner.de,
Benjamin.Cheatham@amd.com,
sathyanarayanan.kuppuswamy@linux.intel.com,
vishal.l.verma@intel.com, alucerop@amd.com, ira.weiny@intel.com,
corbet@lwn.net, rafael@kernel.org, xueshuai@linux.alibaba.com,
linux-cxl@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
linux-acpi@vger.kernel.org, linux-doc@vger.kernel.org
Subject: Re: [PATCH v17 10/11] PCI/CXL: Mask/Unmask CXL protocol errors
Date: Wed, 6 May 2026 11:00:57 -0700 [thread overview]
Message-ID: <38155e50-c0c0-4f51-9777-243f0dd049ca@intel.com> (raw)
In-Reply-To: <20260505173029.2718246-11-terry.bowman@amd.com>
On 5/5/26 10:30 AM, Terry Bowman wrote:
> CXL protocol errors are not enabled for all CXL devices after boot. They
> must be enabled in order to process CXL protocol errors. Provide matching
> teardown helpers so the masks are restored when a CXL Port or Downstream
> Port goes away.
>
> Add pci_aer_mask_internal_errors() as the symmetric counterpart to
> pci_aer_unmask_internal_errors() and export both for the cxl_core module.
>
> Introduce cxl_unmask_proto_interrupts() and cxl_mask_proto_interrupts()
> in cxl_core to wrap the PCI helpers with the dev_is_pci() and
> pcie_aer_is_native() gating CXL needs. Both helpers tolerate a NULL
> @dev so teardown callers do not have to special-case it.
>
> Wire cxl_unmask_proto_interrupts() into the success path of
> cxl_dport_map_ras() and devm_cxl_port_ras_setup() so the unmask only
> runs when the RAS register block was actually mapped. Pair each unmask
> with a devm_add_action_or_reset() registration of
> cxl_mask_proto_interrupts() scoped to the cxl_port device. The mask is
> then restored when the cxl_port device releases its devres. This
> applies to Endpoints, Upstream Switch Ports, Downstream Switch Ports,
> and Root Ports.
>
> Co-developed-by: Dan Williams <djbw@kernel.org>
> Signed-off-by: Dan Williams <djbw@kernel.org>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
I do wonder if we should save the original mask values and write those back rather than blindly remask everything when we are done.
>
> ---
>
> Changes in v16->v17:
> - Drop redundant cxl_mask_proto_interrupts() calls from unregister_port()
> and cxl_dport_remove(); the devres action registered alongside the unmask
> is the sole mask path.
> - Update title
> - Remove unnecessary check for aer_capabilities
> - Gate cxl_unmask_proto_interrupts() on pcie_aer_is_native()
> - Add pci_aer_mask_internal_errors() and cxl_mask_proto_interrupts()
> - Only unmask on successful cxl_map_component_regs()
> - NULL-check @dev in cxl_{un,}mask_proto_interrupts()
> - Drop static and declare in core/core.h
>
> Change in v15 -> v16:
> - None
>
> Change in v14 -> v15:
> - None
>
> Changes in v13->v14:
> - Update commit title's prefix (Bjorn)
>
> Changes in v12->v13:
> - Add dev and dev_is_pci() NULL checks in cxl_unmask_proto_interrupts() (Terry)
> - Add Dave Jiang's and Ben's review-by
>
> Changes in v11->v12:
> - None
> ---
> drivers/cxl/core/core.h | 4 +++
> drivers/cxl/core/ras.c | 63 ++++++++++++++++++++++++++++++++++++++---
> drivers/pci/pcie/aer.c | 25 ++++++++++++++++
> include/linux/aer.h | 2 ++
> 4 files changed, 90 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
> index 2c7387506dfb..ff39985d363f 100644
> --- a/drivers/cxl/core/core.h
> +++ b/drivers/cxl/core/core.h
> @@ -190,6 +190,8 @@ void cxl_dport_map_rch_aer(struct cxl_dport *dport);
> void cxl_disable_rch_root_ints(struct cxl_dport *dport);
> void cxl_handle_rdport_errors(struct pci_dev *pdev);
> void devm_cxl_dport_ras_setup(struct cxl_dport *dport);
> +void cxl_unmask_proto_interrupts(struct device *dev);
> +void cxl_mask_proto_interrupts(struct device *dev);
> #else
> static inline int cxl_ras_init(void)
> {
> @@ -207,6 +209,8 @@ static inline void cxl_dport_map_rch_aer(struct cxl_dport *dport) { }
> static inline void cxl_disable_rch_root_ints(struct cxl_dport *dport) { }
> static inline void cxl_handle_rdport_errors(struct pci_dev *pdev) { }
> static inline void devm_cxl_dport_ras_setup(struct cxl_dport *dport) { }
> +static inline void cxl_unmask_proto_interrupts(struct device *dev) { }
> +static inline void cxl_mask_proto_interrupts(struct device *dev) { }
> #endif /* CONFIG_CXL_RAS */
>
> int cxl_gpf_port_setup(struct cxl_dport *dport);
> diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
> index a98ce0f412ad..b45e2b539b5f 100644
> --- a/drivers/cxl/core/ras.c
> +++ b/drivers/cxl/core/ras.c
> @@ -66,16 +66,59 @@ static void cxl_cper_prot_err_work_fn(struct work_struct *work)
> }
> static DECLARE_WORK(cxl_cper_prot_err_work, cxl_cper_prot_err_work_fn);
>
> +void cxl_unmask_proto_interrupts(struct device *dev)
> +{
> + struct pci_dev *pdev;
> +
> + if (!dev || !dev_is_pci(dev))
> + return;
> +
> + pdev = to_pci_dev(dev);
> + if (!pcie_aer_is_native(pdev))
> + return;
> +
> + pci_aer_unmask_internal_errors(pdev);
> +}
> +
> +void cxl_mask_proto_interrupts(struct device *dev)
> +{
> + struct pci_dev *pdev;
> +
> + if (!dev || !dev_is_pci(dev))
> + return;
> +
> + pdev = to_pci_dev(dev);
> + if (!pcie_aer_is_native(pdev))
> + return;
> +
> + pci_aer_mask_internal_errors(pdev);
> +}
> +
> +static void cxl_mask_proto_irqs(void *dev)
> +{
> + cxl_mask_proto_interrupts(dev);
> +}
> +
> static void cxl_dport_map_ras(struct cxl_dport *dport)
> {
> struct cxl_register_map *map = &dport->reg_map;
> struct device *dev = dport->dport_dev;
>
> - if (!map->component_map.ras.valid)
> + if (!map->component_map.ras.valid) {
> dev_dbg(dev, "RAS registers not found\n");
> - else if (cxl_map_component_regs(map, &dport->regs.component,
> - BIT(CXL_CM_CAP_CAP_ID_RAS)))
> + return;
> + }
> +
> + if (cxl_map_component_regs(map, &dport->regs.component,
> + BIT(CXL_CM_CAP_CAP_ID_RAS))) {
> dev_dbg(dev, "Failed to map RAS capability.\n");
> + return;
> + }
> +
> + cxl_unmask_proto_interrupts(dev);
> + if (devm_add_action_or_reset(dport_to_host(dport),
> + cxl_mask_proto_irqs, dev))
> + dev_warn(dev, "failed to register CXL proto-irq mask cleanup\n");
> }
>
> /**
> @@ -109,6 +152,7 @@ EXPORT_SYMBOL_NS_GPL(devm_cxl_dport_rch_ras_setup, "CXL");
> void devm_cxl_port_ras_setup(struct cxl_port *port)
> {
> struct cxl_register_map *map = &port->reg_map;
> + struct device *dev;
>
> if (!map->component_map.ras.valid) {
> dev_dbg(&port->dev, "RAS registers not found\n");
> @@ -117,8 +161,19 @@ void devm_cxl_port_ras_setup(struct cxl_port *port)
>
> map->host = &port->dev;
> if (cxl_map_component_regs(map, &port->regs,
> - BIT(CXL_CM_CAP_CAP_ID_RAS)))
> + BIT(CXL_CM_CAP_CAP_ID_RAS))) {
> dev_dbg(&port->dev, "Failed to map RAS capability\n");
> + return;
> + }
> +
> + dev = is_cxl_endpoint(port) ? port->uport_dev->parent : port->uport_dev;
> + if (!dev_is_pci(dev))
> + return;
> +
> + cxl_unmask_proto_interrupts(dev);
> + if (devm_add_action_or_reset(&port->dev, cxl_mask_proto_irqs, dev))
> + dev_warn(&port->dev,
> + "Failed to register CXL proto-irq mask cleanup\n");
> }
> EXPORT_SYMBOL_NS_GPL(devm_cxl_port_ras_setup, "CXL");
>
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index b9c6c7b97217..eaa36fe0eb31 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -1151,6 +1151,31 @@ void pci_aer_unmask_internal_errors(struct pci_dev *dev)
> */
> EXPORT_SYMBOL_FOR_MODULES(pci_aer_unmask_internal_errors, "cxl_core");
>
> +/**
> + * pci_aer_mask_internal_errors - mask internal errors
> + * @dev: pointer to the pci_dev data structure
> + *
> + * Mask internal errors in the Uncorrectable and Correctable Error
> + * Mask registers.
> + *
> + * Note: AER must be enabled and supported by the device which must be
> + * checked in advance, e.g. with pcie_aer_is_native().
> + */
> +void pci_aer_mask_internal_errors(struct pci_dev *dev)
> +{
> + int aer = dev->aer_cap;
> + u32 mask;
> +
> + pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_MASK, &mask);
> + mask |= PCI_ERR_UNC_INTN;
> + pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_MASK, mask);
> +
> + pci_read_config_dword(dev, aer + PCI_ERR_COR_MASK, &mask);
> + mask |= PCI_ERR_COR_INTERNAL;
> + pci_write_config_dword(dev, aer + PCI_ERR_COR_MASK, mask);
> +}
> +EXPORT_SYMBOL_FOR_MODULES(pci_aer_mask_internal_errors, "cxl_core");
> +
> /**
> * pci_aer_handle_error - handle logging error into an event log
> * @dev: pointer to pci_dev data structure of error source device
> diff --git a/include/linux/aer.h b/include/linux/aer.h
> index 979ed2f9fd38..c52db62d4c7e 100644
> --- a/include/linux/aer.h
> +++ b/include/linux/aer.h
> @@ -71,6 +71,7 @@ int pci_aer_clear_nonfatal_status(struct pci_dev *dev);
> void pci_aer_clear_fatal_status(struct pci_dev *dev);
> int pcie_aer_is_native(struct pci_dev *dev);
> void pci_aer_unmask_internal_errors(struct pci_dev *dev);
> +void pci_aer_mask_internal_errors(struct pci_dev *dev);
> #else
> static inline int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
> {
> @@ -79,6 +80,7 @@ static inline int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
> static inline void pci_aer_clear_fatal_status(struct pci_dev *dev) { }
> static inline int pcie_aer_is_native(struct pci_dev *dev) { return 0; }
> static inline void pci_aer_unmask_internal_errors(struct pci_dev *dev) { }
> +static inline void pci_aer_mask_internal_errors(struct pci_dev *dev) { }
> #endif
>
> #ifdef CONFIG_CXL_RAS
next prev parent reply other threads:[~2026-05-06 18:01 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-05 17:30 [PATCH v17 00/11] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
2026-05-05 17:30 ` [PATCH v17 01/11] PCI/AER: Introduce AER-CXL Kfifo Terry Bowman
2026-05-05 21:17 ` Dave Jiang
2026-05-05 17:30 ` [PATCH v17 02/11] cxl/ras: Unify Endpoint and Port AER trace events Terry Bowman
2026-05-05 21:46 ` Dave Jiang
2026-05-05 17:30 ` [PATCH v17 03/11] cxl: Use common CPER handling for all CXL devices Terry Bowman
2026-05-05 22:02 ` Dave Jiang
2026-05-05 17:30 ` [PATCH v17 04/11] cxl: Rename find_cxl_port() to find_cxl_port_by_dport() Terry Bowman
2026-05-05 22:06 ` Dave Jiang
2026-05-05 17:30 ` [PATCH v17 05/11] cxl: Limit CXL-CPER kfifo registration functions scope Terry Bowman
2026-05-05 22:16 ` Dave Jiang
2026-05-05 17:30 ` [PATCH v17 06/11] PCI: Establish common CXL Port protocol error flow Terry Bowman
2026-05-05 17:30 ` [PATCH v17 07/11] PCI/CXL: Add RCH support to CXL handlers Terry Bowman
2026-05-05 23:59 ` Dave Jiang
2026-05-05 17:30 ` [PATCH v17 08/11] cxl: Remove Endpoint AER correctable handler Terry Bowman
2026-05-05 17:30 ` [PATCH v17 09/11] cxl: Update Endpoint AER uncorrectable handler Terry Bowman
2026-05-06 17:43 ` Dave Jiang
2026-05-05 17:30 ` [PATCH v17 10/11] PCI/CXL: Mask/Unmask CXL protocol errors Terry Bowman
2026-05-06 18:00 ` Dave Jiang [this message]
2026-05-05 17:30 ` [PATCH v17 11/11] Documentation: cxl: Document CXL protocol error handling Terry Bowman
2026-05-06 18:34 ` Dave Jiang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=38155e50-c0c0-4f51-9777-243f0dd049ca@intel.com \
--to=dave.jiang@intel.com \
--cc=Benjamin.Cheatham@amd.com \
--cc=PradeepVineshReddy.Kodamati@amd.com \
--cc=Smita.KoralahalliChannabasappa@amd.com \
--cc=alison.schofield@intel.com \
--cc=alucerop@amd.com \
--cc=bhelgaas@google.com \
--cc=corbet@lwn.net \
--cc=dan.carpenter@linaro.org \
--cc=dave@stgolabs.net \
--cc=djbw@kernel.org \
--cc=ira.weiny@intel.com \
--cc=jic23@kernel.org \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=lukas@wunner.de \
--cc=ming.li@zohomail.com \
--cc=rafael@kernel.org \
--cc=rrichter@amd.com \
--cc=sathyanarayanan.kuppuswamy@linux.intel.com \
--cc=shiju.jose@huawei.com \
--cc=terry.bowman@amd.com \
--cc=vishal.l.verma@intel.com \
--cc=xueshuai@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox