From: Dave Jiang <dave.jiang@intel.com>
To: "Bowman, Terry" <terry.bowman@amd.com>,
dave@stgolabs.net, jic23@kernel.org, alison.schofield@intel.com,
djbw@kernel.org, bhelgaas@google.com, shiju.jose@huawei.com,
ming.li@zohomail.com, Smita.KoralahalliChannabasappa@amd.com,
rrichter@amd.com, dan.carpenter@linaro.org,
PradeepVineshReddy.Kodamati@amd.com, lukas@wunner.de,
Benjamin.Cheatham@amd.com,
sathyanarayanan.kuppuswamy@linux.intel.com,
vishal.l.verma@intel.com, alucerop@amd.com, ira.weiny@intel.com,
corbet@lwn.net, rafael@kernel.org, xueshuai@linux.alibaba.com,
linux-cxl@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
linux-acpi@vger.kernel.org, linux-doc@vger.kernel.org
Subject: Re: [PATCH v17 10/11] PCI/CXL: Mask/Unmask CXL protocol errors
Date: Mon, 11 May 2026 15:36:58 -0700 [thread overview]
Message-ID: <ce03e27a-f82f-40f8-b185-d80c74ce15ed@intel.com> (raw)
In-Reply-To: <8ab3325b-175f-4664-a046-6f4b1d472816@amd.com>
On 5/11/26 2:04 PM, Bowman, Terry wrote:
> On 5/6/2026 1:00 PM, Dave Jiang wrote:
>>
>>
>> On 5/5/26 10:30 AM, Terry Bowman wrote:
>>> CXL protocol errors are not enabled for all CXL devices after boot. They
>>> must be enabled in order to process CXL protocol errors. Provide matching
>>> teardown helpers so the masks are restored when a CXL Port or Downstream
>>> Port goes away.
>>>
>>> Add pci_aer_mask_internal_errors() as the symmetric counterpart to
>>> pci_aer_unmask_internal_errors() and export both for the cxl_core module.
>>>
>>> Introduce cxl_unmask_proto_interrupts() and cxl_mask_proto_interrupts()
>>> in cxl_core to wrap the PCI helpers with the dev_is_pci() and
>>> pcie_aer_is_native() gating CXL needs. Both helpers tolerate a NULL
>>> @dev so teardown callers do not have to special-case it.
>>>
>>> Wire cxl_unmask_proto_interrupts() into the success path of
>>> cxl_dport_map_ras() and devm_cxl_port_ras_setup() so the unmask only
>>> runs when the RAS register block was actually mapped. Pair each unmask
>>> with a devm_add_action_or_reset() registration of
>>> cxl_mask_proto_interrupts() scoped to the cxl_port device. The mask is
>>> then restored when the cxl_port device releases its devres. This
>>> applies to Endpoints, Upstream Switch Ports, Downstream Switch Ports,
>>> and Root Ports.
>>>
>>> Co-developed-by: Dan Williams <djbw@kernel.org>
>>> Signed-off-by: Dan Williams <djbw@kernel.org>
>>> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
>>
>> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
>>
>> I do wonder if we should save the original mask values and write those back rather than blindly remask everything when we are done.
>>
>>
>
> Hi Dave,
>
> This is only masking/unmasking the internal error bit. The other mask bits are not
> modified.
Hi Terry,
My point is we lost the original register values for internal error on exit. We probably should restore whatever was in the register before we unmasked. Currently it comes to us with everything masked. But maybe in the future in may not be and we don't want to go chasing that side effect.
DJ
>
> - Terry
>
>
>>>
>>> ---
>>>
>>> Changes in v16->v17:
>>> - Drop redundant cxl_mask_proto_interrupts() calls from unregister_port()
>>> and cxl_dport_remove(); the devres action registered alongside the unmask
>>> is the sole mask path.
>>> - Update title
>>> - Remove unnecessary check for aer_capabilities
>>> - Gate cxl_unmask_proto_interrupts() on pcie_aer_is_native()
>>> - Add pci_aer_mask_internal_errors() and cxl_mask_proto_interrupts()
>>> - Only unmask on successful cxl_map_component_regs()
>>> - NULL-check @dev in cxl_{un,}mask_proto_interrupts()
>>> - Drop static and declare in core/core.h
>>>
>>> Change in v15 -> v16:
>>> - None
>>>
>>> Change in v14 -> v15:
>>> - None
>>>
>>> Changes in v13->v14:
>>> - Update commit title's prefix (Bjorn)
>>>
>>> Changes in v12->v13:
>>> - Add dev and dev_is_pci() NULL checks in cxl_unmask_proto_interrupts() (Terry)
>>> - Add Dave Jiang's and Ben's review-by
>>>
>>> Changes in v11->v12:
>>> - None
>>> ---
>>> drivers/cxl/core/core.h | 4 +++
>>> drivers/cxl/core/ras.c | 63 ++++++++++++++++++++++++++++++++++++++---
>>> drivers/pci/pcie/aer.c | 25 ++++++++++++++++
>>> include/linux/aer.h | 2 ++
>>> 4 files changed, 90 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
>>> index 2c7387506dfb..ff39985d363f 100644
>>> --- a/drivers/cxl/core/core.h
>>> +++ b/drivers/cxl/core/core.h
>>> @@ -190,6 +190,8 @@ void cxl_dport_map_rch_aer(struct cxl_dport *dport);
>>> void cxl_disable_rch_root_ints(struct cxl_dport *dport);
>>> void cxl_handle_rdport_errors(struct pci_dev *pdev);
>>> void devm_cxl_dport_ras_setup(struct cxl_dport *dport);
>>> +void cxl_unmask_proto_interrupts(struct device *dev);
>>> +void cxl_mask_proto_interrupts(struct device *dev);
>>> #else
>>> static inline int cxl_ras_init(void)
>>> {
>>> @@ -207,6 +209,8 @@ static inline void cxl_dport_map_rch_aer(struct cxl_dport *dport) { }
>>> static inline void cxl_disable_rch_root_ints(struct cxl_dport *dport) { }
>>> static inline void cxl_handle_rdport_errors(struct pci_dev *pdev) { }
>>> static inline void devm_cxl_dport_ras_setup(struct cxl_dport *dport) { }
>>> +static inline void cxl_unmask_proto_interrupts(struct device *dev) { }
>>> +static inline void cxl_mask_proto_interrupts(struct device *dev) { }
>>> #endif /* CONFIG_CXL_RAS */
>>>
>>> int cxl_gpf_port_setup(struct cxl_dport *dport);
>>> diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
>>> index a98ce0f412ad..b45e2b539b5f 100644
>>> --- a/drivers/cxl/core/ras.c
>>> +++ b/drivers/cxl/core/ras.c
>>> @@ -66,16 +66,59 @@ static void cxl_cper_prot_err_work_fn(struct work_struct *work)
>>> }
>>> static DECLARE_WORK(cxl_cper_prot_err_work, cxl_cper_prot_err_work_fn);
>>>
>>> +void cxl_unmask_proto_interrupts(struct device *dev)
>>> +{
>>> + struct pci_dev *pdev;
>>> +
>>> + if (!dev || !dev_is_pci(dev))
>>> + return;
>>> +
>>> + pdev = to_pci_dev(dev);
>>> + if (!pcie_aer_is_native(pdev))
>>> + return;
>>> +
>>> + pci_aer_unmask_internal_errors(pdev);
>>> +}
>>> +
>>> +void cxl_mask_proto_interrupts(struct device *dev)
>>> +{
>>> + struct pci_dev *pdev;
>>> +
>>> + if (!dev || !dev_is_pci(dev))
>>> + return;
>>> +
>>> + pdev = to_pci_dev(dev);
>>> + if (!pcie_aer_is_native(pdev))
>>> + return;
>>> +
>>> + pci_aer_mask_internal_errors(pdev);
>>> +}
>>> +
>>> +static void cxl_mask_proto_irqs(void *dev)
>>> +{
>>> + cxl_mask_proto_interrupts(dev);
>>> +}
>>> +
>>> static void cxl_dport_map_ras(struct cxl_dport *dport)
>>> {
>>> struct cxl_register_map *map = &dport->reg_map;
>>> struct device *dev = dport->dport_dev;
>>>
>>> - if (!map->component_map.ras.valid)
>>> + if (!map->component_map.ras.valid) {
>>> dev_dbg(dev, "RAS registers not found\n");
>>> - else if (cxl_map_component_regs(map, &dport->regs.component,
>>> - BIT(CXL_CM_CAP_CAP_ID_RAS)))
>>> + return;
>>> + }
>>> +
>>> + if (cxl_map_component_regs(map, &dport->regs.component,
>>> + BIT(CXL_CM_CAP_CAP_ID_RAS))) {
>>> dev_dbg(dev, "Failed to map RAS capability.\n");
>>> + return;
>>> + }
>>> +
>>> + cxl_unmask_proto_interrupts(dev);
>>> + if (devm_add_action_or_reset(dport_to_host(dport),
>>> + cxl_mask_proto_irqs, dev))
>>> + dev_warn(dev, "failed to register CXL proto-irq mask cleanup\n");
>>> }
>>>
>>> /**
>>> @@ -109,6 +152,7 @@ EXPORT_SYMBOL_NS_GPL(devm_cxl_dport_rch_ras_setup, "CXL");
>>> void devm_cxl_port_ras_setup(struct cxl_port *port)
>>> {
>>> struct cxl_register_map *map = &port->reg_map;
>>> + struct device *dev;
>>>
>>> if (!map->component_map.ras.valid) {
>>> dev_dbg(&port->dev, "RAS registers not found\n");
>>> @@ -117,8 +161,19 @@ void devm_cxl_port_ras_setup(struct cxl_port *port)
>>>
>>> map->host = &port->dev;
>>> if (cxl_map_component_regs(map, &port->regs,
>>> - BIT(CXL_CM_CAP_CAP_ID_RAS)))
>>> + BIT(CXL_CM_CAP_CAP_ID_RAS))) {
>>> dev_dbg(&port->dev, "Failed to map RAS capability\n");
>>> + return;
>>> + }
>>> +
>>> + dev = is_cxl_endpoint(port) ? port->uport_dev->parent : port->uport_dev;
>>> + if (!dev_is_pci(dev))
>>> + return;
>>> +
>>> + cxl_unmask_proto_interrupts(dev);
>>> + if (devm_add_action_or_reset(&port->dev, cxl_mask_proto_irqs, dev))
>>> + dev_warn(&port->dev,
>>> + "Failed to register CXL proto-irq mask cleanup\n");
>>> }
>>> EXPORT_SYMBOL_NS_GPL(devm_cxl_port_ras_setup, "CXL");
>>>
>>> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
>>> index b9c6c7b97217..eaa36fe0eb31 100644
>>> --- a/drivers/pci/pcie/aer.c
>>> +++ b/drivers/pci/pcie/aer.c
>>> @@ -1151,6 +1151,31 @@ void pci_aer_unmask_internal_errors(struct pci_dev *dev)
>>> */
>>> EXPORT_SYMBOL_FOR_MODULES(pci_aer_unmask_internal_errors, "cxl_core");
>>>
>>> +/**
>>> + * pci_aer_mask_internal_errors - mask internal errors
>>> + * @dev: pointer to the pci_dev data structure
>>> + *
>>> + * Mask internal errors in the Uncorrectable and Correctable Error
>>> + * Mask registers.
>>> + *
>>> + * Note: AER must be enabled and supported by the device which must be
>>> + * checked in advance, e.g. with pcie_aer_is_native().
>>> + */
>>> +void pci_aer_mask_internal_errors(struct pci_dev *dev)
>>> +{
>>> + int aer = dev->aer_cap;
>>> + u32 mask;
>>> +
>>> + pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_MASK, &mask);
>>> + mask |= PCI_ERR_UNC_INTN;
>>> + pci_write_config_dword(dev, aer + PCI_ERR_UNCOR_MASK, mask);
>>> +
>>> + pci_read_config_dword(dev, aer + PCI_ERR_COR_MASK, &mask);
>>> + mask |= PCI_ERR_COR_INTERNAL;
>>> + pci_write_config_dword(dev, aer + PCI_ERR_COR_MASK, mask);
>>> +}
>>> +EXPORT_SYMBOL_FOR_MODULES(pci_aer_mask_internal_errors, "cxl_core");
>>> +
>>> /**
>>> * pci_aer_handle_error - handle logging error into an event log
>>> * @dev: pointer to pci_dev data structure of error source device
>>> diff --git a/include/linux/aer.h b/include/linux/aer.h
>>> index 979ed2f9fd38..c52db62d4c7e 100644
>>> --- a/include/linux/aer.h
>>> +++ b/include/linux/aer.h
>>> @@ -71,6 +71,7 @@ int pci_aer_clear_nonfatal_status(struct pci_dev *dev);
>>> void pci_aer_clear_fatal_status(struct pci_dev *dev);
>>> int pcie_aer_is_native(struct pci_dev *dev);
>>> void pci_aer_unmask_internal_errors(struct pci_dev *dev);
>>> +void pci_aer_mask_internal_errors(struct pci_dev *dev);
>>> #else
>>> static inline int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
>>> {
>>> @@ -79,6 +80,7 @@ static inline int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
>>> static inline void pci_aer_clear_fatal_status(struct pci_dev *dev) { }
>>> static inline int pcie_aer_is_native(struct pci_dev *dev) { return 0; }
>>> static inline void pci_aer_unmask_internal_errors(struct pci_dev *dev) { }
>>> +static inline void pci_aer_mask_internal_errors(struct pci_dev *dev) { }
>>> #endif
>>>
>>> #ifdef CONFIG_CXL_RAS
>>
>
next prev parent reply other threads:[~2026-05-11 22:37 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-05 17:30 [PATCH v17 00/11] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
2026-05-05 17:30 ` [PATCH v17 01/11] PCI/AER: Introduce AER-CXL Kfifo Terry Bowman
2026-05-05 21:17 ` Dave Jiang
2026-05-07 17:53 ` Jonathan Cameron
2026-05-07 18:26 ` Bowman, Terry
2026-05-05 17:30 ` [PATCH v17 02/11] cxl/ras: Unify Endpoint and Port AER trace events Terry Bowman
2026-05-05 21:46 ` Dave Jiang
2026-05-07 18:08 ` Jonathan Cameron
2026-05-07 18:33 ` Bowman, Terry
2026-05-08 14:05 ` Jonathan Cameron
2026-05-09 3:49 ` Dan Williams (nvidia)
2026-05-11 12:51 ` Bowman, Terry
2026-05-11 23:28 ` Dan Williams (nvidia)
2026-05-05 17:30 ` [PATCH v17 03/11] cxl: Use common CPER handling for all CXL devices Terry Bowman
2026-05-05 22:02 ` Dave Jiang
2026-05-05 17:30 ` [PATCH v17 04/11] cxl: Rename find_cxl_port() to find_cxl_port_by_dport() Terry Bowman
2026-05-05 22:06 ` Dave Jiang
2026-05-07 18:11 ` Jonathan Cameron
2026-05-05 17:30 ` [PATCH v17 05/11] cxl: Limit CXL-CPER kfifo registration functions scope Terry Bowman
2026-05-05 22:16 ` Dave Jiang
2026-05-07 18:14 ` Jonathan Cameron
2026-05-05 17:30 ` [PATCH v17 06/11] PCI: Establish common CXL Port protocol error flow Terry Bowman
2026-05-07 18:22 ` Jonathan Cameron
2026-05-05 17:30 ` [PATCH v17 07/11] PCI/CXL: Add RCH support to CXL handlers Terry Bowman
2026-05-05 23:59 ` Dave Jiang
2026-05-05 17:30 ` [PATCH v17 08/11] cxl: Remove Endpoint AER correctable handler Terry Bowman
2026-05-05 17:30 ` [PATCH v17 09/11] cxl: Update Endpoint AER uncorrectable handler Terry Bowman
2026-05-06 17:43 ` Dave Jiang
2026-05-07 18:25 ` Jonathan Cameron
2026-05-05 17:30 ` [PATCH v17 10/11] PCI/CXL: Mask/Unmask CXL protocol errors Terry Bowman
2026-05-06 18:00 ` Dave Jiang
2026-05-11 21:04 ` Bowman, Terry
2026-05-11 22:36 ` Dave Jiang [this message]
2026-05-07 18:29 ` Jonathan Cameron
2026-05-05 17:30 ` [PATCH v17 11/11] Documentation: cxl: Document CXL protocol error handling Terry Bowman
2026-05-06 18:34 ` Dave Jiang
2026-05-07 18:51 ` Jonathan Cameron
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ce03e27a-f82f-40f8-b185-d80c74ce15ed@intel.com \
--to=dave.jiang@intel.com \
--cc=Benjamin.Cheatham@amd.com \
--cc=PradeepVineshReddy.Kodamati@amd.com \
--cc=Smita.KoralahalliChannabasappa@amd.com \
--cc=alison.schofield@intel.com \
--cc=alucerop@amd.com \
--cc=bhelgaas@google.com \
--cc=corbet@lwn.net \
--cc=dan.carpenter@linaro.org \
--cc=dave@stgolabs.net \
--cc=djbw@kernel.org \
--cc=ira.weiny@intel.com \
--cc=jic23@kernel.org \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=lukas@wunner.de \
--cc=ming.li@zohomail.com \
--cc=rafael@kernel.org \
--cc=rrichter@amd.com \
--cc=sathyanarayanan.kuppuswamy@linux.intel.com \
--cc=shiju.jose@huawei.com \
--cc=terry.bowman@amd.com \
--cc=vishal.l.verma@intel.com \
--cc=xueshuai@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox