* [RFC PATCH v2 0/2] PCI/CXL: Add RDPAS support for CXL.io
@ 2026-06-18 17:07 Dave Jiang
2026-06-18 17:07 ` [RFC PATCH v2 1/2] PCI/CXL: Add RDPAS parsing support Dave Jiang
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Dave Jiang @ 2026-06-18 17:07 UTC (permalink / raw)
To: linux-cxl, linux-pci; +Cc: terry.bowman, bhelgaas, jic23, djbw
v2:
- Added multiple DSP per RCEC support.
- Added boundary checks for reading MMIO
- Addressed issues raised by shashiko
- See individual patches for detailed changes
The series add RCEC Downstream Port Assocation Structure (RDPAS) parsing
support to CXL.io. RDPAS is an ACPI table that is part of the CXL Early
Discovery Table (CEDT) defined in CXL specification r4.0 9.18.1.5. It
provides the mapping between RCEC and downstream ports. With RDPAS, the
error device can be directly found when an error is reported on RCEC,
without walking a number of RCiEP in order to determine which one reported
the error. While CXL.cachemem is supported by RDPAS, there is no easy way
to discover the source id of the error and therefore finding the Linux PCI
object for the RCiEP. The intention here is to accelerate the discovery
of the error by directly locating the error device with the given
information.
This series is based on top of Terry's CXL error protocol series [1].
Looking for comments on the series WRT if it makes sense to add on top
of Terry's error handling for RCH/RCD devices.
[1]: https://lore.kernel.org/linux-cxl/20260505173029.2718246-1-terry.bowman@amd.com/T/#t
Dave Jiang (2):
PCI/CXL: Add RDPAS parsing support
PCI/CXL: Enable usage of RDPAS to shortcut error device discovery
drivers/cxl/acpi.c | 5 +
drivers/pci/pcie/aer_cxl_rch.c | 271 ++++++++++++++++++++++++++++++++-
include/cxl/ras.h | 18 +++
3 files changed, 291 insertions(+), 3 deletions(-)
create mode 100644 include/cxl/ras.h
base-commit: a558d1571c0b3bb6b4a830cb2cd8f128cc5ef3e1
--
2.54.0
^ permalink raw reply [flat|nested] 6+ messages in thread
* [RFC PATCH v2 1/2] PCI/CXL: Add RDPAS parsing support
2026-06-18 17:07 [RFC PATCH v2 0/2] PCI/CXL: Add RDPAS support for CXL.io Dave Jiang
@ 2026-06-18 17:07 ` Dave Jiang
2026-06-18 17:19 ` sashiko-bot
2026-06-18 17:07 ` [RFC PATCH v2 2/2] PCI/CXL: Enable usage of RDPAS to shortcut error device discovery Dave Jiang
2026-06-18 19:05 ` [RFC PATCH v2 0/2] PCI/CXL: Add RDPAS support for CXL.io Bowman, Terry
2 siblings, 1 reply; 6+ messages in thread
From: Dave Jiang @ 2026-06-18 17:07 UTC (permalink / raw)
To: linux-cxl, linux-pci; +Cc: terry.bowman, bhelgaas, jic23, djbw
Add parsing of the RCEC Downstream Port Association Structure (RDPAS),
which is a structure defined in the CXL spec r4.0 9.18.1.5. This
structure allows error handler to locate the downstream port(s)
that report errors to a given Root Complex Event Collector (RCEC).
The structure is part of CXL Early Discovery Table (CEDT) and can
be parsed like other CEDT tables.
A base address is provided in the RDPAS structure where depending on
the protocol field, it is the RCRB base associated with the downstream
port for CXL.io or the Component Base Register base associated with the
downstream port for CXL.cachemem.
Per the spec, "For every RCEC, zero or more entries of this type are
permitted", so a single (segment, BDF) maps to multiple downstream
ports. Each RDPAS structure is stored as a per-port list node hung off
a per-RCEC container in an xarray indexed by the combination of the
RCEC segment plus the BDF in a 32bit field. Both the base address and
protocol type are recorded for every entry so the error handler can
walk all ports associated with an RCEC and dispatch per protocol.
The parsed table is meant to live the entire life of the kernel, so the
xarray is not cleaned up when cxl_acpi unloads.
A helper is also added to retrieve the per-RCEC container based on the
segment and BDF of the RCEC.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
RFC v2:
- verify table length (sashiko)
- store multiple downstream ports per RCEC in a list (sashiko)
- Add a gate to initialize the xarray only once per boot.
- Add support for multiple DSP per RCEC. (sashiko)
---
drivers/cxl/acpi.c | 5 ++
drivers/pci/pcie/aer_cxl_rch.c | 121 +++++++++++++++++++++++++++++++++
include/cxl/ras.h | 18 +++++
3 files changed, 144 insertions(+)
create mode 100644 include/cxl/ras.h
diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
index 127537628817..e09706275d85 100644
--- a/drivers/cxl/acpi.c
+++ b/drivers/cxl/acpi.c
@@ -7,6 +7,7 @@
#include <linux/acpi.h>
#include <linux/pci.h>
#include <linux/node.h>
+#include <cxl/ras.h>
#include <asm/div64.h>
#include "cxlpci.h"
#include "cxl.h"
@@ -933,6 +934,10 @@ static int cxl_acpi_probe(struct platform_device *pdev)
if (rc < 0)
return -ENXIO;
+ rc = cxl_rdpas_init(host);
+ if (rc < 0)
+ dev_dbg(host, "No RDPAS entries found or failed to parse\n");
+
rc = add_cxl_resources(cxl_res);
if (rc)
return rc;
diff --git a/drivers/pci/pcie/aer_cxl_rch.c b/drivers/pci/pcie/aer_cxl_rch.c
index 83142eac0cab..eaab7698217e 100644
--- a/drivers/pci/pcie/aer_cxl_rch.c
+++ b/drivers/pci/pcie/aer_cxl_rch.c
@@ -4,9 +4,130 @@
#include <linux/pci.h>
#include <linux/aer.h>
#include <linux/bitfield.h>
+#include <linux/acpi.h>
+#include <linux/list.h>
+#include <cxl/ras.h>
#include "../pci.h"
#include "portdrv.h"
+/*
+ * CXL r4.0 9.18.1.5: "For every RCEC, zero or more entries of this type are
+ * permitted." A single (segment, bdf) therefore maps to multiple downstream
+ * ports, each with its own base address and protocol. The xarray value is a
+ * per-RCEC container holding the list of associated downstream ports.
+ */
+struct cxl_rdpas_rcec {
+ struct list_head ports;
+};
+
+/* One per RDPAS structure, i.e. per associated downstream port */
+struct cxl_rdpas_entry {
+ struct list_head list;
+ u64 address;
+ u8 protocol;
+};
+
+static DEFINE_XARRAY(cxl_rdpas);
+static bool rdpas_parsed;
+
+/* CXL r4.0 9.18.1.5 Table 9-24. The segment and the BDF belongs to the RCEC */
+static unsigned long __rdpas_index(u16 segment, u16 bdf)
+{
+ return FIELD_PREP(GENMASK(31, 16), segment) |
+ FIELD_PREP(GENMASK(15, 0), bdf);
+}
+
+static unsigned long rdpas_index(u16 segment, u8 bus, u8 device, u8 function)
+{
+ return __rdpas_index(segment,
+ FIELD_PREP(GENMASK(15, 8), bus) |
+ FIELD_PREP(GENMASK(7, 3), device) |
+ FIELD_PREP(GENMASK(2, 0), function));
+}
+
+static int __cxl_parse_rdpas(struct acpi_cedt_rdpas *rdpas, struct device *dev)
+{
+ struct cxl_rdpas_rcec *rdpas_rcec;
+ struct cxl_rdpas_entry *entry;
+ unsigned long index;
+ int rc;
+
+ if (rdpas->header.length < sizeof(struct acpi_cedt_rdpas))
+ return -EINVAL;
+
+ index = __rdpas_index(rdpas->segment, rdpas->bdf);
+
+ rdpas_rcec = xa_load(&cxl_rdpas, index);
+ if (!rdpas_rcec) {
+ rdpas_rcec = kzalloc(sizeof(*rdpas_rcec), GFP_KERNEL);
+ if (!rdpas_rcec)
+ return -ENOMEM;
+
+ INIT_LIST_HEAD(&rdpas_rcec->ports);
+ rc = xa_insert(&cxl_rdpas, index, rdpas_rcec, GFP_KERNEL);
+ if (rc) {
+ kfree(rdpas_rcec);
+ return rc;
+ }
+ }
+
+ entry = kzalloc(sizeof(*entry), GFP_KERNEL);
+ if (!entry)
+ return -ENOMEM;
+
+ entry->address = rdpas->address;
+ entry->protocol = rdpas->protocol;
+ list_add_tail(&entry->list, &rdpas_rcec->ports);
+
+ dev_dbg(dev,
+ "RDPAS entry: PCI %04x:%02lx:%02lx.%ld %s CXL addr %016llx\n",
+ rdpas->segment, FIELD_GET(GENMASK(15, 8), rdpas->bdf),
+ FIELD_GET(GENMASK(7, 3), rdpas->bdf),
+ FIELD_GET(GENMASK(2, 0), rdpas->bdf),
+ rdpas->protocol == ACPI_CEDT_RDPAS_PROTOCOL_IO ?
+ "CXL.io" : "CXL.cachemem",
+ rdpas->address);
+
+ return 0;
+}
+
+static int cxl_parse_rdpas(union acpi_subtable_headers *header, void *arg,
+ const unsigned long end)
+{
+ struct acpi_cedt_rdpas *rdpas = (struct acpi_cedt_rdpas *)header;
+ struct device *dev = arg;
+
+ return __cxl_parse_rdpas(rdpas, dev);
+}
+
+/*
+ * The CEDT is a single static system-wide firmware table, so RDPAS is parsed
+ * exactly once for the lifetime of the kernel. cxl_acpi may probe more than
+ * once (re-bind or multiple ACPI0017), but the global xarray is populated only
+ * on the first call; subsequent calls are no-ops. There is no teardown: the
+ * data describes the platform and remains valid until the kernel exits.
+ */
+int cxl_rdpas_init(struct device *host)
+{
+ if (rdpas_parsed)
+ return 0;
+
+ rdpas_parsed = true;
+
+ return acpi_table_parse_cedt(ACPI_CEDT_TYPE_RDPAS, cxl_parse_rdpas, host);
+}
+EXPORT_SYMBOL_FOR_MODULES(cxl_rdpas_init, "cxl_acpi");
+
+static struct cxl_rdpas_rcec __maybe_unused *cxl_get_rdpas_by_rcec(struct pci_dev *rcec)
+{
+ unsigned long index;
+
+ index = rdpas_index(pci_domain_nr(rcec->bus), rcec->bus->number,
+ PCI_SLOT(rcec->devfn), PCI_FUNC(rcec->devfn));
+
+ return xa_load(&cxl_rdpas, index);
+}
+
static bool is_cxl_mem_dev(struct pci_dev *dev)
{
/*
diff --git a/include/cxl/ras.h b/include/cxl/ras.h
new file mode 100644
index 000000000000..661307b0230a
--- /dev/null
+++ b/include/cxl/ras.h
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright(c) 2026 Intel Corporation. */
+
+#ifndef __CXL_RAS_H__
+#define __CXL_RAS_H__
+
+#include <linux/acpi.h>
+
+#ifdef CONFIG_CXL_RAS
+int cxl_rdpas_init(struct device *host);
+#else
+static inline int cxl_rdpas_init(struct device *host)
+{
+ return -EOPNOTSUPP;
+}
+#endif
+
+#endif
--
2.54.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [RFC PATCH v2 2/2] PCI/CXL: Enable usage of RDPAS to shortcut error device discovery
2026-06-18 17:07 [RFC PATCH v2 0/2] PCI/CXL: Add RDPAS support for CXL.io Dave Jiang
2026-06-18 17:07 ` [RFC PATCH v2 1/2] PCI/CXL: Add RDPAS parsing support Dave Jiang
@ 2026-06-18 17:07 ` Dave Jiang
2026-06-18 17:20 ` sashiko-bot
2026-06-18 19:05 ` [RFC PATCH v2 0/2] PCI/CXL: Add RDPAS support for CXL.io Bowman, Terry
2 siblings, 1 reply; 6+ messages in thread
From: Dave Jiang @ 2026-06-18 17:07 UTC (permalink / raw)
To: linux-cxl, linux-pci; +Cc: terry.bowman, bhelgaas, jic23, djbw
The RDPAS allows the CXL RCH error handler to find the device directly
instead of iterating through a set number of RCiEP in order to discover
which device triggered an error. For the CXL.io protocol, the base
address provided from the cxl_rdpas xarray points to the RCRB of the
device. The RCRB mirrors the configuration space of the device via MMIO.
The error handler can walk the RCRB to find the AER capability block and
therefore read the root status as well as the error source in order
to determine the BDF of the error device.
The entries with cxl.cachemem protocol is ignored because the base address
provided by the RDPAS structure points to the Component Base Register Base
and does not provide a way for th ecode to identify the device that
triggered the error.
Change the current RCH error handler behavior so it will probe the
RCRB first to see if the error device can be discovered quickly
before falling back to the current method of iterating through RCiEPs.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
v2:
- Add boundary checks for MMIO reads (sashiko)
- Add checks for surprise removal of devices (sashiko)
- Use aer_info to also check severity. (Ming)
- Update to iterate list of RPs under a RCEC entry.
---
drivers/pci/pcie/aer_cxl_rch.c | 152 ++++++++++++++++++++++++++++++++-
1 file changed, 148 insertions(+), 4 deletions(-)
diff --git a/drivers/pci/pcie/aer_cxl_rch.c b/drivers/pci/pcie/aer_cxl_rch.c
index eaab7698217e..f295e4eefbba 100644
--- a/drivers/pci/pcie/aer_cxl_rch.c
+++ b/drivers/pci/pcie/aer_cxl_rch.c
@@ -118,7 +118,7 @@ int cxl_rdpas_init(struct device *host)
}
EXPORT_SYMBOL_FOR_MODULES(cxl_rdpas_init, "cxl_acpi");
-static struct cxl_rdpas_rcec __maybe_unused *cxl_get_rdpas_by_rcec(struct pci_dev *rcec)
+static struct cxl_rdpas_rcec *cxl_get_rdpas_by_rcec(struct pci_dev *rcec)
{
unsigned long index;
@@ -166,6 +166,143 @@ static int cxl_rch_handle_error_iter(struct pci_dev *dev, void *data)
return 0;
}
+static u16 rcrb_to_aer(void __iomem *rcrb)
+{
+ /*
+ * The extended capability space is SZ_4K and each capability header
+ * is dword aligned, so the chain can hold at most SZ_4K / 4 entries.
+ * Bound the walk by that count to avoid spinning on a malformed,
+ * looping capability list.
+ */
+ int entries = SZ_4K / 4;
+ u16 offset;
+ u32 cap_hdr;
+
+ /* Start from PCIe extended capabilities at offset 0x100 */
+ offset = PCI_CFG_SPACE_SIZE;
+ cap_hdr = readl(rcrb + offset);
+ if (cap_hdr == 0 || PCI_POSSIBLE_ERROR(cap_hdr))
+ return 0;
+
+ while (PCI_EXT_CAP_ID(cap_hdr) != PCI_EXT_CAP_ID_ERR) {
+ if (--entries <= 0)
+ return 0;
+
+ offset = PCI_EXT_CAP_NEXT(cap_hdr);
+ if (!offset)
+ return 0;
+
+ if (offset >= SZ_4K)
+ return 0;
+
+ cap_hdr = readl(rcrb + offset);
+ if (cap_hdr == 0 || PCI_POSSIBLE_ERROR(cap_hdr))
+ return 0;
+ }
+
+ return offset;
+}
+
+DEFINE_FREE(iounmap, void __iomem *, if (_T) iounmap(_T))
+static u16 cxl_rch_get_err_src_id(u64 rcrb_base, struct aer_err_info *info)
+{
+ u32 root_status, err_src;
+ void __iomem *aer_base;
+ u16 aer_offset;
+
+ void __iomem *rcrb __free(iounmap) = ioremap(rcrb_base, SZ_4K);
+ if (!rcrb)
+ return 0;
+
+ aer_offset = rcrb_to_aer(rcrb);
+ if (!aer_offset)
+ return 0;
+
+ aer_base = rcrb + aer_offset;
+ if (aer_offset + PCI_ERR_ROOT_STATUS + sizeof(u32) > SZ_4K)
+ return 0;
+
+ root_status = readl(aer_base + PCI_ERR_ROOT_STATUS);
+ if (!(root_status & (PCI_ERR_ROOT_COR_RCV | PCI_ERR_ROOT_UNCOR_RCV)))
+ return 0;
+
+ if (aer_offset + PCI_ERR_ROOT_ERR_SRC + sizeof(u32) > SZ_4K)
+ return 0;
+
+ err_src = readl(aer_base + PCI_ERR_ROOT_ERR_SRC);
+
+ if (info->severity == AER_CORRECTABLE &&
+ root_status & PCI_ERR_ROOT_COR_RCV)
+ return FIELD_GET(GENMASK(15, 0), err_src);
+
+ /* Assume at this point the info->severity points to UNCOR */
+ if (root_status & PCI_ERR_ROOT_UNCOR_RCV)
+ return FIELD_GET(GENMASK(31, 16), err_src);
+
+ return 0;
+}
+
+static bool cxl_rch_forward_error_by_dsp(struct pci_dev *rcec, u64 rcrb_base,
+ struct aer_err_info *info)
+{
+ u8 bus, devfn;
+ u16 segment;
+ u16 src_id;
+
+ src_id = cxl_rch_get_err_src_id(rcrb_base, info);
+ if (!src_id)
+ return false;
+
+ /* Try uncorrectable error source first, then correctable */
+ segment = pci_domain_nr(rcec->bus);
+ bus = FIELD_GET(GENMASK(15, 8), src_id);
+ devfn = FIELD_GET(GENMASK(7, 0), src_id);
+
+ struct pci_dev *pdev __free(pci_dev_put) =
+ pci_get_domain_bus_and_slot(segment, bus, devfn);
+ if (!pdev)
+ return false;
+
+ /*
+ * The error source id resolves to whatever BDF the root port logged,
+ * which is not guaranteed to be a natively handled CXL.mem device.
+ * Apply the same gating as the RCiEP walk fallback before forwarding.
+ */
+ if (!is_cxl_mem_dev(pdev) || !cxl_error_is_native(pdev))
+ return false;
+
+ cxl_forward_error(pdev, info);
+ return true;
+}
+
+static bool cxl_rch_handled_error_by_rdpas(struct pci_dev *rcec,
+ struct aer_err_info *info)
+{
+ struct cxl_rdpas_rcec *rdpas_rcec;
+ struct cxl_rdpas_entry *entry;
+ bool handled = false;
+
+ rdpas_rcec = cxl_get_rdpas_by_rcec(rcec);
+ if (!rdpas_rcec)
+ return false;
+
+ /*
+ * The RCEC aggregates multiple downstream ports. Each CXL.io
+ * downstream port associated with this RCEC exposes the RCRB at its
+ * base address; walk them all and forward the error from every port
+ * that reports a valid error source.
+ */
+ list_for_each_entry(entry, &rdpas_rcec->ports, list) {
+ if (entry->protocol != ACPI_CEDT_RDPAS_PROTOCOL_IO)
+ continue;
+
+ if (cxl_rch_forward_error_by_dsp(rcec, entry->address, info))
+ handled = true;
+ }
+
+ return handled;
+}
+
void cxl_rch_handle_error(struct pci_dev *dev, struct aer_err_info *info)
{
/*
@@ -173,9 +310,16 @@ void cxl_rch_handle_error(struct pci_dev *dev, struct aer_err_info *info)
* RCH's downstream port. Check and handle them in the CXL.mem
* device driver.
*/
- if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC &&
- is_aer_internal_error(info))
- pcie_walk_rcec(dev, cxl_rch_handle_error_iter, info);
+ if (pci_pcie_type(dev) != PCI_EXP_TYPE_RC_EC)
+ return;
+
+ if (!is_aer_internal_error(info))
+ return;
+
+ if (cxl_rch_handled_error_by_rdpas(dev, info))
+ return;
+
+ pcie_walk_rcec(dev, cxl_rch_handle_error_iter, info);
}
static int handles_cxl_error_iter(struct pci_dev *dev, void *data)
--
2.54.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [RFC PATCH v2 1/2] PCI/CXL: Add RDPAS parsing support
2026-06-18 17:07 ` [RFC PATCH v2 1/2] PCI/CXL: Add RDPAS parsing support Dave Jiang
@ 2026-06-18 17:19 ` sashiko-bot
0 siblings, 0 replies; 6+ messages in thread
From: sashiko-bot @ 2026-06-18 17:19 UTC (permalink / raw)
To: Dave Jiang; +Cc: linux-cxl, linux-pci
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [High] The global `rdpas_parsed` flag is checked and updated without synchronization, leading to concurrent initialization races.
- [High] The per-RCEC container is published globally before its port list is populated, allowing lockless AER handlers to encounter corrupted or uninitialized list state.
--
commit 0749bd9917c3af37380ffaa0f749878e1c45c2ad
Author: Dave Jiang <dave.jiang@intel.com>
Subject: PCI/CXL: Add RDPAS parsing support
This commit introduces parsing for the RCEC Downstream Port Association
Structure (RDPAS) from the CEDT. The parsed entries are stored in a global
xarray that contains per-RCEC linked lists of downstream ports to help error
handlers locate them.
> diff --git a/drivers/pci/pcie/aer_cxl_rch.c b/drivers/pci/pcie/aer_cxl_rch.c
> index e471eefec9c40..5b0778fc08d7b 100644
> --- a/drivers/pci/pcie/aer_cxl_rch.c
> +++ b/drivers/pci/pcie/aer_cxl_rch.c
[ ... ]
> +static int __cxl_parse_rdpas(struct acpi_cedt_rdpas *rdpas, struct device *dev)
> +{
> + struct cxl_rdpas_rcec *rdpas_rcec;
> + struct cxl_rdpas_entry *entry;
> + unsigned long index;
> + int rc;
[ ... ]
> + index = __rdpas_index(rdpas->segment, rdpas->bdf);
> +
> + rdpas_rcec = xa_load(&cxl_rdpas, index);
> + if (!rdpas_rcec) {
> + rdpas_rcec = kzalloc(sizeof(*rdpas_rcec), GFP_KERNEL);
> + if (!rdpas_rcec)
> + return -ENOMEM;
> +
> + INIT_LIST_HEAD(&rdpas_rcec->ports);
> + rc = xa_insert(&cxl_rdpas, index, rdpas_rcec, GFP_KERNEL);
> + if (rc) {
> + kfree(rdpas_rcec);
> + return rc;
> + }
> + }
> +
> + entry = kzalloc(sizeof(*entry), GFP_KERNEL);
> + if (!entry)
> + return -ENOMEM;
> +
> + entry->address = rdpas->address;
> + entry->protocol = rdpas->protocol;
> + list_add_tail(&entry->list, &rdpas_rcec->ports);
[Severity: High]
Is it safe to publish the rdpas_rcec container to the global cxl_rdpas
xarray via xa_insert() before its port list is populated?
If a concurrent AER error handler retrieves this container locklessly using
cxl_get_rdpas_by_rcec(), it might iterate the list using
list_for_each_entry(). Because list_add_tail() is used without RCU memory
barriers, a concurrent reader could observe an uninitialized or partially
initialized list state, which could result in a NULL pointer dereference
or crash.
[ ... ]
> +int cxl_rdpas_init(struct device *host)
> +{
> + if (rdpas_parsed)
> + return 0;
> +
> + rdpas_parsed = true;
[Severity: High]
Could there be a race condition when checking and setting rdpas_parsed
without synchronization?
If multiple threads probe cxl_acpi concurrently (for example, via sysfs bind
or if multiple ACPI0017 devices exist), they could both observe
rdpas_parsed == false. This would cause them to concurrently execute
acpi_table_parse_cedt() and perform lockless list_add_tail() operations
for the same RCEC container, potentially corrupting the list pointers.
> +
> + return acpi_table_parse_cedt(ACPI_CEDT_TYPE_RDPAS, cxl_parse_rdpas, host);
> +}
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260618170723.2010490-1-dave.jiang@intel.com?part=1
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC PATCH v2 2/2] PCI/CXL: Enable usage of RDPAS to shortcut error device discovery
2026-06-18 17:07 ` [RFC PATCH v2 2/2] PCI/CXL: Enable usage of RDPAS to shortcut error device discovery Dave Jiang
@ 2026-06-18 17:20 ` sashiko-bot
0 siblings, 0 replies; 6+ messages in thread
From: sashiko-bot @ 2026-06-18 17:20 UTC (permalink / raw)
To: Dave Jiang; +Cc: linux-pci, linux-cxl
Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [High] Incorrect fall-through logic in `cxl_rch_get_err_src_id()` extracts and returns the wrong error source ID for correctable errors.
- [High] The RDPAS list is built concurrently without locking and traversed locklessly in the AER error handler, exposing readers to partially built lists or list corruption.
- [Low] The function `cxl_forward_error` is invoked but is neither declared nor defined anywhere in the kernel.
--
commit e9dabb60b82f98b2295a3d242546827da2279bbf
Author: Dave Jiang <dave.jiang@intel.com>
PCI/CXL: Enable usage of RDPAS to shortcut error device discovery
This commit changes the CXL RCH error handler to probe the RCRB first to
see if the error device can be discovered quickly, falling back to iterating
through RCiEPs if needed. It introduces RDPAS parsing to directly locate the
RCRB and determine the BDF of the error device.
diff --git a/drivers/pci/pcie/aer_cxl_rch.c b/drivers/pci/pcie/aer_cxl_rch.c
--- a/drivers/pci/pcie/aer_cxl_rch.c
+++ b/drivers/pci/pcie/aer_cxl_rch.c
[ ... ]
> +static u16 cxl_rch_get_err_src_id(u64 rcrb_base, struct aer_err_info *info)
> +{
> + u32 root_status, err_src;
[ ... ]
> + if (info->severity == AER_CORRECTABLE &&
> + root_status & PCI_ERR_ROOT_COR_RCV)
> + return FIELD_GET(GENMASK(15, 0), err_src);
> +
> + /* Assume at this point the info->severity points to UNCOR */
> + if (root_status & PCI_ERR_ROOT_UNCOR_RCV)
> + return FIELD_GET(GENMASK(31, 16), err_src);
> +
> + return 0;
> +}
[Severity: High]
If this is handling a correctable error but PCI_ERR_ROOT_COR_RCV is not set,
is it intentional to fall through to the uncorrectable error check?
If an uncorrectable error is concurrently logged in the RCRB status, this
would return the source ID of the uncorrectable error instead of 0, which
might misroute the correctable error notification.
Should these severity checks be mutually exclusive using an else block?
> +static bool cxl_rch_forward_error_by_dsp(struct pci_dev *rcec, u64 rcrb_base,
> + struct aer_err_info *info)
> +{
[ ... ]
> + if (!is_cxl_mem_dev(pdev) || !cxl_error_is_native(pdev))
> + return false;
> +
> + cxl_forward_error(pdev, info);
[Severity: Low]
Does cxl_forward_error() exist in the kernel tree?
I'm having trouble finding its declaration or definition in the codebase,
which seems like it might cause an implicit function declaration build failure.
> + return true;
> +}
> +
> +static bool cxl_rch_handled_error_by_rdpas(struct pci_dev *rcec,
> + struct aer_err_info *info)
> +{
> + struct cxl_rdpas_rcec *rdpas_rcec;
> + struct cxl_rdpas_entry *entry;
> + bool handled = false;
> +
> + rdpas_rcec = cxl_get_rdpas_by_rcec(rcec);
> + if (!rdpas_rcec)
> + return false;
> +
> + /*
> + * The RCEC aggregates multiple downstream ports. Each CXL.io
> + * downstream port associated with this RCEC exposes the RCRB at its
> + * base address; walk them all and forward the error from every port
> + * that reports a valid error source.
> + */
> + list_for_each_entry(entry, &rdpas_rcec->ports, list) {
[Severity: High]
Could this lockless list traversal race with the initialization in
__cxl_parse_rdpas()?
If an asynchronous AER hardware event triggers cxl_rch_handle_error() while
__cxl_parse_rdpas() is still running, it seems we might walk rdpas_rcec->ports
using list_for_each_entry() while it is concurrently being mutated by
list_add_tail().
Since the rdpas_rcec container is published to the cxl_rdpas xarray before
the list is fully populated, would this expose readers to partially built
lists or use-after-free conditions? Additionally, would the lockless check
of rdpas_parsed in cxl_rdpas_init() allow concurrent probes to race and
corrupt the list?
> + if (entry->protocol != ACPI_CEDT_RDPAS_PROTOCOL_IO)
> + continue;
> +
> + if (cxl_rch_forward_error_by_dsp(rcec, entry->address, info))
> + handled = true;
> + }
> +
> + return handled;
> +}
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260618170723.2010490-1-dave.jiang@intel.com?part=2
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC PATCH v2 0/2] PCI/CXL: Add RDPAS support for CXL.io
2026-06-18 17:07 [RFC PATCH v2 0/2] PCI/CXL: Add RDPAS support for CXL.io Dave Jiang
2026-06-18 17:07 ` [RFC PATCH v2 1/2] PCI/CXL: Add RDPAS parsing support Dave Jiang
2026-06-18 17:07 ` [RFC PATCH v2 2/2] PCI/CXL: Enable usage of RDPAS to shortcut error device discovery Dave Jiang
@ 2026-06-18 19:05 ` Bowman, Terry
2 siblings, 0 replies; 6+ messages in thread
From: Bowman, Terry @ 2026-06-18 19:05 UTC (permalink / raw)
To: Dave Jiang, linux-cxl, linux-pci; +Cc: bhelgaas, jic23, djbw
On 6/18/2026 12:07 PM, Dave Jiang wrote:
> v2:
> - Added multiple DSP per RCEC support.
> - Added boundary checks for reading MMIO
> - Addressed issues raised by shashiko
> - See individual patches for detailed changes
>
> The series add RCEC Downstream Port Assocation Structure (RDPAS) parsing
> support to CXL.io. RDPAS is an ACPI table that is part of the CXL Early
> Discovery Table (CEDT) defined in CXL specification r4.0 9.18.1.5. It
> provides the mapping between RCEC and downstream ports. With RDPAS, the
> error device can be directly found when an error is reported on RCEC,
> without walking a number of RCiEP in order to determine which one reported
> the error. While CXL.cachemem is supported by RDPAS, there is no easy way
> to discover the source id of the error and therefore finding the Linux PCI
> object for the RCiEP. The intention here is to accelerate the discovery
> of the error by directly locating the error device with the given
> information.
>
> This series is based on top of Terry's CXL error protocol series [1].
> Looking for comments on the series WRT if it makes sense to add on top
> of Terry's error handling for RCH/RCD devices.
>
> [1]: https://lore.kernel.org/linux-cxl/20260505173029.2718246-1-terry.bowman@amd.com/T/#t
>
I was considering whether this could possibly be reused for other RCEC–RCRB use
cases and lifted into the AER core as a generalized solution. However, as far as I
know, only CXL aggregates RCiEP reporting using the RCEC SBDF and leveraging RDPAS.
I think this will provide a useful alternative to walking the RCEC’s associated RCiEPs and
should improve performance with scale.
It may be worth clarifying whether this supports both RCH and VH modes. Also
consider updating the problem statement to note that the current RCEC RCiEP
walk processes the RCH downstream ports of *all* RCiEPs associated with the RCEC.
-Terry
>
> Dave Jiang (2):
> PCI/CXL: Add RDPAS parsing support
> PCI/CXL: Enable usage of RDPAS to shortcut error device discovery
>
> drivers/cxl/acpi.c | 5 +
> drivers/pci/pcie/aer_cxl_rch.c | 271 ++++++++++++++++++++++++++++++++-
> include/cxl/ras.h | 18 +++
> 3 files changed, 291 insertions(+), 3 deletions(-)
> create mode 100644 include/cxl/ras.h
>
>
> base-commit: a558d1571c0b3bb6b4a830cb2cd8f128cc5ef3e1
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-06-18 19:05 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-18 17:07 [RFC PATCH v2 0/2] PCI/CXL: Add RDPAS support for CXL.io Dave Jiang
2026-06-18 17:07 ` [RFC PATCH v2 1/2] PCI/CXL: Add RDPAS parsing support Dave Jiang
2026-06-18 17:19 ` sashiko-bot
2026-06-18 17:07 ` [RFC PATCH v2 2/2] PCI/CXL: Enable usage of RDPAS to shortcut error device discovery Dave Jiang
2026-06-18 17:20 ` sashiko-bot
2026-06-18 19:05 ` [RFC PATCH v2 0/2] PCI/CXL: Add RDPAS support for CXL.io Bowman, Terry
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox