* [RFC PATCH v2 0/2] PCI/CXL: Add RDPAS support for CXL.io
@ 2026-06-18 17:07 Dave Jiang
2026-06-18 17:07 ` [RFC PATCH v2 1/2] PCI/CXL: Add RDPAS parsing support Dave Jiang
` (3 more replies)
0 siblings, 4 replies; 13+ messages in thread
From: Dave Jiang @ 2026-06-18 17:07 UTC (permalink / raw)
To: linux-cxl, linux-pci; +Cc: terry.bowman, bhelgaas, jic23, djbw
v2:
- Added multiple DSP per RCEC support.
- Added boundary checks for reading MMIO
- Addressed issues raised by shashiko
- See individual patches for detailed changes
The series add RCEC Downstream Port Assocation Structure (RDPAS) parsing
support to CXL.io. RDPAS is an ACPI table that is part of the CXL Early
Discovery Table (CEDT) defined in CXL specification r4.0 9.18.1.5. It
provides the mapping between RCEC and downstream ports. With RDPAS, the
error device can be directly found when an error is reported on RCEC,
without walking a number of RCiEP in order to determine which one reported
the error. While CXL.cachemem is supported by RDPAS, there is no easy way
to discover the source id of the error and therefore finding the Linux PCI
object for the RCiEP. The intention here is to accelerate the discovery
of the error by directly locating the error device with the given
information.
This series is based on top of Terry's CXL error protocol series [1].
Looking for comments on the series WRT if it makes sense to add on top
of Terry's error handling for RCH/RCD devices.
[1]: https://lore.kernel.org/linux-cxl/20260505173029.2718246-1-terry.bowman@amd.com/T/#t
Dave Jiang (2):
PCI/CXL: Add RDPAS parsing support
PCI/CXL: Enable usage of RDPAS to shortcut error device discovery
drivers/cxl/acpi.c | 5 +
drivers/pci/pcie/aer_cxl_rch.c | 271 ++++++++++++++++++++++++++++++++-
include/cxl/ras.h | 18 +++
3 files changed, 291 insertions(+), 3 deletions(-)
create mode 100644 include/cxl/ras.h
base-commit: a558d1571c0b3bb6b4a830cb2cd8f128cc5ef3e1
--
2.54.0
^ permalink raw reply [flat|nested] 13+ messages in thread* [RFC PATCH v2 1/2] PCI/CXL: Add RDPAS parsing support 2026-06-18 17:07 [RFC PATCH v2 0/2] PCI/CXL: Add RDPAS support for CXL.io Dave Jiang @ 2026-06-18 17:07 ` Dave Jiang 2026-06-18 17:19 ` sashiko-bot 2026-06-18 21:26 ` Bowman, Terry 2026-06-18 17:07 ` [RFC PATCH v2 2/2] PCI/CXL: Enable usage of RDPAS to shortcut error device discovery Dave Jiang ` (2 subsequent siblings) 3 siblings, 2 replies; 13+ messages in thread From: Dave Jiang @ 2026-06-18 17:07 UTC (permalink / raw) To: linux-cxl, linux-pci; +Cc: terry.bowman, bhelgaas, jic23, djbw Add parsing of the RCEC Downstream Port Association Structure (RDPAS), which is a structure defined in the CXL spec r4.0 9.18.1.5. This structure allows error handler to locate the downstream port(s) that report errors to a given Root Complex Event Collector (RCEC). The structure is part of CXL Early Discovery Table (CEDT) and can be parsed like other CEDT tables. A base address is provided in the RDPAS structure where depending on the protocol field, it is the RCRB base associated with the downstream port for CXL.io or the Component Base Register base associated with the downstream port for CXL.cachemem. Per the spec, "For every RCEC, zero or more entries of this type are permitted", so a single (segment, BDF) maps to multiple downstream ports. Each RDPAS structure is stored as a per-port list node hung off a per-RCEC container in an xarray indexed by the combination of the RCEC segment plus the BDF in a 32bit field. Both the base address and protocol type are recorded for every entry so the error handler can walk all ports associated with an RCEC and dispatch per protocol. The parsed table is meant to live the entire life of the kernel, so the xarray is not cleaned up when cxl_acpi unloads. A helper is also added to retrieve the per-RCEC container based on the segment and BDF of the RCEC. Signed-off-by: Dave Jiang <dave.jiang@intel.com> --- RFC v2: - verify table length (sashiko) - store multiple downstream ports per RCEC in a list (sashiko) - Add a gate to initialize the xarray only once per boot. - Add support for multiple DSP per RCEC. (sashiko) --- drivers/cxl/acpi.c | 5 ++ drivers/pci/pcie/aer_cxl_rch.c | 121 +++++++++++++++++++++++++++++++++ include/cxl/ras.h | 18 +++++ 3 files changed, 144 insertions(+) create mode 100644 include/cxl/ras.h diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c index 127537628817..e09706275d85 100644 --- a/drivers/cxl/acpi.c +++ b/drivers/cxl/acpi.c @@ -7,6 +7,7 @@ #include <linux/acpi.h> #include <linux/pci.h> #include <linux/node.h> +#include <cxl/ras.h> #include <asm/div64.h> #include "cxlpci.h" #include "cxl.h" @@ -933,6 +934,10 @@ static int cxl_acpi_probe(struct platform_device *pdev) if (rc < 0) return -ENXIO; + rc = cxl_rdpas_init(host); + if (rc < 0) + dev_dbg(host, "No RDPAS entries found or failed to parse\n"); + rc = add_cxl_resources(cxl_res); if (rc) return rc; diff --git a/drivers/pci/pcie/aer_cxl_rch.c b/drivers/pci/pcie/aer_cxl_rch.c index 83142eac0cab..eaab7698217e 100644 --- a/drivers/pci/pcie/aer_cxl_rch.c +++ b/drivers/pci/pcie/aer_cxl_rch.c @@ -4,9 +4,130 @@ #include <linux/pci.h> #include <linux/aer.h> #include <linux/bitfield.h> +#include <linux/acpi.h> +#include <linux/list.h> +#include <cxl/ras.h> #include "../pci.h" #include "portdrv.h" +/* + * CXL r4.0 9.18.1.5: "For every RCEC, zero or more entries of this type are + * permitted." A single (segment, bdf) therefore maps to multiple downstream + * ports, each with its own base address and protocol. The xarray value is a + * per-RCEC container holding the list of associated downstream ports. + */ +struct cxl_rdpas_rcec { + struct list_head ports; +}; + +/* One per RDPAS structure, i.e. per associated downstream port */ +struct cxl_rdpas_entry { + struct list_head list; + u64 address; + u8 protocol; +}; + +static DEFINE_XARRAY(cxl_rdpas); +static bool rdpas_parsed; + +/* CXL r4.0 9.18.1.5 Table 9-24. The segment and the BDF belongs to the RCEC */ +static unsigned long __rdpas_index(u16 segment, u16 bdf) +{ + return FIELD_PREP(GENMASK(31, 16), segment) | + FIELD_PREP(GENMASK(15, 0), bdf); +} + +static unsigned long rdpas_index(u16 segment, u8 bus, u8 device, u8 function) +{ + return __rdpas_index(segment, + FIELD_PREP(GENMASK(15, 8), bus) | + FIELD_PREP(GENMASK(7, 3), device) | + FIELD_PREP(GENMASK(2, 0), function)); +} + +static int __cxl_parse_rdpas(struct acpi_cedt_rdpas *rdpas, struct device *dev) +{ + struct cxl_rdpas_rcec *rdpas_rcec; + struct cxl_rdpas_entry *entry; + unsigned long index; + int rc; + + if (rdpas->header.length < sizeof(struct acpi_cedt_rdpas)) + return -EINVAL; + + index = __rdpas_index(rdpas->segment, rdpas->bdf); + + rdpas_rcec = xa_load(&cxl_rdpas, index); + if (!rdpas_rcec) { + rdpas_rcec = kzalloc(sizeof(*rdpas_rcec), GFP_KERNEL); + if (!rdpas_rcec) + return -ENOMEM; + + INIT_LIST_HEAD(&rdpas_rcec->ports); + rc = xa_insert(&cxl_rdpas, index, rdpas_rcec, GFP_KERNEL); + if (rc) { + kfree(rdpas_rcec); + return rc; + } + } + + entry = kzalloc(sizeof(*entry), GFP_KERNEL); + if (!entry) + return -ENOMEM; + + entry->address = rdpas->address; + entry->protocol = rdpas->protocol; + list_add_tail(&entry->list, &rdpas_rcec->ports); + + dev_dbg(dev, + "RDPAS entry: PCI %04x:%02lx:%02lx.%ld %s CXL addr %016llx\n", + rdpas->segment, FIELD_GET(GENMASK(15, 8), rdpas->bdf), + FIELD_GET(GENMASK(7, 3), rdpas->bdf), + FIELD_GET(GENMASK(2, 0), rdpas->bdf), + rdpas->protocol == ACPI_CEDT_RDPAS_PROTOCOL_IO ? + "CXL.io" : "CXL.cachemem", + rdpas->address); + + return 0; +} + +static int cxl_parse_rdpas(union acpi_subtable_headers *header, void *arg, + const unsigned long end) +{ + struct acpi_cedt_rdpas *rdpas = (struct acpi_cedt_rdpas *)header; + struct device *dev = arg; + + return __cxl_parse_rdpas(rdpas, dev); +} + +/* + * The CEDT is a single static system-wide firmware table, so RDPAS is parsed + * exactly once for the lifetime of the kernel. cxl_acpi may probe more than + * once (re-bind or multiple ACPI0017), but the global xarray is populated only + * on the first call; subsequent calls are no-ops. There is no teardown: the + * data describes the platform and remains valid until the kernel exits. + */ +int cxl_rdpas_init(struct device *host) +{ + if (rdpas_parsed) + return 0; + + rdpas_parsed = true; + + return acpi_table_parse_cedt(ACPI_CEDT_TYPE_RDPAS, cxl_parse_rdpas, host); +} +EXPORT_SYMBOL_FOR_MODULES(cxl_rdpas_init, "cxl_acpi"); + +static struct cxl_rdpas_rcec __maybe_unused *cxl_get_rdpas_by_rcec(struct pci_dev *rcec) +{ + unsigned long index; + + index = rdpas_index(pci_domain_nr(rcec->bus), rcec->bus->number, + PCI_SLOT(rcec->devfn), PCI_FUNC(rcec->devfn)); + + return xa_load(&cxl_rdpas, index); +} + static bool is_cxl_mem_dev(struct pci_dev *dev) { /* diff --git a/include/cxl/ras.h b/include/cxl/ras.h new file mode 100644 index 000000000000..661307b0230a --- /dev/null +++ b/include/cxl/ras.h @@ -0,0 +1,18 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* Copyright(c) 2026 Intel Corporation. */ + +#ifndef __CXL_RAS_H__ +#define __CXL_RAS_H__ + +#include <linux/acpi.h> + +#ifdef CONFIG_CXL_RAS +int cxl_rdpas_init(struct device *host); +#else +static inline int cxl_rdpas_init(struct device *host) +{ + return -EOPNOTSUPP; +} +#endif + +#endif -- 2.54.0 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [RFC PATCH v2 1/2] PCI/CXL: Add RDPAS parsing support 2026-06-18 17:07 ` [RFC PATCH v2 1/2] PCI/CXL: Add RDPAS parsing support Dave Jiang @ 2026-06-18 17:19 ` sashiko-bot 2026-06-18 21:26 ` Bowman, Terry 1 sibling, 0 replies; 13+ messages in thread From: sashiko-bot @ 2026-06-18 17:19 UTC (permalink / raw) To: Dave Jiang; +Cc: linux-cxl, linux-pci Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider: - [High] The global `rdpas_parsed` flag is checked and updated without synchronization, leading to concurrent initialization races. - [High] The per-RCEC container is published globally before its port list is populated, allowing lockless AER handlers to encounter corrupted or uninitialized list state. -- commit 0749bd9917c3af37380ffaa0f749878e1c45c2ad Author: Dave Jiang <dave.jiang@intel.com> Subject: PCI/CXL: Add RDPAS parsing support This commit introduces parsing for the RCEC Downstream Port Association Structure (RDPAS) from the CEDT. The parsed entries are stored in a global xarray that contains per-RCEC linked lists of downstream ports to help error handlers locate them. > diff --git a/drivers/pci/pcie/aer_cxl_rch.c b/drivers/pci/pcie/aer_cxl_rch.c > index e471eefec9c40..5b0778fc08d7b 100644 > --- a/drivers/pci/pcie/aer_cxl_rch.c > +++ b/drivers/pci/pcie/aer_cxl_rch.c [ ... ] > +static int __cxl_parse_rdpas(struct acpi_cedt_rdpas *rdpas, struct device *dev) > +{ > + struct cxl_rdpas_rcec *rdpas_rcec; > + struct cxl_rdpas_entry *entry; > + unsigned long index; > + int rc; [ ... ] > + index = __rdpas_index(rdpas->segment, rdpas->bdf); > + > + rdpas_rcec = xa_load(&cxl_rdpas, index); > + if (!rdpas_rcec) { > + rdpas_rcec = kzalloc(sizeof(*rdpas_rcec), GFP_KERNEL); > + if (!rdpas_rcec) > + return -ENOMEM; > + > + INIT_LIST_HEAD(&rdpas_rcec->ports); > + rc = xa_insert(&cxl_rdpas, index, rdpas_rcec, GFP_KERNEL); > + if (rc) { > + kfree(rdpas_rcec); > + return rc; > + } > + } > + > + entry = kzalloc(sizeof(*entry), GFP_KERNEL); > + if (!entry) > + return -ENOMEM; > + > + entry->address = rdpas->address; > + entry->protocol = rdpas->protocol; > + list_add_tail(&entry->list, &rdpas_rcec->ports); [Severity: High] Is it safe to publish the rdpas_rcec container to the global cxl_rdpas xarray via xa_insert() before its port list is populated? If a concurrent AER error handler retrieves this container locklessly using cxl_get_rdpas_by_rcec(), it might iterate the list using list_for_each_entry(). Because list_add_tail() is used without RCU memory barriers, a concurrent reader could observe an uninitialized or partially initialized list state, which could result in a NULL pointer dereference or crash. [ ... ] > +int cxl_rdpas_init(struct device *host) > +{ > + if (rdpas_parsed) > + return 0; > + > + rdpas_parsed = true; [Severity: High] Could there be a race condition when checking and setting rdpas_parsed without synchronization? If multiple threads probe cxl_acpi concurrently (for example, via sysfs bind or if multiple ACPI0017 devices exist), they could both observe rdpas_parsed == false. This would cause them to concurrently execute acpi_table_parse_cedt() and perform lockless list_add_tail() operations for the same RCEC container, potentially corrupting the list pointers. > + > + return acpi_table_parse_cedt(ACPI_CEDT_TYPE_RDPAS, cxl_parse_rdpas, host); > +} -- Sashiko AI review · https://sashiko.dev/#/patchset/20260618170723.2010490-1-dave.jiang@intel.com?part=1 ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH v2 1/2] PCI/CXL: Add RDPAS parsing support 2026-06-18 17:07 ` [RFC PATCH v2 1/2] PCI/CXL: Add RDPAS parsing support Dave Jiang 2026-06-18 17:19 ` sashiko-bot @ 2026-06-18 21:26 ` Bowman, Terry 2026-06-18 21:57 ` Dave Jiang 1 sibling, 1 reply; 13+ messages in thread From: Bowman, Terry @ 2026-06-18 21:26 UTC (permalink / raw) To: Dave Jiang, linux-cxl, linux-pci; +Cc: bhelgaas, jic23, djbw On 6/18/2026 12:07 PM, Dave Jiang wrote: > Add parsing of the RCEC Downstream Port Association Structure (RDPAS), > which is a structure defined in the CXL spec r4.0 9.18.1.5. This > structure allows error handler to locate the downstream port(s) > that report errors to a given Root Complex Event Collector (RCEC). > The structure is part of CXL Early Discovery Table (CEDT) and can > be parsed like other CEDT tables. > Maybe mention this is used only in RCH mode. > A base address is provided in the RDPAS structure where depending on > the protocol field, it is the RCRB base associated with the downstream > port for CXL.io or the Component Base Register base associated with the > downstream port for CXL.cachemem. > > Per the spec, "For every RCEC, zero or more entries of this type are > permitted", so a single (segment, BDF) maps to multiple downstream > ports. Each RDPAS structure is stored as a per-port list node hung off > a per-RCEC container in an xarray indexed by the combination of the > RCEC segment plus the BDF in a 32bit field. Both the base address and > protocol type are recorded for every entry so the error handler can > walk all ports associated with an RCEC and dispatch per protocol. > It's stated here but I missed that a RCEC list must still be walked. May want to compare to the existing method using RCiEP. > The parsed table is meant to live the entire life of the kernel, so the > xarray is not cleaned up when cxl_acpi unloads. > > A helper is also added to retrieve the per-RCEC container based on the > segment and BDF of the RCEC. > > Signed-off-by: Dave Jiang <dave.jiang@intel.com> > --- > RFC v2: > - verify table length (sashiko) > - store multiple downstream ports per RCEC in a list (sashiko) > - Add a gate to initialize the xarray only once per boot. > - Add support for multiple DSP per RCEC. (sashiko) > --- > drivers/cxl/acpi.c | 5 ++ > drivers/pci/pcie/aer_cxl_rch.c | 121 +++++++++++++++++++++++++++++++++ > include/cxl/ras.h | 18 +++++ > 3 files changed, 144 insertions(+) > create mode 100644 include/cxl/ras.h > > diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c > index 127537628817..e09706275d85 100644 > --- a/drivers/cxl/acpi.c > +++ b/drivers/cxl/acpi.c > @@ -7,6 +7,7 @@ > #include <linux/acpi.h> > #include <linux/pci.h> > #include <linux/node.h> > +#include <cxl/ras.h> > #include <asm/div64.h> > #include "cxlpci.h" > #include "cxl.h" > @@ -933,6 +934,10 @@ static int cxl_acpi_probe(struct platform_device *pdev) > if (rc < 0) > return -ENXIO; > > + rc = cxl_rdpas_init(host); > + if (rc < 0) > + dev_dbg(host, "No RDPAS entries found or failed to parse\n"); > + > rc = add_cxl_resources(cxl_res); > if (rc) > return rc; > diff --git a/drivers/pci/pcie/aer_cxl_rch.c b/drivers/pci/pcie/aer_cxl_rch.c > index 83142eac0cab..eaab7698217e 100644 > --- a/drivers/pci/pcie/aer_cxl_rch.c > +++ b/drivers/pci/pcie/aer_cxl_rch.c > @@ -4,9 +4,130 @@ > #include <linux/pci.h> > #include <linux/aer.h> > #include <linux/bitfield.h> > +#include <linux/acpi.h> > +#include <linux/list.h> > +#include <cxl/ras.h> > #include "../pci.h" > #include "portdrv.h" > > +/* > + * CXL r4.0 9.18.1.5: "For every RCEC, zero or more entries of this type are > + * permitted." A single (segment, bdf) therefore maps to multiple downstream > + * ports, each with its own base address and protocol. The xarray value is a > + * per-RCEC container holding the list of associated downstream ports. > + */ > +struct cxl_rdpas_rcec { > + struct list_head ports; > +}; > + > +/* One per RDPAS structure, i.e. per associated downstream port */ > +struct cxl_rdpas_entry { > + struct list_head list; > + u64 address; > + u8 protocol; > +}; > + > +static DEFINE_XARRAY(cxl_rdpas); > +static bool rdpas_parsed; > + > +/* CXL r4.0 9.18.1.5 Table 9-24. The segment and the BDF belongs to the RCEC */ > +static unsigned long __rdpas_index(u16 segment, u16 bdf) > +{ > + return FIELD_PREP(GENMASK(31, 16), segment) | > + FIELD_PREP(GENMASK(15, 0), bdf); > +} > + > +static unsigned long rdpas_index(u16 segment, u8 bus, u8 device, u8 function) > +{ > + return __rdpas_index(segment, > + FIELD_PREP(GENMASK(15, 8), bus) | > + FIELD_PREP(GENMASK(7, 3), device) | > + FIELD_PREP(GENMASK(2, 0), function)); > +} > + > +static int __cxl_parse_rdpas(struct acpi_cedt_rdpas *rdpas, struct device *dev) > +{ > + struct cxl_rdpas_rcec *rdpas_rcec; > + struct cxl_rdpas_entry *entry; > + unsigned long index; > + int rc; > + > + if (rdpas->header.length < sizeof(struct acpi_cedt_rdpas)) > + return -EINVAL; > + > + index = __rdpas_index(rdpas->segment, rdpas->bdf); > + > + rdpas_rcec = xa_load(&cxl_rdpas, index); > + if (!rdpas_rcec) { > + rdpas_rcec = kzalloc(sizeof(*rdpas_rcec), GFP_KERNEL); > + if (!rdpas_rcec) > + return -ENOMEM; > + > + INIT_LIST_HEAD(&rdpas_rcec->ports); > + rc = xa_insert(&cxl_rdpas, index, rdpas_rcec, GFP_KERNEL); > + if (rc) { > + kfree(rdpas_rcec); > + return rc; > + } > + } > + > + entry = kzalloc(sizeof(*entry), GFP_KERNEL); > + if (!entry) > + return -ENOMEM; > + > + entry->address = rdpas->address; > + entry->protocol = rdpas->protocol; > + list_add_tail(&entry->list, &rdpas_rcec->ports); > + > + dev_dbg(dev, > + "RDPAS entry: PCI %04x:%02lx:%02lx.%ld %s CXL addr %016llx\n", > + rdpas->segment, FIELD_GET(GENMASK(15, 8), rdpas->bdf), > + FIELD_GET(GENMASK(7, 3), rdpas->bdf), > + FIELD_GET(GENMASK(2, 0), rdpas->bdf), > + rdpas->protocol == ACPI_CEDT_RDPAS_PROTOCOL_IO ? > + "CXL.io" : "CXL.cachemem", > + rdpas->address); > + This logging will be helpful. > + return 0; > +} > + > +static int cxl_parse_rdpas(union acpi_subtable_headers *header, void *arg, > + const unsigned long end) > +{ > + struct acpi_cedt_rdpas *rdpas = (struct acpi_cedt_rdpas *)header; > + struct device *dev = arg; > + > + return __cxl_parse_rdpas(rdpas, dev); > +} > + > +/* > + * The CEDT is a single static system-wide firmware table, so RDPAS is parsed > + * exactly once for the lifetime of the kernel. cxl_acpi may probe more than > + * once (re-bind or multiple ACPI0017), but the global xarray is populated only > + * on the first call; subsequent calls are no-ops. There is no teardown: the > + * data describes the platform and remains valid until the kernel exits. > + */ You may want to change the ACPI0017 reference. There is only one ACPI0017 per system. > +int cxl_rdpas_init(struct device *host) > +{ > + if (rdpas_parsed) > + return 0; > + > + rdpas_parsed = true; > + > + return acpi_table_parse_cedt(ACPI_CEDT_TYPE_RDPAS, cxl_parse_rdpas, host);> +} > +EXPORT_SYMBOL_FOR_MODULES(cxl_rdpas_init, "cxl_acpi"); > + > +static struct cxl_rdpas_rcec __maybe_unused *cxl_get_rdpas_by_rcec(struct pci_dev *rcec) > +{ > + unsigned long index; > + > + index = rdpas_index(pci_domain_nr(rcec->bus), rcec->bus->number, > + PCI_SLOT(rcec->devfn), PCI_FUNC(rcec->devfn)); > + > + return xa_load(&cxl_rdpas, index); > +} > + > static bool is_cxl_mem_dev(struct pci_dev *dev) > { > /* > diff --git a/include/cxl/ras.h b/include/cxl/ras.h > new file mode 100644 > index 000000000000..661307b0230a > --- /dev/null > +++ b/include/cxl/ras.h > @@ -0,0 +1,18 @@ > +/* SPDX-License-Identifier: GPL-2.0 */ > +/* Copyright(c) 2026 Intel Corporation. */ > + > +#ifndef __CXL_RAS_H__ > +#define __CXL_RAS_H__ > + > +#include <linux/acpi.h> > + > +#ifdef CONFIG_CXL_RAS > +int cxl_rdpas_init(struct device *host); > +#else > +static inline int cxl_rdpas_init(struct device *host) > +{ > + return -EOPNOTSUPP; > +} > +#endif > + > +#endif The /include/cxl/ras.h addition will be helpful. I'll use this for the CXL-AER definitions, instead of /include/linux/aer.h. LGTM -Terry ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH v2 1/2] PCI/CXL: Add RDPAS parsing support 2026-06-18 21:26 ` Bowman, Terry @ 2026-06-18 21:57 ` Dave Jiang 0 siblings, 0 replies; 13+ messages in thread From: Dave Jiang @ 2026-06-18 21:57 UTC (permalink / raw) To: Bowman, Terry, linux-cxl, linux-pci; +Cc: bhelgaas, jic23, djbw On 6/18/26 2:26 PM, Bowman, Terry wrote: > On 6/18/2026 12:07 PM, Dave Jiang wrote: >> Add parsing of the RCEC Downstream Port Association Structure (RDPAS), >> which is a structure defined in the CXL spec r4.0 9.18.1.5. This >> structure allows error handler to locate the downstream port(s) >> that report errors to a given Root Complex Event Collector (RCEC). >> The structure is part of CXL Early Discovery Table (CEDT) and can >> be parsed like other CEDT tables. >> > > Maybe mention this is used only in RCH mode. > >> A base address is provided in the RDPAS structure where depending on >> the protocol field, it is the RCRB base associated with the downstream >> port for CXL.io or the Component Base Register base associated with the >> downstream port for CXL.cachemem. >> >> Per the spec, "For every RCEC, zero or more entries of this type are >> permitted", so a single (segment, BDF) maps to multiple downstream >> ports. Each RDPAS structure is stored as a per-port list node hung off >> a per-RCEC container in an xarray indexed by the combination of the >> RCEC segment plus the BDF in a 32bit field. Both the base address and >> protocol type are recorded for every entry so the error handler can >> walk all ports associated with an RCEC and dispatch per protocol. >> It's stated here but I missed that a RCEC list must still be walked. May > want to compare to the existing method using RCiEP. > >> The parsed table is meant to live the entire life of the kernel, so the >> xarray is not cleaned up when cxl_acpi unloads. >> >> A helper is also added to retrieve the per-RCEC container based on the >> segment and BDF of the RCEC. >> >> Signed-off-by: Dave Jiang <dave.jiang@intel.com> >> --- >> RFC v2: >> - verify table length (sashiko) >> - store multiple downstream ports per RCEC in a list (sashiko) >> - Add a gate to initialize the xarray only once per boot. >> - Add support for multiple DSP per RCEC. (sashiko) >> --- >> drivers/cxl/acpi.c | 5 ++ >> drivers/pci/pcie/aer_cxl_rch.c | 121 +++++++++++++++++++++++++++++++++ >> include/cxl/ras.h | 18 +++++ >> 3 files changed, 144 insertions(+) >> create mode 100644 include/cxl/ras.h >> >> diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c >> index 127537628817..e09706275d85 100644 >> --- a/drivers/cxl/acpi.c >> +++ b/drivers/cxl/acpi.c >> @@ -7,6 +7,7 @@ >> #include <linux/acpi.h> >> #include <linux/pci.h> >> #include <linux/node.h> >> +#include <cxl/ras.h> >> #include <asm/div64.h> >> #include "cxlpci.h" >> #include "cxl.h" >> @@ -933,6 +934,10 @@ static int cxl_acpi_probe(struct platform_device *pdev) >> if (rc < 0) >> return -ENXIO; >> >> + rc = cxl_rdpas_init(host); >> + if (rc < 0) >> + dev_dbg(host, "No RDPAS entries found or failed to parse\n"); >> + >> rc = add_cxl_resources(cxl_res); >> if (rc) >> return rc; >> diff --git a/drivers/pci/pcie/aer_cxl_rch.c b/drivers/pci/pcie/aer_cxl_rch.c >> index 83142eac0cab..eaab7698217e 100644 >> --- a/drivers/pci/pcie/aer_cxl_rch.c >> +++ b/drivers/pci/pcie/aer_cxl_rch.c >> @@ -4,9 +4,130 @@ >> #include <linux/pci.h> >> #include <linux/aer.h> >> #include <linux/bitfield.h> >> +#include <linux/acpi.h> >> +#include <linux/list.h> >> +#include <cxl/ras.h> >> #include "../pci.h" >> #include "portdrv.h" >> >> +/* >> + * CXL r4.0 9.18.1.5: "For every RCEC, zero or more entries of this type are >> + * permitted." A single (segment, bdf) therefore maps to multiple downstream >> + * ports, each with its own base address and protocol. The xarray value is a >> + * per-RCEC container holding the list of associated downstream ports. >> + */ >> +struct cxl_rdpas_rcec { >> + struct list_head ports; >> +}; >> + >> +/* One per RDPAS structure, i.e. per associated downstream port */ >> +struct cxl_rdpas_entry { >> + struct list_head list; >> + u64 address; >> + u8 protocol; >> +}; >> + >> +static DEFINE_XARRAY(cxl_rdpas); >> +static bool rdpas_parsed; >> + >> +/* CXL r4.0 9.18.1.5 Table 9-24. The segment and the BDF belongs to the RCEC */ >> +static unsigned long __rdpas_index(u16 segment, u16 bdf) >> +{ >> + return FIELD_PREP(GENMASK(31, 16), segment) | >> + FIELD_PREP(GENMASK(15, 0), bdf); >> +} >> + >> +static unsigned long rdpas_index(u16 segment, u8 bus, u8 device, u8 function) >> +{ >> + return __rdpas_index(segment, >> + FIELD_PREP(GENMASK(15, 8), bus) | >> + FIELD_PREP(GENMASK(7, 3), device) | >> + FIELD_PREP(GENMASK(2, 0), function)); >> +} >> + >> +static int __cxl_parse_rdpas(struct acpi_cedt_rdpas *rdpas, struct device *dev) >> +{ >> + struct cxl_rdpas_rcec *rdpas_rcec; >> + struct cxl_rdpas_entry *entry; >> + unsigned long index; >> + int rc; >> + >> + if (rdpas->header.length < sizeof(struct acpi_cedt_rdpas)) >> + return -EINVAL; >> + >> + index = __rdpas_index(rdpas->segment, rdpas->bdf); >> + >> + rdpas_rcec = xa_load(&cxl_rdpas, index); >> + if (!rdpas_rcec) { >> + rdpas_rcec = kzalloc(sizeof(*rdpas_rcec), GFP_KERNEL); >> + if (!rdpas_rcec) >> + return -ENOMEM; >> + >> + INIT_LIST_HEAD(&rdpas_rcec->ports); >> + rc = xa_insert(&cxl_rdpas, index, rdpas_rcec, GFP_KERNEL); >> + if (rc) { >> + kfree(rdpas_rcec); >> + return rc; >> + } >> + } >> + >> + entry = kzalloc(sizeof(*entry), GFP_KERNEL); >> + if (!entry) >> + return -ENOMEM; >> + >> + entry->address = rdpas->address; >> + entry->protocol = rdpas->protocol; >> + list_add_tail(&entry->list, &rdpas_rcec->ports); >> + >> + dev_dbg(dev, >> + "RDPAS entry: PCI %04x:%02lx:%02lx.%ld %s CXL addr %016llx\n", >> + rdpas->segment, FIELD_GET(GENMASK(15, 8), rdpas->bdf), >> + FIELD_GET(GENMASK(7, 3), rdpas->bdf), >> + FIELD_GET(GENMASK(2, 0), rdpas->bdf), >> + rdpas->protocol == ACPI_CEDT_RDPAS_PROTOCOL_IO ? >> + "CXL.io" : "CXL.cachemem", >> + rdpas->address); >> + > > This logging will be helpful. > >> + return 0; >> +} >> + >> +static int cxl_parse_rdpas(union acpi_subtable_headers *header, void *arg, >> + const unsigned long end) >> +{ >> + struct acpi_cedt_rdpas *rdpas = (struct acpi_cedt_rdpas *)header; >> + struct device *dev = arg; >> + >> + return __cxl_parse_rdpas(rdpas, dev); >> +} >> + >> +/* >> + * The CEDT is a single static system-wide firmware table, so RDPAS is parsed >> + * exactly once for the lifetime of the kernel. cxl_acpi may probe more than >> + * once (re-bind or multiple ACPI0017), but the global xarray is populated only >> + * on the first call; subsequent calls are no-ops. There is no teardown: the >> + * data describes the platform and remains valid until the kernel exits. >> + */ > > You may want to change the ACPI0017 reference. There is only one ACPI0017 per system. Ok. This is the current deployed implementation right? Nothing stops anyone from deploying multiple ACPI0017 for whatever reason. The gate also prevents multiple parsing when someone load/unload the cxl_acpi module. DJ > > >> +int cxl_rdpas_init(struct device *host) >> +{ >> + if (rdpas_parsed) >> + return 0; >> + >> + rdpas_parsed = true; >> + >> + return acpi_table_parse_cedt(ACPI_CEDT_TYPE_RDPAS, cxl_parse_rdpas, host);> +} >> +EXPORT_SYMBOL_FOR_MODULES(cxl_rdpas_init, "cxl_acpi"); >> + >> +static struct cxl_rdpas_rcec __maybe_unused *cxl_get_rdpas_by_rcec(struct pci_dev *rcec) >> +{ >> + unsigned long index; >> + >> + index = rdpas_index(pci_domain_nr(rcec->bus), rcec->bus->number, >> + PCI_SLOT(rcec->devfn), PCI_FUNC(rcec->devfn)); >> + >> + return xa_load(&cxl_rdpas, index); >> +} >> + >> static bool is_cxl_mem_dev(struct pci_dev *dev) >> { >> /* >> diff --git a/include/cxl/ras.h b/include/cxl/ras.h >> new file mode 100644 >> index 000000000000..661307b0230a >> --- /dev/null >> +++ b/include/cxl/ras.h >> @@ -0,0 +1,18 @@ >> +/* SPDX-License-Identifier: GPL-2.0 */ >> +/* Copyright(c) 2026 Intel Corporation. */ >> + >> +#ifndef __CXL_RAS_H__ >> +#define __CXL_RAS_H__ >> + >> +#include <linux/acpi.h> >> + >> +#ifdef CONFIG_CXL_RAS >> +int cxl_rdpas_init(struct device *host); >> +#else >> +static inline int cxl_rdpas_init(struct device *host) >> +{ >> + return -EOPNOTSUPP; >> +} >> +#endif >> + >> +#endif > > The /include/cxl/ras.h addition will be helpful. I'll use this for > the CXL-AER definitions, instead of /include/linux/aer.h. > > LGTM > > -Terry > ^ permalink raw reply [flat|nested] 13+ messages in thread
* [RFC PATCH v2 2/2] PCI/CXL: Enable usage of RDPAS to shortcut error device discovery 2026-06-18 17:07 [RFC PATCH v2 0/2] PCI/CXL: Add RDPAS support for CXL.io Dave Jiang 2026-06-18 17:07 ` [RFC PATCH v2 1/2] PCI/CXL: Add RDPAS parsing support Dave Jiang @ 2026-06-18 17:07 ` Dave Jiang 2026-06-18 17:20 ` sashiko-bot 2026-06-18 21:26 ` Bowman, Terry 2026-06-18 19:05 ` [RFC PATCH v2 0/2] PCI/CXL: Add RDPAS support for CXL.io Bowman, Terry 2026-06-18 20:21 ` Dan Williams (nvidia) 3 siblings, 2 replies; 13+ messages in thread From: Dave Jiang @ 2026-06-18 17:07 UTC (permalink / raw) To: linux-cxl, linux-pci; +Cc: terry.bowman, bhelgaas, jic23, djbw The RDPAS allows the CXL RCH error handler to find the device directly instead of iterating through a set number of RCiEP in order to discover which device triggered an error. For the CXL.io protocol, the base address provided from the cxl_rdpas xarray points to the RCRB of the device. The RCRB mirrors the configuration space of the device via MMIO. The error handler can walk the RCRB to find the AER capability block and therefore read the root status as well as the error source in order to determine the BDF of the error device. The entries with cxl.cachemem protocol is ignored because the base address provided by the RDPAS structure points to the Component Base Register Base and does not provide a way for th ecode to identify the device that triggered the error. Change the current RCH error handler behavior so it will probe the RCRB first to see if the error device can be discovered quickly before falling back to the current method of iterating through RCiEPs. Signed-off-by: Dave Jiang <dave.jiang@intel.com> --- v2: - Add boundary checks for MMIO reads (sashiko) - Add checks for surprise removal of devices (sashiko) - Use aer_info to also check severity. (Ming) - Update to iterate list of RPs under a RCEC entry. --- drivers/pci/pcie/aer_cxl_rch.c | 152 ++++++++++++++++++++++++++++++++- 1 file changed, 148 insertions(+), 4 deletions(-) diff --git a/drivers/pci/pcie/aer_cxl_rch.c b/drivers/pci/pcie/aer_cxl_rch.c index eaab7698217e..f295e4eefbba 100644 --- a/drivers/pci/pcie/aer_cxl_rch.c +++ b/drivers/pci/pcie/aer_cxl_rch.c @@ -118,7 +118,7 @@ int cxl_rdpas_init(struct device *host) } EXPORT_SYMBOL_FOR_MODULES(cxl_rdpas_init, "cxl_acpi"); -static struct cxl_rdpas_rcec __maybe_unused *cxl_get_rdpas_by_rcec(struct pci_dev *rcec) +static struct cxl_rdpas_rcec *cxl_get_rdpas_by_rcec(struct pci_dev *rcec) { unsigned long index; @@ -166,6 +166,143 @@ static int cxl_rch_handle_error_iter(struct pci_dev *dev, void *data) return 0; } +static u16 rcrb_to_aer(void __iomem *rcrb) +{ + /* + * The extended capability space is SZ_4K and each capability header + * is dword aligned, so the chain can hold at most SZ_4K / 4 entries. + * Bound the walk by that count to avoid spinning on a malformed, + * looping capability list. + */ + int entries = SZ_4K / 4; + u16 offset; + u32 cap_hdr; + + /* Start from PCIe extended capabilities at offset 0x100 */ + offset = PCI_CFG_SPACE_SIZE; + cap_hdr = readl(rcrb + offset); + if (cap_hdr == 0 || PCI_POSSIBLE_ERROR(cap_hdr)) + return 0; + + while (PCI_EXT_CAP_ID(cap_hdr) != PCI_EXT_CAP_ID_ERR) { + if (--entries <= 0) + return 0; + + offset = PCI_EXT_CAP_NEXT(cap_hdr); + if (!offset) + return 0; + + if (offset >= SZ_4K) + return 0; + + cap_hdr = readl(rcrb + offset); + if (cap_hdr == 0 || PCI_POSSIBLE_ERROR(cap_hdr)) + return 0; + } + + return offset; +} + +DEFINE_FREE(iounmap, void __iomem *, if (_T) iounmap(_T)) +static u16 cxl_rch_get_err_src_id(u64 rcrb_base, struct aer_err_info *info) +{ + u32 root_status, err_src; + void __iomem *aer_base; + u16 aer_offset; + + void __iomem *rcrb __free(iounmap) = ioremap(rcrb_base, SZ_4K); + if (!rcrb) + return 0; + + aer_offset = rcrb_to_aer(rcrb); + if (!aer_offset) + return 0; + + aer_base = rcrb + aer_offset; + if (aer_offset + PCI_ERR_ROOT_STATUS + sizeof(u32) > SZ_4K) + return 0; + + root_status = readl(aer_base + PCI_ERR_ROOT_STATUS); + if (!(root_status & (PCI_ERR_ROOT_COR_RCV | PCI_ERR_ROOT_UNCOR_RCV))) + return 0; + + if (aer_offset + PCI_ERR_ROOT_ERR_SRC + sizeof(u32) > SZ_4K) + return 0; + + err_src = readl(aer_base + PCI_ERR_ROOT_ERR_SRC); + + if (info->severity == AER_CORRECTABLE && + root_status & PCI_ERR_ROOT_COR_RCV) + return FIELD_GET(GENMASK(15, 0), err_src); + + /* Assume at this point the info->severity points to UNCOR */ + if (root_status & PCI_ERR_ROOT_UNCOR_RCV) + return FIELD_GET(GENMASK(31, 16), err_src); + + return 0; +} + +static bool cxl_rch_forward_error_by_dsp(struct pci_dev *rcec, u64 rcrb_base, + struct aer_err_info *info) +{ + u8 bus, devfn; + u16 segment; + u16 src_id; + + src_id = cxl_rch_get_err_src_id(rcrb_base, info); + if (!src_id) + return false; + + /* Try uncorrectable error source first, then correctable */ + segment = pci_domain_nr(rcec->bus); + bus = FIELD_GET(GENMASK(15, 8), src_id); + devfn = FIELD_GET(GENMASK(7, 0), src_id); + + struct pci_dev *pdev __free(pci_dev_put) = + pci_get_domain_bus_and_slot(segment, bus, devfn); + if (!pdev) + return false; + + /* + * The error source id resolves to whatever BDF the root port logged, + * which is not guaranteed to be a natively handled CXL.mem device. + * Apply the same gating as the RCiEP walk fallback before forwarding. + */ + if (!is_cxl_mem_dev(pdev) || !cxl_error_is_native(pdev)) + return false; + + cxl_forward_error(pdev, info); + return true; +} + +static bool cxl_rch_handled_error_by_rdpas(struct pci_dev *rcec, + struct aer_err_info *info) +{ + struct cxl_rdpas_rcec *rdpas_rcec; + struct cxl_rdpas_entry *entry; + bool handled = false; + + rdpas_rcec = cxl_get_rdpas_by_rcec(rcec); + if (!rdpas_rcec) + return false; + + /* + * The RCEC aggregates multiple downstream ports. Each CXL.io + * downstream port associated with this RCEC exposes the RCRB at its + * base address; walk them all and forward the error from every port + * that reports a valid error source. + */ + list_for_each_entry(entry, &rdpas_rcec->ports, list) { + if (entry->protocol != ACPI_CEDT_RDPAS_PROTOCOL_IO) + continue; + + if (cxl_rch_forward_error_by_dsp(rcec, entry->address, info)) + handled = true; + } + + return handled; +} + void cxl_rch_handle_error(struct pci_dev *dev, struct aer_err_info *info) { /* @@ -173,9 +310,16 @@ void cxl_rch_handle_error(struct pci_dev *dev, struct aer_err_info *info) * RCH's downstream port. Check and handle them in the CXL.mem * device driver. */ - if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC && - is_aer_internal_error(info)) - pcie_walk_rcec(dev, cxl_rch_handle_error_iter, info); + if (pci_pcie_type(dev) != PCI_EXP_TYPE_RC_EC) + return; + + if (!is_aer_internal_error(info)) + return; + + if (cxl_rch_handled_error_by_rdpas(dev, info)) + return; + + pcie_walk_rcec(dev, cxl_rch_handle_error_iter, info); } static int handles_cxl_error_iter(struct pci_dev *dev, void *data) -- 2.54.0 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [RFC PATCH v2 2/2] PCI/CXL: Enable usage of RDPAS to shortcut error device discovery 2026-06-18 17:07 ` [RFC PATCH v2 2/2] PCI/CXL: Enable usage of RDPAS to shortcut error device discovery Dave Jiang @ 2026-06-18 17:20 ` sashiko-bot 2026-06-18 21:26 ` Bowman, Terry 1 sibling, 0 replies; 13+ messages in thread From: sashiko-bot @ 2026-06-18 17:20 UTC (permalink / raw) To: Dave Jiang; +Cc: linux-pci, linux-cxl Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider: - [High] Incorrect fall-through logic in `cxl_rch_get_err_src_id()` extracts and returns the wrong error source ID for correctable errors. - [High] The RDPAS list is built concurrently without locking and traversed locklessly in the AER error handler, exposing readers to partially built lists or list corruption. - [Low] The function `cxl_forward_error` is invoked but is neither declared nor defined anywhere in the kernel. -- commit e9dabb60b82f98b2295a3d242546827da2279bbf Author: Dave Jiang <dave.jiang@intel.com> PCI/CXL: Enable usage of RDPAS to shortcut error device discovery This commit changes the CXL RCH error handler to probe the RCRB first to see if the error device can be discovered quickly, falling back to iterating through RCiEPs if needed. It introduces RDPAS parsing to directly locate the RCRB and determine the BDF of the error device. diff --git a/drivers/pci/pcie/aer_cxl_rch.c b/drivers/pci/pcie/aer_cxl_rch.c --- a/drivers/pci/pcie/aer_cxl_rch.c +++ b/drivers/pci/pcie/aer_cxl_rch.c [ ... ] > +static u16 cxl_rch_get_err_src_id(u64 rcrb_base, struct aer_err_info *info) > +{ > + u32 root_status, err_src; [ ... ] > + if (info->severity == AER_CORRECTABLE && > + root_status & PCI_ERR_ROOT_COR_RCV) > + return FIELD_GET(GENMASK(15, 0), err_src); > + > + /* Assume at this point the info->severity points to UNCOR */ > + if (root_status & PCI_ERR_ROOT_UNCOR_RCV) > + return FIELD_GET(GENMASK(31, 16), err_src); > + > + return 0; > +} [Severity: High] If this is handling a correctable error but PCI_ERR_ROOT_COR_RCV is not set, is it intentional to fall through to the uncorrectable error check? If an uncorrectable error is concurrently logged in the RCRB status, this would return the source ID of the uncorrectable error instead of 0, which might misroute the correctable error notification. Should these severity checks be mutually exclusive using an else block? > +static bool cxl_rch_forward_error_by_dsp(struct pci_dev *rcec, u64 rcrb_base, > + struct aer_err_info *info) > +{ [ ... ] > + if (!is_cxl_mem_dev(pdev) || !cxl_error_is_native(pdev)) > + return false; > + > + cxl_forward_error(pdev, info); [Severity: Low] Does cxl_forward_error() exist in the kernel tree? I'm having trouble finding its declaration or definition in the codebase, which seems like it might cause an implicit function declaration build failure. > + return true; > +} > + > +static bool cxl_rch_handled_error_by_rdpas(struct pci_dev *rcec, > + struct aer_err_info *info) > +{ > + struct cxl_rdpas_rcec *rdpas_rcec; > + struct cxl_rdpas_entry *entry; > + bool handled = false; > + > + rdpas_rcec = cxl_get_rdpas_by_rcec(rcec); > + if (!rdpas_rcec) > + return false; > + > + /* > + * The RCEC aggregates multiple downstream ports. Each CXL.io > + * downstream port associated with this RCEC exposes the RCRB at its > + * base address; walk them all and forward the error from every port > + * that reports a valid error source. > + */ > + list_for_each_entry(entry, &rdpas_rcec->ports, list) { [Severity: High] Could this lockless list traversal race with the initialization in __cxl_parse_rdpas()? If an asynchronous AER hardware event triggers cxl_rch_handle_error() while __cxl_parse_rdpas() is still running, it seems we might walk rdpas_rcec->ports using list_for_each_entry() while it is concurrently being mutated by list_add_tail(). Since the rdpas_rcec container is published to the cxl_rdpas xarray before the list is fully populated, would this expose readers to partially built lists or use-after-free conditions? Additionally, would the lockless check of rdpas_parsed in cxl_rdpas_init() allow concurrent probes to race and corrupt the list? > + if (entry->protocol != ACPI_CEDT_RDPAS_PROTOCOL_IO) > + continue; > + > + if (cxl_rch_forward_error_by_dsp(rcec, entry->address, info)) > + handled = true; > + } > + > + return handled; > +} -- Sashiko AI review · https://sashiko.dev/#/patchset/20260618170723.2010490-1-dave.jiang@intel.com?part=2 ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH v2 2/2] PCI/CXL: Enable usage of RDPAS to shortcut error device discovery 2026-06-18 17:07 ` [RFC PATCH v2 2/2] PCI/CXL: Enable usage of RDPAS to shortcut error device discovery Dave Jiang 2026-06-18 17:20 ` sashiko-bot @ 2026-06-18 21:26 ` Bowman, Terry 2026-06-18 22:04 ` Dave Jiang 1 sibling, 1 reply; 13+ messages in thread From: Bowman, Terry @ 2026-06-18 21:26 UTC (permalink / raw) To: Dave Jiang, linux-cxl, linux-pci; +Cc: bhelgaas, jic23, djbw On 6/18/2026 12:07 PM, Dave Jiang wrote: > The RDPAS allows the CXL RCH error handler to find the device directly > instead of iterating through a set number of RCiEP in order to discover > which device triggered an error. For the CXL.io protocol, the base > address provided from the cxl_rdpas xarray points to the RCRB of the > device. The RCRB mirrors the configuration space of the device via MMIO. > The error handler can walk the RCRB to find the AER capability block and > therefore read the root status as well as the error source in order > to determine the BDF of the error device. > > The entries with cxl.cachemem protocol is ignored because the base address > provided by the RDPAS structure points to the Component Base Register Base > and does not provide a way for th ecode to identify the device that > triggered the error. > I see. The protocol explanation is here. And cachemem is ignored. > Change the current RCH error handler behavior so it will probe the > RCRB first to see if the error device can be discovered quickly > before falling back to the current method of iterating through RCiEPs. > > Signed-off-by: Dave Jiang <dave.jiang@intel.com> > --- > v2: > - Add boundary checks for MMIO reads (sashiko) > - Add checks for surprise removal of devices (sashiko) > - Use aer_info to also check severity. (Ming) > - Update to iterate list of RPs under a RCEC entry. > --- > drivers/pci/pcie/aer_cxl_rch.c | 152 ++++++++++++++++++++++++++++++++- > 1 file changed, 148 insertions(+), 4 deletions(-) > > diff --git a/drivers/pci/pcie/aer_cxl_rch.c b/drivers/pci/pcie/aer_cxl_rch.c > index eaab7698217e..f295e4eefbba 100644 > --- a/drivers/pci/pcie/aer_cxl_rch.c > +++ b/drivers/pci/pcie/aer_cxl_rch.c > @@ -118,7 +118,7 @@ int cxl_rdpas_init(struct device *host) > } > EXPORT_SYMBOL_FOR_MODULES(cxl_rdpas_init, "cxl_acpi"); > > -static struct cxl_rdpas_rcec __maybe_unused *cxl_get_rdpas_by_rcec(struct pci_dev *rcec) > +static struct cxl_rdpas_rcec *cxl_get_rdpas_by_rcec(struct pci_dev *rcec) > { > unsigned long index; > > @@ -166,6 +166,143 @@ static int cxl_rch_handle_error_iter(struct pci_dev *dev, void *data) > return 0; > } > > +static u16 rcrb_to_aer(void __iomem *rcrb) > +{ > + /* > + * The extended capability space is SZ_4K and each capability header > + * is dword aligned, so the chain can hold at most SZ_4K / 4 entries. > + * Bound the walk by that count to avoid spinning on a malformed, > + * looping capability list. > + */ > + int entries = SZ_4K / 4; > + u16 offset; > + u32 cap_hdr; > + > + /* Start from PCIe extended capabilities at offset 0x100 */ > + offset = PCI_CFG_SPACE_SIZE; > + cap_hdr = readl(rcrb + offset); > + if (cap_hdr == 0 || PCI_POSSIBLE_ERROR(cap_hdr)) > + return 0; > + > + while (PCI_EXT_CAP_ID(cap_hdr) != PCI_EXT_CAP_ID_ERR) { > + if (--entries <= 0) > + return 0; > + > + offset = PCI_EXT_CAP_NEXT(cap_hdr); > + if (!offset) > + return 0; > + > + if (offset >= SZ_4K) > + return 0; > + > + cap_hdr = readl(rcrb + offset); > + if (cap_hdr == 0 || PCI_POSSIBLE_ERROR(cap_hdr)) > + return 0; > + } > + > + return offset; > +} > + > +DEFINE_FREE(iounmap, void __iomem *, if (_T) iounmap(_T)) > +static u16 cxl_rch_get_err_src_id(u64 rcrb_base, struct aer_err_info *info) > +{ > + u32 root_status, err_src; > + void __iomem *aer_base; > + u16 aer_offset; > + > + void __iomem *rcrb __free(iounmap) = ioremap(rcrb_base, SZ_4K); > + if (!rcrb) > + return 0; > + > + aer_offset = rcrb_to_aer(rcrb); > + if (!aer_offset) > + return 0; > + > + aer_base = rcrb + aer_offset; > + if (aer_offset + PCI_ERR_ROOT_STATUS + sizeof(u32) > SZ_4K) > + return 0; > + > + root_status = readl(aer_base + PCI_ERR_ROOT_STATUS); > + if (!(root_status & (PCI_ERR_ROOT_COR_RCV | PCI_ERR_ROOT_UNCOR_RCV))) > + return 0; > + > + if (aer_offset + PCI_ERR_ROOT_ERR_SRC + sizeof(u32) > SZ_4K) > + return 0; > + > + err_src = readl(aer_base + PCI_ERR_ROOT_ERR_SRC); > + > + if (info->severity == AER_CORRECTABLE && > + root_status & PCI_ERR_ROOT_COR_RCV) > + return FIELD_GET(GENMASK(15, 0), err_src); > + > + /* Assume at this point the info->severity points to UNCOR */ > + if (root_status & PCI_ERR_ROOT_UNCOR_RCV) > + return FIELD_GET(GENMASK(31, 16), err_src); > + > + return 0; > +} > + > +static bool cxl_rch_forward_error_by_dsp(struct pci_dev *rcec, u64 rcrb_base, > + struct aer_err_info *info) > +{ > + u8 bus, devfn; > + u16 segment; > + u16 src_id; > + > + src_id = cxl_rch_get_err_src_id(rcrb_base, info); > + if (!src_id) > + return false; > + !src_id (0000:00.0) is valid. May want to use ~0. > + /* Try uncorrectable error source first, then correctable */ > + segment = pci_domain_nr(rcec->bus); > + bus = FIELD_GET(GENMASK(15, 8), src_id); > + devfn = FIELD_GET(GENMASK(7, 0), src_id); > + > + struct pci_dev *pdev __free(pci_dev_put) = > + pci_get_domain_bus_and_slot(segment, bus, devfn); > + if (!pdev) > + return false; > + > + /* > + * The error source id resolves to whatever BDF the root port logged, > + * which is not guaranteed to be a natively handled CXL.mem device. > + * Apply the same gating as the RCiEP walk fallback before forwarding. > + */ > + if (!is_cxl_mem_dev(pdev) || !cxl_error_is_native(pdev)) > + return false; > + > + cxl_forward_error(pdev, info); Ok, I see the tie to kfifo cxl_forward_error(). > + return true; > +} > + > +static bool cxl_rch_handled_error_by_rdpas(struct pci_dev *rcec, > + struct aer_err_info *info) > +{ > + struct cxl_rdpas_rcec *rdpas_rcec; > + struct cxl_rdpas_entry *entry; > + bool handled = false; > + > + rdpas_rcec = cxl_get_rdpas_by_rcec(rcec); > + if (!rdpas_rcec) > + return false; > + > + /* > + * The RCEC aggregates multiple downstream ports. Each CXL.io Maybe: 'The RCEC aggregates multiple downstream ports' errors. Each CXL.io' > + * downstream port associated with this RCEC exposes the RCRB at its > + * base address; walk them all and forward the error from every port > + * that reports a valid error source. > + */ > + list_for_each_entry(entry, &rdpas_rcec->ports, list) { > + if (entry->protocol != ACPI_CEDT_RDPAS_PROTOCOL_IO) > + continue; > + > + if (cxl_rch_forward_error_by_dsp(rcec, entry->address, info)) > + handled = true; > + } > + I was surprised we had to do another list traversal. For some reason I thought RDPAS would gift us an index for direct access to a RCH RCRB. The advantage to RDPAS looks to be the EP doesnt have to be accessed is all. Is that correct? > + return handled; > +} > + > void cxl_rch_handle_error(struct pci_dev *dev, struct aer_err_info *info) > { > /* > @@ -173,9 +310,16 @@ void cxl_rch_handle_error(struct pci_dev *dev, struct aer_err_info *info) > * RCH's downstream port. Check and handle them in the CXL.mem > * device driver. > */ > - if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC && > - is_aer_internal_error(info)) > - pcie_walk_rcec(dev, cxl_rch_handle_error_iter, info); > + if (pci_pcie_type(dev) != PCI_EXP_TYPE_RC_EC) > + return; > + > + if (!is_aer_internal_error(info)) > + return; > + > + if (cxl_rch_handled_error_by_rdpas(dev, info)) > + return; > + > + pcie_walk_rcec(dev, cxl_rch_handle_error_iter, info); > } > > static int handles_cxl_error_iter(struct pci_dev *dev, void *data) Looks good. -Terry ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH v2 2/2] PCI/CXL: Enable usage of RDPAS to shortcut error device discovery 2026-06-18 21:26 ` Bowman, Terry @ 2026-06-18 22:04 ` Dave Jiang 0 siblings, 0 replies; 13+ messages in thread From: Dave Jiang @ 2026-06-18 22:04 UTC (permalink / raw) To: Bowman, Terry, linux-cxl, linux-pci; +Cc: bhelgaas, jic23, djbw On 6/18/26 2:26 PM, Bowman, Terry wrote: > On 6/18/2026 12:07 PM, Dave Jiang wrote: >> The RDPAS allows the CXL RCH error handler to find the device directly >> instead of iterating through a set number of RCiEP in order to discover >> which device triggered an error. For the CXL.io protocol, the base >> address provided from the cxl_rdpas xarray points to the RCRB of the >> device. The RCRB mirrors the configuration space of the device via MMIO. >> The error handler can walk the RCRB to find the AER capability block and >> therefore read the root status as well as the error source in order >> to determine the BDF of the error device. >> >> The entries with cxl.cachemem protocol is ignored because the base address >> provided by the RDPAS structure points to the Component Base Register Base >> and does not provide a way for th ecode to identify the device that >> triggered the error. >> > > I see. The protocol explanation is here. And cachemem is ignored. > >> Change the current RCH error handler behavior so it will probe the >> RCRB first to see if the error device can be discovered quickly >> before falling back to the current method of iterating through RCiEPs. >> >> Signed-off-by: Dave Jiang <dave.jiang@intel.com> >> --- >> v2: >> - Add boundary checks for MMIO reads (sashiko) >> - Add checks for surprise removal of devices (sashiko) >> - Use aer_info to also check severity. (Ming) >> - Update to iterate list of RPs under a RCEC entry. >> --- >> drivers/pci/pcie/aer_cxl_rch.c | 152 ++++++++++++++++++++++++++++++++- >> 1 file changed, 148 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/pci/pcie/aer_cxl_rch.c b/drivers/pci/pcie/aer_cxl_rch.c >> index eaab7698217e..f295e4eefbba 100644 >> --- a/drivers/pci/pcie/aer_cxl_rch.c >> +++ b/drivers/pci/pcie/aer_cxl_rch.c >> @@ -118,7 +118,7 @@ int cxl_rdpas_init(struct device *host) >> } >> EXPORT_SYMBOL_FOR_MODULES(cxl_rdpas_init, "cxl_acpi"); >> >> -static struct cxl_rdpas_rcec __maybe_unused *cxl_get_rdpas_by_rcec(struct pci_dev *rcec) >> +static struct cxl_rdpas_rcec *cxl_get_rdpas_by_rcec(struct pci_dev *rcec) >> { >> unsigned long index; >> >> @@ -166,6 +166,143 @@ static int cxl_rch_handle_error_iter(struct pci_dev *dev, void *data) >> return 0; >> } >> >> +static u16 rcrb_to_aer(void __iomem *rcrb) >> +{ >> + /* >> + * The extended capability space is SZ_4K and each capability header >> + * is dword aligned, so the chain can hold at most SZ_4K / 4 entries. >> + * Bound the walk by that count to avoid spinning on a malformed, >> + * looping capability list. >> + */ >> + int entries = SZ_4K / 4; >> + u16 offset; >> + u32 cap_hdr; >> + >> + /* Start from PCIe extended capabilities at offset 0x100 */ >> + offset = PCI_CFG_SPACE_SIZE; >> + cap_hdr = readl(rcrb + offset); >> + if (cap_hdr == 0 || PCI_POSSIBLE_ERROR(cap_hdr)) >> + return 0; >> + >> + while (PCI_EXT_CAP_ID(cap_hdr) != PCI_EXT_CAP_ID_ERR) { >> + if (--entries <= 0) >> + return 0; >> + >> + offset = PCI_EXT_CAP_NEXT(cap_hdr); >> + if (!offset) >> + return 0; >> + >> + if (offset >= SZ_4K) >> + return 0; >> + >> + cap_hdr = readl(rcrb + offset); >> + if (cap_hdr == 0 || PCI_POSSIBLE_ERROR(cap_hdr)) >> + return 0; >> + } >> + >> + return offset; >> +} >> + >> +DEFINE_FREE(iounmap, void __iomem *, if (_T) iounmap(_T)) >> +static u16 cxl_rch_get_err_src_id(u64 rcrb_base, struct aer_err_info *info) >> +{ >> + u32 root_status, err_src; >> + void __iomem *aer_base; >> + u16 aer_offset; >> + >> + void __iomem *rcrb __free(iounmap) = ioremap(rcrb_base, SZ_4K); >> + if (!rcrb) >> + return 0; >> + >> + aer_offset = rcrb_to_aer(rcrb); >> + if (!aer_offset) >> + return 0; >> + >> + aer_base = rcrb + aer_offset; >> + if (aer_offset + PCI_ERR_ROOT_STATUS + sizeof(u32) > SZ_4K) >> + return 0; >> + >> + root_status = readl(aer_base + PCI_ERR_ROOT_STATUS); >> + if (!(root_status & (PCI_ERR_ROOT_COR_RCV | PCI_ERR_ROOT_UNCOR_RCV))) >> + return 0; >> + >> + if (aer_offset + PCI_ERR_ROOT_ERR_SRC + sizeof(u32) > SZ_4K) >> + return 0; >> + >> + err_src = readl(aer_base + PCI_ERR_ROOT_ERR_SRC); >> + >> + if (info->severity == AER_CORRECTABLE && >> + root_status & PCI_ERR_ROOT_COR_RCV) >> + return FIELD_GET(GENMASK(15, 0), err_src); >> + >> + /* Assume at this point the info->severity points to UNCOR */ >> + if (root_status & PCI_ERR_ROOT_UNCOR_RCV) >> + return FIELD_GET(GENMASK(31, 16), err_src); >> + >> + return 0; >> +} >> + >> +static bool cxl_rch_forward_error_by_dsp(struct pci_dev *rcec, u64 rcrb_base, >> + struct aer_err_info *info) >> +{ >> + u8 bus, devfn; >> + u16 segment; >> + u16 src_id; >> + >> + src_id = cxl_rch_get_err_src_id(rcrb_base, info); >> + if (!src_id) >> + return false; >> + > > !src_id (0000:00.0) is valid. May want to use ~0. > > >> + /* Try uncorrectable error source first, then correctable */ >> + segment = pci_domain_nr(rcec->bus); >> + bus = FIELD_GET(GENMASK(15, 8), src_id); >> + devfn = FIELD_GET(GENMASK(7, 0), src_id); >> + >> + struct pci_dev *pdev __free(pci_dev_put) = >> + pci_get_domain_bus_and_slot(segment, bus, devfn); >> + if (!pdev) >> + return false; >> + >> + /* >> + * The error source id resolves to whatever BDF the root port logged, >> + * which is not guaranteed to be a natively handled CXL.mem device. >> + * Apply the same gating as the RCiEP walk fallback before forwarding. >> + */ >> + if (!is_cxl_mem_dev(pdev) || !cxl_error_is_native(pdev)) >> + return false; >> + >> + cxl_forward_error(pdev, info); > > Ok, I see the tie to kfifo cxl_forward_error(). > >> + return true; >> +} >> + >> +static bool cxl_rch_handled_error_by_rdpas(struct pci_dev *rcec, >> + struct aer_err_info *info) >> +{ >> + struct cxl_rdpas_rcec *rdpas_rcec; >> + struct cxl_rdpas_entry *entry; >> + bool handled = false; >> + >> + rdpas_rcec = cxl_get_rdpas_by_rcec(rcec); >> + if (!rdpas_rcec) >> + return false; >> + >> + /* >> + * The RCEC aggregates multiple downstream ports. Each CXL.io > Maybe: > 'The RCEC aggregates multiple downstream ports' errors. Each CXL.io' > >> + * downstream port associated with this RCEC exposes the RCRB at its >> + * base address; walk them all and forward the error from every port >> + * that reports a valid error source. >> + */ >> + list_for_each_entry(entry, &rdpas_rcec->ports, list) { >> + if (entry->protocol != ACPI_CEDT_RDPAS_PROTOCOL_IO) >> + continue; >> + >> + if (cxl_rch_forward_error_by_dsp(rcec, entry->address, info)) >> + handled = true; >> + } >> + > > I was surprised we had to do another list traversal. For some reason I thought > RDPAS would gift us an index for direct access to a RCH RCRB. The advantage to > RDPAS looks to be the EP doesnt have to be accessed is all. Is that correct? RDPAS entries give us the RCRB for each of the DSP that reports to the RCEC. But if there are multiple devices reports to the RCEC, then we have to go look at all of them to see which one actually flagging error unfortunately. DJ > >> + return handled; >> +} >> + >> void cxl_rch_handle_error(struct pci_dev *dev, struct aer_err_info *info) >> { >> /* >> @@ -173,9 +310,16 @@ void cxl_rch_handle_error(struct pci_dev *dev, struct aer_err_info *info) >> * RCH's downstream port. Check and handle them in the CXL.mem >> * device driver. >> */ >> - if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC && >> - is_aer_internal_error(info)) >> - pcie_walk_rcec(dev, cxl_rch_handle_error_iter, info); >> + if (pci_pcie_type(dev) != PCI_EXP_TYPE_RC_EC) >> + return; >> + >> + if (!is_aer_internal_error(info)) >> + return; >> + >> + if (cxl_rch_handled_error_by_rdpas(dev, info)) >> + return; >> + >> + pcie_walk_rcec(dev, cxl_rch_handle_error_iter, info); >> } >> >> static int handles_cxl_error_iter(struct pci_dev *dev, void *data) > > Looks good. > > -Terry > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH v2 0/2] PCI/CXL: Add RDPAS support for CXL.io 2026-06-18 17:07 [RFC PATCH v2 0/2] PCI/CXL: Add RDPAS support for CXL.io Dave Jiang 2026-06-18 17:07 ` [RFC PATCH v2 1/2] PCI/CXL: Add RDPAS parsing support Dave Jiang 2026-06-18 17:07 ` [RFC PATCH v2 2/2] PCI/CXL: Enable usage of RDPAS to shortcut error device discovery Dave Jiang @ 2026-06-18 19:05 ` Bowman, Terry 2026-06-18 20:12 ` Dave Jiang 2026-06-18 20:21 ` Dan Williams (nvidia) 3 siblings, 1 reply; 13+ messages in thread From: Bowman, Terry @ 2026-06-18 19:05 UTC (permalink / raw) To: Dave Jiang, linux-cxl, linux-pci; +Cc: bhelgaas, jic23, djbw On 6/18/2026 12:07 PM, Dave Jiang wrote: > v2: > - Added multiple DSP per RCEC support. > - Added boundary checks for reading MMIO > - Addressed issues raised by shashiko > - See individual patches for detailed changes > > The series add RCEC Downstream Port Assocation Structure (RDPAS) parsing > support to CXL.io. RDPAS is an ACPI table that is part of the CXL Early > Discovery Table (CEDT) defined in CXL specification r4.0 9.18.1.5. It > provides the mapping between RCEC and downstream ports. With RDPAS, the > error device can be directly found when an error is reported on RCEC, > without walking a number of RCiEP in order to determine which one reported > the error. While CXL.cachemem is supported by RDPAS, there is no easy way > to discover the source id of the error and therefore finding the Linux PCI > object for the RCiEP. The intention here is to accelerate the discovery > of the error by directly locating the error device with the given > information. > > This series is based on top of Terry's CXL error protocol series [1]. > Looking for comments on the series WRT if it makes sense to add on top > of Terry's error handling for RCH/RCD devices. > > [1]: https://lore.kernel.org/linux-cxl/20260505173029.2718246-1-terry.bowman@amd.com/T/#t > I was considering whether this could possibly be reused for other RCEC–RCRB use cases and lifted into the AER core as a generalized solution. However, as far as I know, only CXL aggregates RCiEP reporting using the RCEC SBDF and leveraging RDPAS. I think this will provide a useful alternative to walking the RCEC’s associated RCiEPs and should improve performance with scale. It may be worth clarifying whether this supports both RCH and VH modes. Also consider updating the problem statement to note that the current RCEC RCiEP walk processes the RCH downstream ports of *all* RCiEPs associated with the RCEC. -Terry > > Dave Jiang (2): > PCI/CXL: Add RDPAS parsing support > PCI/CXL: Enable usage of RDPAS to shortcut error device discovery > > drivers/cxl/acpi.c | 5 + > drivers/pci/pcie/aer_cxl_rch.c | 271 ++++++++++++++++++++++++++++++++- > include/cxl/ras.h | 18 +++ > 3 files changed, 291 insertions(+), 3 deletions(-) > create mode 100644 include/cxl/ras.h > > > base-commit: a558d1571c0b3bb6b4a830cb2cd8f128cc5ef3e1 ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH v2 0/2] PCI/CXL: Add RDPAS support for CXL.io 2026-06-18 19:05 ` [RFC PATCH v2 0/2] PCI/CXL: Add RDPAS support for CXL.io Bowman, Terry @ 2026-06-18 20:12 ` Dave Jiang 0 siblings, 0 replies; 13+ messages in thread From: Dave Jiang @ 2026-06-18 20:12 UTC (permalink / raw) To: Bowman, Terry, linux-cxl, linux-pci; +Cc: bhelgaas, jic23, djbw On 6/18/26 12:05 PM, Bowman, Terry wrote: > On 6/18/2026 12:07 PM, Dave Jiang wrote: >> v2: >> - Added multiple DSP per RCEC support. >> - Added boundary checks for reading MMIO >> - Addressed issues raised by shashiko >> - See individual patches for detailed changes >> >> The series add RCEC Downstream Port Assocation Structure (RDPAS) parsing >> support to CXL.io. RDPAS is an ACPI table that is part of the CXL Early >> Discovery Table (CEDT) defined in CXL specification r4.0 9.18.1.5. It >> provides the mapping between RCEC and downstream ports. With RDPAS, the >> error device can be directly found when an error is reported on RCEC, >> without walking a number of RCiEP in order to determine which one reported >> the error. While CXL.cachemem is supported by RDPAS, there is no easy way >> to discover the source id of the error and therefore finding the Linux PCI >> object for the RCiEP. The intention here is to accelerate the discovery >> of the error by directly locating the error device with the given >> information. >> >> This series is based on top of Terry's CXL error protocol series [1]. >> Looking for comments on the series WRT if it makes sense to add on top >> of Terry's error handling for RCH/RCD devices. >> >> [1]: https://lore.kernel.org/linux-cxl/20260505173029.2718246-1-terry.bowman@amd.com/T/#t >> > > I was considering whether this could possibly be reused for other RCEC–RCRB use > cases and lifted into the AER core as a generalized solution. However, as far as I > know, only CXL aggregates RCiEP reporting using the RCEC SBDF and leveraging RDPAS. > > I think this will provide a useful alternative to walking the RCEC’s associated RCiEPs and > should improve performance with scale. > > It may be worth clarifying whether this supports both RCH and VH modes. Also > consider updating the problem statement to note that the current RCEC RCiEP > walk processes the RCH downstream ports of *all* RCiEPs associated with the RCEC. This currently is only the RCH path. Does it really do anything for VH? Unless there are RCiEP devices that are CXL in the future by platform providers? > > -Terry > >> >> Dave Jiang (2): >> PCI/CXL: Add RDPAS parsing support >> PCI/CXL: Enable usage of RDPAS to shortcut error device discovery >> >> drivers/cxl/acpi.c | 5 + >> drivers/pci/pcie/aer_cxl_rch.c | 271 ++++++++++++++++++++++++++++++++- >> include/cxl/ras.h | 18 +++ >> 3 files changed, 291 insertions(+), 3 deletions(-) >> create mode 100644 include/cxl/ras.h >> >> >> base-commit: a558d1571c0b3bb6b4a830cb2cd8f128cc5ef3e1 > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH v2 0/2] PCI/CXL: Add RDPAS support for CXL.io 2026-06-18 17:07 [RFC PATCH v2 0/2] PCI/CXL: Add RDPAS support for CXL.io Dave Jiang ` (2 preceding siblings ...) 2026-06-18 19:05 ` [RFC PATCH v2 0/2] PCI/CXL: Add RDPAS support for CXL.io Bowman, Terry @ 2026-06-18 20:21 ` Dan Williams (nvidia) 2026-06-18 22:36 ` Dave Jiang 3 siblings, 1 reply; 13+ messages in thread From: Dan Williams (nvidia) @ 2026-06-18 20:21 UTC (permalink / raw) To: Dave Jiang, linux-cxl, linux-pci; +Cc: terry.bowman, bhelgaas, jic23, djbw Dave Jiang wrote: > v2: > - Added multiple DSP per RCEC support. > - Added boundary checks for reading MMIO > - Addressed issues raised by shashiko > - See individual patches for detailed changes > > The series add RCEC Downstream Port Assocation Structure (RDPAS) parsing > support to CXL.io. RDPAS is an ACPI table that is part of the CXL Early > Discovery Table (CEDT) defined in CXL specification r4.0 9.18.1.5. It > provides the mapping between RCEC and downstream ports. With RDPAS, the > error device can be directly found when an error is reported on RCEC, > without walking a number of RCiEP in order to determine which one reported > the error. While CXL.cachemem is supported by RDPAS, there is no easy way > to discover the source id of the error and therefore finding the Linux PCI > object for the RCiEP. The intention here is to accelerate the discovery > of the error by directly locating the error device with the given > information. Can you clarify why this "acceleration" matters in practice, especially when Linux needs to be prepared for the RDPAS to not be present? Did RDPAS ever ship on a CXL platform? > This series is based on top of Terry's CXL error protocol series [1]. > Looking for comments on the series WRT if it makes sense to add on top > of Terry's error handling for RCH/RCD devices. What is the correlation with Terry's CXL.cachemem protocol series when this implementation claims to only support CXL.io and Terry's series is about CXL.cachemem protocol errors? ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC PATCH v2 0/2] PCI/CXL: Add RDPAS support for CXL.io 2026-06-18 20:21 ` Dan Williams (nvidia) @ 2026-06-18 22:36 ` Dave Jiang 0 siblings, 0 replies; 13+ messages in thread From: Dave Jiang @ 2026-06-18 22:36 UTC (permalink / raw) To: Dan Williams (nvidia), linux-cxl, linux-pci; +Cc: terry.bowman, bhelgaas, jic23 On 6/18/26 1:21 PM, Dan Williams (nvidia) wrote: > Dave Jiang wrote: >> v2: >> - Added multiple DSP per RCEC support. >> - Added boundary checks for reading MMIO >> - Addressed issues raised by shashiko >> - See individual patches for detailed changes >> >> The series add RCEC Downstream Port Assocation Structure (RDPAS) parsing >> support to CXL.io. RDPAS is an ACPI table that is part of the CXL Early >> Discovery Table (CEDT) defined in CXL specification r4.0 9.18.1.5. It >> provides the mapping between RCEC and downstream ports. With RDPAS, the >> error device can be directly found when an error is reported on RCEC, >> without walking a number of RCiEP in order to determine which one reported >> the error. While CXL.cachemem is supported by RDPAS, there is no easy way >> to discover the source id of the error and therefore finding the Linux PCI >> object for the RCiEP. The intention here is to accelerate the discovery >> of the error by directly locating the error device with the given >> information. > > Can you clarify why this "acceleration" matters in practice, especially > when Linux needs to be prepared for the RDPAS to not be present? Did > RDPAS ever ship on a CXL platform? > >> This series is based on top of Terry's CXL error protocol series [1]. >> Looking for comments on the series WRT if it makes sense to add on top >> of Terry's error handling for RCH/RCD devices. > > What is the correlation with Terry's CXL.cachemem protocol series when > this implementation claims to only support CXL.io and Terry's series is > about CXL.cachemem protocol errors? That's my misunderstanding. For some reason I was thinking all CXL errors are routed through the new mechanism. Given that it really doesn't help .cachemem anyways, I think we can just drop the series as it doesn't really buy us anything. I did discover that the RDPAS structure in ACPI needs to be updated as there was a spec error in r3.0 that was corrected in r4.0 but the linux version is still based on the incorrect one. I've sent a PR [1] to ACPICA to correct that. [1]: https://github.com/acpica/acpica/pull/1201 ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2026-06-18 22:36 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-06-18 17:07 [RFC PATCH v2 0/2] PCI/CXL: Add RDPAS support for CXL.io Dave Jiang 2026-06-18 17:07 ` [RFC PATCH v2 1/2] PCI/CXL: Add RDPAS parsing support Dave Jiang 2026-06-18 17:19 ` sashiko-bot 2026-06-18 21:26 ` Bowman, Terry 2026-06-18 21:57 ` Dave Jiang 2026-06-18 17:07 ` [RFC PATCH v2 2/2] PCI/CXL: Enable usage of RDPAS to shortcut error device discovery Dave Jiang 2026-06-18 17:20 ` sashiko-bot 2026-06-18 21:26 ` Bowman, Terry 2026-06-18 22:04 ` Dave Jiang 2026-06-18 19:05 ` [RFC PATCH v2 0/2] PCI/CXL: Add RDPAS support for CXL.io Bowman, Terry 2026-06-18 20:12 ` Dave Jiang 2026-06-18 20:21 ` Dan Williams (nvidia) 2026-06-18 22:36 ` Dave Jiang
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.