From: sashiko-bot@kernel.org
To: "Dave Jiang" <dave.jiang@intel.com>
Cc: linux-pci@vger.kernel.org, linux-cxl@vger.kernel.org
Subject: Re: [RFC PATCH v2 2/2] PCI/CXL: Enable usage of RDPAS to shortcut error device discovery
Date: Thu, 18 Jun 2026 17:20:46 +0000 [thread overview]
Message-ID: <20260618172047.535241F000E9@smtp.kernel.org> (raw)
In-Reply-To: <20260618170723.2010490-3-dave.jiang@intel.com>
Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [High] Incorrect fall-through logic in `cxl_rch_get_err_src_id()` extracts and returns the wrong error source ID for correctable errors.
- [High] The RDPAS list is built concurrently without locking and traversed locklessly in the AER error handler, exposing readers to partially built lists or list corruption.
- [Low] The function `cxl_forward_error` is invoked but is neither declared nor defined anywhere in the kernel.
--
commit e9dabb60b82f98b2295a3d242546827da2279bbf
Author: Dave Jiang <dave.jiang@intel.com>
PCI/CXL: Enable usage of RDPAS to shortcut error device discovery
This commit changes the CXL RCH error handler to probe the RCRB first to
see if the error device can be discovered quickly, falling back to iterating
through RCiEPs if needed. It introduces RDPAS parsing to directly locate the
RCRB and determine the BDF of the error device.
diff --git a/drivers/pci/pcie/aer_cxl_rch.c b/drivers/pci/pcie/aer_cxl_rch.c
--- a/drivers/pci/pcie/aer_cxl_rch.c
+++ b/drivers/pci/pcie/aer_cxl_rch.c
[ ... ]
> +static u16 cxl_rch_get_err_src_id(u64 rcrb_base, struct aer_err_info *info)
> +{
> + u32 root_status, err_src;
[ ... ]
> + if (info->severity == AER_CORRECTABLE &&
> + root_status & PCI_ERR_ROOT_COR_RCV)
> + return FIELD_GET(GENMASK(15, 0), err_src);
> +
> + /* Assume at this point the info->severity points to UNCOR */
> + if (root_status & PCI_ERR_ROOT_UNCOR_RCV)
> + return FIELD_GET(GENMASK(31, 16), err_src);
> +
> + return 0;
> +}
[Severity: High]
If this is handling a correctable error but PCI_ERR_ROOT_COR_RCV is not set,
is it intentional to fall through to the uncorrectable error check?
If an uncorrectable error is concurrently logged in the RCRB status, this
would return the source ID of the uncorrectable error instead of 0, which
might misroute the correctable error notification.
Should these severity checks be mutually exclusive using an else block?
> +static bool cxl_rch_forward_error_by_dsp(struct pci_dev *rcec, u64 rcrb_base,
> + struct aer_err_info *info)
> +{
[ ... ]
> + if (!is_cxl_mem_dev(pdev) || !cxl_error_is_native(pdev))
> + return false;
> +
> + cxl_forward_error(pdev, info);
[Severity: Low]
Does cxl_forward_error() exist in the kernel tree?
I'm having trouble finding its declaration or definition in the codebase,
which seems like it might cause an implicit function declaration build failure.
> + return true;
> +}
> +
> +static bool cxl_rch_handled_error_by_rdpas(struct pci_dev *rcec,
> + struct aer_err_info *info)
> +{
> + struct cxl_rdpas_rcec *rdpas_rcec;
> + struct cxl_rdpas_entry *entry;
> + bool handled = false;
> +
> + rdpas_rcec = cxl_get_rdpas_by_rcec(rcec);
> + if (!rdpas_rcec)
> + return false;
> +
> + /*
> + * The RCEC aggregates multiple downstream ports. Each CXL.io
> + * downstream port associated with this RCEC exposes the RCRB at its
> + * base address; walk them all and forward the error from every port
> + * that reports a valid error source.
> + */
> + list_for_each_entry(entry, &rdpas_rcec->ports, list) {
[Severity: High]
Could this lockless list traversal race with the initialization in
__cxl_parse_rdpas()?
If an asynchronous AER hardware event triggers cxl_rch_handle_error() while
__cxl_parse_rdpas() is still running, it seems we might walk rdpas_rcec->ports
using list_for_each_entry() while it is concurrently being mutated by
list_add_tail().
Since the rdpas_rcec container is published to the cxl_rdpas xarray before
the list is fully populated, would this expose readers to partially built
lists or use-after-free conditions? Additionally, would the lockless check
of rdpas_parsed in cxl_rdpas_init() allow concurrent probes to race and
corrupt the list?
> + if (entry->protocol != ACPI_CEDT_RDPAS_PROTOCOL_IO)
> + continue;
> +
> + if (cxl_rch_forward_error_by_dsp(rcec, entry->address, info))
> + handled = true;
> + }
> +
> + return handled;
> +}
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260618170723.2010490-1-dave.jiang@intel.com?part=2
next prev parent reply other threads:[~2026-06-18 17:20 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-18 17:07 [RFC PATCH v2 0/2] PCI/CXL: Add RDPAS support for CXL.io Dave Jiang
2026-06-18 17:07 ` [RFC PATCH v2 1/2] PCI/CXL: Add RDPAS parsing support Dave Jiang
2026-06-18 17:19 ` sashiko-bot
2026-06-18 21:26 ` Bowman, Terry
2026-06-18 21:57 ` Dave Jiang
2026-06-18 17:07 ` [RFC PATCH v2 2/2] PCI/CXL: Enable usage of RDPAS to shortcut error device discovery Dave Jiang
2026-06-18 17:20 ` sashiko-bot [this message]
2026-06-18 21:26 ` Bowman, Terry
2026-06-18 22:04 ` Dave Jiang
2026-06-18 19:05 ` [RFC PATCH v2 0/2] PCI/CXL: Add RDPAS support for CXL.io Bowman, Terry
2026-06-18 20:12 ` Dave Jiang
2026-06-18 20:21 ` Dan Williams (nvidia)
2026-06-18 22:36 ` Dave Jiang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260618172047.535241F000E9@smtp.kernel.org \
--to=sashiko-bot@kernel.org \
--cc=dave.jiang@intel.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=sashiko-reviews@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.