Linux PCI subsystem development
 help / color / mirror / Atom feed
From: sashiko-bot@kernel.org
To: "Srirangan Madhavan" <smadhavan@nvidia.com>
Cc: linux-pci@vger.kernel.org
Subject: Re: [PATCH v6 5/9] cxl/pci: Add CXL DVSEC reset helper
Date: Thu, 28 May 2026 11:05:47 +0000	[thread overview]
Message-ID: <20260528110547.C194E1F00A3A@smtp.kernel.org> (raw)
In-Reply-To: <20260528083154.137979-6-smadhavan@nvidia.com>

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
- [Critical] Device control bits modified during CXL reset are not properly restored on error or success, risking unintended memory clears.
- [High] `cxl_reset_wait_done()` unconditionally sleeps for the maximum reset timeout duration instead of actively polling.
- [High] `cxl_reset_wait_cache_inv()` incorrectly returns success if the device is disconnected and the config space read returns `0xFFFF`.
- [Medium] Missing rollback of `PCI_DVSEC_CXL_INIT_CACHE_WBI` if cache invalidation times out.
--

commit 79504f256cc3acc66c73c075be3fa3a60d62e4c6
Author: Srirangan Madhavan <smadhavan@nvidia.com>

cxl/pci: Add CXL DVSEC reset helper

This patch adds a helper function to execute CXL Reset via the CXL Device DVSEC.
It verifies reset capabilities, manages IOMMU traffic, disables caching, and
polls for cache invalidation and reset completion.

> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> index 01effbb4e7cd1..1dd880f5a333f 100644
> --- a/drivers/cxl/core/pci.c
> +++ b/drivers/cxl/core/pci.c

[ ... ]

> +static int cxl_reset_wait_cache_inv(struct pci_dev *pdev, int dvsec)
> +{
> +	int remaining_us = CXL_RESET_CACHE_WBI_TIMEOUT_US;
> +	u16 status2;
> +	int rc;
> +
> +	do {
> +		usleep_range(CXL_RESET_CACHE_WBI_POLL_US,
> +			     CXL_RESET_CACHE_WBI_POLL_US + 1);
> +		remaining_us -= CXL_RESET_CACHE_WBI_POLL_US;
> +
> +		rc = pci_read_config_word(pdev, dvsec + PCI_DVSEC_CXL_STATUS2,
> +					  &status2);
> +		if (rc)
> +			return rc;
> +
> +		if (status2 & PCI_DVSEC_CXL_CACHE_INV)
> +			return 0;

[Severity: High]
Does this code incorrectly return success if the device is disconnected?

If the device is suddenly removed, pci_read_config_word() can return 0 with
status2 set to 0xFFFF. Since PCI_DVSEC_CXL_CACHE_INV is bit 0, this evaluates
as true, causing the loop to prematurely exit with success.

Furthermore, proceeding with the reset causes the subsequent
cxl_reset_wait_done() function to unconditionally sleep for up to 100 seconds
before realizing the device is gone, leading to an extreme system stall.

> +	} while (remaining_us > 0);
> +
> +	pci_err(pdev, "CXL cache WB+I timed out\n");
> +	return -ETIMEDOUT;
> +}

[ ... ]

> +static int cxl_reset_disable_cache(struct pci_dev *pdev, int dvsec, u16 cap)
> +{
> +	int rc;
> +
> +	if (!(cap & PCI_DVSEC_CXL_CACHE_CAPABLE))
> +		return 0;
> +
> +	rc = cxl_reset_update_ctrl2(pdev, dvsec,
> +				    PCI_DVSEC_CXL_DISABLE_CACHING, 0);
> +	if (rc)
> +		return rc;
> +
> +	if (!(cap & PCI_DVSEC_CXL_CACHE_WBI_CAPABLE))
> +		return 0;
> +
> +	rc = cxl_reset_update_ctrl2(pdev, dvsec,
> +				    PCI_DVSEC_CXL_INIT_CACHE_WBI, 0);
> +	if (rc)
> +		goto err_enable_cache;
> +
> +	rc = cxl_reset_wait_cache_inv(pdev, dvsec);
> +	if (rc)
> +		goto err_enable_cache;
> +
> +	return 0;
> +
> +err_enable_cache:
> +	/*
> +	 * Best effort rollback: preserve the original WB+I failure even if
> +	 * re-enabling CXL.cache also fails.
> +	 */
> +	cxl_reset_enable_cache(pdev, dvsec, cap);
> +	return rc;
> +}

[Severity: Medium]
Does this error path leave PCI_DVSEC_CXL_INIT_CACHE_WBI set?

cxl_reset_enable_cache() only clears PCI_DVSEC_CXL_DISABLE_CACHING. Since
cache writeback and invalidation typically requires a 0-to-1 transition,
leaving PCI_DVSEC_CXL_INIT_CACHE_WBI stuck at 1 might prevent future
invalidation operations from succeeding.

> +static int cxl_reset_wait_done(struct pci_dev *pdev, int dvsec, u16 cap)
> +{
> +	static const u32 reset_timeout_ms[] = { 10, 100, 1000, 10000, 100000 };
> +	u32 timeout_ms;
> +	u16 status2;
> +	int rc, idx;
> +
> +	idx = FIELD_GET(PCI_DVSEC_CXL_RST_TIMEOUT, cap);
> +	if (idx >= ARRAY_SIZE(reset_timeout_ms))
> +		idx = ARRAY_SIZE(reset_timeout_ms) - 1;
> +	timeout_ms = reset_timeout_ms[idx];
> +
> +	msleep(max_t(u32, timeout_ms, CXL_RESET_MIN_QUIET_MS));

[Severity: High]
Does this code unconditionally sleep for the maximum reset timeout duration
instead of actively polling?

The reset_timeout_ms array maxes out at 100,000 ms. Unconditionally sleeping
forces the kernel thread (and any locks held across the reset) to block for
the worst-case time on every reset, causing massive latency and potentially
triggering the kernel's hung task watchdog.

> +	rc = pci_read_config_word(pdev, dvsec + PCI_DVSEC_CXL_STATUS2,
> +				  &status2);

[ ... ]

> +static int __maybe_unused cxl_dev_reset(struct pci_dev *pdev, bool mem_clear)
> +{

[ ... ]

> +	rc = cxl_reset_wait_done(pdev, dvsec, cap);
> +	if (rc)
> +		goto out_iommu;
> +
> +	rc = cxl_reset_update_ctrl2(pdev, dvsec, 0,
> +				    PCI_DVSEC_CXL_DISABLE_CACHING);
> +
> +out_ctrl2:
> +	if (rc && ctrl2_clear)
> +		cxl_reset_update_ctrl2(pdev, dvsec, 0, ctrl2_clear);

[Severity: Critical]
Are the device control bits correctly restored on success and error paths?

On the success path, if cxl_reset_update_ctrl2() returns 0, the condition
(rc && ctrl2_clear) is false and skips clearing the ctrl2_clear bits.
The PCI_DVSEC_CXL_RST_MEM_CLR_EN bit remains set indefinitely.

In subsequent calls to cxl_dev_reset(..., mem_clear = false), the lockless
RMW cycle in cxl_reset_update_ctrl2() to set PCI_DVSEC_CXL_INIT_CXL_RST
preserves the previously leaked PCI_DVSEC_CXL_RST_MEM_CLR_EN bit. This
could trigger an unintended memory-clearing reset, leading to data
destruction.

Additionally, on error paths like when cxl_reset_wait_done() fails, the code
jumps to out_iommu:

	rc = cxl_reset_wait_done(pdev, dvsec, cap);
	if (rc)
		goto out_iommu;

This skips the out_ctrl2 block entirely, permanently leaking
PCI_DVSEC_CXL_DISABLE_CACHING and PCI_DVSEC_CXL_RST_MEM_CLR_EN.

> +out_iommu:
> +	pci_dev_reset_iommu_done(pdev);
> +	return rc;
> +}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260528083154.137979-1-smadhavan@nvidia.com?part=5

  reply	other threads:[~2026-05-28 11:05 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-28  8:31 [PATCH v6 0/9] cxl: Add cxl_reset sysfs attribute for memdevs Srirangan Madhavan
2026-05-28  8:31 ` [PATCH v6 1/9] cxl/hdm: Add helpers to restore and commit memdev decoders Srirangan Madhavan
2026-05-28  9:12   ` sashiko-bot
2026-05-28 11:06   ` Richard Cheng
2026-06-02 18:12     ` Dave Jiang
2026-06-02 18:31   ` Dave Jiang
2026-06-02 20:34   ` Cheatham, Benjamin
2026-06-03 22:35   ` Dan Williams (nvidia)
2026-05-28  8:31 ` [PATCH v6 2/9] PCI: Export pci_dev_save_and_disable() and pci_dev_restore() Srirangan Madhavan
2026-06-02 20:18   ` Dave Jiang
2026-06-03 22:36   ` Dan Williams (nvidia)
2026-05-28  8:31 ` [PATCH v6 3/9] cxl: Add reset-idle and cache flush helpers Srirangan Madhavan
2026-05-28 10:09   ` sashiko-bot
2026-06-02 20:34   ` Cheatham, Benjamin
2026-06-02 20:36   ` Dave Jiang
2026-06-04  2:49   ` Dan Williams (nvidia)
2026-05-28  8:31 ` [PATCH v6 4/9] PCI/CXL: Add sibling function coordination for reset Srirangan Madhavan
2026-05-28 10:41   ` sashiko-bot
2026-05-28 11:15   ` Richard Cheng
2026-06-02 22:10   ` Dave Jiang
2026-06-04  3:13   ` Dan Williams (nvidia)
2026-05-28  8:31 ` [PATCH v6 5/9] cxl/pci: Add CXL DVSEC reset helper Srirangan Madhavan
2026-05-28 11:05   ` sashiko-bot [this message]
2026-06-02 20:34   ` Cheatham, Benjamin
2026-05-28  8:31 ` [PATCH v6 6/9] cxl/pci: Track memdevs affected by CXL reset Srirangan Madhavan
2026-05-28 11:36   ` sashiko-bot
2026-06-02 20:34   ` Cheatham, Benjamin
2026-05-28  8:31 ` [PATCH v6 7/9] cxl/pci: Orchestrate CXL reset for affected memdevs Srirangan Madhavan
2026-05-28 12:25   ` sashiko-bot
2026-06-02 20:34   ` Cheatham, Benjamin
2026-06-04  3:25   ` Dan Williams (nvidia)
2026-05-28  8:31 ` [PATCH v6 8/9] cxl/memdev: Add cxl_reset sysfs attribute Srirangan Madhavan
2026-05-28 13:03   ` sashiko-bot
2026-06-02 21:35   ` Cheatham, Benjamin
2026-06-02 23:50   ` Dave Jiang
2026-05-28  8:31 ` [PATCH v6 9/9] Documentation/ABI: Document CXL memdev cxl_reset Srirangan Madhavan
2026-06-03  0:11   ` Dave Jiang
2026-06-02 20:34 ` [PATCH v6 0/9] cxl: Add cxl_reset sysfs attribute for memdevs Cheatham, Benjamin
2026-06-02 21:42 ` Dan Williams (nvidia)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260528110547.C194E1F00A3A@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    --cc=smadhavan@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox