All of lore.kernel.org
 help / color / mirror / Atom feed
From: sashiko-bot@kernel.org
To: "Srirangan Madhavan" <smadhavan@nvidia.com>
Cc: linux-pci@vger.kernel.org
Subject: Re: [PATCH v6 7/9] cxl/pci: Orchestrate CXL reset for affected memdevs
Date: Thu, 28 May 2026 12:25:08 +0000	[thread overview]
Message-ID: <20260528122509.899271F00A3A@smtp.kernel.org> (raw)
In-Reply-To: <20260528083154.137979-8-smadhavan@nvidia.com>

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
- [High] cxl_reset_disable_memdevs() fails to update cxlds->media_ready to false when disabling CXL.mem.
- [High] Missing wait for pending DMA transactions before blocking the IOMMU for sibling PCI functions.
- [High] Early PCI config state restore re-enables bus mastering while the IOMMU is still blocked.
- [Medium] Mixing goto-based error handling and scope-based cleanup helpers in the same function.
--

commit 1e8789977f0aca50924f7c6f37e03c4dc2387cd7
Author: Srirangan Madhavan <smadhavan@nvidia.com>

cxl/pci: Orchestrate CXL reset for affected memdevs

This commit adds a reset flow for CXL devices that coordinates the target
function, sibling functions, and active memdevs. It collects regions,
flushes caches, saves and disables functions, and then executes the CXL
reset before restoring PCI config space and re-enabling CXL.mem.

> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> index c755c18c8d84a..486c447e98f3f 100644
> --- a/drivers/cxl/core/pci.c
> +++ b/drivers/cxl/core/pci.c

[ ... ]

> +static int cxl_pci_functions_reset_prepare(struct cxl_reset_context *ctx)
> +{
> +	int rc, i;
> +
> +	ctx->nr_siblings_prepared = 0;
> +
> +	for (i = 0; i < ctx->nr_siblings_locked; i++) {
> +		struct pci_dev *sibling = ctx->siblings[i];
> +
>  		pci_dev_save_and_disable(sibling);
>  		rc = pci_dev_reset_iommu_prepare(sibling);

[Severity: High]
Is it safe to block the IOMMU here without waiting for in-flight DMA
transactions to drain?

pci_dev_save_and_disable() disables bus mastering, but any transactions
already in flight might be dropped by the IOMMU and raise AERs (Advanced
Error Reporting faults) if we don't wait for them.

The target device correctly drains its transactions using
pci_wait_for_pending_transaction(), could we do the same for the sibling
functions here?

[ ... ]

> +static void cxl_pci_functions_restore_state(struct cxl_reset_context *ctx)
> +{
> +	int i;
> +
> +	for (i = ctx->nr_siblings_prepared - 1; i >= 0; i--)
> +		pci_restore_state(ctx->siblings[i]);
> +}

[Severity: High]
Does restoring the PCI config space here re-enable bus mastering while the
IOMMU is still actively blocking requests?

pci_restore_state() restores the entire saved PCI config space, including
the PCI_COMMAND register with the PCI_COMMAND_MASTER bit set.

Since the IOMMU is not unblocked until pci_dev_reset_iommu_done() runs
later during cleanup, could this allow the device to issue DMAs (or
MSI/MSI-X interrupts) that will trigger IOMMU faults and AERs?

[ ... ]

> +static void cxl_reset_disable_memdevs(struct cxl_reset_context *ctx)
> +{
> +	int rc, i;
> +
> +	for (i = ctx->nr_memdevs - 1; i >= 0; i--) {
> +		struct cxl_memdev *cxlmd = ctx->memdevs[i].cxlmd;
> +
> +		if (!ctx->memdevs[i].active)
> +			continue;
> +
> +		rc = cxl_set_mem_enable(cxlmd->cxlds, 0);

[Severity: High]
Could this leave the device state indicating the media is active when
physical memory decode has been disabled?

cxl_set_mem_enable() disables CXL.mem on the device, but
cxlds->media_ready isn't updated to false here. If the system subsequently
attempts to access this memory, could it result in unhandled unsupported
requests (UR) and trigger fatal bus errors?

[ ... ]

> +static int __maybe_unused cxl_do_reset(struct pci_dev *pdev, bool mem_clear)
> +{
> +	struct cxl_reset_context ctx = {
> +		.target = pdev,
> +	};
> +	int rc;
> +
> +	/*
> +	 * Snapshot the CXL r3.2 9.7 device reset scope before taking
> +	 * cxl_rwsem.region. Hot-added functions after this point are not
> +	 * coordinated by this reset operation.
> +	 */
> +	rc = cxl_reset_collect_siblings(&ctx);
> +	if (rc)
> +		goto out;
> +
> +	rc = cxl_reset_collect_memdevs(&ctx);
> +	if (rc)
> +		goto out;
> +
> +	scoped_guard(rwsem_write, &cxl_rwsem.region)
> +		rc = cxl_do_reset_locked(&ctx, mem_clear);
> +
> +out:
> +	cxl_reset_context_destroy(&ctx);
>  	return rc;
>  }

[Severity: Medium]
Is it intended to mix goto-based error handling and scope-based cleanup
helpers in the same function?

This function uses "goto out" for early error paths, but then uses
scoped_guard() below. The cleanup subsystem guidelines in
include/linux/cleanup.h state that goto and cleanup helpers should never be
mixed in the same function, as it can create confusing ownership semantics
and increase the risk of double-free or resource leak bugs.

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260528083154.137979-1-smadhavan@nvidia.com?part=7

  reply	other threads:[~2026-05-28 12:25 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-28  8:31 [PATCH v6 0/9] cxl: Add cxl_reset sysfs attribute for memdevs Srirangan Madhavan
2026-05-28  8:31 ` [PATCH v6 1/9] cxl/hdm: Add helpers to restore and commit memdev decoders Srirangan Madhavan
2026-05-28  9:12   ` sashiko-bot
2026-05-28 11:06   ` Richard Cheng
2026-06-02 18:12     ` Dave Jiang
2026-06-02 18:31   ` Dave Jiang
2026-06-02 20:34   ` Cheatham, Benjamin
2026-06-03 22:35   ` Dan Williams (nvidia)
2026-05-28  8:31 ` [PATCH v6 2/9] PCI: Export pci_dev_save_and_disable() and pci_dev_restore() Srirangan Madhavan
2026-06-02 20:18   ` Dave Jiang
2026-06-03 22:36   ` Dan Williams (nvidia)
2026-05-28  8:31 ` [PATCH v6 3/9] cxl: Add reset-idle and cache flush helpers Srirangan Madhavan
2026-05-28 10:09   ` sashiko-bot
2026-06-02 20:34   ` Cheatham, Benjamin
2026-06-02 20:36   ` Dave Jiang
2026-06-04  2:49   ` Dan Williams (nvidia)
2026-05-28  8:31 ` [PATCH v6 4/9] PCI/CXL: Add sibling function coordination for reset Srirangan Madhavan
2026-05-28 10:41   ` sashiko-bot
2026-05-28 11:15   ` Richard Cheng
2026-06-02 22:10   ` Dave Jiang
2026-06-04  3:13   ` Dan Williams (nvidia)
2026-05-28  8:31 ` [PATCH v6 5/9] cxl/pci: Add CXL DVSEC reset helper Srirangan Madhavan
2026-05-28 11:05   ` sashiko-bot
2026-06-02 20:34   ` Cheatham, Benjamin
2026-05-28  8:31 ` [PATCH v6 6/9] cxl/pci: Track memdevs affected by CXL reset Srirangan Madhavan
2026-05-28 11:36   ` sashiko-bot
2026-06-02 20:34   ` Cheatham, Benjamin
2026-05-28  8:31 ` [PATCH v6 7/9] cxl/pci: Orchestrate CXL reset for affected memdevs Srirangan Madhavan
2026-05-28 12:25   ` sashiko-bot [this message]
2026-06-02 20:34   ` Cheatham, Benjamin
2026-06-04  3:25   ` Dan Williams (nvidia)
2026-05-28  8:31 ` [PATCH v6 8/9] cxl/memdev: Add cxl_reset sysfs attribute Srirangan Madhavan
2026-05-28 13:03   ` sashiko-bot
2026-06-02 21:35   ` Cheatham, Benjamin
2026-06-02 23:50   ` Dave Jiang
2026-05-28  8:31 ` [PATCH v6 9/9] Documentation/ABI: Document CXL memdev cxl_reset Srirangan Madhavan
2026-06-03  0:11   ` Dave Jiang
2026-06-02 20:34 ` [PATCH v6 0/9] cxl: Add cxl_reset sysfs attribute for memdevs Cheatham, Benjamin
2026-06-02 21:42 ` Dan Williams (nvidia)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260528122509.899271F00A3A@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    --cc=smadhavan@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.