From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EF86B3E3C72 for ; Thu, 28 May 2026 12:25:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779971111; cv=none; b=qPgjGlrFD6cP3ZbCXqRA4gYHnCETT1oQT1vrZSK9XvMWTtt3kLsx9QwP47IgAi1965Ej7QgtNInk/3VITtDlOZ+vMblA0mr0ecW2m/pzRIpqN3JpJMBukrtKCA4LAga6G6/6GKSpGTh4KyPud+L1osymzynZW315yX0eCpKRfjg= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779971111; c=relaxed/simple; bh=SZzyV6zVpfcAUlErInHgwGuTjTzARo3+txHuiCbZ0AA=; h=From:Subject:To:Cc:In-Reply-To:References:Content-Type:Date: Message-Id; b=bMVOkLTSNQXjOHvYjl9PJB32esFMdKrgXscIkBl6UdhSduS3c3/XxuOTbuixi+XDc5D4wGDcKDShxIr9aCx7wsOfWb2kUI9dUmpyhQsydKVkOp0hAJpSg/pH2yeQZbbeVfOY+LFVxkxjSlw39DrxM00IOWczdBKBXaMNcLR4eDE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=iYosvVxd; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="iYosvVxd" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 899271F00A3A; Thu, 28 May 2026 12:25:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1779971109; bh=VlV2+U7Meti+TsbG8mNI6A6wt6Uxrmad1CHHgXpKHfA=; h=From:Subject:Reply-To:To:Cc:In-Reply-To:References:Date; b=iYosvVxd99lBrNZe0lrEY+1lzSdKkcae/Gcmn5rjh2gCdlqauYMb9BvuHSEZqBbgX qPTAMDrttHqnQBGoDHmnWnDd1340XuIqHvEF1pIfMTGpg6mR3zTM4diNul/DIGSKjp T6LfwkAUosbcN/y9JwGgpZ2qrEQcA3dyldvkf/kd3FF08BPbfy4EOodkhZB0Dh9/cn JeKCgmxibz82AyN60kyc6dG+yTPWqkOafuIltpd9oMVhd0jfYnIs4an1PLth+Vlxjd xHpcYhkrpYLIYrydAZ8WuuJtkOs7YOhX3+RF/Pp5GBAlV12QaRopHnniLEYq2mExsN uGg4B0lA8O4tQ== From: sashiko-bot@kernel.org Subject: Re: [PATCH v6 7/9] cxl/pci: Orchestrate CXL reset for affected memdevs Reply-To: sashiko-reviews@lists.linux.dev To: "Srirangan Madhavan" Cc: linux-pci@vger.kernel.org In-Reply-To: <20260528083154.137979-8-smadhavan@nvidia.com> References: <20260528083154.137979-8-smadhavan@nvidia.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Date: Thu, 28 May 2026 12:25:08 +0000 Message-Id: <20260528122509.899271F00A3A@smtp.kernel.org> Precedence: bulk X-Mailing-List: linux-pci@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Thank you for your contribution! Sashiko AI review found 4 potential issue(= s) to consider: - [High] cxl_reset_disable_memdevs() fails to update cxlds->media_ready to = false when disabling CXL.mem. - [High] Missing wait for pending DMA transactions before blocking the IOMM= U for sibling PCI functions. - [High] Early PCI config state restore re-enables bus mastering while the = IOMMU is still blocked. - [Medium] Mixing goto-based error handling and scope-based cleanup helpers= in the same function. -- commit 1e8789977f0aca50924f7c6f37e03c4dc2387cd7 Author: Srirangan Madhavan cxl/pci: Orchestrate CXL reset for affected memdevs This commit adds a reset flow for CXL devices that coordinates the target function, sibling functions, and active memdevs. It collects regions, flushes caches, saves and disables functions, and then executes the CXL reset before restoring PCI config space and re-enabling CXL.mem. > diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c > index c755c18c8d84a..486c447e98f3f 100644 > --- a/drivers/cxl/core/pci.c > +++ b/drivers/cxl/core/pci.c [ ... ] > +static int cxl_pci_functions_reset_prepare(struct cxl_reset_context *ctx) > +{ > + int rc, i; > + > + ctx->nr_siblings_prepared =3D 0; > + > + for (i =3D 0; i < ctx->nr_siblings_locked; i++) { > + struct pci_dev *sibling =3D ctx->siblings[i]; > + > pci_dev_save_and_disable(sibling); > rc =3D pci_dev_reset_iommu_prepare(sibling); [Severity: High] Is it safe to block the IOMMU here without waiting for in-flight DMA transactions to drain? pci_dev_save_and_disable() disables bus mastering, but any transactions already in flight might be dropped by the IOMMU and raise AERs (Advanced Error Reporting faults) if we don't wait for them. The target device correctly drains its transactions using pci_wait_for_pending_transaction(), could we do the same for the sibling functions here? [ ... ] > +static void cxl_pci_functions_restore_state(struct cxl_reset_context *ct= x) > +{ > + int i; > + > + for (i =3D ctx->nr_siblings_prepared - 1; i >=3D 0; i--) > + pci_restore_state(ctx->siblings[i]); > +} [Severity: High] Does restoring the PCI config space here re-enable bus mastering while the IOMMU is still actively blocking requests? pci_restore_state() restores the entire saved PCI config space, including the PCI_COMMAND register with the PCI_COMMAND_MASTER bit set. Since the IOMMU is not unblocked until pci_dev_reset_iommu_done() runs later during cleanup, could this allow the device to issue DMAs (or MSI/MSI-X interrupts) that will trigger IOMMU faults and AERs? [ ... ] > +static void cxl_reset_disable_memdevs(struct cxl_reset_context *ctx) > +{ > + int rc, i; > + > + for (i =3D ctx->nr_memdevs - 1; i >=3D 0; i--) { > + struct cxl_memdev *cxlmd =3D ctx->memdevs[i].cxlmd; > + > + if (!ctx->memdevs[i].active) > + continue; > + > + rc =3D cxl_set_mem_enable(cxlmd->cxlds, 0); [Severity: High] Could this leave the device state indicating the media is active when physical memory decode has been disabled? cxl_set_mem_enable() disables CXL.mem on the device, but cxlds->media_ready isn't updated to false here. If the system subsequently attempts to access this memory, could it result in unhandled unsupported requests (UR) and trigger fatal bus errors? [ ... ] > +static int __maybe_unused cxl_do_reset(struct pci_dev *pdev, bool mem_cl= ear) > +{ > + struct cxl_reset_context ctx =3D { > + .target =3D pdev, > + }; > + int rc; > + > + /* > + * Snapshot the CXL r3.2 9.7 device reset scope before taking > + * cxl_rwsem.region. Hot-added functions after this point are not > + * coordinated by this reset operation. > + */ > + rc =3D cxl_reset_collect_siblings(&ctx); > + if (rc) > + goto out; > + > + rc =3D cxl_reset_collect_memdevs(&ctx); > + if (rc) > + goto out; > + > + scoped_guard(rwsem_write, &cxl_rwsem.region) > + rc =3D cxl_do_reset_locked(&ctx, mem_clear); > + > +out: > + cxl_reset_context_destroy(&ctx); > return rc; > } [Severity: Medium] Is it intended to mix goto-based error handling and scope-based cleanup helpers in the same function? This function uses "goto out" for early error paths, but then uses scoped_guard() below. The cleanup subsystem guidelines in include/linux/cleanup.h state that goto and cleanup helpers should never be mixed in the same function, as it can create confusing ownership semantics and increase the risk of double-free or resource leak bugs. --=20 Sashiko AI review =C2=B7 https://sashiko.dev/#/patchset/20260528083154.1379= 79-1-smadhavan@nvidia.com?part=3D7