From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 79B19283FD4 for ; Mon, 18 May 2026 17:38:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779125917; cv=none; b=DXZOVoahrbFpaEfR0u+nIsTFgJ6JCzaPXITc9SUpnPxpiXNGncXG7EhxmuaaZ5hLLxVLZxfFjjLTizGPpMS3cCuW20apr+7doQZAmdj2Qxb1vjhwsUvdtsVq+Y6Zi9uZwdqBNcg6tJfVhSikOqmH9xaydahtFprXo2WTnFjDe7I= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779125917; c=relaxed/simple; bh=ywUrW+cc8wAzj7JWoBL0xhy8ORDq8eQbbNmV1AsmF6Y=; h=From:Subject:To:Cc:In-Reply-To:References:Content-Type:Date: Message-Id; b=SvbOXI/HO5ixV/def/q9+w134wY/UJIN+gtctHcCyeEeE8STYnr5Ftr/xsa5vVj0kgcV0Tm/4VMAsgGRZdZAuUh35GJVJvkQNAwXZznLWuz0jayRVjjDUyXNs06y3/0gbVA8d1qhvYi9h3a9+SvbOdaShSHLQRu2Wi6KdcKc3mo= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=pN0Vj+g3; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="pN0Vj+g3" Received: by smtp.kernel.org (Postfix) with ESMTPSA id C77C3C2BCB7; Mon, 18 May 2026 17:38:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1779125917; bh=ywUrW+cc8wAzj7JWoBL0xhy8ORDq8eQbbNmV1AsmF6Y=; h=From:Subject:Reply-To:To:Cc:In-Reply-To:References:Date:From; b=pN0Vj+g3HQ1EV9rZU6VhweNXRLQTAXz776+0p7O4HHn4e3zqp309WvULpeRMaXIjD JYAMIdM0hifSG2FtRta2MwmWmXUVDzZO/QOP/JnkUOubCtlG0VpVs+OQ1ZoPp5QUzK efaxjpRYb4zhWUACLw9rzDx5N5pBtGOFgpCUARFxK5+46gVKDWsL62p9a18u32K5Mp RPThdlbhtrthti2sq8+8S73fwaFV4CQZGnThOl9jbXeklC1FmjbaJnkGXAwsPEqFXs TjHljHuRs9XLak6q9gZO8SnG1trrnUskBhSIwDuKWP3hf2QQtQZ5tF6reZrQ1WccWB +0VKoUqEbr5pA== From: sashiko-bot@kernel.org Subject: Re: [PATCH v8 4/5] PCI: qcom: Add support for resetting the Root Port due to link down event Reply-To: sashiko-reviews@lists.linux.dev To: "Manivannan Sadhasivam via B4 Relay" Cc: linux-pci@vger.kernel.org In-Reply-To: <20260518-pci-port-reset-v8-4-eb5a7d331dfc@oss.qualcomm.com> References: <20260518-pci-port-reset-v8-4-eb5a7d331dfc@oss.qualcomm.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Date: Mon, 18 May 2026 17:38:36 +0000 Message-Id: <20260518173836.C77C3C2BCB7@smtp.kernel.org> Precedence: bulk X-Mailing-List: linux-pci@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Thank you for your contribution! Sashiko AI review found 4 potential issue(= s) to consider: - [Critical] System crash on error path due to unclocked controller. - [High] Unintended MSI interception by the global IRQ handler causes sever= e performance regressions or silently drops events. - [High] Data race and use-after-free via lockless iteration of PCI bus dev= ices in IRQ thread. - [Medium] Ignored return value masks link retraining failures. -- commit f4a6523af4f70d1a2c5951a7190c4516cab3c8fc Author: Manivannan Sadhasivam PCI: qcom: Add support for resetting the Root Port due to link down event This commit detects the PCIe link down event and resets the Root Port by first resetting the PCIe core, reinitializing resources, and restarting the link to make it operational again. > diff --git a/drivers/pci/controller/dwc/pcie-qcom.c b/drivers/pci/control= ler/dwc/pcie-qcom.c > index af6bf5cce65be..feda8abf5f851 100644 > --- a/drivers/pci/controller/dwc/pcie-qcom.c > +++ b/drivers/pci/controller/dwc/pcie-qcom.c [ ... ] > @@ -1613,6 +1633,78 @@ static void qcom_pcie_icc_opp_update(struct qcom_p= cie *pcie) > +static int qcom_pcie_reset_root_port(struct pci_host_bridge *bridge, > + struct pci_dev *pdev) > +{ [ ... ] > + ret =3D dw_pcie_setup_rc(pp); > + if (ret) > + goto err_host_deinit; > + > + /* > + * Re-enable global IRQ events as the PARF_INT_ALL_MASK register is > + * non-sticky. > + */ > + if (pcie->global_irq) > + writel_relaxed(PARF_INT_ALL_LINK_DOWN | PARF_INT_MSI_DEV_0_7, > + pcie->parf + PARF_INT_ALL_MASK); > + > + qcom_pcie_start_link(pci); > + dw_pcie_wait_for_link(pci); > + > + dev_dbg(dev, "Root Port reset completed\n"); > + > + return 0; Does dw_pcie_wait_for_link() return an error code (such as -ETIMEDOUT) if t= he link fails to retrain? Since this return value is ignored and the function unconditionally returns= 0, the PCI core is falsely informed that the recovery was successful. Could this prevent the PCI subsystem from properly disconnecting the downst= ream devices if the link fails to come up, leaving zombie device structures that cannot communicate? > + > +err_host_deinit: > + qcom_pcie_host_deinit(pp); > + > + return ret; > +} If an error occurs earlier in this function and the code jumps to err_host_deinit, qcom_pcie_host_deinit() will be called, which turns off the PHY power, regulators, and PCIe clocks. When this function returns the error code back to the PCI core, the PCI core handles the reset failure by returning PCI_ERS_RESULT_DISCONNECT and triggering device removal. Since the PCI core and userspace tools might attempt to read the devices' configuration space during unbind, and the clocks are now disabled, will th= ese MMIO accesses cause a synchronous external abort and a kernel panic? Should the controller remain clocked on the error path so it can safely rej= ect config accesses with 0xFFFFFFFF? > @@ -1650,6 +1742,27 @@ static void qcom_pcie_init_debugfs(struct qcom_pci= e *pcie) > +static irqreturn_t qcom_pcie_global_irq_thread(int irq, void *data) > +{ > + struct qcom_pcie *pcie =3D data; > + struct dw_pcie_rp *pp =3D &pcie->pci->pp; > + struct device *dev =3D pcie->pci->dev; > + struct pci_dev *port; > + unsigned long status =3D readl_relaxed(pcie->parf + PARF_INT_ALL_STATUS= ); > + > + writel_relaxed(status, pcie->parf + PARF_INT_ALL_CLEAR); > + In qcom_pcie_reset_root_port() and qcom_pcie_probe(), PARF_INT_MSI_DEV_0_7 is unmasked in PARF_INT_ALL_MASK. By explicitly unmasking this bit alongside PARF_INT_ALL_LINK_DOWN, does this cause MSI interrupts to be incorrectly routed to this global IRQ handler? The threaded handler reads the status and writes it to PARF_INT_ALL_CLEAR, but ignores the MSI bits and unconditionally returns IRQ_HANDLED. Could this lead to silent dropping of MSI interrupts if the Synopsys Design= Ware core relies on this status being preserved? Also, for high-frequency interrupt sources, would every MSI unnecessarily w= ake up qcom_pcie_global_irq_thread, causing excessive CPU overhead? > + if (test_and_clear_bit(INT_ALL_LINK_DOWN, &status)) { > + dev_dbg(dev, "Received Link down event\n"); > + for_each_pci_bridge(port, pp->bridge->bus) { > + if (pci_pcie_type(port) =3D=3D PCI_EXP_TYPE_ROOT_PORT) > + pci_host_handle_link_down(port); > + } > + } Does iterating over the root bus devices using for_each_pci_bridge() in an asynchronous threaded IRQ context require holding the pci_bus_sem read-write semaphore? Without locking, could this iteration race with concurrent PCI topology changes like hotplug events, AER recoveries, or driver unbinds, potentially leading to list corruption or a use-after-free if a device structure is freed while the loop is executing? Would it be safer to use pci_walk_bus() or explicitly acquire down_read(&pci_bus_sem)? > + > + return IRQ_HANDLED; > +} --=20 Sashiko AI review =C2=B7 https://sashiko.dev/#/patchset/20260518-pci-port-r= eset-v8-0-eb5a7d331dfc@oss.qualcomm.com?part=3D4