From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 4870ACD8CB2 for ; Wed, 10 Jun 2026 07:24:58 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 9525710E787; Wed, 10 Jun 2026 07:24:57 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=kernel.org header.i=@kernel.org header.b="aEbg9u4s"; dkim-atps=neutral Received: from tor.source.kernel.org (tor.source.kernel.org [172.105.4.254]) by gabe.freedesktop.org (Postfix) with ESMTPS id 3C2F210E786 for ; Wed, 10 Jun 2026 07:24:56 +0000 (UTC) Received: from smtp.kernel.org (quasi.space.kernel.org [100.103.45.18]) by tor.source.kernel.org (Postfix) with ESMTP id 6ED0D602CC; Wed, 10 Jun 2026 07:24:55 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id F2DEB1F00893; Wed, 10 Jun 2026 07:24:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1781076295; bh=OR9YiHFTMqfz/UlYE227FUGVYjeC94KEIOJ0d+Lo5LI=; h=From:Subject:Reply-To:To:Cc:In-Reply-To:References:Date; b=aEbg9u4srZKl3zNrwGPv0WQrP3XA4+mNMLZDPlkEOaRyWI8oxkASUc9pJc2x3/Vl/ LTyQDPQGJydL1LNGFb8pYAPYr8faBwOFXngkK7mN5ZSCWI6tAwP3+F0i2bBXKCWGj+ kOpQ+KNeAohARYA/Dm0Zsalmrpk2f95iwcK2ixqW9PoF4sMI0pjQdnx60VT1EBx393 NKCO11r4rYTBk0RNwdNKGvWPUBBpwvCi/ZUNoc3knKVDbZW/qJ0OpnKAQZ0gShV+G1 PFpshYQa04kYfpAYHaqI2Og137JWLN04d4bXzT+/M0KldlP5W+zhkfjav8ScZTq2sC dHbjAN73nb1aQ== From: sashiko-bot@kernel.org Subject: Re: [PATCH v6 2/2] accel/rocket: Fix iommu_group leak and unsafe IRQ register access To: ZhaoJinming Cc: dri-devel@lists.freedesktop.org In-Reply-To: <20260610071045.3414828-2-zhaojinming@uniontech.com> References: <20260610071045.3414828-2-zhaojinming@uniontech.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Date: Wed, 10 Jun 2026 07:24:54 +0000 Message-Id: <20260610072454.F2DEB1F00893@smtp.kernel.org> X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: sashiko-reviews@lists.linux.dev Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Thank you for your contribution! Sashiko AI review found 4 potential issue(= s) to consider: New issues: - [High] Mismatched and strict return value checks for `pm_runtime_get_if_a= ctive()` break IRQ handling when Runtime PM is disabled, causing job hangs. Pre-existing issues: - [High] The runtime suspend callback powers down hardware without calling = `synchronize_irq()`, violating shared IRQ rules. - [High] Unhandled hardware DMA errors result in an unacknowledged interrup= t, causing an immediate IRQ storm. - [High] Calling `iommu_detach_group()` and `iommu_attach_group()` with a N= ULL group pointer causes a kernel panic if the device is not bound to an IO= MMU. -- commit 1a0ceb9d138ab6f44b71efac6f2c376438bd8d22 Author: ZhaoJinming accel/rocket: Fix iommu_group leak and unsafe IRQ register access This commit fixes a reference leak of the iommu_group in the IRQ handler by reusing the cached core->iommu_group. It also fixes unsafe hardware regi= ster accesses in the shared IRQ handler by adding runtime PM checks to verify the device is active before reading registers. > diff --git a/drivers/accel/rocket/rocket_job.c b/drivers/accel/rocket/roc= ket_job.c > index cf9f61682dd8b..81b222198dac9 100644 > --- a/drivers/accel/rocket/rocket_job.c > +++ b/drivers/accel/rocket/rocket_job.c > @@ -349,7 +349,7 @@ static void rocket_job_handle_irq(struct rocket_core = *core) > return; > } > =20 > - iommu_detach_group(NULL, iommu_group_get(core->dev)); > + iommu_detach_group(NULL, core->iommu_group); [Severity: High] This is a pre-existing issue, but calling iommu_detach_group() and iommu_attach_group() with a NULL group pointer causes a kernel panic if the device is not bound to an IOMMU. If the device is not behind an IOMMU, iommu_group_get() returns NULL during initialization. If CONFIG_IOMMU_API is enabled, both functions unconditionally dereference the group pointer to acquire its lock, which triggers a NULL pointer dereference crash. Does this path also need to ensure core->iommu_group is not NULL before detaching, and similarly during job submission? > dma_fence_signal(core->in_flight_job->done_fence); > pm_runtime_put_autosuspend(core->dev); > core->in_flight_job =3D NULL; [ ... ] > @@ -420,7 +420,10 @@ static irqreturn_t rocket_job_irq_handler_thread(int= irq, void *data) > { > struct rocket_core *core =3D data; > =20 > - rocket_job_handle_irq(core); > + if (pm_runtime_get_if_active(core->dev) =3D=3D 1) { > + rocket_job_handle_irq(core); > + pm_runtime_put(core->dev); > + } [Severity: High] When CONFIG_PM is disabled or runtime PM is globally disabled for the device, pm_runtime_get_if_active() returns -EINVAL. Because -EINVAL !=3D 1, the condition evaluates to false, silently skipping rocket_job_handle_irq() entirely. Since the threaded handler skips execution, the completed job is never processed, the IOMMU group is not detached, the DMA fence is never signaled, and waiting user processes hang permanently. Could this strict check cause a regression in IRQ handling when Runtime PM is disabled? > =20 > return IRQ_HANDLED; > } > @@ -428,16 +431,28 @@ static irqreturn_t rocket_job_irq_handler_thread(in= t irq, void *data) > static irqreturn_t rocket_job_irq_handler(int irq, void *data) > { > struct rocket_core *core =3D data; > - u32 raw_status =3D rocket_pc_readl(core, INTERRUPT_RAW_STATUS); > + u32 raw_status; > + int ret; > + > + ret =3D pm_runtime_get_if_active(core->dev); > + if (ret =3D=3D 0) > + return IRQ_NONE; > + > + raw_status =3D rocket_pc_readl(core, INTERRUPT_RAW_STATUS); > =20 > WARN_ON(raw_status & PC_INTERRUPT_RAW_STATUS_DMA_READ_ERROR); > WARN_ON(raw_status & PC_INTERRUPT_RAW_STATUS_DMA_WRITE_ERROR); > =20 > if (!(raw_status & PC_INTERRUPT_RAW_STATUS_DPU_0 || > - raw_status & PC_INTERRUPT_RAW_STATUS_DPU_1)) > + raw_status & PC_INTERRUPT_RAW_STATUS_DPU_1)) { > + if (ret > 0) > + pm_runtime_put(core->dev); > return IRQ_NONE; > + } [Severity: High] This is a pre-existing issue, but unhandled hardware DMA errors result in an unacknowledged interrupt, causing an immediate IRQ storm. If the hardware triggers an interrupt with PC_INTERRUPT_RAW_STATUS_DMA_READ_ERROR set, but the DPU completion bits are not set, the condition !(raw_status & DPU) evaluates as true. It executes the WARN_ON() and immediately returns IRQ_NONE without writing to INTERRUPT_CLEAR or INTERRUPT_MASK. Because the shared IRQ line is level-triggered and the hardware is still asserting the unacknowledged error, the interrupt controller will immediately re-invoke the handler as soon as it returns. Will this create an infinite IRQ storm that permanently hogs the CPU core until the kernel's spurious interrupt detector forcibly disables the shared IRQ line? > =20 > rocket_pc_writel(core, INTERRUPT_MASK, 0x0); > + if (ret > 0) > + pm_runtime_put(core->dev); > =20 > return IRQ_WAKE_THREAD; > } [Severity: High] This is a pre-existing issue, but the runtime suspend callback powers down hardware without calling synchronize_irq(), violating shared IRQ rules. The global subsystem guide explicitly states: "When devices use shared IRQs (IRQF_SHARED), the runtime suspend callback must call synchronize_irq() before powering down hardware to ensure no IRQ handler is executing mid-flight." The device requests its interrupt with IRQF_SHARED, but its runtime suspend callback disables the device's clocks without waiting for executing IRQ handlers to finish: drivers/accel/rocket/rocket_drv.c:rocket_device_runtime_suspend() { ... clk_bulk_disable_unprepare(ARRAY_SIZE(rdev->cores[core].clks), rdev->cores[core].clks); ... } Might this lead to mid-flight handlers reading powered-down hardware registers, causing bus faults? --=20 Sashiko AI review =C2=B7 https://sashiko.dev/#/patchset/20260610061915.1CA2= 81F00893@smtp.kernel.org?part=3D2