* [PATCH] accel/rocket: Fix dma_fence reference leak on error paths in rocket_job_run()
@ 2026-06-08 8:38 ZhaoJinming
2026-06-08 8:50 ` sashiko-bot
0 siblings, 1 reply; 8+ messages in thread
From: ZhaoJinming @ 2026-06-08 8:38 UTC (permalink / raw)
To: Tomeu Vizoso, Oded Gabbay
Cc: Sumit Semwal, Christian König, Jeff Hugo, dri-devel,
linux-kernel, linux-media, linaro-mm-sig, ZhaoJinming, stable
In rocket_job_run(), after creating a fence at line 305 and taking an
extra reference for job->done_fence at line 311 via dma_fence_get(),
two error paths fail to release that extra reference before returning:
- Line 314: When pm_runtime_get_sync() fails, returns fence without
putting job->done_fence.
- Line 318: When iommu_attach_group() fails, returns fence without
putting job->done_fence, and also omits pm_runtime_put() to balance
the successful pm_runtime_get_sync() at line 313.
The leaked fence reference prevents the fence and its underlying rocket
device from being freed while job->done_fence still points to it.
Repeated failures will accumulate leaked fences, consuming kernel memory
and holding device resources indefinitely.
Fix both error paths by calling dma_fence_put(job->done_fence) and
setting job->done_fence to NULL before returning. Also add the missing
pm_runtime_put() on the iommu_attach_group error path.
Cc: stable@vger.kernel.org
Fixes: 0810d5ad88a1 ("accel/rocket: Add job submission IOCTL")
Signed-off-by: ZhaoJinming <zhaojinming@uniontech.com>
---
drivers/accel/rocket/rocket_job.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/accel/rocket/rocket_job.c b/drivers/accel/rocket/rocket_job.c
index ac51bff39833..66e4a1d57842 100644
--- a/drivers/accel/rocket/rocket_job.c
+++ b/drivers/accel/rocket/rocket_job.c
@@ -311,12 +311,19 @@ static struct dma_fence *rocket_job_run(struct drm_sched_job *sched_job)
job->done_fence = dma_fence_get(fence);
ret = pm_runtime_get_sync(core->dev);
- if (ret < 0)
+ if (ret < 0) {
+ dma_fence_put(job->done_fence);
+ job->done_fence = NULL;
return fence;
+ }
ret = iommu_attach_group(job->domain->domain, core->iommu_group);
- if (ret < 0)
+ if (ret < 0) {
+ pm_runtime_put(core->dev);
+ dma_fence_put(job->done_fence);
+ job->done_fence = NULL;
return fence;
+ }
scoped_guard(mutex, &core->job_lock) {
core->in_flight_job = job;
--
2.20.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] accel/rocket: Fix dma_fence reference leak on error paths in rocket_job_run()
2026-06-08 8:38 [PATCH] accel/rocket: Fix dma_fence reference leak on error paths in rocket_job_run() ZhaoJinming
@ 2026-06-08 8:50 ` sashiko-bot
2026-06-08 10:00 ` [PATCH v2] accel/rocket: Fix fence leak, runtime PM leak and WARN in rocket_job_run() error paths ZhaoJinming
0 siblings, 1 reply; 8+ messages in thread
From: sashiko-bot @ 2026-06-08 8:50 UTC (permalink / raw)
To: ZhaoJinming; +Cc: dri-devel, linux-media
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
Pre-existing issues:
- [High] The `pm_runtime_get_sync()` error path fails to decrement the runtime PM usage counter, permanently leaking the reference.
- [High] Returning an unsignaled dma_fence instead of an error pointer on job submission failure triggers a kernel WARN.
--
commit f0df2d483139ad216e9d43923cbf9543a8f4e81c
Author: ZhaoJinming <zhaojinming@uniontech.com>
accel/rocket: Fix dma_fence reference leak on error paths in rocket_job_run()
In rocket_job_run(), after creating a fence at line 305 and taking an
extra reference for job->done_fence at line 311 via dma_fence_get(),
two error paths fail to release that extra reference before returning.
> diff --git a/drivers/accel/rocket/rocket_job.c b/drivers/accel/rocket/rocket_job.c
> index 2f1861f960ccb..fc10703534e37 100644
> --- a/drivers/accel/rocket/rocket_job.c
> +++ b/drivers/accel/rocket/rocket_job.c
> @@ -311,12 +311,19 @@ static struct dma_fence *rocket_job_run(struct drm_sched_job *sched_job)
> job->done_fence = dma_fence_get(fence);
>
> ret = pm_runtime_get_sync(core->dev);
> - if (ret < 0)
> + if (ret < 0) {
> + dma_fence_put(job->done_fence);
> + job->done_fence = NULL;
[Severity: High]
This is a pre-existing issue, but does this error path permanently leak the
runtime PM usage counter?
pm_runtime_get_sync() increments the device's usage counter before attempting
to resume the device. If the resume fails, it returns a negative error code
but the usage counter remains incremented.
Without explicitly dropping this usage counter (e.g., using
pm_runtime_put_noidle()), could repeated failures prevent the NPU from ever
suspending again?
> return fence;
[Severity: High]
This is also a pre-existing issue, but will returning the valid, unsignaled
fence instead of ERR_PTR(ret) trigger a kernel WARN?
Since the hardware wasn't submitted, the fence refcount immediately hits 0
after the DRM scheduler drops its initial reference. dma_fence_release()
detects pending callbacks on an unsignaled fence and triggers a loud kernel
WARN(1, "Fence ... released with pending signals!\n"), forcibly signaling it
with -EDEADLK.
Should this path destroy the fence and return ERR_PTR(ret) so the scheduler
cleanly aborts the job?
> + }
>
> ret = iommu_attach_group(job->domain->domain, core->iommu_group);
> - if (ret < 0)
> + if (ret < 0) {
> + pm_runtime_put(core->dev);
> + dma_fence_put(job->done_fence);
> + job->done_fence = NULL;
> return fence;
[Severity: High]
This is a pre-existing issue, but similar to the above, does returning the
unsignaled fence here instead of ERR_PTR(ret) cause the same DRM scheduler
WARN when the fence is released?
> + }
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260608083824.775261-1-zhaojinming@uniontech.com?part=1
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v2] accel/rocket: Fix fence leak, runtime PM leak and WARN in rocket_job_run() error paths
2026-06-08 8:50 ` sashiko-bot
@ 2026-06-08 10:00 ` ZhaoJinming
2026-06-08 10:38 ` sashiko-bot
0 siblings, 1 reply; 8+ messages in thread
From: ZhaoJinming @ 2026-06-08 10:00 UTC (permalink / raw)
To: Tomeu Vizoso, Oded Gabbay
Cc: Sumit Semwal, Christian König, Jeff Hugo, dri-devel,
linux-kernel, linux-media, linaro-mm-sig, ZhaoJinming, stable
Three issues exist in the error paths of rocket_job_run():
1) dma_fence reference leak: After creating a fence and taking an extra
reference for job->done_fence via dma_fence_get(), the error paths
return without releasing the extra reference held by job->done_fence.
The leaked reference prevents the fence from being freed, causing
resource accumulation on repeated failures.
2) pm_runtime_get_sync() usage counter leak: pm_runtime_get_sync()
increments the runtime PM usage counter before attempting to resume
the device. If the resume fails and returns an error, the usage
counter remains incremented. The original error path does not call
pm_runtime_put_noidle() to balance it. Repeated failures will
permanently prevent the NPU from entering suspend.
3) Unsignaled fence returned on failure: The error paths return a valid
but unsignaled dma_fence to the DRM scheduler. Since the hardware
was never submitted, the fence is never signaled. When the scheduler
eventually drops its reference, dma_fence_release() detects the
unsignaled fence and triggers:
WARN(1, "Fence ... released with pending signals!")
and forcibly signals it with -EDEADLK.
Fix all three issues by:
- Replace pm_runtime_get_sync() with pm_runtime_resume_and_get(), which
automatically decrements the usage counter on failure, eliminating
the need for a manual pm_runtime_put_noidle() call and avoiding the
usage counter leak. The pm_runtime_get_sync() documentation itself
recommends pm_runtime_resume_and_get() as the preferred alternative
when the return value is checked by the caller.
- Release both fence references (job->done_fence and the local fence)
before returning ERR_PTR(ret) so the DRM scheduler cleanly aborts
the job without triggering the unsignaled fence WARN.
- Add pm_runtime_put() on the iommu_attach_group error path to release
the runtime PM reference that was successfully acquired.
Cc: stable@vger.kernel.org
Fixes: 0810d5ad88a1 ("accel/rocket: Add job submission IOCTL")
Signed-off-by: ZhaoJinming <zhaojinming@uniontech.com>
---
drivers/accel/rocket/rocket_job.c | 19 ++++++++++++++-----
1 file changed, 14 insertions(+), 5 deletions(-)
diff --git a/drivers/accel/rocket/rocket_job.c b/drivers/accel/rocket/rocket_job.c
index ac51bff39833..e8a073e22ac2 100644
--- a/drivers/accel/rocket/rocket_job.c
+++ b/drivers/accel/rocket/rocket_job.c
@@ -310,13 +310,22 @@ static struct dma_fence *rocket_job_run(struct drm_sched_job *sched_job)
dma_fence_put(job->done_fence);
job->done_fence = dma_fence_get(fence);
- ret = pm_runtime_get_sync(core->dev);
- if (ret < 0)
- return fence;
+ ret = pm_runtime_resume_and_get(core->dev);
+ if (ret < 0) {
+ dma_fence_put(job->done_fence);
+ job->done_fence = NULL;
+ dma_fence_put(fence);
+ return ERR_PTR(ret);
+ }
ret = iommu_attach_group(job->domain->domain, core->iommu_group);
- if (ret < 0)
- return fence;
+ if (ret < 0) {
+ pm_runtime_put(core->dev);
+ dma_fence_put(job->done_fence);
+ job->done_fence = NULL;
+ dma_fence_put(fence);
+ return ERR_PTR(ret);
+ }
scoped_guard(mutex, &core->job_lock) {
core->in_flight_job = job;
--
2.20.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v2] accel/rocket: Fix fence leak, runtime PM leak and WARN in rocket_job_run() error paths
2026-06-08 10:00 ` [PATCH v2] accel/rocket: Fix fence leak, runtime PM leak and WARN in rocket_job_run() error paths ZhaoJinming
@ 2026-06-08 10:38 ` sashiko-bot
2026-06-09 9:33 ` [PATCH v3 1/2] accel/rocket: Fix error path handling in rocket_job_run() ZhaoJinming
0 siblings, 1 reply; 8+ messages in thread
From: sashiko-bot @ 2026-06-08 10:38 UTC (permalink / raw)
To: ZhaoJinming; +Cc: linux-media, dri-devel
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
Pre-existing issues:
- [High] IOMMU group reference leak in rocket_job_handle_irq()
- [High] Unsafe hardware register access in shared IRQ handler while runtime suspended
--
commit fca53bbe333680ce9d4e0368b39e684506994081
Author: ZhaoJinming <zhaojinming@uniontech.com>
Subject: accel/rocket: Fix fence leak, runtime PM leak and WARN in rocket_job_run() error paths
This commit fixes multiple resource leaks in the error paths of rocket_job_run().
It replaces pm_runtime_get_sync() with pm_runtime_resume_and_get() to handle
the usage counter, correctly releases dma_fence references on failure, and
adds pm_runtime_put() during IOMMU attachment failures.
> diff --git a/drivers/accel/rocket/rocket_job.c b/drivers/accel/rocket/rocket_job.c
> index 2f1861f960ccb..cf9f61682dd8b 100644
> --- a/drivers/accel/rocket/rocket_job.c
> +++ b/drivers/accel/rocket/rocket_job.c
[ ... ]
> @@ -310,13 +310,22 @@ static struct dma_fence *rocket_job_run(struct drm_sched_job *sched_job)
> dma_fence_put(job->done_fence);
> job->done_fence = dma_fence_get(fence);
>
> - ret = pm_runtime_get_sync(core->dev);
> - if (ret < 0)
> - return fence;
> + ret = pm_runtime_resume_and_get(core->dev);
> + if (ret < 0) {
> + dma_fence_put(job->done_fence);
> + job->done_fence = NULL;
> + dma_fence_put(fence);
> + return ERR_PTR(ret);
> + }
>
> ret = iommu_attach_group(job->domain->domain, core->iommu_group);
> - if (ret < 0)
> - return fence;
> + if (ret < 0) {
> + pm_runtime_put(core->dev);
> + dma_fence_put(job->done_fence);
> + job->done_fence = NULL;
> + dma_fence_put(fence);
> + return ERR_PTR(ret);
> + }
>
> scoped_guard(mutex, &core->job_lock) {
> core->in_flight_job = job;
[ ... ]
> static void rocket_job_handle_irq(struct rocket_core *core)
> {
> pm_runtime_mark_last_busy(core->dev);
>
> rocket_pc_writel(core, OPERATION_ENABLE, 0x0);
> rocket_pc_writel(core, INTERRUPT_CLEAR, 0x1ffff);
>
> scoped_guard(mutex, &core->job_lock)
> if (core->in_flight_job) {
> if (core->in_flight_job->next_task_idx < core->in_flight_job->task_count) {
> rocket_job_hw_submit(core, core->in_flight_job);
> return;
> }
>
> iommu_detach_group(NULL, iommu_group_get(core->dev));
[Severity: High]
This isn't a bug introduced by this patch, but does this leak an IOMMU
group reference?
It looks like iommu_group_get() increments the kobject reference count for the
group. However, iommu_detach_group() doesn't consume this reference, and there
is no subsequent iommu_group_put() call.
Since this runs on every completed job, could this eventually overflow the
reference counter and prevent the group from being freed?
[ ... ]
> static irqreturn_t rocket_job_irq_handler(int irq, void *data)
> {
> struct rocket_core *core = data;
> u32 raw_status = rocket_pc_readl(core, INTERRUPT_RAW_STATUS);
[Severity: High]
This is a pre-existing issue, but is it safe to access hardware registers
here without first checking if the device is active?
Since the driver registers this as a shared interrupt handler, it could be
invoked if another device on the same IRQ line triggers an interrupt while the
NPU device is runtime suspended.
Reading from a powered-down device typically returns 0xffffffff, which would
spuriously trigger the WARN_ON macros below and falsely return
IRQ_WAKE_THREAD.
Should this use pm_runtime_get_if_active() to verify the device is powered
before reading INTERRUPT_RAW_STATUS?
>
> WARN_ON(raw_status & PC_INTERRUPT_RAW_STATUS_DMA_READ_ERROR);
> WARN_ON(raw_status & PC_INTERRUPT_RAW_STATUS_DMA_WRITE_ERROR);
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260608100040.985526-1-zhaojinming@uniontech.com?part=1
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v3 1/2] accel/rocket: Fix error path handling in rocket_job_run()
2026-06-08 10:38 ` sashiko-bot
@ 2026-06-09 9:33 ` ZhaoJinming
2026-06-09 9:33 ` [PATCH v3 2/2] accel/rocket: Fix iommu_group leak and unsafe IRQ register access ZhaoJinming
2026-06-09 9:50 ` [PATCH v3 1/2] accel/rocket: Fix error path handling in rocket_job_run() sashiko-bot
0 siblings, 2 replies; 8+ messages in thread
From: ZhaoJinming @ 2026-06-09 9:33 UTC (permalink / raw)
To: Tomeu Vizoso, Oded Gabbay
Cc: Sumit Semwal, Christian König, Jeff Hugo, dri-devel,
linux-kernel, linux-media, linaro-mm-sig, ZhaoJinming, stable
In rocket_job_run(), after taking an extra fence reference for
job->done_fence via dma_fence_get(), the error paths have three bugs:
- The dma_fence reference held by job->done_fence is never released,
causing a reference leak.
- pm_runtime_get_sync() increments the usage counter even on failure,
but the error path does not decrement it, leaking the runtime PM
reference and preventing the NPU from suspending.
- A valid but unsignaled fence is returned to the DRM scheduler,
which triggers WARN("Fence ... released with pending signals!")
when the scheduler drops its reference.
Fix by replacing pm_runtime_get_sync() with pm_runtime_resume_and_get()
which auto-balances the usage counter on failure, releasing both fence
references on error, and returning ERR_PTR(ret) instead of the
unsignaled fence.
Cc: stable@vger.kernel.org
Fixes: 0810d5ad88a1 ("accel/rocket: Add job submission IOCTL")
Signed-off-by: ZhaoJinming <zhaojinming@uniontech.com>
---
drivers/accel/rocket/rocket_job.c | 19 ++++++++++++++-----
1 file changed, 14 insertions(+), 5 deletions(-)
diff --git a/drivers/accel/rocket/rocket_job.c b/drivers/accel/rocket/rocket_job.c
index ac51bff39833..e8a073e22ac2 100644
--- a/drivers/accel/rocket/rocket_job.c
+++ b/drivers/accel/rocket/rocket_job.c
@@ -310,13 +310,22 @@ static struct dma_fence *rocket_job_run(struct drm_sched_job *sched_job)
dma_fence_put(job->done_fence);
job->done_fence = dma_fence_get(fence);
- ret = pm_runtime_get_sync(core->dev);
- if (ret < 0)
- return fence;
+ ret = pm_runtime_resume_and_get(core->dev);
+ if (ret < 0) {
+ dma_fence_put(job->done_fence);
+ job->done_fence = NULL;
+ dma_fence_put(fence);
+ return ERR_PTR(ret);
+ }
ret = iommu_attach_group(job->domain->domain, core->iommu_group);
- if (ret < 0)
- return fence;
+ if (ret < 0) {
+ pm_runtime_put(core->dev);
+ dma_fence_put(job->done_fence);
+ job->done_fence = NULL;
+ dma_fence_put(fence);
+ return ERR_PTR(ret);
+ }
scoped_guard(mutex, &core->job_lock) {
core->in_flight_job = job;
--
2.20.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v3 2/2] accel/rocket: Fix iommu_group leak and unsafe IRQ register access
2026-06-09 9:33 ` [PATCH v3 1/2] accel/rocket: Fix error path handling in rocket_job_run() ZhaoJinming
@ 2026-06-09 9:33 ` ZhaoJinming
2026-06-09 9:47 ` sashiko-bot
2026-06-09 9:50 ` [PATCH v3 1/2] accel/rocket: Fix error path handling in rocket_job_run() sashiko-bot
1 sibling, 1 reply; 8+ messages in thread
From: ZhaoJinming @ 2026-06-09 9:33 UTC (permalink / raw)
To: Tomeu Vizoso, Oded Gabbay
Cc: Sumit Semwal, Christian König, Jeff Hugo, dri-devel,
linux-kernel, linux-media, linaro-mm-sig, ZhaoJinming, stable
Two bugs in the IRQ handling path:
1) iommu_group reference leak in rocket_job_handle_irq():
iommu_group_get() increments the reference count but the returned
pointer is passed directly to iommu_detach_group() which does not
consume it. Since this runs on every completed job, the reference
count accumulates and prevents the group from being freed. Use
core->iommu_group instead, consistent with rocket_reset().
2) Unsafe hardware register access in shared IRQ handler:
rocket_job_irq_handler() is registered with IRQF_SHARED but accesses
hardware registers without checking runtime PM status. If another
device on the same IRQ line triggers an interrupt while the NPU is
suspended, register reads return 0xffffffff, spuriously triggering
WARN_ON macros and falsely returning IRQ_WAKE_THREAD.
Replace with pm_runtime_get_if_active(), which atomically checks the
device status under spinlock and increments the usage count to keep
the device active during register access. Balance the extra reference:
- If the interrupt is not for this device (raw_status does not match
DPU_0/DPU_1), put in the hardirq handler before returning IRQ_NONE.
- If the interrupt is for this device, the thread handler puts after
completing register writes.
Cc: stable@vger.kernel.org
Fixes: 0810d5ad88a1 ("accel/rocket: Add job submission IOCTL")
Signed-off-by: ZhaoJinming <zhaojinming@uniontech.com>
---
drivers/accel/rocket/rocket_job.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/accel/rocket/rocket_job.c b/drivers/accel/rocket/rocket_job.c
index e8a073e22ac2..30c677952b64 100644
--- a/drivers/accel/rocket/rocket_job.c
+++ b/drivers/accel/rocket/rocket_job.c
@@ -349,7 +349,7 @@ static void rocket_job_handle_irq(struct rocket_core *core)
return;
}
- iommu_detach_group(NULL, iommu_group_get(core->dev));
+ iommu_detach_group(NULL, core->iommu_group);
dma_fence_signal(core->in_flight_job->done_fence);
pm_runtime_put_autosuspend(core->dev);
core->in_flight_job = NULL;
@@ -421,6 +421,7 @@ static irqreturn_t rocket_job_irq_handler_thread(int irq, void *data)
struct rocket_core *core = data;
rocket_job_handle_irq(core);
+ pm_runtime_put(core->dev);
return IRQ_HANDLED;
}
@@ -428,14 +429,20 @@ static irqreturn_t rocket_job_irq_handler_thread(int irq, void *data)
static irqreturn_t rocket_job_irq_handler(int irq, void *data)
{
struct rocket_core *core = data;
+
+ if (!pm_runtime_get_if_active(core->dev))
+ return IRQ_NONE;
+
u32 raw_status = rocket_pc_readl(core, INTERRUPT_RAW_STATUS);
WARN_ON(raw_status & PC_INTERRUPT_RAW_STATUS_DMA_READ_ERROR);
WARN_ON(raw_status & PC_INTERRUPT_RAW_STATUS_DMA_WRITE_ERROR);
if (!(raw_status & PC_INTERRUPT_RAW_STATUS_DPU_0 ||
- raw_status & PC_INTERRUPT_RAW_STATUS_DPU_1))
+ raw_status & PC_INTERRUPT_RAW_STATUS_DPU_1)) {
+ pm_runtime_put(core->dev);
return IRQ_NONE;
+ }
rocket_pc_writel(core, INTERRUPT_MASK, 0x0);
--
2.20.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v3 2/2] accel/rocket: Fix iommu_group leak and unsafe IRQ register access
2026-06-09 9:33 ` [PATCH v3 2/2] accel/rocket: Fix iommu_group leak and unsafe IRQ register access ZhaoJinming
@ 2026-06-09 9:47 ` sashiko-bot
0 siblings, 0 replies; 8+ messages in thread
From: sashiko-bot @ 2026-06-09 9:47 UTC (permalink / raw)
To: ZhaoJinming; +Cc: dri-devel
Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
New issues:
- [High] Incorrect error handling of pm_runtime_get_if_active() return value causes unbalanced pm_runtime_put() and unsafe hardware access.
- [High] PM usage count leak due to threaded IRQ coalescing on shared interrupts.
Pre-existing issues:
- [High] Missing synchronize_irq() in the runtime suspend callback for a device with a shared IRQ handler.
- [High] Attacker-controlled DMA faults leave the device interrupt asserted without clearing it, causing an interrupt storm and kernel log spam.
--
commit 57f30fe728d7f765520f0d3ccb419368e6f945d0
Author: ZhaoJinming <zhaojinming@uniontech.com>
accel/rocket: Fix iommu_group leak and unsafe IRQ register access
This commit fixes an iommu_group reference leak in rocket_job_handle_irq()
by using the stored iommu_group instead of getting a new reference. It also
attempts to fix unsafe hardware register accesses in the shared IRQ handler
by adding pm_runtime_get_if_active() checks to ensure the device is awake.
> 2) Unsafe hardware register access in shared IRQ handler:
> rocket_job_irq_handler() is registered with IRQF_SHARED but accesses
> hardware registers without checking runtime PM status.
[Severity: High]
This is a pre-existing issue, but since the device uses a shared IRQ handler,
does the runtime suspend callback (rocket_device_runtime_suspend() in
rocket_drv.c) need to call synchronize_irq() before disabling clocks?
Without it, the IRQ handler could execute mid-flight while the hardware is
turned off, leading to invalid register accesses and spurious interrupts.
> diff --git a/drivers/accel/rocket/rocket_job.c b/drivers/accel/rocket/rocket_job.c
> index cf9f61682dd8b..8eaa3d427a51d 100644
> --- a/drivers/accel/rocket/rocket_job.c
> +++ b/drivers/accel/rocket/rocket_job.c
[ ... ]
> @@ -428,14 +429,20 @@ static irqreturn_t rocket_job_irq_handler_thread(int irq, void *data)
> static irqreturn_t rocket_job_irq_handler(int irq, void *data)
> {
> struct rocket_core *core = data;
> +
> + if (!pm_runtime_get_if_active(core->dev))
> + return IRQ_NONE;
> +
[Severity: High]
Does this correctly handle negative error codes?
pm_runtime_get_if_active() returns negative error codes (e.g., -EINVAL) when
runtime PM is disabled. The condition here only traps a return value of 0.
When it returns a negative error code, the condition evaluates to false,
allowing execution to proceed to read hardware registers on a potentially
suspended device.
Because the PM usage counter is not incremented on error, wouldn't the
subsequent unconditional pm_runtime_put() calls cause a PM usage count
underflow?
> u32 raw_status = rocket_pc_readl(core, INTERRUPT_RAW_STATUS);
>
> WARN_ON(raw_status & PC_INTERRUPT_RAW_STATUS_DMA_READ_ERROR);
> WARN_ON(raw_status & PC_INTERRUPT_RAW_STATUS_DMA_WRITE_ERROR);
>
[Severity: High]
This is a pre-existing issue, but if a DMA error occurs and the DPU_0 and
DPU_1 bits are not set, the handler returns IRQ_NONE without clearing the
interrupt.
Since this is a shared level-triggered IRQ line, won't this leave the
interrupt asserted and cause an interrupt storm, permanently disabling the
shared IRQ line?
> if (!(raw_status & PC_INTERRUPT_RAW_STATUS_DPU_0 ||
> - raw_status & PC_INTERRUPT_RAW_STATUS_DPU_1))
> + raw_status & PC_INTERRUPT_RAW_STATUS_DPU_1)) {
> + pm_runtime_put(core->dev);
> return IRQ_NONE;
> + }
>
> rocket_pc_writel(core, INTERRUPT_MASK, 0x0);
>
[Severity: High]
Does this leak the PM usage count if the IRQ subsystem coalesces the threaded
interrupt?
If another device triggers an interrupt on this shared line while
rocket_job_irq_handler_thread() is already pending, this hardirq handler
will run again. It will increment the PM usage count via
pm_runtime_get_if_active() and eventually return IRQ_WAKE_THREAD.
Because the thread is already pending, the IRQ core coalesces the request
and the thread only runs once. Doesn't this mean pm_runtime_put() is only
called once, permanently leaking the extra PM reference acquired here?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260609093346.380396-1-zhaojinming@uniontech.com?part=2
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3 1/2] accel/rocket: Fix error path handling in rocket_job_run()
2026-06-09 9:33 ` [PATCH v3 1/2] accel/rocket: Fix error path handling in rocket_job_run() ZhaoJinming
2026-06-09 9:33 ` [PATCH v3 2/2] accel/rocket: Fix iommu_group leak and unsafe IRQ register access ZhaoJinming
@ 2026-06-09 9:50 ` sashiko-bot
1 sibling, 0 replies; 8+ messages in thread
From: sashiko-bot @ 2026-06-09 9:50 UTC (permalink / raw)
To: ZhaoJinming; +Cc: dri-devel
Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
Pre-existing issues:
- [High] Calling `drm_sched_job_cleanup()` after `drm_sched_job_arm()` violates the DRM scheduler API contract.
- [High] The runtime suspend callback fails to synchronize the shared IRQ before powering down.
- [High] Potential NULL pointer dereference in `iommu_attach_group()` if the device lacks an IOMMU group.
--
commit a91812825621124e6ae3a091adc6205cbeefed80
Author: ZhaoJinming <zhaojinming@uniontech.com>
accel/rocket: Fix error path handling in rocket_job_run()
This commit fixes multiple issues in the error paths of rocket_job_run(). It
resolves a dma_fence reference leak, correctly balances the runtime PM
usage counter on failure by switching to pm_runtime_resume_and_get(), and
returns ERR_PTR(ret) instead of an unsignaled fence to avoid DRM scheduler
warnings.
> diff --git a/drivers/accel/rocket/rocket_job.c b/drivers/accel/rocket/rocket_job.c
> index 2f1861f960ccb..cf9f61682dd8b 100644
> --- a/drivers/accel/rocket/rocket_job.c
> +++ b/drivers/accel/rocket/rocket_job.c
> @@ -310,13 +310,22 @@ static struct dma_fence *rocket_job_run(struct drm_sched_job *sched_job)
[ ... ]
> ret = iommu_attach_group(job->domain->domain, core->iommu_group);
> - if (ret < 0)
> - return fence;
> + if (ret < 0) {
> + pm_runtime_put(core->dev);
> + dma_fence_put(job->done_fence);
> + job->done_fence = NULL;
> + dma_fence_put(fence);
> + return ERR_PTR(ret);
> + }
[Severity: High]
This is a pre-existing issue, but can core->iommu_group be NULL here?
If the device does not have an IOMMU group (e.g., IOMMU is disabled or not
present in the device tree), iommu_group_get() returns NULL. Calling
iommu_attach_group() unconditionally with a NULL group parameter will cause a
kernel crash when it attempts to dereference the pointer.
Should there be a check for a valid iommu_group before attaching?
[ ... ]
I also noticed a couple of pre-existing issues in this driver:
[Severity: High]
This isn't a bug introduced by this patch, but is it safe to call
drm_sched_job_cleanup() after drm_sched_job_arm() has already been called?
In drivers/accel/rocket/rocket_job.c:rocket_job_push(), if
rocket_acquire_object_fences() fails:
drm_sched_job_arm(&job->base);
...
ret = rocket_acquire_object_fences(job->in_bos, job->in_bo_count,
&job->base, false);
if (ret)
goto err_unlock;
The error path returns to rocket_ioctl_submit_job(), which then calls:
out_cleanup_job:
if (ret)
drm_sched_job_cleanup(&rjob->base);
The DRM scheduler API contract states that drm_sched_job_arm() is a point of
no return that initializes fences and sequence numbers. Calling
drm_sched_job_cleanup() after this breaks the dma_fence sequence numbers,
potentially causing any context waiting on the aborted sequence number to hang
indefinitely.
[Severity: High]
This is a pre-existing issue, but does the runtime suspend callback need to
synchronize the shared IRQ before powering down?
In drivers/accel/rocket/rocket_drv.c:rocket_device_runtime_suspend():
clk_bulk_disable_unprepare(ARRAY_SIZE(rdev->cores[core].clks),
rdev->cores[core].clks);
The Rocket driver registers its IRQ handler with IRQF_SHARED. According to
the power management subsystem rules, drivers using shared IRQs must call
synchronize_irq() in their runtime suspend callback before disabling clocks.
Without this, could the hardware be powered off while the IRQ handler is
mid-flight and reading registers, potentially leading to system hangs or
spurious wakeups?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260609093346.380396-1-zhaojinming@uniontech.com?part=1
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2026-06-09 9:51 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-08 8:38 [PATCH] accel/rocket: Fix dma_fence reference leak on error paths in rocket_job_run() ZhaoJinming
2026-06-08 8:50 ` sashiko-bot
2026-06-08 10:00 ` [PATCH v2] accel/rocket: Fix fence leak, runtime PM leak and WARN in rocket_job_run() error paths ZhaoJinming
2026-06-08 10:38 ` sashiko-bot
2026-06-09 9:33 ` [PATCH v3 1/2] accel/rocket: Fix error path handling in rocket_job_run() ZhaoJinming
2026-06-09 9:33 ` [PATCH v3 2/2] accel/rocket: Fix iommu_group leak and unsafe IRQ register access ZhaoJinming
2026-06-09 9:47 ` sashiko-bot
2026-06-09 9:50 ` [PATCH v3 1/2] accel/rocket: Fix error path handling in rocket_job_run() sashiko-bot
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.