Re: [PATCH v6 2/2] accel/rocket: Fix iommu_group leak and unsafe IRQ register access

All of lore.kernel.org
 help / color / mirror / Atom feed

From: sashiko-bot@kernel.org
To: ZhaoJinming <zhaojinming@uniontech.com>
Cc: dri-devel@lists.freedesktop.org
Subject: Re: [PATCH v6 2/2] accel/rocket: Fix iommu_group leak and unsafe IRQ register access
Date: Wed, 10 Jun 2026 07:24:54 +0000	[thread overview]
Message-ID: <20260610072454.F2DEB1F00893@smtp.kernel.org> (raw)
In-Reply-To: <20260610071045.3414828-2-zhaojinming@uniontech.com>

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:

New issues:
- [High] Mismatched and strict return value checks for `pm_runtime_get_if_active()` break IRQ handling when Runtime PM is disabled, causing job hangs.

Pre-existing issues:
- [High] The runtime suspend callback powers down hardware without calling `synchronize_irq()`, violating shared IRQ rules.
- [High] Unhandled hardware DMA errors result in an unacknowledged interrupt, causing an immediate IRQ storm.
- [High] Calling `iommu_detach_group()` and `iommu_attach_group()` with a NULL group pointer causes a kernel panic if the device is not bound to an IOMMU.
--

commit 1a0ceb9d138ab6f44b71efac6f2c376438bd8d22
Author: ZhaoJinming <zhaojinming@uniontech.com>

accel/rocket: Fix iommu_group leak and unsafe IRQ register access

This commit fixes a reference leak of the iommu_group in the IRQ handler
by reusing the cached core->iommu_group. It also fixes unsafe hardware register
accesses in the shared IRQ handler by adding runtime PM checks to verify the
device is active before reading registers.

> diff --git a/drivers/accel/rocket/rocket_job.c b/drivers/accel/rocket/rocket_job.c
> index cf9f61682dd8b..81b222198dac9 100644
> --- a/drivers/accel/rocket/rocket_job.c
> +++ b/drivers/accel/rocket/rocket_job.c
> @@ -349,7 +349,7 @@ static void rocket_job_handle_irq(struct rocket_core *core)
>  				return;
>  			}
>  
> -			iommu_detach_group(NULL, iommu_group_get(core->dev));
> +			iommu_detach_group(NULL, core->iommu_group);

[Severity: High]
This is a pre-existing issue, but calling iommu_detach_group() and
iommu_attach_group() with a NULL group pointer causes a kernel panic if
the device is not bound to an IOMMU.

If the device is not behind an IOMMU, iommu_group_get() returns NULL
during initialization.

If CONFIG_IOMMU_API is enabled, both functions unconditionally
dereference the group pointer to acquire its lock, which triggers a NULL
pointer dereference crash.

Does this path also need to ensure core->iommu_group is not NULL before
detaching, and similarly during job submission?

>  			dma_fence_signal(core->in_flight_job->done_fence);
>  			pm_runtime_put_autosuspend(core->dev);
>  			core->in_flight_job = NULL;

[ ... ]

> @@ -420,7 +420,10 @@ static irqreturn_t rocket_job_irq_handler_thread(int irq, void *data)
>  {
>  	struct rocket_core *core = data;
>  
> -	rocket_job_handle_irq(core);
> +	if (pm_runtime_get_if_active(core->dev) == 1) {
> +		rocket_job_handle_irq(core);
> +		pm_runtime_put(core->dev);
> +	}

[Severity: High]
When CONFIG_PM is disabled or runtime PM is globally disabled for the
device, pm_runtime_get_if_active() returns -EINVAL.

Because -EINVAL != 1, the condition evaluates to false, silently
skipping rocket_job_handle_irq() entirely.

Since the threaded handler skips execution, the completed job is never
processed, the IOMMU group is not detached, the DMA fence is never
signaled, and waiting user processes hang permanently.

Could this strict check cause a regression in IRQ handling when Runtime
PM is disabled?

>  
>  	return IRQ_HANDLED;
>  }
> @@ -428,16 +431,28 @@ static irqreturn_t rocket_job_irq_handler_thread(int irq, void *data)
>  static irqreturn_t rocket_job_irq_handler(int irq, void *data)
>  {
>  	struct rocket_core *core = data;
> -	u32 raw_status = rocket_pc_readl(core, INTERRUPT_RAW_STATUS);
> +	u32 raw_status;
> +	int ret;
> +
> +	ret = pm_runtime_get_if_active(core->dev);
> +	if (ret == 0)
> +		return IRQ_NONE;
> +
> +	raw_status = rocket_pc_readl(core, INTERRUPT_RAW_STATUS);
>  
>  	WARN_ON(raw_status & PC_INTERRUPT_RAW_STATUS_DMA_READ_ERROR);
>  	WARN_ON(raw_status & PC_INTERRUPT_RAW_STATUS_DMA_WRITE_ERROR);
>  
>  	if (!(raw_status & PC_INTERRUPT_RAW_STATUS_DPU_0 ||
> -	      raw_status & PC_INTERRUPT_RAW_STATUS_DPU_1))
> +	      raw_status & PC_INTERRUPT_RAW_STATUS_DPU_1)) {
> +		if (ret > 0)
> +			pm_runtime_put(core->dev);
>  		return IRQ_NONE;
> +	}

[Severity: High]
This is a pre-existing issue, but unhandled hardware DMA errors result
in an unacknowledged interrupt, causing an immediate IRQ storm.

If the hardware triggers an interrupt with
PC_INTERRUPT_RAW_STATUS_DMA_READ_ERROR set, but the DPU completion bits
are not set, the condition !(raw_status & DPU) evaluates as true.

It executes the WARN_ON() and immediately returns IRQ_NONE without
writing to INTERRUPT_CLEAR or INTERRUPT_MASK.

Because the shared IRQ line is level-triggered and the hardware is still
asserting the unacknowledged error, the interrupt controller will
immediately re-invoke the handler as soon as it returns.

Will this create an infinite IRQ storm that permanently hogs the CPU
core until the kernel's spurious interrupt detector forcibly disables
the shared IRQ line?

>  
>  	rocket_pc_writel(core, INTERRUPT_MASK, 0x0);
> +	if (ret > 0)
> +		pm_runtime_put(core->dev);
>  
>  	return IRQ_WAKE_THREAD;
>  }

[Severity: High]
This is a pre-existing issue, but the runtime suspend callback powers
down hardware without calling synchronize_irq(), violating shared IRQ
rules.

The global subsystem guide explicitly states: "When devices use shared
IRQs (IRQF_SHARED), the runtime suspend callback must call
synchronize_irq() before powering down hardware to ensure no IRQ handler
is executing mid-flight."

The device requests its interrupt with IRQF_SHARED, but its runtime
suspend callback disables the device's clocks without waiting for
executing IRQ handlers to finish:

drivers/accel/rocket/rocket_drv.c:rocket_device_runtime_suspend() {
    ...
    clk_bulk_disable_unprepare(ARRAY_SIZE(rdev->cores[core].clks),
                               rdev->cores[core].clks);
    ...
}

Might this lead to mid-flight handlers reading powered-down hardware
registers, causing bus faults?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260610061915.1CA281F00893@smtp.kernel.org?part=2

next prev parent reply	other threads:[~2026-06-10  7:24 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-08  8:38 [PATCH] accel/rocket: Fix dma_fence reference leak on error paths in rocket_job_run() ZhaoJinming
2026-06-08  8:50 ` sashiko-bot
2026-06-08 10:00   ` [PATCH v2] accel/rocket: Fix fence leak, runtime PM leak and WARN in rocket_job_run() error paths ZhaoJinming
2026-06-08 10:38     ` sashiko-bot
2026-06-09  9:33       ` [PATCH v3 1/2] accel/rocket: Fix error path handling in rocket_job_run() ZhaoJinming
2026-06-09  9:33         ` [PATCH v3 2/2] accel/rocket: Fix iommu_group leak and unsafe IRQ register access ZhaoJinming
2026-06-09  9:47           ` sashiko-bot
2026-06-10  3:03             ` [PATCH v4 1/2] accel/rocket: Fix error path handling in rocket_job_run() ZhaoJinming
2026-06-10  3:03               ` [PATCH v4 2/2] accel/rocket: Fix iommu_group leak and unsafe IRQ register access ZhaoJinming
2026-06-10  3:14                 ` sashiko-bot
2026-06-10  3:25                   ` 赵金明
2026-06-10  6:01                   ` [PATCH v5 1/2] accel/rocket: Fix error path handling in rocket_job_run() ZhaoJinming
2026-06-10  6:01                     ` [PATCH v5 2/2] accel/rocket: Fix iommu_group leak and unsafe IRQ register access ZhaoJinming
2026-06-10  6:19                       ` sashiko-bot
2026-06-10  7:08                         ` 赵金明
2026-06-10  7:10                         ` [PATCH v6 1/2] accel/rocket: Fix error path handling in rocket_job_run() ZhaoJinming
2026-06-10  7:10                           ` [PATCH v6 2/2] accel/rocket: Fix iommu_group leak and unsafe IRQ register access ZhaoJinming
2026-06-10  7:24                             ` sashiko-bot [this message]
2026-06-10  7:24                           ` [PATCH v6 1/2] accel/rocket: Fix error path handling in rocket_job_run() sashiko-bot
2026-06-10  3:19               ` [PATCH v4 " sashiko-bot
2026-06-09  9:50         ` [PATCH v3 " sashiko-bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260610072454.F2DEB1F00893@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=sashiko-reviews@lists.linux.dev \
    --cc=zhaojinming@uniontech.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.