Linux Media Controller development
 help / color / mirror / Atom feed
From: sashiko-bot@kernel.org
To: "Sven Püschel" <s.pueschel@pengutronix.de>
Cc: linux-media@vger.kernel.org
Subject: Re: [PATCH 15/17] media: rockchip: rga: schedule jobs to multiple cores
Date: Fri, 05 Jun 2026 22:25:49 +0000	[thread overview]
Message-ID: <20260605222549.80D521F00898@smtp.kernel.org> (raw)
In-Reply-To: <20260606-spu-rga3multicore-v1-15-3ec2b15675f7@pengutronix.de>

Thank you for your contribution! Sashiko AI review found 5 potential issue(s) to consider:

New issues:
- [High] Race condition in device_run() error path and unhandled early return permanently deadlock the mem2mem queue.
- [Medium] Data race on core->curr pointer when clearing job assignments.

Pre-existing issues:
- [High] Out-of-bounds array write on unbind/rebind of components because rga->num_cores is never reset.
- [High] Missing pm_runtime_get_if_active() before hardware access in shared IRQ handler.
- [High] Missing synchronize_irq() in runtime suspend callback for shared IRQ.
--

commit f03021dff03be4350cbe7b0ab4ce4a9a54a7602f
Author: Sven Püschel <s.pueschel@pengutronix.de>

media: rockchip: rga: schedule jobs to multiple cores

This commit modifies the Rockchip RGA driver to schedule jobs across
multiple cores to better utilize hardware. A dedicated spinlock is added
to protect the selection of the next free core from race conditions.

> diff --git a/drivers/media/platform/rockchip/rga/rga.c b/drivers/media/platform/rockchip/rga/rga.c
> index 9cebb461b3fd2..f00b7f99f2521 100644
> --- a/drivers/media/platform/rockchip/rga/rga.c
> +++ b/drivers/media/platform/rockchip/rga/rga.c
> @@ -38,15 +38,31 @@ static void device_run(void *prv)
>  {
>  	struct rga_ctx *ctx = prv;
>  	struct rockchip_rga *rga = ctx->rga;
> -	struct rga_core *core = rga->cores[0];
> +	struct rga_core *core = NULL;
>  	struct vb2_v4l2_buffer *src, *dst;
>  	unsigned long flags;
>  	int ret;
> +	unsigned int i;
> +
> +	spin_lock_irqsave(&rga->cores_lock, flags);
> +	for (i = 0; i < rga->num_cores; i++) {
> +		if (!rga->cores[i]->curr) {
> +			core = rga->cores[i];
> +			core->curr = ctx;
> +			break;
> +		}
> +	}
> +	spin_unlock_irqrestore(&rga->cores_lock, flags);
> +
> +	WARN_ONCE(!core, "No free core although max parallel jobs matches the core count!\n");
> +	if (!core)
> +		return;
>  
>  	ret = pm_runtime_resume_and_get(core->dev);
>  	if (ret < 0) {
>  		v4l2_m2m_buf_done_and_job_finish(rga->m2m_dev, ctx->fh.m2m_ctx,
>  						 VB2_BUF_STATE_ERROR);
> +		core->curr = NULL;

[Severity: High]
Could this error path introduce a race condition that permanently deadlocks the
mem2mem queue?

v4l2_m2m_buf_done_and_job_finish() is called before core->curr is cleared,
which asynchronously schedules the next job via a workqueue.

If the worker executes device_run() on another CPU before the current CPU marks
the core as free, the worker will see the core as busy. If no other core is
free, device_run() will trigger the WARN_ONCE and return early without
notifying the v4l2 mem2mem framework:

    if (!core)
        return;

Because the new job is already marked as TRANS_RUNNING, returning early without
signaling job completion permanently hangs the job queue, and can cause a
D-state deadlock if userspace attempts to close the device file descriptor.

[Severity: Medium]
Additionally, could there be a data race when clearing core->curr here?

While cores_lock is acquired when scanning and assigning a core earlier in
device_run(), it is not held when releasing the core in this error path.
Concurrent unprotected reads and writes to core->curr across CPUs could lead to
undefined behavior.

>  		return;
>  	}

[ ... ]

[Severity: Medium]
This isn't a bug introduced by this patch, but does modifying core->curr in
rga_isr() without holding cores_lock introduce a data race?

Since cores_lock was introduced to protect against concurrent selections,
shouldn't it also be held when releasing the core in the interrupt handler?

drivers/media/platform/rockchip/rga/rga.c:rga_isr() {
    ...
    core->curr = NULL;
    ...
}

[Severity: High]
This isn't a bug introduced by this patch, but does the unbind/rebind path
cause an out-of-bounds array write in rga->cores[]?

During component bind, rga_core_bind() appends to the rga->cores array:

drivers/media/platform/rockchip/rga/rga.c:rga_core_bind() {
    ...
    rga->cores[rga->num_cores++] = core;
    ...
}

However, the master unbind callback and the component system do not reset
rga->num_cores to 0:

drivers/media/platform/rockchip/rga/rga.c:rga_unbind() {
    ...
    component_unbind_all(dev, NULL);
}

The struct rockchip_rga structure is not freed when a component is unbound. If
a component is unbound and subsequently re-bound, rga_core_bind() will continue
incrementing rga->num_cores past the initially allocated length, corrupting
memory.

[Severity: High]
This isn't a bug introduced by this patch, but is the driver missing a
pm_runtime_get_if_active() check before accessing hardware in the shared IRQ
handler?

The driver registers rga_isr() with IRQF_SHARED if there is no internal IOMMU.
When an interrupt is triggered by another device on the shared line, rga_isr()
blindly calls handle_irq(), which reads hardware registers:

drivers/media/platform/rockchip/rga/rga.c:rga_isr() {
    ...
    if (rga->hw->handle_irq(core)) {
    ...
}

If the RGA core is runtime-suspended and its clocks are disabled, accessing its
registers will cause an external abort / SError.

[Severity: High]
This isn't a bug introduced by this patch, but is there a missing
synchronize_irq() in the runtime suspend callback for shared IRQs?

In rga_runtime_suspend(), the driver disables clocks unconditionally:

drivers/media/platform/rockchip/rga/rga.c:rga_runtime_suspend() {
    ...
    clk_bulk_disable_unprepare(core->num_clks, core->clks);
    ...
}

Because the driver can use a shared IRQ, an interrupt handler could be
mid-flight on another CPU when the clocks are disabled, causing an external
abort when the handler accesses the now-unclocked hardware.

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260606-spu-rga3multicore-v1-0-3ec2b15675f7@pengutronix.de?part=15

  reply	other threads:[~2026-06-05 22:25 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-05 22:06 [PATCH 00/17] media: rockchip: rga: Add multi-core support Sven Püschel
2026-06-05 22:06 ` [PATCH 01/17] media: rockchip: rga: zero cmdbuf in shared code Sven Püschel
2026-06-05 22:20   ` sashiko-bot
2026-06-05 22:06 ` [PATCH 02/17] media: rockchip: rga: add comment about pixel alignment for YUV formats Sven Püschel
2026-06-05 22:06 ` [PATCH 03/17] media: rockchip: rga: move early return into if condition in vidioc_enum_fmt Sven Püschel
2026-06-05 22:06 ` [PATCH 04/17] media: rockchip: rga: removed unused regmap member Sven Püschel
2026-06-05 22:06 ` [PATCH 05/17] media: v4l2-mem2mem: support running multiple jobs in parallel Sven Püschel
2026-06-05 22:18   ` sashiko-bot
2026-06-05 22:06 ` [PATCH 06/17] media: rockchip: rga: move power handling to device_run Sven Püschel
2026-06-05 22:22   ` sashiko-bot
2026-06-05 22:06 ` [PATCH 07/17] media: rockchip: rga: adjust get_version to return the version Sven Püschel
2026-06-05 22:06 ` [PATCH 08/17] media: rockchip: rga: add rga_core structure Sven Püschel
2026-06-05 22:22   ` sashiko-bot
2026-06-05 22:06 ` [PATCH 09/17] media: rockchip: rga: use components to manage multiple cores Sven Püschel
2026-06-05 22:24   ` sashiko-bot
2026-06-05 22:06 ` [PATCH 10/17] media: rockchip: rga: move rockchip_rga allocation to master probe Sven Püschel
2026-06-05 22:23   ` sashiko-bot
2026-06-05 22:06 ` [PATCH 11/17] media: rockchip: rga: move video device to the master Sven Püschel
2026-06-05 22:21   ` sashiko-bot
2026-06-05 22:06 ` [PATCH 12/17] media: rockchip: rga: move core initialization from bind to probe Sven Püschel
2026-06-05 22:20   ` sashiko-bot
2026-06-05 22:06 ` [PATCH 13/17] media: rockchip: rga: bind all cores to the master Sven Püschel
2026-06-05 22:23   ` sashiko-bot
2026-06-05 22:07 ` [PATCH 14/17] media: rockchip: rga: put all cores into first core iommu domain Sven Püschel
2026-06-05 22:23   ` sashiko-bot
2026-06-05 22:07 ` [PATCH 15/17] media: rockchip: rga: schedule jobs to multiple cores Sven Püschel
2026-06-05 22:25   ` sashiko-bot [this message]
2026-06-05 22:07 ` [PATCH 16/17] arm64: dts: rockchip: add rga3 dt nodes to rk3588 Sven Püschel
2026-06-05 22:07 ` [PATCH 17/17] iommu/rockchip: disable fetch dte time limit Sven Püschel
2026-06-05 22:26   ` sashiko-bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260605222549.80D521F00898@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=s.pueschel@pengutronix.de \
    --cc=sashiko-reviews@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox