From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EB3543B71BA for ; Fri, 5 Jun 2026 22:25:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780698351; cv=none; b=H8SvRAPNBxtaheFa2DnMJVWkHVtfjamo4paLgrR60dk184/gkbLAYFo09kPthFvajMQXLWruy569yniL/AFbowcP0BZ3wdRLTXquKZUDetjjv80wCnSflQeYOYY7Wmxi9OkesslBKt7WF3Mmsp0VnpqcSEGE6WdQELyVMRbn8IM= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780698351; c=relaxed/simple; bh=0qshCgHiS1JLJOEweaJkpW57BZ02Bh8K16IQuMDiXb0=; h=From:Subject:To:Cc:In-Reply-To:References:Content-Type:Date: Message-Id; b=iq1ItK0dOB/Ad4jCLK3wJzkXtyQQV9AzDXDpcsq8yyrXjvBOrMCSRXt/qn81ZePELA70ommQVDZzsiOHLcSaG7OLU8Pn8NqVwjbi5Jlbt29gbxcwAnTGxh6uORd+LBtrQoVE+cRvGAY3yH/MYF9Sj55KhobbArkFf6uBjDtk9mg= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Ab+iLkrM; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Ab+iLkrM" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 80D521F00898; Fri, 5 Jun 2026 22:25:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1780698349; bh=a5hpfllqBjrTsTtCL3N6Wr1RzM9H3bZZXWc+LYBmDx0=; h=From:Subject:Reply-To:To:Cc:In-Reply-To:References:Date; b=Ab+iLkrMZkdN39CnlwCEJPnycnfwdB/aN87YyEEyL35yaeRuuqRu3vRQ1gMumUZUo 7HfW0vQG4OMk1woM5Hx/tMr6WnnPdRGpuRDhmRgsrAwEseOQVB06Ce5Oxr23zdiN0v rZhWnSHujJGmBzChvwi1g2KC1OcsU2jJIU2iuEgykSN4fc68j3vJMw3N7aAfLNZMCZ DKxBXgWQ3zI8w++ifHBEojFyBkHt4vbR0Yki1qAWM/MmsSCumFi9b2/maffqUneU2o QaV/z040syJNXG+fWkQ4JnQ/kYT+ZFgSo6ffkcP9r1frozyzrIWL2zLjXkIH7J3ou3 KECaFt/owCuoQ== From: sashiko-bot@kernel.org Subject: Re: [PATCH 15/17] media: rockchip: rga: schedule jobs to multiple cores Reply-To: sashiko-reviews@lists.linux.dev To: =?utf-8?b?U3ZlbiBQw7xzY2hlbA==?= Cc: linux-media@vger.kernel.org In-Reply-To: <20260606-spu-rga3multicore-v1-15-3ec2b15675f7@pengutronix.de> References: <20260606-spu-rga3multicore-v1-15-3ec2b15675f7@pengutronix.de> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Date: Fri, 05 Jun 2026 22:25:49 +0000 Message-Id: <20260605222549.80D521F00898@smtp.kernel.org> Precedence: bulk X-Mailing-List: linux-media@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Thank you for your contribution! Sashiko AI review found 5 potential issue(= s) to consider: New issues: - [High] Race condition in device_run() error path and unhandled early retu= rn permanently deadlock the mem2mem queue. - [Medium] Data race on core->curr pointer when clearing job assignments. Pre-existing issues: - [High] Out-of-bounds array write on unbind/rebind of components because r= ga->num_cores is never reset. - [High] Missing pm_runtime_get_if_active() before hardware access in share= d IRQ handler. - [High] Missing synchronize_irq() in runtime suspend callback for shared I= RQ. -- commit f03021dff03be4350cbe7b0ab4ce4a9a54a7602f Author: Sven P=C3=BCschel media: rockchip: rga: schedule jobs to multiple cores This commit modifies the Rockchip RGA driver to schedule jobs across multiple cores to better utilize hardware. A dedicated spinlock is added to protect the selection of the next free core from race conditions. > diff --git a/drivers/media/platform/rockchip/rga/rga.c b/drivers/media/pl= atform/rockchip/rga/rga.c > index 9cebb461b3fd2..f00b7f99f2521 100644 > --- a/drivers/media/platform/rockchip/rga/rga.c > +++ b/drivers/media/platform/rockchip/rga/rga.c > @@ -38,15 +38,31 @@ static void device_run(void *prv) > { > struct rga_ctx *ctx =3D prv; > struct rockchip_rga *rga =3D ctx->rga; > - struct rga_core *core =3D rga->cores[0]; > + struct rga_core *core =3D NULL; > struct vb2_v4l2_buffer *src, *dst; > unsigned long flags; > int ret; > + unsigned int i; > + > + spin_lock_irqsave(&rga->cores_lock, flags); > + for (i =3D 0; i < rga->num_cores; i++) { > + if (!rga->cores[i]->curr) { > + core =3D rga->cores[i]; > + core->curr =3D ctx; > + break; > + } > + } > + spin_unlock_irqrestore(&rga->cores_lock, flags); > + > + WARN_ONCE(!core, "No free core although max parallel jobs matches the c= ore count!\n"); > + if (!core) > + return; > =20 > ret =3D pm_runtime_resume_and_get(core->dev); > if (ret < 0) { > v4l2_m2m_buf_done_and_job_finish(rga->m2m_dev, ctx->fh.m2m_ctx, > VB2_BUF_STATE_ERROR); > + core->curr =3D NULL; [Severity: High] Could this error path introduce a race condition that permanently deadlocks= the mem2mem queue? v4l2_m2m_buf_done_and_job_finish() is called before core->curr is cleared, which asynchronously schedules the next job via a workqueue. If the worker executes device_run() on another CPU before the current CPU m= arks the core as free, the worker will see the core as busy. If no other core is free, device_run() will trigger the WARN_ONCE and return early without notifying the v4l2 mem2mem framework: if (!core) return; Because the new job is already marked as TRANS_RUNNING, returning early wit= hout signaling job completion permanently hangs the job queue, and can cause a D-state deadlock if userspace attempts to close the device file descriptor. [Severity: Medium] Additionally, could there be a data race when clearing core->curr here? While cores_lock is acquired when scanning and assigning a core earlier in device_run(), it is not held when releasing the core in this error path. Concurrent unprotected reads and writes to core->curr across CPUs could lea= d to undefined behavior. > return; > } [ ... ] [Severity: Medium] This isn't a bug introduced by this patch, but does modifying core->curr in rga_isr() without holding cores_lock introduce a data race? Since cores_lock was introduced to protect against concurrent selections, shouldn't it also be held when releasing the core in the interrupt handler? drivers/media/platform/rockchip/rga/rga.c:rga_isr() { ... core->curr =3D NULL; ... } [Severity: High] This isn't a bug introduced by this patch, but does the unbind/rebind path cause an out-of-bounds array write in rga->cores[]? During component bind, rga_core_bind() appends to the rga->cores array: drivers/media/platform/rockchip/rga/rga.c:rga_core_bind() { ... rga->cores[rga->num_cores++] =3D core; ... } However, the master unbind callback and the component system do not reset rga->num_cores to 0: drivers/media/platform/rockchip/rga/rga.c:rga_unbind() { ... component_unbind_all(dev, NULL); } The struct rockchip_rga structure is not freed when a component is unbound.= If a component is unbound and subsequently re-bound, rga_core_bind() will cont= inue incrementing rga->num_cores past the initially allocated length, corrupting memory. [Severity: High] This isn't a bug introduced by this patch, but is the driver missing a pm_runtime_get_if_active() check before accessing hardware in the shared IRQ handler? The driver registers rga_isr() with IRQF_SHARED if there is no internal IOM= MU. When an interrupt is triggered by another device on the shared line, rga_is= r() blindly calls handle_irq(), which reads hardware registers: drivers/media/platform/rockchip/rga/rga.c:rga_isr() { ... if (rga->hw->handle_irq(core)) { ... } If the RGA core is runtime-suspended and its clocks are disabled, accessing= its registers will cause an external abort / SError. [Severity: High] This isn't a bug introduced by this patch, but is there a missing synchronize_irq() in the runtime suspend callback for shared IRQs? In rga_runtime_suspend(), the driver disables clocks unconditionally: drivers/media/platform/rockchip/rga/rga.c:rga_runtime_suspend() { ... clk_bulk_disable_unprepare(core->num_clks, core->clks); ... } Because the driver can use a shared IRQ, an interrupt handler could be mid-flight on another CPU when the clocks are disabled, causing an external abort when the handler accesses the now-unclocked hardware. --=20 Sashiko AI review =C2=B7 https://sashiko.dev/#/patchset/20260606-spu-rga3mu= lticore-v1-0-3ec2b15675f7@pengutronix.de?part=3D15