* [PATCH 00/10] drm/v3d: Scheduler and submission fixes and refactoring
@ 2026-04-13 15:03 Maíra Canal
2026-04-13 15:03 ` [PATCH 01/10] drm/v3d: Drop unused drm_encoder.h include from v3d_drv.h Maíra Canal
` (9 more replies)
0 siblings, 10 replies; 24+ messages in thread
From: Maíra Canal @ 2026-04-13 15:03 UTC (permalink / raw)
To: Melissa Wen, Iago Toral, Tvrtko Ursulin, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
Christian König
Cc: kernel-dev, dri-devel, Maíra Canal
This series addresses several issues in the v3d scheduler and submission
code. Most of the fixes were motivated by feedback in the vc4 scheduler
series [1], which inherited issues from v3d. Based on the issues found
there, this series addresses the issues in the v3d driver as well.
This series has cleanup patches, fixes, and finally, a refactoring of
the submission code, which allowed us to fix the atomicity of a
submission.
- Cleanups and small improvements:
- PATCH 1/10: "drm/v3d: Drop unused drm_encoder.h include from v3d_drv.h"
- PATCH 3/10: "drm/v3d: Use inline lock for dma fence initialization"
- PATCH 4/10: "drm/v3d: Replace spin_lock_irqsave() with spin_lock_irq()"
- Fixes:
- PATCH 2/10: "drm/v3d: Clear queue->active_job when v3d_fence_create() fails"
- PATCH 5/10: "drm/v3d: Extract v3d_job_add_syncobjs() helper"
- PATCH 6/10: "drm/v3d: Reject invalid syncobj handles in submit ioctls"
- Submission refactoring:
- PATCH 7/10: "drm/v3d: Migrate BO reservation locking to DRM exec"
- PATCH 8/10: "drm/v3d: Introduce struct v3d_submit and convert CL/TFU/CSD ioctls"
- PATCH 9/10: "drm/v3d: Refactor CPU ioctl into unified submission chain"
- PATCH 10/10: "drm/v3d: Ensure atomic submissions in v3d_submit_jobs()" (Also a fix)
[1] https://lore.kernel.org/dri-devel/20260205-vc4-drm-scheduler-v1-0-c6174fd7bbc1@igalia.com/T/
Best regards,
- Maíra
---
Maíra Canal (10):
drm/v3d: Drop unused drm_encoder.h include from v3d_drv.h
drm/v3d: Clear queue->active_job when v3d_fence_create() fails
drm/v3d: Use inline lock for dma fence initialization
drm/v3d: Replace spin_lock_irqsave() with spin_lock_irq()
drm/v3d: Extract v3d_job_add_syncobjs() helper
drm/v3d: Reject invalid syncobj handles in submit ioctls
drm/v3d: Migrate BO reservation locking to DRM exec
drm/v3d: Introduce struct v3d_submit and convert CL/TFU/CSD ioctls
drm/v3d: Refactor CPU ioctl into unified submission chain
drm/v3d: Ensure atomic submissions in v3d_submit_jobs()
drivers/gpu/drm/v3d/Kconfig | 1 +
drivers/gpu/drm/v3d/v3d_drv.h | 32 +-
drivers/gpu/drm/v3d/v3d_fence.c | 2 +-
drivers/gpu/drm/v3d/v3d_irq.c | 7 +-
drivers/gpu/drm/v3d/v3d_sched.c | 63 ++--
drivers/gpu/drm/v3d/v3d_submit.c | 674 +++++++++++++++++----------------------
6 files changed, 369 insertions(+), 410 deletions(-)
---
base-commit: ac6cba38729f63b2b9c947a973e3a45ba8ec448a
change-id: 20260407-v3d-sched-misc-fixes-623739017e53
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 01/10] drm/v3d: Drop unused drm_encoder.h include from v3d_drv.h
2026-04-13 15:03 [PATCH 00/10] drm/v3d: Scheduler and submission fixes and refactoring Maíra Canal
@ 2026-04-13 15:03 ` Maíra Canal
2026-04-13 15:03 ` [PATCH 02/10] drm/v3d: Clear queue->active_job when v3d_fence_create() fails Maíra Canal
` (8 subsequent siblings)
9 siblings, 0 replies; 24+ messages in thread
From: Maíra Canal @ 2026-04-13 15:03 UTC (permalink / raw)
To: Melissa Wen, Iago Toral, Tvrtko Ursulin, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
Christian König
Cc: kernel-dev, dri-devel, Maíra Canal
The V3D driver has no display pipeline, so nothing in the driver requires
drm_encoder.h. Remove the stale include.
Signed-off-by: Maíra Canal <mcanal@igalia.com>
---
drivers/gpu/drm/v3d/v3d_drv.h | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index 4ebe175a8c6b..5a0b9da2c3aa 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -7,7 +7,6 @@
#include <linux/spinlock_types.h>
#include <linux/workqueue.h>
-#include <drm/drm_encoder.h>
#include <drm/drm_gem.h>
#include <drm/drm_gem_shmem_helper.h>
#include <drm/gpu_scheduler.h>
--
2.53.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 02/10] drm/v3d: Clear queue->active_job when v3d_fence_create() fails
2026-04-13 15:03 [PATCH 00/10] drm/v3d: Scheduler and submission fixes and refactoring Maíra Canal
2026-04-13 15:03 ` [PATCH 01/10] drm/v3d: Drop unused drm_encoder.h include from v3d_drv.h Maíra Canal
@ 2026-04-13 15:03 ` Maíra Canal
2026-04-16 11:39 ` Tvrtko Ursulin
2026-04-13 15:03 ` [PATCH 03/10] drm/v3d: Use inline lock for dma fence initialization Maíra Canal
` (7 subsequent siblings)
9 siblings, 1 reply; 24+ messages in thread
From: Maíra Canal @ 2026-04-13 15:03 UTC (permalink / raw)
To: Melissa Wen, Iago Toral, Tvrtko Ursulin, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
Christian König
Cc: kernel-dev, dri-devel, Maíra Canal
The run_job() callbacks for BIN, RENDER, TFU and CSD assign the incoming
job to queue->active_job before calling v3d_fence_create(). If
v3d_fence_create() fails, the callback returns NULL without clearing
active_job, leaving a dangling pointer.
Create a failure path in all run_job() callbacks that clears the active
job before returning NULL. The BIN path takes queue->queue_lock around the
clear as it races against v3d_overflow_mem_work(); RENDER, TFU and CSD
paths have no concurrent reader, so the clear is lock-free.
Fixes: a783a09ee76d ("drm/v3d: Refactor job management.")
Signed-off-by: Maíra Canal <mcanal@igalia.com>
---
drivers/gpu/drm/v3d/v3d_sched.c | 52 ++++++++++++++++++++++++-----------------
1 file changed, 30 insertions(+), 22 deletions(-)
diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index 1855ef5b3b5f..d42db75e639b 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -206,12 +206,8 @@ static struct dma_fence *v3d_bin_job_run(struct drm_sched_job *sched_job)
struct dma_fence *fence;
unsigned long irqflags;
- if (unlikely(job->base.base.s_fence->finished.error)) {
- spin_lock_irqsave(&queue->queue_lock, irqflags);
- queue->active_job = NULL;
- spin_unlock_irqrestore(&queue->queue_lock, irqflags);
- return NULL;
- }
+ if (unlikely(job->base.base.s_fence->finished.error))
+ goto out_clean_job;
/* Lock required around bin_job update vs
* v3d_overflow_mem_work().
@@ -228,7 +224,7 @@ static struct dma_fence *v3d_bin_job_run(struct drm_sched_job *sched_job)
fence = v3d_fence_create(v3d, V3D_BIN);
if (IS_ERR(fence))
- return NULL;
+ goto out_clean_job;
if (job->base.irq_fence)
dma_fence_put(job->base.irq_fence);
@@ -256,6 +252,12 @@ static struct dma_fence *v3d_bin_job_run(struct drm_sched_job *sched_job)
V3D_CORE_WRITE(0, V3D_CLE_CT0QEA, job->end);
return fence;
+
+out_clean_job:
+ spin_lock_irqsave(&queue->queue_lock, irqflags);
+ queue->active_job = NULL;
+ spin_unlock_irqrestore(&queue->queue_lock, irqflags);
+ return NULL;
}
static struct dma_fence *v3d_render_job_run(struct drm_sched_job *sched_job)
@@ -265,10 +267,8 @@ static struct dma_fence *v3d_render_job_run(struct drm_sched_job *sched_job)
struct drm_device *dev = &v3d->drm;
struct dma_fence *fence;
- if (unlikely(job->base.base.s_fence->finished.error)) {
- v3d->queue[V3D_RENDER].active_job = NULL;
- return NULL;
- }
+ if (unlikely(job->base.base.s_fence->finished.error))
+ goto out_clean_job;
v3d->queue[V3D_RENDER].active_job = &job->base;
@@ -282,7 +282,7 @@ static struct dma_fence *v3d_render_job_run(struct drm_sched_job *sched_job)
fence = v3d_fence_create(v3d, V3D_RENDER);
if (IS_ERR(fence))
- return NULL;
+ goto out_clean_job;
if (job->base.irq_fence)
dma_fence_put(job->base.irq_fence);
@@ -303,6 +303,10 @@ static struct dma_fence *v3d_render_job_run(struct drm_sched_job *sched_job)
V3D_CORE_WRITE(0, V3D_CLE_CT1QEA, job->end);
return fence;
+
+out_clean_job:
+ v3d->queue[V3D_RENDER].active_job = NULL;
+ return NULL;
}
static struct dma_fence *
@@ -313,16 +317,14 @@ v3d_tfu_job_run(struct drm_sched_job *sched_job)
struct drm_device *dev = &v3d->drm;
struct dma_fence *fence;
- if (unlikely(job->base.base.s_fence->finished.error)) {
- v3d->queue[V3D_TFU].active_job = NULL;
- return NULL;
- }
+ if (unlikely(job->base.base.s_fence->finished.error))
+ goto out_clean_job;
v3d->queue[V3D_TFU].active_job = &job->base;
fence = v3d_fence_create(v3d, V3D_TFU);
if (IS_ERR(fence))
- return NULL;
+ goto out_clean_job;
if (job->base.irq_fence)
dma_fence_put(job->base.irq_fence);
@@ -350,6 +352,10 @@ v3d_tfu_job_run(struct drm_sched_job *sched_job)
V3D_WRITE(V3D_TFU_ICFG(v3d->ver), job->args.icfg | V3D_TFU_ICFG_IOC);
return fence;
+
+out_clean_job:
+ v3d->queue[V3D_TFU].active_job = NULL;
+ return NULL;
}
static struct dma_fence *
@@ -361,10 +367,8 @@ v3d_csd_job_run(struct drm_sched_job *sched_job)
struct dma_fence *fence;
int i, csd_cfg0_reg;
- if (unlikely(job->base.base.s_fence->finished.error)) {
- v3d->queue[V3D_CSD].active_job = NULL;
- return NULL;
- }
+ if (unlikely(job->base.base.s_fence->finished.error))
+ goto out_clean_job;
v3d->queue[V3D_CSD].active_job = &job->base;
@@ -372,7 +376,7 @@ v3d_csd_job_run(struct drm_sched_job *sched_job)
fence = v3d_fence_create(v3d, V3D_CSD);
if (IS_ERR(fence))
- return NULL;
+ goto out_clean_job;
if (job->base.irq_fence)
dma_fence_put(job->base.irq_fence);
@@ -399,6 +403,10 @@ v3d_csd_job_run(struct drm_sched_job *sched_job)
V3D_CORE_WRITE(0, csd_cfg0_reg, job->args.cfg[0]);
return fence;
+
+out_clean_job:
+ v3d->queue[V3D_CSD].active_job = NULL;
+ return NULL;
}
static void
--
2.53.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 03/10] drm/v3d: Use inline lock for dma fence initialization
2026-04-13 15:03 [PATCH 00/10] drm/v3d: Scheduler and submission fixes and refactoring Maíra Canal
2026-04-13 15:03 ` [PATCH 01/10] drm/v3d: Drop unused drm_encoder.h include from v3d_drv.h Maíra Canal
2026-04-13 15:03 ` [PATCH 02/10] drm/v3d: Clear queue->active_job when v3d_fence_create() fails Maíra Canal
@ 2026-04-13 15:03 ` Maíra Canal
2026-04-16 11:44 ` Tvrtko Ursulin
2026-04-13 15:03 ` [PATCH 04/10] drm/v3d: Replace spin_lock_irqsave() with spin_lock_irq() Maíra Canal
` (6 subsequent siblings)
9 siblings, 1 reply; 24+ messages in thread
From: Maíra Canal @ 2026-04-13 15:03 UTC (permalink / raw)
To: Melissa Wen, Iago Toral, Tvrtko Ursulin, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
Christian König
Cc: kernel-dev, dri-devel, Maíra Canal
Since commit 1f32f310a13c ("dma-buf: inline spinlock for fence protection
v5"), dma_fence_init() accepts a NULL external lock and will fall back to
an inline spinlock embedded in the fence itself. The embedded spinlock
allows the decoupling the lock and the fence lifetime.
Pass NULL so each v3d fence uses its own inline lock. This will allow
queue_lock to use spin_(un)lock_irq() in all its uses instead of
spin_(un)lock_irqsave().
Signed-off-by: Maíra Canal <mcanal@igalia.com>
---
drivers/gpu/drm/v3d/v3d_fence.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/v3d/v3d_fence.c b/drivers/gpu/drm/v3d/v3d_fence.c
index c500136d0455..9b1a882a4c15 100644
--- a/drivers/gpu/drm/v3d/v3d_fence.c
+++ b/drivers/gpu/drm/v3d/v3d_fence.c
@@ -15,7 +15,7 @@ struct dma_fence *v3d_fence_create(struct v3d_dev *v3d, enum v3d_queue q)
fence->dev = &v3d->drm;
fence->queue = q;
fence->seqno = ++queue->emit_seqno;
- dma_fence_init(&fence->base, &v3d_fence_ops, &queue->queue_lock,
+ dma_fence_init(&fence->base, &v3d_fence_ops, NULL,
queue->fence_context, fence->seqno);
return &fence->base;
--
2.53.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 04/10] drm/v3d: Replace spin_lock_irqsave() with spin_lock_irq()
2026-04-13 15:03 [PATCH 00/10] drm/v3d: Scheduler and submission fixes and refactoring Maíra Canal
` (2 preceding siblings ...)
2026-04-13 15:03 ` [PATCH 03/10] drm/v3d: Use inline lock for dma fence initialization Maíra Canal
@ 2026-04-13 15:03 ` Maíra Canal
2026-04-16 11:46 ` Tvrtko Ursulin
2026-04-13 15:03 ` [PATCH 05/10] drm/v3d: Extract v3d_job_add_syncobjs() helper Maíra Canal
` (5 subsequent siblings)
9 siblings, 1 reply; 24+ messages in thread
From: Maíra Canal @ 2026-04-13 15:03 UTC (permalink / raw)
To: Melissa Wen, Iago Toral, Tvrtko Ursulin, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
Christian König
Cc: kernel-dev, dri-devel, Maíra Canal
v3d_overflow_mem_work() and v3d_bin_job_run() both run from workqueue
context where IRQs are always enabled, so the flags save/restore of
spin_lock_irqsave() is unnecessary. Use spin_lock_irq() instead.
Signed-off-by: Maíra Canal <mcanal@igalia.com>
---
drivers/gpu/drm/v3d/v3d_irq.c | 7 +++----
drivers/gpu/drm/v3d/v3d_sched.c | 9 ++++-----
2 files changed, 7 insertions(+), 9 deletions(-)
diff --git a/drivers/gpu/drm/v3d/v3d_irq.c b/drivers/gpu/drm/v3d/v3d_irq.c
index 86efaef2722c..d4d7c39227be 100644
--- a/drivers/gpu/drm/v3d/v3d_irq.c
+++ b/drivers/gpu/drm/v3d/v3d_irq.c
@@ -47,7 +47,6 @@ v3d_overflow_mem_work(struct work_struct *work)
struct v3d_queue_state *queue = &v3d->queue[V3D_BIN];
struct v3d_bin_job *bin_job;
struct drm_gem_object *obj;
- unsigned long irqflags;
if (IS_ERR(bo)) {
drm_err(dev, "Couldn't allocate binner overflow mem\n");
@@ -64,17 +63,17 @@ v3d_overflow_mem_work(struct work_struct *work)
* bin job got scheduled, that's fine. We'll just give them
* some binner pool anyway.
*/
- spin_lock_irqsave(&queue->queue_lock, irqflags);
+ spin_lock_irq(&queue->queue_lock);
bin_job = (struct v3d_bin_job *)queue->active_job;
if (!bin_job) {
- spin_unlock_irqrestore(&queue->queue_lock, irqflags);
+ spin_unlock_irq(&queue->queue_lock);
goto out;
}
drm_gem_object_get(obj);
list_add_tail(&bo->unref_head, &bin_job->render->unref_list);
- spin_unlock_irqrestore(&queue->queue_lock, irqflags);
+ spin_unlock_irq(&queue->queue_lock);
v3d_mmu_flush_all(v3d);
diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index d42db75e639b..f3999678e5da 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -204,7 +204,6 @@ static struct dma_fence *v3d_bin_job_run(struct drm_sched_job *sched_job)
struct v3d_queue_state *queue = &v3d->queue[V3D_BIN];
struct drm_device *dev = &v3d->drm;
struct dma_fence *fence;
- unsigned long irqflags;
if (unlikely(job->base.base.s_fence->finished.error))
goto out_clean_job;
@@ -212,13 +211,13 @@ static struct dma_fence *v3d_bin_job_run(struct drm_sched_job *sched_job)
/* Lock required around bin_job update vs
* v3d_overflow_mem_work().
*/
- spin_lock_irqsave(&queue->queue_lock, irqflags);
+ spin_lock_irq(&queue->queue_lock);
queue->active_job = &job->base;
/* Clear out the overflow allocation, so we don't
* reuse the overflow attached to a previous job.
*/
V3D_CORE_WRITE(0, V3D_PTB_BPOS, 0);
- spin_unlock_irqrestore(&queue->queue_lock, irqflags);
+ spin_unlock_irq(&queue->queue_lock);
v3d_invalidate_caches(v3d);
@@ -254,9 +253,9 @@ static struct dma_fence *v3d_bin_job_run(struct drm_sched_job *sched_job)
return fence;
out_clean_job:
- spin_lock_irqsave(&queue->queue_lock, irqflags);
+ spin_lock_irq(&queue->queue_lock);
queue->active_job = NULL;
- spin_unlock_irqrestore(&queue->queue_lock, irqflags);
+ spin_unlock_irq(&queue->queue_lock);
return NULL;
}
--
2.53.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 05/10] drm/v3d: Extract v3d_job_add_syncobjs() helper
2026-04-13 15:03 [PATCH 00/10] drm/v3d: Scheduler and submission fixes and refactoring Maíra Canal
` (3 preceding siblings ...)
2026-04-13 15:03 ` [PATCH 04/10] drm/v3d: Replace spin_lock_irqsave() with spin_lock_irq() Maíra Canal
@ 2026-04-13 15:03 ` Maíra Canal
2026-04-16 11:53 ` Tvrtko Ursulin
2026-04-13 15:03 ` [PATCH 06/10] drm/v3d: Reject invalid syncobj handles in submit ioctls Maíra Canal
` (4 subsequent siblings)
9 siblings, 1 reply; 24+ messages in thread
From: Maíra Canal @ 2026-04-13 15:03 UTC (permalink / raw)
To: Melissa Wen, Iago Toral, Tvrtko Ursulin, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
Christian König
Cc: kernel-dev, dri-devel, Maíra Canal
Move the syncobj dependency setup out of v3d_job_init() into its own
v3d_job_add_syncobjs() helper. No functional change.
This prepares for the next commit which changes the error handling, and
for a later consolidation that separates job allocation from syncobj
attachment.
Signed-off-by: Maíra Canal <mcanal@igalia.com>
---
drivers/gpu/drm/v3d/v3d_submit.c | 71 ++++++++++++++++++++++++----------------
1 file changed, 43 insertions(+), 28 deletions(-)
diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index f75da2e3533e..054367ba533d 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -162,14 +162,52 @@ v3d_job_deallocate(void **container)
*container = NULL;
}
+static int
+v3d_job_add_syncobjs(struct v3d_job *job, struct drm_file *file_priv,
+ u32 in_sync, struct v3d_submit_ext *se, enum v3d_queue queue)
+{
+ bool has_multisync = se && (se->flags & DRM_V3D_EXT_ID_MULTI_SYNC);
+ struct v3d_dev *v3d = job->v3d;
+ int ret = 0;
+
+ if (!has_multisync) {
+ ret = drm_sched_job_add_syncobj_dependency(&job->base, file_priv,
+ in_sync, 0);
+ // TODO: Investigate why this was filtered out for the IOCTL.
+ if (ret && ret != -ENOENT)
+ return ret;
+ return 0;
+ }
+
+ if (se->in_sync_count && se->wait_stage == queue) {
+ struct drm_v3d_sem __user *handle = u64_to_user_ptr(se->in_syncs);
+
+ for (int i = 0; i < se->in_sync_count; i++) {
+ struct drm_v3d_sem in;
+
+ if (copy_from_user(&in, handle++, sizeof(in))) {
+ drm_dbg(&v3d->drm, "Failed to copy wait dep handle.\n");
+ return -EFAULT;
+ }
+
+ ret = drm_sched_job_add_syncobj_dependency(&job->base,
+ file_priv, in.handle, 0);
+ // TODO: Investigate why this was filtered out for the IOCTL.
+ if (ret && ret != -ENOENT)
+ return ret;
+ }
+ }
+
+ return 0;
+}
+
static int
v3d_job_init(struct v3d_dev *v3d, struct drm_file *file_priv,
struct v3d_job *job, void (*free)(struct kref *ref),
u32 in_sync, struct v3d_submit_ext *se, enum v3d_queue queue)
{
struct v3d_file_priv *v3d_priv = file_priv->driver_priv;
- bool has_multisync = se && (se->flags & DRM_V3D_EXT_ID_MULTI_SYNC);
- int ret, i;
+ int ret;
job->v3d = v3d;
job->free = free;
@@ -180,32 +218,9 @@ v3d_job_init(struct v3d_dev *v3d, struct drm_file *file_priv,
if (ret)
return ret;
- if (has_multisync) {
- if (se->in_sync_count && se->wait_stage == queue) {
- struct drm_v3d_sem __user *handle = u64_to_user_ptr(se->in_syncs);
-
- for (i = 0; i < se->in_sync_count; i++) {
- struct drm_v3d_sem in;
-
- if (copy_from_user(&in, handle++, sizeof(in))) {
- ret = -EFAULT;
- drm_dbg(&v3d->drm, "Failed to copy wait dep handle.\n");
- goto fail_job_init;
- }
- ret = drm_sched_job_add_syncobj_dependency(&job->base, file_priv, in.handle, 0);
-
- // TODO: Investigate why this was filtered out for the IOCTL.
- if (ret && ret != -ENOENT)
- goto fail_job_init;
- }
- }
- } else {
- ret = drm_sched_job_add_syncobj_dependency(&job->base, file_priv, in_sync, 0);
-
- // TODO: Investigate why this was filtered out for the IOCTL.
- if (ret && ret != -ENOENT)
- goto fail_job_init;
- }
+ ret = v3d_job_add_syncobjs(job, file_priv, in_sync, se, queue);
+ if (ret)
+ goto fail_job_init;
/* CPU jobs don't require hardware resources */
if (queue != V3D_CPU) {
--
2.53.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 06/10] drm/v3d: Reject invalid syncobj handles in submit ioctls
2026-04-13 15:03 [PATCH 00/10] drm/v3d: Scheduler and submission fixes and refactoring Maíra Canal
` (4 preceding siblings ...)
2026-04-13 15:03 ` [PATCH 05/10] drm/v3d: Extract v3d_job_add_syncobjs() helper Maíra Canal
@ 2026-04-13 15:03 ` Maíra Canal
2026-04-16 11:59 ` Tvrtko Ursulin
2026-04-17 15:05 ` Tvrtko Ursulin
2026-04-13 15:03 ` [PATCH 07/10] drm/v3d: Migrate BO reservation locking to DRM exec Maíra Canal
` (3 subsequent siblings)
9 siblings, 2 replies; 24+ messages in thread
From: Maíra Canal @ 2026-04-13 15:03 UTC (permalink / raw)
To: Melissa Wen, Iago Toral, Tvrtko Ursulin, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
Christian König
Cc: kernel-dev, dri-devel, Maíra Canal
drm_sched_job_add_syncobj_dependency() returns -ENOENT both when the
handle is zero and when the handle is non-zero but does not find a
corresponding existing syncobj (userspace bug). The driver previously
ignored -ENOENT in both cases, silently accepting broken handles.
Distinguish the two: skip the call entirely when the handle is zero, as
there is no dependency, and let -ENOENT propagate for non-zero handles
that don't resolve, turning the error into a proper return to userspace.
Signed-off-by: Maíra Canal <mcanal@igalia.com>
---
drivers/gpu/drm/v3d/v3d_submit.c | 23 ++++++++++++-----------
1 file changed, 12 insertions(+), 11 deletions(-)
diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index 054367ba533d..fe8b5757c3e8 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -171,12 +171,11 @@ v3d_job_add_syncobjs(struct v3d_job *job, struct drm_file *file_priv,
int ret = 0;
if (!has_multisync) {
- ret = drm_sched_job_add_syncobj_dependency(&job->base, file_priv,
- in_sync, 0);
- // TODO: Investigate why this was filtered out for the IOCTL.
- if (ret && ret != -ENOENT)
- return ret;
- return 0;
+ /* Ignore syncobj if its handle is NULL */
+ if (in_sync)
+ ret = drm_sched_job_add_syncobj_dependency(&job->base, file_priv,
+ in_sync, 0);
+ return ret;
}
if (se->in_sync_count && se->wait_stage == queue) {
@@ -190,11 +189,13 @@ v3d_job_add_syncobjs(struct v3d_job *job, struct drm_file *file_priv,
return -EFAULT;
}
- ret = drm_sched_job_add_syncobj_dependency(&job->base,
- file_priv, in.handle, 0);
- // TODO: Investigate why this was filtered out for the IOCTL.
- if (ret && ret != -ENOENT)
- return ret;
+ /* Ignore syncobj if its handle is NULL */
+ if (in.handle) {
+ ret = drm_sched_job_add_syncobj_dependency(&job->base,
+ file_priv, in.handle, 0);
+ if (ret)
+ return ret;
+ }
}
}
--
2.53.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 07/10] drm/v3d: Migrate BO reservation locking to DRM exec
2026-04-13 15:03 [PATCH 00/10] drm/v3d: Scheduler and submission fixes and refactoring Maíra Canal
` (5 preceding siblings ...)
2026-04-13 15:03 ` [PATCH 06/10] drm/v3d: Reject invalid syncobj handles in submit ioctls Maíra Canal
@ 2026-04-13 15:03 ` Maíra Canal
2026-04-16 12:24 ` Tvrtko Ursulin
2026-04-13 15:03 ` [PATCH 08/10] drm/v3d: Introduce struct v3d_submit and convert CL/TFU/CSD ioctls Maíra Canal
` (2 subsequent siblings)
9 siblings, 1 reply; 24+ messages in thread
From: Maíra Canal @ 2026-04-13 15:03 UTC (permalink / raw)
To: Melissa Wen, Iago Toral, Tvrtko Ursulin, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
Christian König
Cc: kernel-dev, dri-devel, Maíra Canal
Replace the drm_gem_(un)lock_reservations() + ww_acquire_ctx pattern with
DRM exec across all submit ioctls. Just a straightforward conversion; no
functional change.
Signed-off-by: Maíra Canal <mcanal@igalia.com>
---
drivers/gpu/drm/v3d/Kconfig | 1 +
drivers/gpu/drm/v3d/v3d_drv.h | 3 +-
drivers/gpu/drm/v3d/v3d_submit.c | 68 +++++++++++++++++-----------------------
3 files changed, 32 insertions(+), 40 deletions(-)
diff --git a/drivers/gpu/drm/v3d/Kconfig b/drivers/gpu/drm/v3d/Kconfig
index ce62c5908e1d..6a33e0ab30de 100644
--- a/drivers/gpu/drm/v3d/Kconfig
+++ b/drivers/gpu/drm/v3d/Kconfig
@@ -5,6 +5,7 @@ config DRM_V3D
depends on DRM
depends on COMMON_CLK
depends on MMU
+ select DRM_EXEC
select DRM_SCHED
select DRM_GEM_SHMEM_HELPER
help
diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index 5a0b9da2c3aa..788a45c60290 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -7,6 +7,7 @@
#include <linux/spinlock_types.h>
#include <linux/workqueue.h>
+#include <drm/drm_exec.h>
#include <drm/drm_gem.h>
#include <drm/drm_gem_shmem_helper.h>
#include <drm/gpu_scheduler.h>
@@ -418,7 +419,7 @@ struct v3d_indirect_csd_info {
struct drm_gem_object *indirect;
/* Context of the Indirect CSD job */
- struct ww_acquire_ctx acquire_ctx;
+ struct drm_exec exec;
};
struct v3d_timestamp_query_info {
diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index fe8b5757c3e8..6aa8c65e873c 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -20,20 +20,18 @@
* to v3d, so we don't attach dma-buf fences to them.
*/
static int
-v3d_lock_bo_reservations(struct v3d_job *job,
- struct ww_acquire_ctx *acquire_ctx)
+v3d_lock_bo_reservations(struct v3d_job *job, struct drm_exec *exec)
{
int i, ret;
- ret = drm_gem_lock_reservations(job->bo, job->bo_count, acquire_ctx);
- if (ret)
- return ret;
-
- for (i = 0; i < job->bo_count; i++) {
- ret = dma_resv_reserve_fences(job->bo[i]->resv, 1);
+ drm_exec_init(exec, DRM_EXEC_INTERRUPTIBLE_WAIT, job->bo_count);
+ drm_exec_until_all_locked(exec) {
+ ret = drm_exec_prepare_array(exec, job->bo, job->bo_count, 1);
if (ret)
goto fail;
+ }
+ for (i = 0; i < job->bo_count; i++) {
ret = drm_sched_job_add_implicit_dependencies(&job->base,
job->bo[i], true);
if (ret)
@@ -43,7 +41,7 @@ v3d_lock_bo_reservations(struct v3d_job *job,
return 0;
fail:
- drm_gem_unlock_reservations(job->bo, job->bo_count, acquire_ctx);
+ drm_exec_fini(exec);
return ret;
}
@@ -259,7 +257,7 @@ v3d_push_job(struct v3d_job *job)
static void
v3d_attach_fences_and_unlock_reservation(struct drm_file *file_priv,
struct v3d_job *job,
- struct ww_acquire_ctx *acquire_ctx,
+ struct drm_exec *exec,
u32 out_sync,
struct v3d_submit_ext *se,
struct dma_fence *done_fence)
@@ -274,7 +272,7 @@ v3d_attach_fences_and_unlock_reservation(struct drm_file *file_priv,
DMA_RESV_USAGE_WRITE);
}
- drm_gem_unlock_reservations(job->bo, job->bo_count, acquire_ctx);
+ drm_exec_fini(exec);
/* Update the return sync object for the job */
/* If it only supports a single signal semaphore*/
@@ -305,7 +303,7 @@ v3d_setup_csd_jobs_and_bos(struct drm_file *file_priv,
struct v3d_csd_job **job,
struct v3d_job **clean_job,
struct v3d_submit_ext *se,
- struct ww_acquire_ctx *acquire_ctx)
+ struct drm_exec *exec)
{
int ret;
@@ -338,7 +336,7 @@ v3d_setup_csd_jobs_and_bos(struct drm_file *file_priv,
if (ret)
return ret;
- return v3d_lock_bo_reservations(*clean_job, acquire_ctx);
+ return v3d_lock_bo_reservations(*clean_job, exec);
}
static void
@@ -491,7 +489,7 @@ v3d_get_cpu_indirect_csd_params(struct drm_file *file_priv,
return v3d_setup_csd_jobs_and_bos(file_priv, v3d, &indirect_csd.submit,
&info->job, &info->clean_job,
- NULL, &info->acquire_ctx);
+ NULL, &info->exec);
}
/* Get data for the query timestamp job submission. */
@@ -906,7 +904,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
struct v3d_render_job *render = NULL;
struct v3d_job *clean_job = NULL;
struct v3d_job *last_job;
- struct ww_acquire_ctx acquire_ctx;
+ struct drm_exec exec;
int ret = 0;
trace_v3d_submit_cl_ioctl(&v3d->drm, args->rcl_start, args->rcl_end);
@@ -986,7 +984,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
if (ret)
goto fail;
- ret = v3d_lock_bo_reservations(last_job, &acquire_ctx);
+ ret = v3d_lock_bo_reservations(last_job, &exec);
if (ret)
goto fail;
@@ -1035,7 +1033,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
v3d_attach_fences_and_unlock_reservation(file_priv,
last_job,
- &acquire_ctx,
+ &exec,
args->out_sync,
&se,
last_job->done_fence);
@@ -1049,8 +1047,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
fail_unreserve:
mutex_unlock(&v3d->sched_lock);
fail_perfmon:
- drm_gem_unlock_reservations(last_job->bo,
- last_job->bo_count, &acquire_ctx);
+ drm_exec_fini(&exec);
fail:
v3d_job_cleanup((void *)bin);
v3d_job_cleanup((void *)render);
@@ -1077,7 +1074,7 @@ v3d_submit_tfu_ioctl(struct drm_device *dev, void *data,
struct drm_v3d_submit_tfu *args = data;
struct v3d_submit_ext se = {0};
struct v3d_tfu_job *job = NULL;
- struct ww_acquire_ctx acquire_ctx;
+ struct drm_exec exec;
int ret = 0;
trace_v3d_submit_tfu_ioctl(&v3d->drm, args->iia);
@@ -1133,7 +1130,7 @@ v3d_submit_tfu_ioctl(struct drm_device *dev, void *data,
job->base.bo[job->base.bo_count] = bo;
}
- ret = v3d_lock_bo_reservations(&job->base, &acquire_ctx);
+ ret = v3d_lock_bo_reservations(&job->base, &exec);
if (ret)
goto fail;
@@ -1142,7 +1139,7 @@ v3d_submit_tfu_ioctl(struct drm_device *dev, void *data,
mutex_unlock(&v3d->sched_lock);
v3d_attach_fences_and_unlock_reservation(file_priv,
- &job->base, &acquire_ctx,
+ &job->base, &exec,
args->out_sync,
&se,
job->base.done_fence);
@@ -1177,7 +1174,7 @@ v3d_submit_csd_ioctl(struct drm_device *dev, void *data,
struct v3d_submit_ext se = {0};
struct v3d_csd_job *job = NULL;
struct v3d_job *clean_job = NULL;
- struct ww_acquire_ctx acquire_ctx;
+ struct drm_exec exec;
int ret;
trace_v3d_submit_csd_ioctl(&v3d->drm, args->cfg[5], args->cfg[6]);
@@ -1204,8 +1201,7 @@ v3d_submit_csd_ioctl(struct drm_device *dev, void *data,
}
ret = v3d_setup_csd_jobs_and_bos(file_priv, v3d, args,
- &job, &clean_job, &se,
- &acquire_ctx);
+ &job, &clean_job, &se, &exec);
if (ret)
goto fail;
@@ -1236,7 +1232,7 @@ v3d_submit_csd_ioctl(struct drm_device *dev, void *data,
v3d_attach_fences_and_unlock_reservation(file_priv,
clean_job,
- &acquire_ctx,
+ &exec,
args->out_sync,
&se,
clean_job->done_fence);
@@ -1249,8 +1245,7 @@ v3d_submit_csd_ioctl(struct drm_device *dev, void *data,
fail_unreserve:
mutex_unlock(&v3d->sched_lock);
fail_perfmon:
- drm_gem_unlock_reservations(clean_job->bo, clean_job->bo_count,
- &acquire_ctx);
+ drm_exec_fini(&exec);
fail:
v3d_job_cleanup((void *)job);
v3d_job_cleanup(clean_job);
@@ -1288,7 +1283,7 @@ v3d_submit_cpu_ioctl(struct drm_device *dev, void *data,
struct v3d_cpu_job *cpu_job = NULL;
struct v3d_csd_job *csd_job = NULL;
struct v3d_job *clean_job = NULL;
- struct ww_acquire_ctx acquire_ctx;
+ struct drm_exec exec;
int ret;
if (args->flags && !(args->flags & DRM_V3D_SUBMIT_EXTENSION)) {
@@ -1339,7 +1334,7 @@ v3d_submit_cpu_ioctl(struct drm_device *dev, void *data,
if (ret)
goto fail;
- ret = v3d_lock_bo_reservations(&cpu_job->base, &acquire_ctx);
+ ret = v3d_lock_bo_reservations(&cpu_job->base, &exec);
if (ret)
goto fail;
}
@@ -1373,14 +1368,14 @@ v3d_submit_cpu_ioctl(struct drm_device *dev, void *data,
v3d_attach_fences_and_unlock_reservation(file_priv,
&cpu_job->base,
- &acquire_ctx, 0,
+ &exec, 0,
out_se, cpu_job->base.done_fence);
switch (cpu_job->job_type) {
case V3D_CPU_JOB_TYPE_INDIRECT_CSD:
v3d_attach_fences_and_unlock_reservation(file_priv,
clean_job,
- &cpu_job->indirect_csd.acquire_ctx,
+ &cpu_job->indirect_csd.exec,
0, &se, clean_job->done_fence);
break;
default:
@@ -1395,13 +1390,8 @@ v3d_submit_cpu_ioctl(struct drm_device *dev, void *data,
fail_unreserve:
mutex_unlock(&v3d->sched_lock);
-
- drm_gem_unlock_reservations(cpu_job->base.bo, cpu_job->base.bo_count,
- &acquire_ctx);
-
- drm_gem_unlock_reservations(clean_job->bo, clean_job->bo_count,
- &cpu_job->indirect_csd.acquire_ctx);
-
+ drm_exec_fini(&exec);
+ drm_exec_fini(&cpu_job->indirect_csd.exec);
fail:
v3d_job_cleanup((void *)cpu_job);
v3d_job_cleanup((void *)csd_job);
--
2.53.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 08/10] drm/v3d: Introduce struct v3d_submit and convert CL/TFU/CSD ioctls
2026-04-13 15:03 [PATCH 00/10] drm/v3d: Scheduler and submission fixes and refactoring Maíra Canal
` (6 preceding siblings ...)
2026-04-13 15:03 ` [PATCH 07/10] drm/v3d: Migrate BO reservation locking to DRM exec Maíra Canal
@ 2026-04-13 15:03 ` Maíra Canal
2026-04-16 14:16 ` Tvrtko Ursulin
2026-04-13 15:03 ` [PATCH 09/10] drm/v3d: Refactor CPU ioctl into unified submission chain Maíra Canal
2026-04-13 15:03 ` [PATCH 10/10] drm/v3d: Ensure atomic submissions in v3d_submit_jobs() Maíra Canal
9 siblings, 1 reply; 24+ messages in thread
From: Maíra Canal @ 2026-04-13 15:03 UTC (permalink / raw)
To: Melissa Wen, Iago Toral, Tvrtko Ursulin, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
Christian König
Cc: kernel-dev, dri-devel, Maíra Canal
As the V3D driver grew with time, different types of submission were added
and the submission code grew more complex, but the driver stuck with the
same abstractions.
Nowadays, the submission ioctls don't submit a single job, but a
chain of jobs:
1. v3d_submit_cl_ioctl() submits a BIN job (optional), RENDER job
(mandatory), and a CLEAN_CACHE job (optional).
2. v3d_submit_csd_ioctl() submits a CSD, and a CLEAN_CACHE job.
3. v3d_submit_tfu_ioctl() submits a TFU job.
Therefore, each ioctl submits a chain of jobs in which each job depends on
the previous one. However, this concept is not well represented in software
at the moment.
To address this, introduce a new concept: the struct v3d_submit, which
groups the submission state and represents the submission chain formed by
an ordered array of jobs.
Add new helpers to allocate, add jobs to the chain and submit jobs to
the scheduler, all based on the new struct. Convert v3d_submit_cl_ioctl(),
v3d_submit_tfu_ioctl() and v3d_submit_csd_ioctl() to the new pattern. Each
ioctl now follows the same flow: add jobs -> lookup BOs -> lock
reservations -> attach perfmon -> submit chain -> attach fences -> put
jobs.
The CPU ioctl is left on the old helpers for now; its indirect CSD path
requires some restructuring that will be addressed in the next commit.
Signed-off-by: Maíra Canal <mcanal@igalia.com>
---
drivers/gpu/drm/v3d/v3d_drv.h | 21 +++
drivers/gpu/drm/v3d/v3d_submit.c | 348 ++++++++++++++++++++++-----------------
2 files changed, 215 insertions(+), 154 deletions(-)
diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index 788a45c60290..fc12e5215fb0 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -288,6 +288,27 @@ to_v3d_fence(struct dma_fence *fence)
#define V3D_CORE_READ(core, offset) readl(v3d->core_regs[core] + offset)
#define V3D_CORE_WRITE(core, offset, val) writel(val, v3d->core_regs[core] + offset)
+#define V3D_MAX_JOBS_PER_SUBMISSION 3
+
+/* Per-ioctl submission context */
+struct v3d_submit {
+ struct v3d_dev *v3d;
+
+ struct drm_file *file_priv;
+
+ /* DRM exec context for this submission. */
+ struct drm_exec exec;
+
+ /* Ordered array of jobs forming the submission chain. Jobs are
+ * appended via v3d_submit_add_job(), then chained and pushed to
+ * the scheduler by v3d_submit_jobs().
+ */
+ struct v3d_job *jobs[V3D_MAX_JOBS_PER_SUBMISSION];
+
+ /* Number of jobs currently in @jobs. */
+ u32 job_count;
+};
+
struct v3d_job {
struct drm_sched_job base;
diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index 6aa8c65e873c..5b519a892985 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -241,6 +241,81 @@ v3d_job_init(struct v3d_dev *v3d, struct drm_file *file_priv,
return ret;
}
+static int
+v3d_submit_add_job(struct v3d_submit *submit, void **container, size_t size,
+ void (*free)(struct kref *ref), enum v3d_queue queue)
+{
+ struct v3d_file_priv *v3d_priv = submit->file_priv->driver_priv;
+ struct v3d_dev *v3d = submit->v3d;
+ struct v3d_job *job;
+ int ret;
+
+ *container = kcalloc(1, size, GFP_KERNEL);
+ if (!*container)
+ return -ENOMEM;
+
+ job = *container;
+ job->v3d = v3d;
+ job->free = free;
+ job->file_priv = v3d_priv;
+
+ ret = drm_sched_job_init(&job->base, &v3d_priv->sched_entity[queue],
+ 1, v3d_priv, submit->file_priv->client_id);
+ if (ret)
+ goto fail_free;
+
+ /* CPU jobs don't require hardware resources */
+ if (queue != V3D_CPU) {
+ ret = v3d_pm_runtime_get(v3d);
+ if (ret)
+ goto fail_sched_job;
+ job->has_pm_ref = true;
+ }
+
+ kref_init(&job->refcount);
+
+ job->client_stats = v3d_stats_get(v3d_priv->stats[queue]);
+ job->global_stats = v3d_stats_get(v3d->queue[queue].stats);
+
+ submit->jobs[submit->job_count++] = job;
+
+ return 0;
+
+fail_sched_job:
+ drm_sched_job_cleanup(&job->base);
+fail_free:
+ kfree(*container);
+ *container = NULL;
+ return ret;
+}
+
+static int
+v3d_attach_perfmon_to_jobs(struct v3d_submit *submit, u32 perfmon_id)
+{
+ struct v3d_file_priv *v3d_priv = submit->file_priv->driver_priv;
+ struct v3d_dev *v3d = submit->v3d;
+ struct v3d_perfmon *perfmon;
+
+ if (!perfmon_id)
+ return 0;
+
+ if (v3d->global_perfmon)
+ return -EAGAIN;
+
+ perfmon = v3d_perfmon_find(v3d_priv, perfmon_id);
+ if (!perfmon)
+ return -ENOENT;
+
+ for (int i = 0; i < submit->job_count; i++) {
+ submit->jobs[i]->perfmon = perfmon;
+ v3d_perfmon_get(perfmon);
+ }
+
+ v3d_perfmon_put(perfmon);
+
+ return 0;
+}
+
static void
v3d_push_job(struct v3d_job *job)
{
@@ -254,6 +329,30 @@ v3d_push_job(struct v3d_job *job)
drm_sched_entity_push_job(&job->base);
}
+static int
+v3d_submit_jobs(struct v3d_submit *submit)
+{
+ struct v3d_dev *v3d = submit->v3d;
+ int ret;
+
+ guard(mutex)(&v3d->sched_lock);
+
+ for (int i = 0; i < submit->job_count; i++) {
+ struct v3d_job *job = submit->jobs[i];
+
+ v3d_push_job(job);
+
+ if (i + 1 < submit->job_count) {
+ ret = drm_sched_job_add_dependency(&submit->jobs[i + 1]->base,
+ dma_fence_get(job->done_fence));
+ if (ret)
+ return ret;
+ }
+ }
+
+ return 0;
+}
+
static void
v3d_attach_fences_and_unlock_reservation(struct drm_file *file_priv,
struct v3d_job *job,
@@ -896,18 +995,19 @@ int
v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
struct drm_file *file_priv)
{
- struct v3d_dev *v3d = to_v3d_dev(dev);
- struct v3d_file_priv *v3d_priv = file_priv->driver_priv;
+ struct v3d_submit submit = {
+ .v3d = to_v3d_dev(dev),
+ .file_priv = file_priv,
+ .job_count = 0
+ };
struct drm_v3d_submit_cl *args = data;
struct v3d_submit_ext se = {0};
struct v3d_bin_job *bin = NULL;
struct v3d_render_job *render = NULL;
struct v3d_job *clean_job = NULL;
- struct v3d_job *last_job;
- struct drm_exec exec;
- int ret = 0;
+ int ret;
- trace_v3d_submit_cl_ioctl(&v3d->drm, args->rcl_start, args->rcl_end);
+ trace_v3d_submit_cl_ioctl(dev, args->rcl_start, args->rcl_end);
if (args->pad)
return -EINVAL;
@@ -927,131 +1027,84 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
}
}
- ret = v3d_job_allocate(v3d, (void *)&render, sizeof(*render));
- if (ret)
- return ret;
-
- ret = v3d_job_init(v3d, file_priv, &render->base,
- v3d_render_job_free, args->in_sync_rcl, &se, V3D_RENDER);
- if (ret) {
- v3d_job_deallocate((void *)&render);
- goto fail;
- }
-
- render->start = args->rcl_start;
- render->end = args->rcl_end;
- INIT_LIST_HEAD(&render->unref_list);
-
if (args->bcl_start != args->bcl_end) {
- ret = v3d_job_allocate(v3d, (void *)&bin, sizeof(*bin));
+ ret = v3d_submit_add_job(&submit, (void **)&bin, sizeof(*bin),
+ v3d_job_free, V3D_BIN);
if (ret)
goto fail;
- ret = v3d_job_init(v3d, file_priv, &bin->base,
- v3d_job_free, args->in_sync_bcl, &se, V3D_BIN);
- if (ret) {
- v3d_job_deallocate((void *)&bin);
- goto fail;
- }
-
bin->start = args->bcl_start;
bin->end = args->bcl_end;
bin->qma = args->qma;
bin->qms = args->qms;
bin->qts = args->qts;
- bin->render = render;
- }
- if (args->flags & DRM_V3D_SUBMIT_CL_FLUSH_CACHE) {
- ret = v3d_job_allocate(v3d, (void *)&clean_job, sizeof(*clean_job));
+ ret = v3d_job_add_syncobjs(&bin->base, file_priv,
+ args->in_sync_bcl, &se, V3D_BIN);
if (ret)
goto fail;
-
- ret = v3d_job_init(v3d, file_priv, clean_job,
- v3d_job_free, 0, NULL, V3D_CACHE_CLEAN);
- if (ret) {
- v3d_job_deallocate((void *)&clean_job);
- goto fail;
- }
-
- last_job = clean_job;
- } else {
- last_job = &render->base;
}
- ret = v3d_lookup_bos(dev, file_priv, last_job,
+ ret = v3d_submit_add_job(&submit, (void **)&render, sizeof(*render),
+ v3d_render_job_free, V3D_RENDER);
+ if (ret)
+ goto fail;
+
+ INIT_LIST_HEAD(&render->unref_list);
+ render->start = args->rcl_start;
+ render->end = args->rcl_end;
+
+ if (bin)
+ bin->render = render;
+
+ ret = v3d_job_add_syncobjs(&render->base, file_priv, args->in_sync_rcl,
+ &se, V3D_RENDER);
+ if (ret)
+ goto fail;
+
+ if (args->flags & DRM_V3D_SUBMIT_CL_FLUSH_CACHE) {
+ ret = v3d_submit_add_job(&submit, (void **)&clean_job,
+ sizeof(*clean_job), v3d_job_free,
+ V3D_CACHE_CLEAN);
+ if (ret)
+ goto fail;
+ }
+
+ ret = v3d_lookup_bos(dev, file_priv,
+ submit.jobs[submit.job_count - 1],
args->bo_handles, args->bo_handle_count);
if (ret)
goto fail;
- ret = v3d_lock_bo_reservations(last_job, &exec);
+ ret = v3d_lock_bo_reservations(submit.jobs[submit.job_count - 1],
+ &submit.exec);
if (ret)
goto fail;
- if (args->perfmon_id) {
- if (v3d->global_perfmon) {
- ret = -EAGAIN;
- goto fail_perfmon;
- }
+ ret = v3d_attach_perfmon_to_jobs(&submit, args->perfmon_id);
+ if (ret)
+ goto fail_perfmon;
- render->base.perfmon = v3d_perfmon_find(v3d_priv,
- args->perfmon_id);
-
- if (!render->base.perfmon) {
- ret = -ENOENT;
- goto fail_perfmon;
- }
- }
-
- mutex_lock(&v3d->sched_lock);
- if (bin) {
- bin->base.perfmon = render->base.perfmon;
- v3d_perfmon_get(bin->base.perfmon);
- v3d_push_job(&bin->base);
-
- ret = drm_sched_job_add_dependency(&render->base.base,
- dma_fence_get(bin->base.done_fence));
- if (ret)
- goto fail_unreserve;
- }
-
- v3d_push_job(&render->base);
-
- if (clean_job) {
- struct dma_fence *render_fence =
- dma_fence_get(render->base.done_fence);
- ret = drm_sched_job_add_dependency(&clean_job->base,
- render_fence);
- if (ret)
- goto fail_unreserve;
- clean_job->perfmon = render->base.perfmon;
- v3d_perfmon_get(clean_job->perfmon);
- v3d_push_job(clean_job);
- }
-
- mutex_unlock(&v3d->sched_lock);
+ ret = v3d_submit_jobs(&submit);
+ if (ret)
+ goto fail_perfmon;
v3d_attach_fences_and_unlock_reservation(file_priv,
- last_job,
- &exec,
- args->out_sync,
- &se,
- last_job->done_fence);
+ submit.jobs[submit.job_count - 1],
+ &submit.exec,
+ args->out_sync, &se,
+ submit.jobs[submit.job_count - 1]->done_fence);
- v3d_job_put(&bin->base);
- v3d_job_put(&render->base);
- v3d_job_put(clean_job);
+ for (int i = 0; i < submit.job_count; i++)
+ v3d_job_put(submit.jobs[i]);
return 0;
-fail_unreserve:
- mutex_unlock(&v3d->sched_lock);
fail_perfmon:
- drm_exec_fini(&exec);
+ drm_exec_fini(&submit.exec);
fail:
- v3d_job_cleanup((void *)bin);
- v3d_job_cleanup((void *)render);
- v3d_job_cleanup(clean_job);
+ for (int i = 0; i < submit.job_count; i++)
+ v3d_job_cleanup(submit.jobs[i]);
v3d_put_multisync_post_deps(&se);
return ret;
@@ -1070,14 +1123,17 @@ int
v3d_submit_tfu_ioctl(struct drm_device *dev, void *data,
struct drm_file *file_priv)
{
- struct v3d_dev *v3d = to_v3d_dev(dev);
+ struct v3d_submit submit = {
+ .v3d = to_v3d_dev(dev),
+ .file_priv = file_priv,
+ .job_count = 0
+ };
struct drm_v3d_submit_tfu *args = data;
struct v3d_submit_ext se = {0};
struct v3d_tfu_job *job = NULL;
- struct drm_exec exec;
int ret = 0;
- trace_v3d_submit_tfu_ioctl(&v3d->drm, args->iia);
+ trace_v3d_submit_tfu_ioctl(dev, args->iia);
if (args->flags && !(args->flags & DRM_V3D_SUBMIT_EXTENSION)) {
drm_dbg(dev, "invalid flags: %d\n", args->flags);
@@ -1092,16 +1148,14 @@ v3d_submit_tfu_ioctl(struct drm_device *dev, void *data,
}
}
- ret = v3d_job_allocate(v3d, (void *)&job, sizeof(*job));
+ ret = v3d_submit_add_job(&submit, (void **)&job, sizeof(*job),
+ v3d_job_free, V3D_TFU);
+ if (ret)
+ goto fail;
+
+ ret = v3d_job_add_syncobjs(&job->base, file_priv, args->in_sync, &se, V3D_TFU);
if (ret)
- return ret;
-
- ret = v3d_job_init(v3d, file_priv, &job->base,
- v3d_job_free, args->in_sync, &se, V3D_TFU);
- if (ret) {
- v3d_job_deallocate((void *)&job);
goto fail;
- }
job->base.bo = kzalloc_objs(*job->base.bo, ARRAY_SIZE(args->bo_handles));
if (!job->base.bo) {
@@ -1130,24 +1184,25 @@ v3d_submit_tfu_ioctl(struct drm_device *dev, void *data,
job->base.bo[job->base.bo_count] = bo;
}
- ret = v3d_lock_bo_reservations(&job->base, &exec);
+ ret = v3d_lock_bo_reservations(&job->base, &submit.exec);
if (ret)
goto fail;
- mutex_lock(&v3d->sched_lock);
- v3d_push_job(&job->base);
- mutex_unlock(&v3d->sched_lock);
+ ret = v3d_submit_jobs(&submit);
+ if (ret)
+ goto fail_submit;
v3d_attach_fences_and_unlock_reservation(file_priv,
- &job->base, &exec,
- args->out_sync,
- &se,
+ &job->base, &submit.exec,
+ args->out_sync, &se,
job->base.done_fence);
v3d_job_put(&job->base);
return 0;
+fail_submit:
+ drm_exec_fini(&submit.exec);
fail:
v3d_job_cleanup((void *)job);
v3d_put_multisync_post_deps(&se);
@@ -1168,21 +1223,23 @@ int
v3d_submit_csd_ioctl(struct drm_device *dev, void *data,
struct drm_file *file_priv)
{
- struct v3d_dev *v3d = to_v3d_dev(dev);
- struct v3d_file_priv *v3d_priv = file_priv->driver_priv;
+ struct v3d_submit submit = {
+ .v3d = to_v3d_dev(dev),
+ .file_priv = file_priv,
+ .job_count = 0
+ };
struct drm_v3d_submit_csd *args = data;
struct v3d_submit_ext se = {0};
struct v3d_csd_job *job = NULL;
struct v3d_job *clean_job = NULL;
- struct drm_exec exec;
int ret;
- trace_v3d_submit_csd_ioctl(&v3d->drm, args->cfg[5], args->cfg[6]);
+ trace_v3d_submit_csd_ioctl(dev, args->cfg[5], args->cfg[6]);
if (args->pad)
return -EINVAL;
- if (!v3d_has_csd(v3d)) {
+ if (!v3d_has_csd(submit.v3d)) {
drm_warn(dev, "Attempting CSD submit on non-CSD hardware\n");
return -EINVAL;
}
@@ -1201,51 +1258,34 @@ v3d_submit_csd_ioctl(struct drm_device *dev, void *data,
}
ret = v3d_setup_csd_jobs_and_bos(file_priv, v3d, args,
- &job, &clean_job, &se, &exec);
+ &job, &clean_job, &se, &submit.exec);
if (ret)
goto fail;
- if (args->perfmon_id) {
- if (v3d->global_perfmon) {
- ret = -EAGAIN;
- goto fail_perfmon;
- }
+ submit.jobs[submit.job_count++] = &job->base;
+ submit.jobs[submit.job_count++] = clean_job;
- job->base.perfmon = v3d_perfmon_find(v3d_priv,
- args->perfmon_id);
- if (!job->base.perfmon) {
- ret = -ENOENT;
- goto fail_perfmon;
- }
- }
-
- mutex_lock(&v3d->sched_lock);
- v3d_push_job(&job->base);
-
- ret = drm_sched_job_add_dependency(&clean_job->base,
- dma_fence_get(job->base.done_fence));
+ ret = v3d_attach_perfmon_to_jobs(&submit, args->perfmon_id);
if (ret)
- goto fail_unreserve;
+ goto fail_perfmon;
- v3d_push_job(clean_job);
- mutex_unlock(&v3d->sched_lock);
+ ret = v3d_submit_jobs(&submit);
+ if (ret)
+ goto fail_perfmon;
v3d_attach_fences_and_unlock_reservation(file_priv,
clean_job,
- &exec,
- args->out_sync,
- &se,
+ &submit.exec,
+ args->out_sync, &se,
clean_job->done_fence);
- v3d_job_put(&job->base);
- v3d_job_put(clean_job);
+ for (int i = 0; i < submit.job_count; i++)
+ v3d_job_put(submit.jobs[i]);
return 0;
-fail_unreserve:
- mutex_unlock(&v3d->sched_lock);
fail_perfmon:
- drm_exec_fini(&exec);
+ drm_exec_fini(&submit.exec);
fail:
v3d_job_cleanup((void *)job);
v3d_job_cleanup(clean_job);
--
2.53.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 09/10] drm/v3d: Refactor CPU ioctl into unified submission chain
2026-04-13 15:03 [PATCH 00/10] drm/v3d: Scheduler and submission fixes and refactoring Maíra Canal
` (7 preceding siblings ...)
2026-04-13 15:03 ` [PATCH 08/10] drm/v3d: Introduce struct v3d_submit and convert CL/TFU/CSD ioctls Maíra Canal
@ 2026-04-13 15:03 ` Maíra Canal
2026-04-17 14:38 ` Tvrtko Ursulin
2026-04-13 15:03 ` [PATCH 10/10] drm/v3d: Ensure atomic submissions in v3d_submit_jobs() Maíra Canal
9 siblings, 1 reply; 24+ messages in thread
From: Maíra Canal @ 2026-04-13 15:03 UTC (permalink / raw)
To: Melissa Wen, Iago Toral, Tvrtko Ursulin, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
Christian König
Cc: kernel-dev, dri-devel, Maíra Canal
Restructure the CPU ioctl so that all job types, including indirect CSD,
use a single struct v3d_submit chain and a single DRM exec context.
Currently, v3d_get_cpu_indirect_csd_params(), which is the indirect CSD
parser, not only parses the ioctl arguments, but also creates the jobs and
locks the BOs during extension parsing. This breaks the default submission
flow and creates a nested job submission.
This refactoring turns v3d_get_cpu_indirect_csd_params() into a pure
parser. Now, job creation and BO locking happen in the ioctl function,
which appends the indirect CSD and CLEAN_CACHE jobs to the same struct
v3d_submit and locks the union of all jobs' BOs under one drm_exec. This
eliminates the second drm_exec, the nested submission, and the conditional
two-pass fence attachment that the CPU ioctl previously required for the
indirect CSD path.
For the refactoring, change the functions v3d_lock_bo_reservations(),
v3d_lookup_bos(), v3d_setup_csd_jobs_and_bos() and
v3d_attach_fences_and_unlock_reservation() to take struct v3d_submit
instead of individual arguments. The BO locking helper now iterates over
all jobs in the submission and locks the union, instead of only handling
the last job of the chain.
Signed-off-by: Maíra Canal <mcanal@igalia.com>
---
drivers/gpu/drm/v3d/v3d_drv.h | 9 +-
drivers/gpu/drm/v3d/v3d_submit.c | 341 ++++++++++++++-------------------------
2 files changed, 121 insertions(+), 229 deletions(-)
diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index fc12e5215fb0..071d919fe860 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -422,8 +422,10 @@ struct v3d_indirect_csd_info {
/* Indirect CSD */
struct v3d_csd_job *job;
- /* Clean cache job associated to the Indirect CSD job */
- struct v3d_job *clean_job;
+ /* Indirect CSD args, stashed by the extension parser and later used
+ * to create the CSD job from them.
+ */
+ struct drm_v3d_submit_csd args;
/* Offset within the BO where the workgroup counts are stored */
u32 offset;
@@ -438,9 +440,6 @@ struct v3d_indirect_csd_info {
/* Indirect BO */
struct drm_gem_object *indirect;
-
- /* Context of the Indirect CSD job */
- struct drm_exec exec;
};
struct v3d_timestamp_query_info {
diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index 5b519a892985..36402fc25c10 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -20,28 +20,42 @@
* to v3d, so we don't attach dma-buf fences to them.
*/
static int
-v3d_lock_bo_reservations(struct v3d_job *job, struct drm_exec *exec)
+v3d_lock_bo_reservations(struct v3d_submit *submit)
{
- int i, ret;
+ int i, j, ret;
- drm_exec_init(exec, DRM_EXEC_INTERRUPTIBLE_WAIT, job->bo_count);
- drm_exec_until_all_locked(exec) {
- ret = drm_exec_prepare_array(exec, job->bo, job->bo_count, 1);
+ drm_exec_init(&submit->exec,
+ DRM_EXEC_INTERRUPTIBLE_WAIT | DRM_EXEC_IGNORE_DUPLICATES, 0);
+ drm_exec_until_all_locked(&submit->exec) {
+ for (i = 0; i < submit->job_count; i++) {
+ struct v3d_job *job = submit->jobs[i];
+
+ ret = drm_exec_prepare_array(&submit->exec, job->bo,
+ job->bo_count, 1);
+ if (ret)
+ break;
+ }
+ drm_exec_retry_on_contention(&submit->exec);
if (ret)
goto fail;
}
- for (i = 0; i < job->bo_count; i++) {
- ret = drm_sched_job_add_implicit_dependencies(&job->base,
- job->bo[i], true);
- if (ret)
- goto fail;
+ for (i = 0; i < submit->job_count; i++) {
+ struct v3d_job *job = submit->jobs[i];
+
+ for (j = 0; j < job->bo_count; j++) {
+ ret = drm_sched_job_add_implicit_dependencies(&job->base,
+ job->bo[j],
+ true);
+ if (ret)
+ goto fail;
+ }
}
return 0;
fail:
- drm_exec_fini(exec);
+ drm_exec_fini(&submit->exec);
return ret;
}
@@ -62,25 +76,23 @@ v3d_lock_bo_reservations(struct v3d_job *job, struct drm_exec *exec)
* failure, because that will happen at `v3d_job_free()`.
*/
static int
-v3d_lookup_bos(struct drm_device *dev,
- struct drm_file *file_priv,
- struct v3d_job *job,
- u64 bo_handles,
- u32 bo_count)
+v3d_lookup_bos(struct v3d_submit *submit, u64 bo_handles, u32 bo_count)
{
- job->bo_count = bo_count;
+ struct v3d_job *last_job = submit->jobs[submit->job_count - 1];
- if (!job->bo_count) {
+ last_job->bo_count = bo_count;
+
+ if (!last_job->bo_count) {
/* See comment on bo_index for why we have to check
* this.
*/
- drm_warn(dev, "Rendering requires BOs\n");
+ drm_warn(&submit->v3d->drm, "Rendering requires BOs\n");
return -EINVAL;
}
- return drm_gem_objects_lookup(file_priv,
+ return drm_gem_objects_lookup(submit->file_priv,
(void __user *)(uintptr_t)bo_handles,
- job->bo_count, &job->bo);
+ last_job->bo_count, &last_job->bo);
}
static void
@@ -141,25 +153,6 @@ void v3d_job_put(struct v3d_job *job)
kref_put(&job->refcount, job->free);
}
-static int
-v3d_job_allocate(struct v3d_dev *v3d, void **container, size_t size)
-{
- *container = kcalloc(1, size, GFP_KERNEL);
- if (!*container) {
- drm_err(&v3d->drm, "Cannot allocate memory for V3D job.\n");
- return -ENOMEM;
- }
-
- return 0;
-}
-
-static void
-v3d_job_deallocate(void **container)
-{
- kfree(*container);
- *container = NULL;
-}
-
static int
v3d_job_add_syncobjs(struct v3d_job *job, struct drm_file *file_priv,
u32 in_sync, struct v3d_submit_ext *se, enum v3d_queue queue)
@@ -200,47 +193,6 @@ v3d_job_add_syncobjs(struct v3d_job *job, struct drm_file *file_priv,
return 0;
}
-static int
-v3d_job_init(struct v3d_dev *v3d, struct drm_file *file_priv,
- struct v3d_job *job, void (*free)(struct kref *ref),
- u32 in_sync, struct v3d_submit_ext *se, enum v3d_queue queue)
-{
- struct v3d_file_priv *v3d_priv = file_priv->driver_priv;
- int ret;
-
- job->v3d = v3d;
- job->free = free;
- job->file_priv = v3d_priv;
-
- ret = drm_sched_job_init(&job->base, &v3d_priv->sched_entity[queue],
- 1, v3d_priv, file_priv->client_id);
- if (ret)
- return ret;
-
- ret = v3d_job_add_syncobjs(job, file_priv, in_sync, se, queue);
- if (ret)
- goto fail_job_init;
-
- /* CPU jobs don't require hardware resources */
- if (queue != V3D_CPU) {
- ret = v3d_pm_runtime_get(v3d);
- if (ret)
- goto fail_job_init;
- job->has_pm_ref = true;
- }
-
- kref_init(&job->refcount);
-
- job->client_stats = v3d_stats_get(v3d_priv->stats[queue]);
- job->global_stats = v3d_stats_get(v3d->queue[queue].stats);
-
- return 0;
-
-fail_job_init:
- drm_sched_job_cleanup(&job->base);
- return ret;
-}
-
static int
v3d_submit_add_job(struct v3d_submit *submit, void **container, size_t size,
void (*free)(struct kref *ref), enum v3d_queue queue)
@@ -354,31 +306,35 @@ v3d_submit_jobs(struct v3d_submit *submit)
}
static void
-v3d_attach_fences_and_unlock_reservation(struct drm_file *file_priv,
- struct v3d_job *job,
- struct drm_exec *exec,
- u32 out_sync,
- struct v3d_submit_ext *se,
- struct dma_fence *done_fence)
+v3d_attach_fences_and_unlock_reservation(struct v3d_submit *submit,
+ u32 out_sync, struct v3d_submit_ext *se)
{
- struct drm_syncobj *sync_out;
bool has_multisync = se && (se->flags & DRM_V3D_EXT_ID_MULTI_SYNC);
- int i;
+ struct v3d_job *last_job = submit->jobs[submit->job_count - 1];
+ struct drm_syncobj *sync_out;
+ int i, j;
- for (i = 0; i < job->bo_count; i++) {
- /* XXX: Use shared fences for read-only objects. */
- dma_resv_add_fence(job->bo[i]->resv, job->done_fence,
- DMA_RESV_USAGE_WRITE);
+ /* The submission's last fence covers the entire submission. Attach it
+ * to every BO touched by any job in the submission.
+ */
+ for (i = 0; i < submit->job_count; i++) {
+ struct v3d_job *job = submit->jobs[i];
+
+ for (j = 0; j < job->bo_count; j++) {
+ /* XXX: Use shared fences for read-only objects. */
+ dma_resv_add_fence(job->bo[j]->resv, last_job->done_fence,
+ DMA_RESV_USAGE_WRITE);
+ }
}
- drm_exec_fini(exec);
+ drm_exec_fini(&submit->exec);
/* Update the return sync object for the job */
/* If it only supports a single signal semaphore*/
if (!has_multisync) {
- sync_out = drm_syncobj_find(file_priv, out_sync);
+ sync_out = drm_syncobj_find(submit->file_priv, out_sync);
if (sync_out) {
- drm_syncobj_replace_fence(sync_out, done_fence);
+ drm_syncobj_replace_fence(sync_out, last_job->done_fence);
drm_syncobj_put(sync_out);
}
return;
@@ -388,7 +344,7 @@ v3d_attach_fences_and_unlock_reservation(struct drm_file *file_priv,
if (se->out_sync_count) {
for (i = 0; i < se->out_sync_count; i++) {
drm_syncobj_replace_fence(se->out_syncs[i].syncobj,
- done_fence);
+ last_job->done_fence);
drm_syncobj_put(se->out_syncs[i].syncobj);
}
kvfree(se->out_syncs);
@@ -396,46 +352,36 @@ v3d_attach_fences_and_unlock_reservation(struct drm_file *file_priv,
}
static int
-v3d_setup_csd_jobs_and_bos(struct drm_file *file_priv,
- struct v3d_dev *v3d,
+v3d_setup_csd_jobs_and_bos(struct v3d_submit *submit,
struct drm_v3d_submit_csd *args,
- struct v3d_csd_job **job,
- struct v3d_job **clean_job,
- struct v3d_submit_ext *se,
- struct drm_exec *exec)
+ struct v3d_submit_ext *se)
{
+ struct v3d_csd_job *job = NULL;
+ struct v3d_job *clean_job = NULL;
int ret;
- ret = v3d_job_allocate(v3d, (void *)job, sizeof(**job));
+ ret = v3d_submit_add_job(submit, (void **)&job, sizeof(*job),
+ v3d_job_free, V3D_CSD);
if (ret)
return ret;
- ret = v3d_job_init(v3d, file_priv, &(*job)->base,
- v3d_job_free, args->in_sync, se, V3D_CSD);
- if (ret) {
- v3d_job_deallocate((void *)job);
- return ret;
- }
-
- ret = v3d_job_allocate(v3d, (void *)clean_job, sizeof(**clean_job));
+ ret = v3d_job_add_syncobjs(&job->base, submit->file_priv, args->in_sync,
+ se, V3D_CSD);
if (ret)
return ret;
- ret = v3d_job_init(v3d, file_priv, *clean_job,
- v3d_job_free, 0, NULL, V3D_CACHE_CLEAN);
- if (ret) {
- v3d_job_deallocate((void *)clean_job);
- return ret;
- }
+ job->args = *args;
- (*job)->args = *args;
-
- ret = v3d_lookup_bos(&v3d->drm, file_priv, *clean_job,
- args->bo_handles, args->bo_handle_count);
+ ret = v3d_submit_add_job(submit, (void **)&clean_job, sizeof(*clean_job),
+ v3d_job_free, V3D_CACHE_CLEAN);
if (ret)
return ret;
- return v3d_lock_bo_reservations(*clean_job, exec);
+ ret = v3d_lookup_bos(submit, args->bo_handles, args->bo_handle_count);
+ if (ret)
+ return ret;
+
+ return v3d_lock_bo_reservations(submit);
}
static void
@@ -579,6 +525,7 @@ v3d_get_cpu_indirect_csd_params(struct drm_file *file_priv,
}
job->job_type = V3D_CPU_JOB_TYPE_INDIRECT_CSD;
+ info->args = indirect_csd.submit;
info->offset = indirect_csd.offset;
info->wg_size = indirect_csd.wg_size;
memcpy(&info->wg_uniform_offsets, &indirect_csd.wg_uniform_offsets,
@@ -586,9 +533,7 @@ v3d_get_cpu_indirect_csd_params(struct drm_file *file_priv,
info->indirect = drm_gem_object_lookup(file_priv, indirect_csd.indirect);
- return v3d_setup_csd_jobs_and_bos(file_priv, v3d, &indirect_csd.submit,
- &info->job, &info->clean_job,
- NULL, &info->exec);
+ return 0;
}
/* Get data for the query timestamp job submission. */
@@ -1070,14 +1015,11 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
goto fail;
}
- ret = v3d_lookup_bos(dev, file_priv,
- submit.jobs[submit.job_count - 1],
- args->bo_handles, args->bo_handle_count);
+ ret = v3d_lookup_bos(&submit, args->bo_handles, args->bo_handle_count);
if (ret)
goto fail;
- ret = v3d_lock_bo_reservations(submit.jobs[submit.job_count - 1],
- &submit.exec);
+ ret = v3d_lock_bo_reservations(&submit);
if (ret)
goto fail;
@@ -1089,11 +1031,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
if (ret)
goto fail_perfmon;
- v3d_attach_fences_and_unlock_reservation(file_priv,
- submit.jobs[submit.job_count - 1],
- &submit.exec,
- args->out_sync, &se,
- submit.jobs[submit.job_count - 1]->done_fence);
+ v3d_attach_fences_and_unlock_reservation(&submit, args->out_sync, &se);
for (int i = 0; i < submit.job_count; i++)
v3d_job_put(submit.jobs[i]);
@@ -1184,7 +1122,7 @@ v3d_submit_tfu_ioctl(struct drm_device *dev, void *data,
job->base.bo[job->base.bo_count] = bo;
}
- ret = v3d_lock_bo_reservations(&job->base, &submit.exec);
+ ret = v3d_lock_bo_reservations(&submit);
if (ret)
goto fail;
@@ -1192,10 +1130,7 @@ v3d_submit_tfu_ioctl(struct drm_device *dev, void *data,
if (ret)
goto fail_submit;
- v3d_attach_fences_and_unlock_reservation(file_priv,
- &job->base, &submit.exec,
- args->out_sync, &se,
- job->base.done_fence);
+ v3d_attach_fences_and_unlock_reservation(&submit, args->out_sync, &se);
v3d_job_put(&job->base);
@@ -1230,8 +1165,6 @@ v3d_submit_csd_ioctl(struct drm_device *dev, void *data,
};
struct drm_v3d_submit_csd *args = data;
struct v3d_submit_ext se = {0};
- struct v3d_csd_job *job = NULL;
- struct v3d_job *clean_job = NULL;
int ret;
trace_v3d_submit_csd_ioctl(dev, args->cfg[5], args->cfg[6]);
@@ -1257,14 +1190,10 @@ v3d_submit_csd_ioctl(struct drm_device *dev, void *data,
}
}
- ret = v3d_setup_csd_jobs_and_bos(file_priv, v3d, args,
- &job, &clean_job, &se, &submit.exec);
+ ret = v3d_setup_csd_jobs_and_bos(&submit, args, &se);
if (ret)
goto fail;
- submit.jobs[submit.job_count++] = &job->base;
- submit.jobs[submit.job_count++] = clean_job;
-
ret = v3d_attach_perfmon_to_jobs(&submit, args->perfmon_id);
if (ret)
goto fail_perfmon;
@@ -1273,11 +1202,7 @@ v3d_submit_csd_ioctl(struct drm_device *dev, void *data,
if (ret)
goto fail_perfmon;
- v3d_attach_fences_and_unlock_reservation(file_priv,
- clean_job,
- &submit.exec,
- args->out_sync, &se,
- clean_job->done_fence);
+ v3d_attach_fences_and_unlock_reservation(&submit, args->out_sync, &se);
for (int i = 0; i < submit.job_count; i++)
v3d_job_put(submit.jobs[i]);
@@ -1287,8 +1212,8 @@ v3d_submit_csd_ioctl(struct drm_device *dev, void *data,
fail_perfmon:
drm_exec_fini(&submit.exec);
fail:
- v3d_job_cleanup((void *)job);
- v3d_job_cleanup(clean_job);
+ for (int i = 0; i < submit.job_count; i++)
+ v3d_job_cleanup(submit.jobs[i]);
v3d_put_multisync_post_deps(&se);
return ret;
@@ -1316,14 +1241,14 @@ int
v3d_submit_cpu_ioctl(struct drm_device *dev, void *data,
struct drm_file *file_priv)
{
- struct v3d_dev *v3d = to_v3d_dev(dev);
+ struct v3d_submit submit = {
+ .v3d = to_v3d_dev(dev),
+ .file_priv = file_priv,
+ .job_count = 0
+ };
struct drm_v3d_submit_cpu *args = data;
struct v3d_submit_ext se = {0};
- struct v3d_submit_ext *out_se = NULL;
struct v3d_cpu_job *cpu_job = NULL;
- struct v3d_csd_job *csd_job = NULL;
- struct v3d_job *clean_job = NULL;
- struct drm_exec exec;
int ret;
if (args->flags && !(args->flags & DRM_V3D_SUBMIT_EXTENSION)) {
@@ -1331,7 +1256,8 @@ v3d_submit_cpu_ioctl(struct drm_device *dev, void *data,
return -EINVAL;
}
- ret = v3d_job_allocate(v3d, (void *)&cpu_job, sizeof(*cpu_job));
+ ret = v3d_submit_add_job(&submit, (void **)&cpu_job, sizeof(*cpu_job),
+ v3d_job_free, V3D_CPU);
if (ret)
return ret;
@@ -1356,86 +1282,53 @@ v3d_submit_cpu_ioctl(struct drm_device *dev, void *data,
goto fail;
}
- trace_v3d_submit_cpu_ioctl(&v3d->drm, cpu_job->job_type);
+ trace_v3d_submit_cpu_ioctl(dev, cpu_job->job_type);
- ret = v3d_job_init(v3d, file_priv, &cpu_job->base,
- v3d_job_free, 0, &se, V3D_CPU);
- if (ret) {
- v3d_job_deallocate((void *)&cpu_job);
+ ret = v3d_job_add_syncobjs(&cpu_job->base, file_priv, 0, &se, V3D_CPU);
+ if (ret)
goto fail;
- }
-
- clean_job = cpu_job->indirect_csd.clean_job;
- csd_job = cpu_job->indirect_csd.job;
+ /* Look up the CPU jobs' BOs first, so v3d_setup_csd_jobs_and_bos(), which
+ * locks all jobs' BOs at its end, picks them up too in the case of indirect
+ * CSD.
+ */
if (args->bo_handle_count) {
- ret = v3d_lookup_bos(dev, file_priv, &cpu_job->base,
- args->bo_handles, args->bo_handle_count);
- if (ret)
- goto fail;
-
- ret = v3d_lock_bo_reservations(&cpu_job->base, &exec);
+ ret = v3d_lookup_bos(&submit, args->bo_handles, args->bo_handle_count);
if (ret)
goto fail;
}
- mutex_lock(&v3d->sched_lock);
- v3d_push_job(&cpu_job->base);
-
- switch (cpu_job->job_type) {
- case V3D_CPU_JOB_TYPE_INDIRECT_CSD:
- ret = drm_sched_job_add_dependency(&csd_job->base.base,
- dma_fence_get(cpu_job->base.done_fence));
+ if (cpu_job->job_type == V3D_CPU_JOB_TYPE_INDIRECT_CSD) {
+ ret = v3d_setup_csd_jobs_and_bos(&submit, &cpu_job->indirect_csd.args,
+ NULL);
if (ret)
- goto fail_unreserve;
+ goto fail;
- v3d_push_job(&csd_job->base);
-
- ret = drm_sched_job_add_dependency(&clean_job->base,
- dma_fence_get(csd_job->base.done_fence));
+ /* The CSD job was appended at jobs[1] */
+ cpu_job->indirect_csd.job = container_of(submit.jobs[1], struct v3d_csd_job,
+ base);
+ } else {
+ ret = v3d_lock_bo_reservations(&submit);
if (ret)
- goto fail_unreserve;
-
- v3d_push_job(clean_job);
-
- break;
- default:
- break;
- }
- mutex_unlock(&v3d->sched_lock);
-
- out_se = (cpu_job->job_type == V3D_CPU_JOB_TYPE_INDIRECT_CSD) ? NULL : &se;
-
- v3d_attach_fences_and_unlock_reservation(file_priv,
- &cpu_job->base,
- &exec, 0,
- out_se, cpu_job->base.done_fence);
-
- switch (cpu_job->job_type) {
- case V3D_CPU_JOB_TYPE_INDIRECT_CSD:
- v3d_attach_fences_and_unlock_reservation(file_priv,
- clean_job,
- &cpu_job->indirect_csd.exec,
- 0, &se, clean_job->done_fence);
- break;
- default:
- break;
+ goto fail;
}
- v3d_job_put(&cpu_job->base);
- v3d_job_put(&csd_job->base);
- v3d_job_put(clean_job);
+ ret = v3d_submit_jobs(&submit);
+ if (ret)
+ goto fail_unlock;
+
+ v3d_attach_fences_and_unlock_reservation(&submit, 0, &se);
+
+ for (int i = 0; i < submit.job_count; i++)
+ v3d_job_put(submit.jobs[i]);
return 0;
-fail_unreserve:
- mutex_unlock(&v3d->sched_lock);
- drm_exec_fini(&exec);
- drm_exec_fini(&cpu_job->indirect_csd.exec);
+fail_unlock:
+ drm_exec_fini(&submit.exec);
fail:
- v3d_job_cleanup((void *)cpu_job);
- v3d_job_cleanup((void *)csd_job);
- v3d_job_cleanup(clean_job);
+ for (int i = 0; i < submit.job_count; i++)
+ v3d_job_cleanup(submit.jobs[i]);
v3d_put_multisync_post_deps(&se);
kvfree(cpu_job->timestamp_query.queries);
kvfree(cpu_job->performance_query.queries);
--
2.53.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 10/10] drm/v3d: Ensure atomic submissions in v3d_submit_jobs()
2026-04-13 15:03 [PATCH 00/10] drm/v3d: Scheduler and submission fixes and refactoring Maíra Canal
` (8 preceding siblings ...)
2026-04-13 15:03 ` [PATCH 09/10] drm/v3d: Refactor CPU ioctl into unified submission chain Maíra Canal
@ 2026-04-13 15:03 ` Maíra Canal
2026-04-17 15:02 ` Tvrtko Ursulin
9 siblings, 1 reply; 24+ messages in thread
From: Maíra Canal @ 2026-04-13 15:03 UTC (permalink / raw)
To: Melissa Wen, Iago Toral, Tvrtko Ursulin, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
Christian König
Cc: kernel-dev, dri-devel, Maíra Canal
Currently, v3d_submit_jobs() arms and pushes each job one at a time,
wiring dependencies between consecutive jobs after each push. If
drm_sched_job_add_dependency() fails midway, the already-pushed jobs are
scheduler-owned and will be submitted to the GPU for execution, even though
the subsequent jobs won't be submitted.
This breaks the atomicity of the submissions, as only some of the jobs
from a submission would be submitted to the hardware, while the other part
fails.
Restructure v3d_submit_jobs() into three phases: (1) arm all jobs belonging
to a given submission, (2) wire inter-job dependencies, and (3) push all
jobs to the scheduler unconditionally. Phase (2) can fail; on failure,
it marks every armed job finished fence with an error, so that run_job()
callbacks skip hardware execution.
This guarantees that every armed job is always pushed, either to run
or to be skipped, and it also ensures the atomicity of a submission.
Adapt all submit ioctls to proceed unconditionally to fence attachment,
even if v3d_submit_jobs() fails, and return the error code to userspace.
Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Signed-off-by: Maíra Canal <mcanal@igalia.com>
---
drivers/gpu/drm/v3d/v3d_sched.c | 6 ++++
drivers/gpu/drm/v3d/v3d_submit.c | 63 ++++++++++++++++------------------------
2 files changed, 31 insertions(+), 38 deletions(-)
diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index f3999678e5da..cf1be78e4d27 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -670,6 +670,9 @@ v3d_cpu_job_run(struct drm_sched_job *sched_job)
struct v3d_cpu_job *job = to_cpu_job(sched_job);
struct v3d_dev *v3d = job->base.v3d;
+ if (unlikely(job->base.base.s_fence->finished.error))
+ return NULL;
+
if (job->job_type >= ARRAY_SIZE(cpu_job_function)) {
drm_dbg(&v3d->drm, "Unknown CPU job: %d\n", job->job_type);
return NULL;
@@ -693,6 +696,9 @@ v3d_cache_clean_job_run(struct drm_sched_job *sched_job)
struct v3d_job *job = to_v3d_job(sched_job);
struct v3d_dev *v3d = job->v3d;
+ if (unlikely(job->base.s_fence->finished.error))
+ return NULL;
+
v3d_job_start_stats(job);
v3d_clean_caches(v3d);
diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index 36402fc25c10..46d8bedce97c 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -268,41 +268,40 @@ v3d_attach_perfmon_to_jobs(struct v3d_submit *submit, u32 perfmon_id)
return 0;
}
-static void
-v3d_push_job(struct v3d_job *job)
-{
- drm_sched_job_arm(&job->base);
-
- job->done_fence = dma_fence_get(&job->base.s_fence->finished);
-
- /* put by scheduler job completion */
- kref_get(&job->refcount);
-
- drm_sched_entity_push_job(&job->base);
-}
-
static int
v3d_submit_jobs(struct v3d_submit *submit)
{
struct v3d_dev *v3d = submit->v3d;
- int ret;
+ int i, j, ret = 0;
guard(mutex)(&v3d->sched_lock);
- for (int i = 0; i < submit->job_count; i++) {
+ for (i = 0; i < submit->job_count; i++) {
struct v3d_job *job = submit->jobs[i];
- v3d_push_job(job);
+ drm_sched_job_arm(&job->base);
+ job->done_fence = dma_fence_get(&job->base.s_fence->finished);
- if (i + 1 < submit->job_count) {
- ret = drm_sched_job_add_dependency(&submit->jobs[i + 1]->base,
- dma_fence_get(job->done_fence));
- if (ret)
- return ret;
+ /* put by scheduler job completion */
+ kref_get(&job->refcount);
+ }
+
+ for (i = 0; i + 1 < submit->job_count; i++) {
+ ret = drm_sched_job_add_dependency(&submit->jobs[i + 1]->base,
+ dma_fence_get(submit->jobs[i]->done_fence));
+ if (ret) {
+ /* Mark every armed job as failed so run_job() skips execution */
+ for (j = 0; j < submit->job_count; j++)
+ dma_fence_set_error(&submit->jobs[j]->base.s_fence->finished,
+ ret);
+ break;
}
}
- return 0;
+ for (i = 0; i < submit->job_count; i++)
+ drm_sched_entity_push_job(&submit->jobs[i]->base);
+
+ return ret;
}
static void
@@ -1028,15 +1027,13 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
goto fail_perfmon;
ret = v3d_submit_jobs(&submit);
- if (ret)
- goto fail_perfmon;
v3d_attach_fences_and_unlock_reservation(&submit, args->out_sync, &se);
for (int i = 0; i < submit.job_count; i++)
v3d_job_put(submit.jobs[i]);
- return 0;
+ return ret;
fail_perfmon:
drm_exec_fini(&submit.exec);
@@ -1127,17 +1124,13 @@ v3d_submit_tfu_ioctl(struct drm_device *dev, void *data,
goto fail;
ret = v3d_submit_jobs(&submit);
- if (ret)
- goto fail_submit;
v3d_attach_fences_and_unlock_reservation(&submit, args->out_sync, &se);
v3d_job_put(&job->base);
- return 0;
+ return ret;
-fail_submit:
- drm_exec_fini(&submit.exec);
fail:
v3d_job_cleanup((void *)job);
v3d_put_multisync_post_deps(&se);
@@ -1199,15 +1192,13 @@ v3d_submit_csd_ioctl(struct drm_device *dev, void *data,
goto fail_perfmon;
ret = v3d_submit_jobs(&submit);
- if (ret)
- goto fail_perfmon;
v3d_attach_fences_and_unlock_reservation(&submit, args->out_sync, &se);
for (int i = 0; i < submit.job_count; i++)
v3d_job_put(submit.jobs[i]);
- return 0;
+ return ret;
fail_perfmon:
drm_exec_fini(&submit.exec);
@@ -1314,18 +1305,14 @@ v3d_submit_cpu_ioctl(struct drm_device *dev, void *data,
}
ret = v3d_submit_jobs(&submit);
- if (ret)
- goto fail_unlock;
v3d_attach_fences_and_unlock_reservation(&submit, 0, &se);
for (int i = 0; i < submit.job_count; i++)
v3d_job_put(submit.jobs[i]);
- return 0;
+ return ret;
-fail_unlock:
- drm_exec_fini(&submit.exec);
fail:
for (int i = 0; i < submit.job_count; i++)
v3d_job_cleanup(submit.jobs[i]);
--
2.53.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH 02/10] drm/v3d: Clear queue->active_job when v3d_fence_create() fails
2026-04-13 15:03 ` [PATCH 02/10] drm/v3d: Clear queue->active_job when v3d_fence_create() fails Maíra Canal
@ 2026-04-16 11:39 ` Tvrtko Ursulin
2026-05-08 14:01 ` Maíra Canal
0 siblings, 1 reply; 24+ messages in thread
From: Tvrtko Ursulin @ 2026-04-16 11:39 UTC (permalink / raw)
To: Maíra Canal, Melissa Wen, Iago Toral, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
Christian König
Cc: kernel-dev, dri-devel
On 13/04/2026 16:03, Maíra Canal wrote:
> The run_job() callbacks for BIN, RENDER, TFU and CSD assign the incoming
> job to queue->active_job before calling v3d_fence_create(). If
> v3d_fence_create() fails, the callback returns NULL without clearing
> active_job, leaving a dangling pointer.
I don't know the driver well enough to know if this can be hit, or under
which circumstances (spurious interrupt?), but the fix looks correct
regardless.
One small doubt below.
> Create a failure path in all run_job() callbacks that clears the active
> job before returning NULL. The BIN path takes queue->queue_lock around the
> clear as it races against v3d_overflow_mem_work(); RENDER, TFU and CSD
> paths have no concurrent reader, so the clear is lock-free.
>
> Fixes: a783a09ee76d ("drm/v3d: Refactor job management.")
> Signed-off-by: Maíra Canal <mcanal@igalia.com>
> ---
> drivers/gpu/drm/v3d/v3d_sched.c | 52 ++++++++++++++++++++++++-----------------
> 1 file changed, 30 insertions(+), 22 deletions(-)
>
> diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
> index 1855ef5b3b5f..d42db75e639b 100644
> --- a/drivers/gpu/drm/v3d/v3d_sched.c
> +++ b/drivers/gpu/drm/v3d/v3d_sched.c
> @@ -206,12 +206,8 @@ static struct dma_fence *v3d_bin_job_run(struct drm_sched_job *sched_job)
> struct dma_fence *fence;
> unsigned long irqflags;
>
> - if (unlikely(job->base.base.s_fence->finished.error)) {
> - spin_lock_irqsave(&queue->queue_lock, irqflags);
> - queue->active_job = NULL;
> - spin_unlock_irqrestore(&queue->queue_lock, irqflags);
> - return NULL;
> - }
> + if (unlikely(job->base.base.s_fence->finished.error))
> + goto out_clean_job;
>
> /* Lock required around bin_job update vs
> * v3d_overflow_mem_work().
> @@ -228,7 +224,7 @@ static struct dma_fence *v3d_bin_job_run(struct drm_sched_job *sched_job)
>
> fence = v3d_fence_create(v3d, V3D_BIN);
> if (IS_ERR(fence))
> - return NULL;
> + goto out_clean_job;
I wonder whether it would be feasible to simply only assign to
queue->active_job after fence has been created and have a smaller diff
for backporting?
Either way the patch looks correct to me:
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Regards,
Tvrtko
>
> if (job->base.irq_fence)
> dma_fence_put(job->base.irq_fence);
> @@ -256,6 +252,12 @@ static struct dma_fence *v3d_bin_job_run(struct drm_sched_job *sched_job)
> V3D_CORE_WRITE(0, V3D_CLE_CT0QEA, job->end);
>
> return fence;
> +
> +out_clean_job:
> + spin_lock_irqsave(&queue->queue_lock, irqflags);
> + queue->active_job = NULL;
> + spin_unlock_irqrestore(&queue->queue_lock, irqflags);
> + return NULL;
> }
>
> static struct dma_fence *v3d_render_job_run(struct drm_sched_job *sched_job)
> @@ -265,10 +267,8 @@ static struct dma_fence *v3d_render_job_run(struct drm_sched_job *sched_job)
> struct drm_device *dev = &v3d->drm;
> struct dma_fence *fence;
>
> - if (unlikely(job->base.base.s_fence->finished.error)) {
> - v3d->queue[V3D_RENDER].active_job = NULL;
> - return NULL;
> - }
> + if (unlikely(job->base.base.s_fence->finished.error))
> + goto out_clean_job;
>
> v3d->queue[V3D_RENDER].active_job = &job->base;
>
> @@ -282,7 +282,7 @@ static struct dma_fence *v3d_render_job_run(struct drm_sched_job *sched_job)
>
> fence = v3d_fence_create(v3d, V3D_RENDER);
> if (IS_ERR(fence))
> - return NULL;
> + goto out_clean_job;
>
> if (job->base.irq_fence)
> dma_fence_put(job->base.irq_fence);
> @@ -303,6 +303,10 @@ static struct dma_fence *v3d_render_job_run(struct drm_sched_job *sched_job)
> V3D_CORE_WRITE(0, V3D_CLE_CT1QEA, job->end);
>
> return fence;
> +
> +out_clean_job:
> + v3d->queue[V3D_RENDER].active_job = NULL;
> + return NULL;
> }
>
> static struct dma_fence *
> @@ -313,16 +317,14 @@ v3d_tfu_job_run(struct drm_sched_job *sched_job)
> struct drm_device *dev = &v3d->drm;
> struct dma_fence *fence;
>
> - if (unlikely(job->base.base.s_fence->finished.error)) {
> - v3d->queue[V3D_TFU].active_job = NULL;
> - return NULL;
> - }
> + if (unlikely(job->base.base.s_fence->finished.error))
> + goto out_clean_job;
>
> v3d->queue[V3D_TFU].active_job = &job->base;
>
> fence = v3d_fence_create(v3d, V3D_TFU);
> if (IS_ERR(fence))
> - return NULL;
> + goto out_clean_job;
>
> if (job->base.irq_fence)
> dma_fence_put(job->base.irq_fence);
> @@ -350,6 +352,10 @@ v3d_tfu_job_run(struct drm_sched_job *sched_job)
> V3D_WRITE(V3D_TFU_ICFG(v3d->ver), job->args.icfg | V3D_TFU_ICFG_IOC);
>
> return fence;
> +
> +out_clean_job:
> + v3d->queue[V3D_TFU].active_job = NULL;
> + return NULL;
> }
>
> static struct dma_fence *
> @@ -361,10 +367,8 @@ v3d_csd_job_run(struct drm_sched_job *sched_job)
> struct dma_fence *fence;
> int i, csd_cfg0_reg;
>
> - if (unlikely(job->base.base.s_fence->finished.error)) {
> - v3d->queue[V3D_CSD].active_job = NULL;
> - return NULL;
> - }
> + if (unlikely(job->base.base.s_fence->finished.error))
> + goto out_clean_job;
>
> v3d->queue[V3D_CSD].active_job = &job->base;
>
> @@ -372,7 +376,7 @@ v3d_csd_job_run(struct drm_sched_job *sched_job)
>
> fence = v3d_fence_create(v3d, V3D_CSD);
> if (IS_ERR(fence))
> - return NULL;
> + goto out_clean_job;
>
> if (job->base.irq_fence)
> dma_fence_put(job->base.irq_fence);
> @@ -399,6 +403,10 @@ v3d_csd_job_run(struct drm_sched_job *sched_job)
> V3D_CORE_WRITE(0, csd_cfg0_reg, job->args.cfg[0]);
>
> return fence;
> +
> +out_clean_job:
> + v3d->queue[V3D_CSD].active_job = NULL;
> + return NULL;
> }
>
> static void
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 03/10] drm/v3d: Use inline lock for dma fence initialization
2026-04-13 15:03 ` [PATCH 03/10] drm/v3d: Use inline lock for dma fence initialization Maíra Canal
@ 2026-04-16 11:44 ` Tvrtko Ursulin
0 siblings, 0 replies; 24+ messages in thread
From: Tvrtko Ursulin @ 2026-04-16 11:44 UTC (permalink / raw)
To: Maíra Canal, Melissa Wen, Iago Toral, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
Christian König
Cc: kernel-dev, dri-devel
On 13/04/2026 16:03, Maíra Canal wrote:
> Since commit 1f32f310a13c ("dma-buf: inline spinlock for fence protection
> v5"), dma_fence_init() accepts a NULL external lock and will fall back to
> an inline spinlock embedded in the fence itself. The embedded spinlock
> allows the decoupling the lock and the fence lifetime.
>
> Pass NULL so each v3d fence uses its own inline lock. This will allow
> queue_lock to use spin_(un)lock_irq() in all its uses instead of
> spin_(un)lock_irqsave().
>
> Signed-off-by: Maíra Canal <mcanal@igalia.com>
> ---
> drivers/gpu/drm/v3d/v3d_fence.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/v3d/v3d_fence.c b/drivers/gpu/drm/v3d/v3d_fence.c
> index c500136d0455..9b1a882a4c15 100644
> --- a/drivers/gpu/drm/v3d/v3d_fence.c
> +++ b/drivers/gpu/drm/v3d/v3d_fence.c
> @@ -15,7 +15,7 @@ struct dma_fence *v3d_fence_create(struct v3d_dev *v3d, enum v3d_queue q)
> fence->dev = &v3d->drm;
> fence->queue = q;
> fence->seqno = ++queue->emit_seqno;
> - dma_fence_init(&fence->base, &v3d_fence_ops, &queue->queue_lock,
> + dma_fence_init(&fence->base, &v3d_fence_ops, NULL,
> queue->fence_context, fence->seqno);
>
> return &fence->base;
>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Regards,
Tvrtko
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 04/10] drm/v3d: Replace spin_lock_irqsave() with spin_lock_irq()
2026-04-13 15:03 ` [PATCH 04/10] drm/v3d: Replace spin_lock_irqsave() with spin_lock_irq() Maíra Canal
@ 2026-04-16 11:46 ` Tvrtko Ursulin
0 siblings, 0 replies; 24+ messages in thread
From: Tvrtko Ursulin @ 2026-04-16 11:46 UTC (permalink / raw)
To: Maíra Canal, Melissa Wen, Iago Toral, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
Christian König
Cc: kernel-dev, dri-devel
On 13/04/2026 16:03, Maíra Canal wrote:
> v3d_overflow_mem_work() and v3d_bin_job_run() both run from workqueue
> context where IRQs are always enabled, so the flags save/restore of
> spin_lock_irqsave() is unnecessary. Use spin_lock_irq() instead.
spin_(un)lock looks would be enough. Sorry if it was me who said
spin_lock_irq during the previous round?
Regards,
Tvrtko
> Signed-off-by: Maíra Canal <mcanal@igalia.com>
> ---
> drivers/gpu/drm/v3d/v3d_irq.c | 7 +++----
> drivers/gpu/drm/v3d/v3d_sched.c | 9 ++++-----
> 2 files changed, 7 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/gpu/drm/v3d/v3d_irq.c b/drivers/gpu/drm/v3d/v3d_irq.c
> index 86efaef2722c..d4d7c39227be 100644
> --- a/drivers/gpu/drm/v3d/v3d_irq.c
> +++ b/drivers/gpu/drm/v3d/v3d_irq.c
> @@ -47,7 +47,6 @@ v3d_overflow_mem_work(struct work_struct *work)
> struct v3d_queue_state *queue = &v3d->queue[V3D_BIN];
> struct v3d_bin_job *bin_job;
> struct drm_gem_object *obj;
> - unsigned long irqflags;
>
> if (IS_ERR(bo)) {
> drm_err(dev, "Couldn't allocate binner overflow mem\n");
> @@ -64,17 +63,17 @@ v3d_overflow_mem_work(struct work_struct *work)
> * bin job got scheduled, that's fine. We'll just give them
> * some binner pool anyway.
> */
> - spin_lock_irqsave(&queue->queue_lock, irqflags);
> + spin_lock_irq(&queue->queue_lock);
> bin_job = (struct v3d_bin_job *)queue->active_job;
>
> if (!bin_job) {
> - spin_unlock_irqrestore(&queue->queue_lock, irqflags);
> + spin_unlock_irq(&queue->queue_lock);
> goto out;
> }
>
> drm_gem_object_get(obj);
> list_add_tail(&bo->unref_head, &bin_job->render->unref_list);
> - spin_unlock_irqrestore(&queue->queue_lock, irqflags);
> + spin_unlock_irq(&queue->queue_lock);
>
> v3d_mmu_flush_all(v3d);
>
> diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
> index d42db75e639b..f3999678e5da 100644
> --- a/drivers/gpu/drm/v3d/v3d_sched.c
> +++ b/drivers/gpu/drm/v3d/v3d_sched.c
> @@ -204,7 +204,6 @@ static struct dma_fence *v3d_bin_job_run(struct drm_sched_job *sched_job)
> struct v3d_queue_state *queue = &v3d->queue[V3D_BIN];
> struct drm_device *dev = &v3d->drm;
> struct dma_fence *fence;
> - unsigned long irqflags;
>
> if (unlikely(job->base.base.s_fence->finished.error))
> goto out_clean_job;
> @@ -212,13 +211,13 @@ static struct dma_fence *v3d_bin_job_run(struct drm_sched_job *sched_job)
> /* Lock required around bin_job update vs
> * v3d_overflow_mem_work().
> */
> - spin_lock_irqsave(&queue->queue_lock, irqflags);
> + spin_lock_irq(&queue->queue_lock);
> queue->active_job = &job->base;
> /* Clear out the overflow allocation, so we don't
> * reuse the overflow attached to a previous job.
> */
> V3D_CORE_WRITE(0, V3D_PTB_BPOS, 0);
> - spin_unlock_irqrestore(&queue->queue_lock, irqflags);
> + spin_unlock_irq(&queue->queue_lock);
>
> v3d_invalidate_caches(v3d);
>
> @@ -254,9 +253,9 @@ static struct dma_fence *v3d_bin_job_run(struct drm_sched_job *sched_job)
> return fence;
>
> out_clean_job:
> - spin_lock_irqsave(&queue->queue_lock, irqflags);
> + spin_lock_irq(&queue->queue_lock);
> queue->active_job = NULL;
> - spin_unlock_irqrestore(&queue->queue_lock, irqflags);
> + spin_unlock_irq(&queue->queue_lock);
> return NULL;
> }
>
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 05/10] drm/v3d: Extract v3d_job_add_syncobjs() helper
2026-04-13 15:03 ` [PATCH 05/10] drm/v3d: Extract v3d_job_add_syncobjs() helper Maíra Canal
@ 2026-04-16 11:53 ` Tvrtko Ursulin
0 siblings, 0 replies; 24+ messages in thread
From: Tvrtko Ursulin @ 2026-04-16 11:53 UTC (permalink / raw)
To: Maíra Canal, Melissa Wen, Iago Toral, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
Christian König
Cc: kernel-dev, dri-devel
On 13/04/2026 16:03, Maíra Canal wrote:
> Move the syncobj dependency setup out of v3d_job_init() into its own
> v3d_job_add_syncobjs() helper. No functional change.
>
> This prepares for the next commit which changes the error handling, and
> for a later consolidation that separates job allocation from syncobj
> attachment.
>
> Signed-off-by: Maíra Canal <mcanal@igalia.com>
> ---
> drivers/gpu/drm/v3d/v3d_submit.c | 71 ++++++++++++++++++++++++----------------
> 1 file changed, 43 insertions(+), 28 deletions(-)
>
> diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
> index f75da2e3533e..054367ba533d 100644
> --- a/drivers/gpu/drm/v3d/v3d_submit.c
> +++ b/drivers/gpu/drm/v3d/v3d_submit.c
> @@ -162,14 +162,52 @@ v3d_job_deallocate(void **container)
> *container = NULL;
> }
>
> +static int
> +v3d_job_add_syncobjs(struct v3d_job *job, struct drm_file *file_priv,
> + u32 in_sync, struct v3d_submit_ext *se, enum v3d_queue queue)
> +{
> + bool has_multisync = se && (se->flags & DRM_V3D_EXT_ID_MULTI_SYNC);
> + struct v3d_dev *v3d = job->v3d;
> + int ret = 0;
Nitpick - looks like ret does not need to be initialized.
Otherwise extraction indeed looks like no functional change:
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Regards,
Tvrtko
> +
> + if (!has_multisync) {
> + ret = drm_sched_job_add_syncobj_dependency(&job->base, file_priv,
> + in_sync, 0);
> + // TODO: Investigate why this was filtered out for the IOCTL.
> + if (ret && ret != -ENOENT)
> + return ret;
> + return 0;
> + }
> +
> + if (se->in_sync_count && se->wait_stage == queue) {
> + struct drm_v3d_sem __user *handle = u64_to_user_ptr(se->in_syncs);
> +
> + for (int i = 0; i < se->in_sync_count; i++) {
> + struct drm_v3d_sem in;
> +
> + if (copy_from_user(&in, handle++, sizeof(in))) {
> + drm_dbg(&v3d->drm, "Failed to copy wait dep handle.\n");
> + return -EFAULT;
> + }
> +
> + ret = drm_sched_job_add_syncobj_dependency(&job->base,
> + file_priv, in.handle, 0);
> + // TODO: Investigate why this was filtered out for the IOCTL.
> + if (ret && ret != -ENOENT)
> + return ret;
> + }
> + }
> +
> + return 0;
> +}
> +
> static int
> v3d_job_init(struct v3d_dev *v3d, struct drm_file *file_priv,
> struct v3d_job *job, void (*free)(struct kref *ref),
> u32 in_sync, struct v3d_submit_ext *se, enum v3d_queue queue)
> {
> struct v3d_file_priv *v3d_priv = file_priv->driver_priv;
> - bool has_multisync = se && (se->flags & DRM_V3D_EXT_ID_MULTI_SYNC);
> - int ret, i;
> + int ret;
>
> job->v3d = v3d;
> job->free = free;
> @@ -180,32 +218,9 @@ v3d_job_init(struct v3d_dev *v3d, struct drm_file *file_priv,
> if (ret)
> return ret;
>
> - if (has_multisync) {
> - if (se->in_sync_count && se->wait_stage == queue) {
> - struct drm_v3d_sem __user *handle = u64_to_user_ptr(se->in_syncs);
> -
> - for (i = 0; i < se->in_sync_count; i++) {
> - struct drm_v3d_sem in;
> -
> - if (copy_from_user(&in, handle++, sizeof(in))) {
> - ret = -EFAULT;
> - drm_dbg(&v3d->drm, "Failed to copy wait dep handle.\n");
> - goto fail_job_init;
> - }
> - ret = drm_sched_job_add_syncobj_dependency(&job->base, file_priv, in.handle, 0);
> -
> - // TODO: Investigate why this was filtered out for the IOCTL.
> - if (ret && ret != -ENOENT)
> - goto fail_job_init;
> - }
> - }
> - } else {
> - ret = drm_sched_job_add_syncobj_dependency(&job->base, file_priv, in_sync, 0);
> -
> - // TODO: Investigate why this was filtered out for the IOCTL.
> - if (ret && ret != -ENOENT)
> - goto fail_job_init;
> - }
> + ret = v3d_job_add_syncobjs(job, file_priv, in_sync, se, queue);
> + if (ret)
> + goto fail_job_init;
>
> /* CPU jobs don't require hardware resources */
> if (queue != V3D_CPU) {
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 06/10] drm/v3d: Reject invalid syncobj handles in submit ioctls
2026-04-13 15:03 ` [PATCH 06/10] drm/v3d: Reject invalid syncobj handles in submit ioctls Maíra Canal
@ 2026-04-16 11:59 ` Tvrtko Ursulin
2026-05-08 15:28 ` Maíra Canal
2026-04-17 15:05 ` Tvrtko Ursulin
1 sibling, 1 reply; 24+ messages in thread
From: Tvrtko Ursulin @ 2026-04-16 11:59 UTC (permalink / raw)
To: Maíra Canal, Melissa Wen, Iago Toral, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
Christian König
Cc: kernel-dev, dri-devel
On 13/04/2026 16:03, Maíra Canal wrote:
> drm_sched_job_add_syncobj_dependency() returns -ENOENT both when the
> handle is zero and when the handle is non-zero but does not find a
> corresponding existing syncobj (userspace bug). The driver previously
> ignored -ENOENT in both cases, silently accepting broken handles.
>
> Distinguish the two: skip the call entirely when the handle is zero, as
> there is no dependency, and let -ENOENT propagate for non-zero handles
> that don't resolve, turning the error into a proper return to userspace.
>
> Signed-off-by: Maíra Canal <mcanal@igalia.com>
> ---
> drivers/gpu/drm/v3d/v3d_submit.c | 23 ++++++++++++-----------
> 1 file changed, 12 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
> index 054367ba533d..fe8b5757c3e8 100644
> --- a/drivers/gpu/drm/v3d/v3d_submit.c
> +++ b/drivers/gpu/drm/v3d/v3d_submit.c
> @@ -171,12 +171,11 @@ v3d_job_add_syncobjs(struct v3d_job *job, struct drm_file *file_priv,
> int ret = 0;
>
> if (!has_multisync) {
> - ret = drm_sched_job_add_syncobj_dependency(&job->base, file_priv,
> - in_sync, 0);
> - // TODO: Investigate why this was filtered out for the IOCTL.
> - if (ret && ret != -ENOENT)
> - return ret;
> - return 0;
> + /* Ignore syncobj if its handle is NULL */
Nitpick s/NULL/zero/ since it is not a pointer.
> + if (in_sync)
> + ret = drm_sched_job_add_syncobj_dependency(&job->base, file_priv,
> + in_sync, 0);
Ah now ret needs to be initialized to zero (comment from previous patch)!
> + return ret;
> }
>
> if (se->in_sync_count && se->wait_stage == queue) {
> @@ -190,11 +189,13 @@ v3d_job_add_syncobjs(struct v3d_job *job, struct drm_file *file_priv,
> return -EFAULT;
> }
>
> - ret = drm_sched_job_add_syncobj_dependency(&job->base,
> - file_priv, in.handle, 0);
> - // TODO: Investigate why this was filtered out for the IOCTL.
> - if (ret && ret != -ENOENT)
> - return ret;
> + /* Ignore syncobj if its handle is NULL */
As above. Or perhaps even NULL makes sense as in null handle being an
invalid handle reads more obvious than a zero handle?
> + if (in.handle) {
> + ret = drm_sched_job_add_syncobj_dependency(&job->base,
> + file_priv, in.handle, 0);
> + if (ret)
> + return ret;
> + }
> }
> }
>
>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
With a slight danger there is some userspace out there which relies on
this? Have you look at how Mesa uses it, and the history?
Regards,
Tvrtko
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 07/10] drm/v3d: Migrate BO reservation locking to DRM exec
2026-04-13 15:03 ` [PATCH 07/10] drm/v3d: Migrate BO reservation locking to DRM exec Maíra Canal
@ 2026-04-16 12:24 ` Tvrtko Ursulin
0 siblings, 0 replies; 24+ messages in thread
From: Tvrtko Ursulin @ 2026-04-16 12:24 UTC (permalink / raw)
To: Maíra Canal, Melissa Wen, Iago Toral, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
Christian König
Cc: kernel-dev, dri-devel
On 13/04/2026 16:03, Maíra Canal wrote:
> Replace the drm_gem_(un)lock_reservations() + ww_acquire_ctx pattern with
> DRM exec across all submit ioctls. Just a straightforward conversion; no
> functional change.
>
> Signed-off-by: Maíra Canal <mcanal@igalia.com>
> ---
> drivers/gpu/drm/v3d/Kconfig | 1 +
> drivers/gpu/drm/v3d/v3d_drv.h | 3 +-
> drivers/gpu/drm/v3d/v3d_submit.c | 68 +++++++++++++++++-----------------------
> 3 files changed, 32 insertions(+), 40 deletions(-)
>
> diff --git a/drivers/gpu/drm/v3d/Kconfig b/drivers/gpu/drm/v3d/Kconfig
> index ce62c5908e1d..6a33e0ab30de 100644
> --- a/drivers/gpu/drm/v3d/Kconfig
> +++ b/drivers/gpu/drm/v3d/Kconfig
> @@ -5,6 +5,7 @@ config DRM_V3D
> depends on DRM
> depends on COMMON_CLK
> depends on MMU
> + select DRM_EXEC
> select DRM_SCHED
> select DRM_GEM_SHMEM_HELPER
> help
> diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
> index 5a0b9da2c3aa..788a45c60290 100644
> --- a/drivers/gpu/drm/v3d/v3d_drv.h
> +++ b/drivers/gpu/drm/v3d/v3d_drv.h
> @@ -7,6 +7,7 @@
> #include <linux/spinlock_types.h>
> #include <linux/workqueue.h>
>
> +#include <drm/drm_exec.h>
> #include <drm/drm_gem.h>
> #include <drm/drm_gem_shmem_helper.h>
> #include <drm/gpu_scheduler.h>
> @@ -418,7 +419,7 @@ struct v3d_indirect_csd_info {
> struct drm_gem_object *indirect;
>
> /* Context of the Indirect CSD job */
> - struct ww_acquire_ctx acquire_ctx;
> + struct drm_exec exec;
> };
>
> struct v3d_timestamp_query_info {
> diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
> index fe8b5757c3e8..6aa8c65e873c 100644
> --- a/drivers/gpu/drm/v3d/v3d_submit.c
> +++ b/drivers/gpu/drm/v3d/v3d_submit.c
> @@ -20,20 +20,18 @@
> * to v3d, so we don't attach dma-buf fences to them.
> */
> static int
> -v3d_lock_bo_reservations(struct v3d_job *job,
> - struct ww_acquire_ctx *acquire_ctx)
> +v3d_lock_bo_reservations(struct v3d_job *job, struct drm_exec *exec)
> {
> int i, ret;
>
> - ret = drm_gem_lock_reservations(job->bo, job->bo_count, acquire_ctx);
> - if (ret)
> - return ret;
> -
> - for (i = 0; i < job->bo_count; i++) {
> - ret = dma_resv_reserve_fences(job->bo[i]->resv, 1);
> + drm_exec_init(exec, DRM_EXEC_INTERRUPTIBLE_WAIT, job->bo_count);
> + drm_exec_until_all_locked(exec) {
> + ret = drm_exec_prepare_array(exec, job->bo, job->bo_count, 1);
> if (ret)
> goto fail;
> + }
>
> + for (i = 0; i < job->bo_count; i++) {
> ret = drm_sched_job_add_implicit_dependencies(&job->base,
> job->bo[i], true);
> if (ret)
> @@ -43,7 +41,7 @@ v3d_lock_bo_reservations(struct v3d_job *job,
> return 0;
>
> fail:
> - drm_gem_unlock_reservations(job->bo, job->bo_count, acquire_ctx);
> + drm_exec_fini(exec);
> return ret;
> }
>
> @@ -259,7 +257,7 @@ v3d_push_job(struct v3d_job *job)
> static void
> v3d_attach_fences_and_unlock_reservation(struct drm_file *file_priv,
> struct v3d_job *job,
> - struct ww_acquire_ctx *acquire_ctx,
> + struct drm_exec *exec,
> u32 out_sync,
> struct v3d_submit_ext *se,
> struct dma_fence *done_fence)
> @@ -274,7 +272,7 @@ v3d_attach_fences_and_unlock_reservation(struct drm_file *file_priv,
> DMA_RESV_USAGE_WRITE);
> }
>
> - drm_gem_unlock_reservations(job->bo, job->bo_count, acquire_ctx);
> + drm_exec_fini(exec);
>
> /* Update the return sync object for the job */
> /* If it only supports a single signal semaphore*/
> @@ -305,7 +303,7 @@ v3d_setup_csd_jobs_and_bos(struct drm_file *file_priv,
> struct v3d_csd_job **job,
> struct v3d_job **clean_job,
> struct v3d_submit_ext *se,
> - struct ww_acquire_ctx *acquire_ctx)
> + struct drm_exec *exec)
> {
> int ret;
>
> @@ -338,7 +336,7 @@ v3d_setup_csd_jobs_and_bos(struct drm_file *file_priv,
> if (ret)
> return ret;
>
> - return v3d_lock_bo_reservations(*clean_job, acquire_ctx);
> + return v3d_lock_bo_reservations(*clean_job, exec);
> }
>
> static void
> @@ -491,7 +489,7 @@ v3d_get_cpu_indirect_csd_params(struct drm_file *file_priv,
>
> return v3d_setup_csd_jobs_and_bos(file_priv, v3d, &indirect_csd.submit,
> &info->job, &info->clean_job,
> - NULL, &info->acquire_ctx);
> + NULL, &info->exec);
> }
>
> /* Get data for the query timestamp job submission. */
> @@ -906,7 +904,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
> struct v3d_render_job *render = NULL;
> struct v3d_job *clean_job = NULL;
> struct v3d_job *last_job;
> - struct ww_acquire_ctx acquire_ctx;
> + struct drm_exec exec;
> int ret = 0;
>
> trace_v3d_submit_cl_ioctl(&v3d->drm, args->rcl_start, args->rcl_end);
> @@ -986,7 +984,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
> if (ret)
> goto fail;
>
> - ret = v3d_lock_bo_reservations(last_job, &acquire_ctx);
> + ret = v3d_lock_bo_reservations(last_job, &exec);
> if (ret)
> goto fail;
>
> @@ -1035,7 +1033,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
>
> v3d_attach_fences_and_unlock_reservation(file_priv,
> last_job,
> - &acquire_ctx,
> + &exec,
> args->out_sync,
> &se,
> last_job->done_fence);
> @@ -1049,8 +1047,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
> fail_unreserve:
> mutex_unlock(&v3d->sched_lock);
> fail_perfmon:
> - drm_gem_unlock_reservations(last_job->bo,
> - last_job->bo_count, &acquire_ctx);
> + drm_exec_fini(&exec);
> fail:
> v3d_job_cleanup((void *)bin);
> v3d_job_cleanup((void *)render);
> @@ -1077,7 +1074,7 @@ v3d_submit_tfu_ioctl(struct drm_device *dev, void *data,
> struct drm_v3d_submit_tfu *args = data;
> struct v3d_submit_ext se = {0};
> struct v3d_tfu_job *job = NULL;
> - struct ww_acquire_ctx acquire_ctx;
> + struct drm_exec exec;
> int ret = 0;
>
> trace_v3d_submit_tfu_ioctl(&v3d->drm, args->iia);
> @@ -1133,7 +1130,7 @@ v3d_submit_tfu_ioctl(struct drm_device *dev, void *data,
> job->base.bo[job->base.bo_count] = bo;
> }
>
> - ret = v3d_lock_bo_reservations(&job->base, &acquire_ctx);
> + ret = v3d_lock_bo_reservations(&job->base, &exec);
> if (ret)
> goto fail;
>
> @@ -1142,7 +1139,7 @@ v3d_submit_tfu_ioctl(struct drm_device *dev, void *data,
> mutex_unlock(&v3d->sched_lock);
>
> v3d_attach_fences_and_unlock_reservation(file_priv,
> - &job->base, &acquire_ctx,
> + &job->base, &exec,
> args->out_sync,
> &se,
> job->base.done_fence);
> @@ -1177,7 +1174,7 @@ v3d_submit_csd_ioctl(struct drm_device *dev, void *data,
> struct v3d_submit_ext se = {0};
> struct v3d_csd_job *job = NULL;
> struct v3d_job *clean_job = NULL;
> - struct ww_acquire_ctx acquire_ctx;
> + struct drm_exec exec;
> int ret;
>
> trace_v3d_submit_csd_ioctl(&v3d->drm, args->cfg[5], args->cfg[6]);
> @@ -1204,8 +1201,7 @@ v3d_submit_csd_ioctl(struct drm_device *dev, void *data,
> }
>
> ret = v3d_setup_csd_jobs_and_bos(file_priv, v3d, args,
> - &job, &clean_job, &se,
> - &acquire_ctx);
> + &job, &clean_job, &se, &exec);
> if (ret)
> goto fail;
>
> @@ -1236,7 +1232,7 @@ v3d_submit_csd_ioctl(struct drm_device *dev, void *data,
>
> v3d_attach_fences_and_unlock_reservation(file_priv,
> clean_job,
> - &acquire_ctx,
> + &exec,
> args->out_sync,
> &se,
> clean_job->done_fence);
> @@ -1249,8 +1245,7 @@ v3d_submit_csd_ioctl(struct drm_device *dev, void *data,
> fail_unreserve:
> mutex_unlock(&v3d->sched_lock);
> fail_perfmon:
> - drm_gem_unlock_reservations(clean_job->bo, clean_job->bo_count,
> - &acquire_ctx);
> + drm_exec_fini(&exec);
> fail:
> v3d_job_cleanup((void *)job);
> v3d_job_cleanup(clean_job);
> @@ -1288,7 +1283,7 @@ v3d_submit_cpu_ioctl(struct drm_device *dev, void *data,
> struct v3d_cpu_job *cpu_job = NULL;
> struct v3d_csd_job *csd_job = NULL;
> struct v3d_job *clean_job = NULL;
> - struct ww_acquire_ctx acquire_ctx;
> + struct drm_exec exec;
> int ret;
>
> if (args->flags && !(args->flags & DRM_V3D_SUBMIT_EXTENSION)) {
> @@ -1339,7 +1334,7 @@ v3d_submit_cpu_ioctl(struct drm_device *dev, void *data,
> if (ret)
> goto fail;
>
> - ret = v3d_lock_bo_reservations(&cpu_job->base, &acquire_ctx);
> + ret = v3d_lock_bo_reservations(&cpu_job->base, &exec);
> if (ret)
> goto fail;
> }
> @@ -1373,14 +1368,14 @@ v3d_submit_cpu_ioctl(struct drm_device *dev, void *data,
>
> v3d_attach_fences_and_unlock_reservation(file_priv,
> &cpu_job->base,
> - &acquire_ctx, 0,
> + &exec, 0,
> out_se, cpu_job->base.done_fence);
>
> switch (cpu_job->job_type) {
> case V3D_CPU_JOB_TYPE_INDIRECT_CSD:
> v3d_attach_fences_and_unlock_reservation(file_priv,
> clean_job,
> - &cpu_job->indirect_csd.acquire_ctx,
> + &cpu_job->indirect_csd.exec,
> 0, &se, clean_job->done_fence);
> break;
> default:
> @@ -1395,13 +1390,8 @@ v3d_submit_cpu_ioctl(struct drm_device *dev, void *data,
>
> fail_unreserve:
> mutex_unlock(&v3d->sched_lock);
> -
> - drm_gem_unlock_reservations(cpu_job->base.bo, cpu_job->base.bo_count,
> - &acquire_ctx);
> -
> - drm_gem_unlock_reservations(clean_job->bo, clean_job->bo_count,
> - &cpu_job->indirect_csd.acquire_ctx);
> -
> + drm_exec_fini(&exec);
> + drm_exec_fini(&cpu_job->indirect_csd.exec);
I am not familiar with the whole indirect_csd setup but I have a
question. Could you in this ioctl not share the same drm_exec context
(or ww_acquire_ctx before)?
It would need to be passed down from v3d_submit_cl_ioctl to extension
chain processing as an external paramenter (not in a job), possibly
more, but could that remove the need to have 2x drm_exec_fini in here?
I did not yet manage to follow the whole flow but intuitively thinking
is one ioctl, one acquire/drm_exec context should be enough.
Regards,
Tvrtko
> fail:
> v3d_job_cleanup((void *)cpu_job);
> v3d_job_cleanup((void *)csd_job);
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 08/10] drm/v3d: Introduce struct v3d_submit and convert CL/TFU/CSD ioctls
2026-04-13 15:03 ` [PATCH 08/10] drm/v3d: Introduce struct v3d_submit and convert CL/TFU/CSD ioctls Maíra Canal
@ 2026-04-16 14:16 ` Tvrtko Ursulin
0 siblings, 0 replies; 24+ messages in thread
From: Tvrtko Ursulin @ 2026-04-16 14:16 UTC (permalink / raw)
To: Maíra Canal, Melissa Wen, Iago Toral, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
Christian König
Cc: kernel-dev, dri-devel
On 13/04/2026 16:03, Maíra Canal wrote:
> As the V3D driver grew with time, different types of submission were added
> and the submission code grew more complex, but the driver stuck with the
> same abstractions.
>
> Nowadays, the submission ioctls don't submit a single job, but a
> chain of jobs:
>
> 1. v3d_submit_cl_ioctl() submits a BIN job (optional), RENDER job
> (mandatory), and a CLEAN_CACHE job (optional).
> 2. v3d_submit_csd_ioctl() submits a CSD, and a CLEAN_CACHE job.
> 3. v3d_submit_tfu_ioctl() submits a TFU job.
>
> Therefore, each ioctl submits a chain of jobs in which each job depends on
> the previous one. However, this concept is not well represented in software
> at the moment.
>
> To address this, introduce a new concept: the struct v3d_submit, which
> groups the submission state and represents the submission chain formed by
> an ordered array of jobs.
>
> Add new helpers to allocate, add jobs to the chain and submit jobs to
> the scheduler, all based on the new struct. Convert v3d_submit_cl_ioctl(),
> v3d_submit_tfu_ioctl() and v3d_submit_csd_ioctl() to the new pattern. Each
> ioctl now follows the same flow: add jobs -> lookup BOs -> lock
> reservations -> attach perfmon -> submit chain -> attach fences -> put
> jobs.
>
> The CPU ioctl is left on the old helpers for now; its indirect CSD path
> requires some restructuring that will be addressed in the next commit.
>
> Signed-off-by: Maíra Canal <mcanal@igalia.com>
> ---
> drivers/gpu/drm/v3d/v3d_drv.h | 21 +++
> drivers/gpu/drm/v3d/v3d_submit.c | 348 ++++++++++++++++++++++-----------------
> 2 files changed, 215 insertions(+), 154 deletions(-)
>
> diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
> index 788a45c60290..fc12e5215fb0 100644
> --- a/drivers/gpu/drm/v3d/v3d_drv.h
> +++ b/drivers/gpu/drm/v3d/v3d_drv.h
> @@ -288,6 +288,27 @@ to_v3d_fence(struct dma_fence *fence)
> #define V3D_CORE_READ(core, offset) readl(v3d->core_regs[core] + offset)
> #define V3D_CORE_WRITE(core, offset, val) writel(val, v3d->core_regs[core] + offset)
>
> +#define V3D_MAX_JOBS_PER_SUBMISSION 3
> +
> +/* Per-ioctl submission context */
> +struct v3d_submit {
> + struct v3d_dev *v3d;
> +
> + struct drm_file *file_priv;
> +
> + /* DRM exec context for this submission. */
> + struct drm_exec exec;
I see, the very thing I mentioned while reading 7/10, cool.
> +
> + /* Ordered array of jobs forming the submission chain. Jobs are
> + * appended via v3d_submit_add_job(), then chained and pushed to
> + * the scheduler by v3d_submit_jobs().
> + */
> + struct v3d_job *jobs[V3D_MAX_JOBS_PER_SUBMISSION];
> +
> + /* Number of jobs currently in @jobs. */
> + u32 job_count;
> +};
> +
> struct v3d_job {
> struct drm_sched_job base;
>
> diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
> index 6aa8c65e873c..5b519a892985 100644
> --- a/drivers/gpu/drm/v3d/v3d_submit.c
> +++ b/drivers/gpu/drm/v3d/v3d_submit.c
> @@ -241,6 +241,81 @@ v3d_job_init(struct v3d_dev *v3d, struct drm_file *file_priv,
> return ret;
> }
>
> +static int
> +v3d_submit_add_job(struct v3d_submit *submit, void **container, size_t size,
> + void (*free)(struct kref *ref), enum v3d_queue queue)
> +{
> + struct v3d_file_priv *v3d_priv = submit->file_priv->driver_priv;
> + struct v3d_dev *v3d = submit->v3d;
> + struct v3d_job *job;
> + int ret;
> +
> + *container = kcalloc(1, size, GFP_KERNEL);
> + if (!*container)
> + return -ENOMEM;
I still think we could do better than void * and also dont' see why
kcalloc but okay, it can be revisited later.
> +
> + job = *container;
> + job->v3d = v3d;
> + job->free = free;
> + job->file_priv = v3d_priv;
> +
> + ret = drm_sched_job_init(&job->base, &v3d_priv->sched_entity[queue],
> + 1, v3d_priv, submit->file_priv->client_id);
> + if (ret)
> + goto fail_free;
> +
> + /* CPU jobs don't require hardware resources */
> + if (queue != V3D_CPU) {
> + ret = v3d_pm_runtime_get(v3d);
> + if (ret)
> + goto fail_sched_job;
> + job->has_pm_ref = true;
> + }
> +
> + kref_init(&job->refcount);
> +
> + job->client_stats = v3d_stats_get(v3d_priv->stats[queue]);
> + job->global_stats = v3d_stats_get(v3d->queue[queue].stats);
> +
> + submit->jobs[submit->job_count++] = job;
> +
> + return 0;
> +
> +fail_sched_job:
> + drm_sched_job_cleanup(&job->base);
> +fail_free:
> + kfree(*container);
> + *container = NULL;
> + return ret;
> +}
> +
> +static int
> +v3d_attach_perfmon_to_jobs(struct v3d_submit *submit, u32 perfmon_id)
> +{
> + struct v3d_file_priv *v3d_priv = submit->file_priv->driver_priv;
> + struct v3d_dev *v3d = submit->v3d;
> + struct v3d_perfmon *perfmon;
> +
> + if (!perfmon_id)
> + return 0;
> +
> + if (v3d->global_perfmon)
> + return -EAGAIN;
> +
> + perfmon = v3d_perfmon_find(v3d_priv, perfmon_id);
> + if (!perfmon)
> + return -ENOENT;
> +
> + for (int i = 0; i < submit->job_count; i++) {
> + submit->jobs[i]->perfmon = perfmon;
> + v3d_perfmon_get(perfmon);
> + }
> +
> + v3d_perfmon_put(perfmon);
I suppose you could save one atomic by adding a conditional in the loop
(to skip get on first job for example) but maybe it would be a bit ugly,
not sure.
> +
> + return 0;
> +}
> +
> static void
> v3d_push_job(struct v3d_job *job)
> {
> @@ -254,6 +329,30 @@ v3d_push_job(struct v3d_job *job)
> drm_sched_entity_push_job(&job->base);
> }
>
> +static int
> +v3d_submit_jobs(struct v3d_submit *submit)
> +{
> + struct v3d_dev *v3d = submit->v3d;
> + int ret;
> +
> + guard(mutex)(&v3d->sched_lock);
> +
> + for (int i = 0; i < submit->job_count; i++) {
> + struct v3d_job *job = submit->jobs[i];
> +
> + v3d_push_job(job);
> +
> + if (i + 1 < submit->job_count) {
> + ret = drm_sched_job_add_dependency(&submit->jobs[i + 1]->base,
> + dma_fence_get(job->done_fence));
Alternative to i + 1 is to track prev. Not sure what is nicer, just
thinking out loud.
> + if (ret)
> + return ret;
> + }
> + }
> +
> + return 0;
> +}
> +
> static void
> v3d_attach_fences_and_unlock_reservation(struct drm_file *file_priv,
> struct v3d_job *job,
> @@ -896,18 +995,19 @@ int
> v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
> struct drm_file *file_priv)
> {
> - struct v3d_dev *v3d = to_v3d_dev(dev);
> - struct v3d_file_priv *v3d_priv = file_priv->driver_priv;
> + struct v3d_submit submit = {
> + .v3d = to_v3d_dev(dev),
> + .file_priv = file_priv,
> + .job_count = 0
Zero is implied.
> + };
> struct drm_v3d_submit_cl *args = data;
> struct v3d_submit_ext se = {0};
> struct v3d_bin_job *bin = NULL;
> struct v3d_render_job *render = NULL;
> struct v3d_job *clean_job = NULL;
> - struct v3d_job *last_job;
> - struct drm_exec exec;
> - int ret = 0;
> + int ret;
>
> - trace_v3d_submit_cl_ioctl(&v3d->drm, args->rcl_start, args->rcl_end);
> + trace_v3d_submit_cl_ioctl(dev, args->rcl_start, args->rcl_end);
>
> if (args->pad)
> return -EINVAL;
> @@ -927,131 +1027,84 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
> }
> }
>
> - ret = v3d_job_allocate(v3d, (void *)&render, sizeof(*render));
> - if (ret)
> - return ret;
> -
> - ret = v3d_job_init(v3d, file_priv, &render->base,
> - v3d_render_job_free, args->in_sync_rcl, &se, V3D_RENDER);
> - if (ret) {
> - v3d_job_deallocate((void *)&render);
> - goto fail;
> - }
> -
> - render->start = args->rcl_start;
> - render->end = args->rcl_end;
> - INIT_LIST_HEAD(&render->unref_list);
> -
> if (args->bcl_start != args->bcl_end) {
> - ret = v3d_job_allocate(v3d, (void *)&bin, sizeof(*bin));
> + ret = v3d_submit_add_job(&submit, (void **)&bin, sizeof(*bin),
> + v3d_job_free, V3D_BIN);
> if (ret)
> goto fail;
>
> - ret = v3d_job_init(v3d, file_priv, &bin->base,
> - v3d_job_free, args->in_sync_bcl, &se, V3D_BIN);
> - if (ret) {
> - v3d_job_deallocate((void *)&bin);
> - goto fail;
> - }
> -
> bin->start = args->bcl_start;
> bin->end = args->bcl_end;
> bin->qma = args->qma;
> bin->qms = args->qms;
> bin->qts = args->qts;
> - bin->render = render;
> - }
>
> - if (args->flags & DRM_V3D_SUBMIT_CL_FLUSH_CACHE) {
> - ret = v3d_job_allocate(v3d, (void *)&clean_job, sizeof(*clean_job));
> + ret = v3d_job_add_syncobjs(&bin->base, file_priv,
> + args->in_sync_bcl, &se, V3D_BIN);
> if (ret)
> goto fail;
> -
> - ret = v3d_job_init(v3d, file_priv, clean_job,
> - v3d_job_free, 0, NULL, V3D_CACHE_CLEAN);
> - if (ret) {
> - v3d_job_deallocate((void *)&clean_job);
> - goto fail;
> - }
> -
> - last_job = clean_job;
> - } else {
> - last_job = &render->base;
> }
>
> - ret = v3d_lookup_bos(dev, file_priv, last_job,
> + ret = v3d_submit_add_job(&submit, (void **)&render, sizeof(*render),
> + v3d_render_job_free, V3D_RENDER);
> + if (ret)
> + goto fail;
> +
> + INIT_LIST_HEAD(&render->unref_list);
> + render->start = args->rcl_start;
> + render->end = args->rcl_end;
> +
> + if (bin)
> + bin->render = render;
> +
> + ret = v3d_job_add_syncobjs(&render->base, file_priv, args->in_sync_rcl,
> + &se, V3D_RENDER);
> + if (ret)
> + goto fail;
> +
> + if (args->flags & DRM_V3D_SUBMIT_CL_FLUSH_CACHE) {
> + ret = v3d_submit_add_job(&submit, (void **)&clean_job,
> + sizeof(*clean_job), v3d_job_free,
> + V3D_CACHE_CLEAN);
> + if (ret)
> + goto fail;
> + }
> +
> + ret = v3d_lookup_bos(dev, file_priv,
> + submit.jobs[submit.job_count - 1],
> args->bo_handles, args->bo_handle_count);
Could bo lookup be before job creation? Just because it is the step much
more likely to fail. I think at least.. It is only a question.
> if (ret)
> goto fail;
>
> - ret = v3d_lock_bo_reservations(last_job, &exec);
> + ret = v3d_lock_bo_reservations(submit.jobs[submit.job_count - 1],
job_count - 1 I think ends up repeated around four times in this
function so perhaps it could be worth having a helper such as
v3d_submit_last_job(&submit)? It wouldn't be any shorted so not sure.
> + &submit.exec);
> if (ret)
> goto fail;
>
> - if (args->perfmon_id) {
> - if (v3d->global_perfmon) {
> - ret = -EAGAIN;
> - goto fail_perfmon;
> - }
> + ret = v3d_attach_perfmon_to_jobs(&submit, args->perfmon_id);
> + if (ret)
> + goto fail_perfmon;
>
> - render->base.perfmon = v3d_perfmon_find(v3d_priv,
> - args->perfmon_id);
> -
> - if (!render->base.perfmon) {
> - ret = -ENOENT;
> - goto fail_perfmon;
> - }
> - }
> -
> - mutex_lock(&v3d->sched_lock);
> - if (bin) {
> - bin->base.perfmon = render->base.perfmon;
> - v3d_perfmon_get(bin->base.perfmon);
> - v3d_push_job(&bin->base);
> -
> - ret = drm_sched_job_add_dependency(&render->base.base,
> - dma_fence_get(bin->base.done_fence));
> - if (ret)
> - goto fail_unreserve;
> - }
> -
> - v3d_push_job(&render->base);
> -
> - if (clean_job) {
> - struct dma_fence *render_fence =
> - dma_fence_get(render->base.done_fence);
> - ret = drm_sched_job_add_dependency(&clean_job->base,
> - render_fence);
> - if (ret)
> - goto fail_unreserve;
> - clean_job->perfmon = render->base.perfmon;
> - v3d_perfmon_get(clean_job->perfmon);
> - v3d_push_job(clean_job);
> - }
> -
> - mutex_unlock(&v3d->sched_lock);
> + ret = v3d_submit_jobs(&submit);
> + if (ret)
> + goto fail_perfmon;
>
> v3d_attach_fences_and_unlock_reservation(file_priv,
> - last_job,
> - &exec,
> - args->out_sync,
> - &se,
> - last_job->done_fence);
> + submit.jobs[submit.job_count - 1],
> + &submit.exec,
> + args->out_sync, &se,
> + submit.jobs[submit.job_count - 1]->done_fence);
>
> - v3d_job_put(&bin->base);
> - v3d_job_put(&render->base);
> - v3d_job_put(clean_job);
> + for (int i = 0; i < submit.job_count; i++)
> + v3d_job_put(submit.jobs[i]);
Worth having v3d_submit_put_jobs()?
>
> return 0;
>
> -fail_unreserve:
> - mutex_unlock(&v3d->sched_lock);
> fail_perfmon:
> - drm_exec_fini(&exec);
> + drm_exec_fini(&submit.exec);
> fail:
> - v3d_job_cleanup((void *)bin);
> - v3d_job_cleanup((void *)render);
> - v3d_job_cleanup(clean_job);
> + for (int i = 0; i < submit.job_count; i++)
> + v3d_job_cleanup(submit.jobs[i]);
And/or v3d_submit_cleanup_jobs()?
> v3d_put_multisync_post_deps(&se);
>
> return ret;
> @@ -1070,14 +1123,17 @@ int
> v3d_submit_tfu_ioctl(struct drm_device *dev, void *data,
> struct drm_file *file_priv)
> {
> - struct v3d_dev *v3d = to_v3d_dev(dev);
> + struct v3d_submit submit = {
> + .v3d = to_v3d_dev(dev),
> + .file_priv = file_priv,
> + .job_count = 0
> + };
> struct drm_v3d_submit_tfu *args = data;
> struct v3d_submit_ext se = {0};
> struct v3d_tfu_job *job = NULL;
> - struct drm_exec exec;
> int ret = 0;
>
> - trace_v3d_submit_tfu_ioctl(&v3d->drm, args->iia);
> + trace_v3d_submit_tfu_ioctl(dev, args->iia);
>
> if (args->flags && !(args->flags & DRM_V3D_SUBMIT_EXTENSION)) {
> drm_dbg(dev, "invalid flags: %d\n", args->flags);
> @@ -1092,16 +1148,14 @@ v3d_submit_tfu_ioctl(struct drm_device *dev, void *data,
> }
> }
>
> - ret = v3d_job_allocate(v3d, (void *)&job, sizeof(*job));
> + ret = v3d_submit_add_job(&submit, (void **)&job, sizeof(*job),
> + v3d_job_free, V3D_TFU);
> + if (ret)
> + goto fail;
> +
> + ret = v3d_job_add_syncobjs(&job->base, file_priv, args->in_sync, &se, V3D_TFU);
> if (ret)
> - return ret;
> -
> - ret = v3d_job_init(v3d, file_priv, &job->base,
> - v3d_job_free, args->in_sync, &se, V3D_TFU);
> - if (ret) {
> - v3d_job_deallocate((void *)&job);
> goto fail;
> - }
>
> job->base.bo = kzalloc_objs(*job->base.bo, ARRAY_SIZE(args->bo_handles));
> if (!job->base.bo) {
> @@ -1130,24 +1184,25 @@ v3d_submit_tfu_ioctl(struct drm_device *dev, void *data,
> job->base.bo[job->base.bo_count] = bo;
> }
>
> - ret = v3d_lock_bo_reservations(&job->base, &exec);
> + ret = v3d_lock_bo_reservations(&job->base, &submit.exec);
You could almost wrap these in somthing like
v3d_submit_lock_reservations(&submit) since it is always the last bo.
> if (ret)
> goto fail;
>
> - mutex_lock(&v3d->sched_lock);
> - v3d_push_job(&job->base);
> - mutex_unlock(&v3d->sched_lock);
> + ret = v3d_submit_jobs(&submit);
> + if (ret)
> + goto fail_submit;
>
> v3d_attach_fences_and_unlock_reservation(file_priv,
> - &job->base, &exec,
> - args->out_sync,
> - &se,
> + &job->base, &submit.exec,
> + args->out_sync, &se,
> job->base.done_fence);
>
> v3d_job_put(&job->base);
>
> return 0;
>
> +fail_submit:
> + drm_exec_fini(&submit.exec);
The drm_exec now sticks out since nothing else in the function
"mentions" it. v3d_submit_unlock_reservations(&submit)?
> fail:
> v3d_job_cleanup((void *)job);
> v3d_put_multisync_post_deps(&se);
> @@ -1168,21 +1223,23 @@ int
> v3d_submit_csd_ioctl(struct drm_device *dev, void *data,
> struct drm_file *file_priv)
> {
> - struct v3d_dev *v3d = to_v3d_dev(dev);
> - struct v3d_file_priv *v3d_priv = file_priv->driver_priv;
> + struct v3d_submit submit = {
> + .v3d = to_v3d_dev(dev),
> + .file_priv = file_priv,
> + .job_count = 0
> + };
> struct drm_v3d_submit_csd *args = data;
> struct v3d_submit_ext se = {0};
> struct v3d_csd_job *job = NULL;
> struct v3d_job *clean_job = NULL;
> - struct drm_exec exec;
> int ret;
>
> - trace_v3d_submit_csd_ioctl(&v3d->drm, args->cfg[5], args->cfg[6]);
> + trace_v3d_submit_csd_ioctl(dev, args->cfg[5], args->cfg[6]);
>
> if (args->pad)
> return -EINVAL;
>
> - if (!v3d_has_csd(v3d)) {
> + if (!v3d_has_csd(submit.v3d)) {
> drm_warn(dev, "Attempting CSD submit on non-CSD hardware\n");
As a side note, if userspace can trigger this drm_warn it probably
should be removed.
> return -EINVAL;
> }
> @@ -1201,51 +1258,34 @@ v3d_submit_csd_ioctl(struct drm_device *dev, void *data,
> }
>
> ret = v3d_setup_csd_jobs_and_bos(file_priv, v3d, args,
> - &job, &clean_job, &se, &exec);
> + &job, &clean_job, &se, &submit.exec);
You cannot turn around v3d_setup_csd_jobs_and_bos() to be based on
v3d_submit? Or not just yet?
> if (ret)
> goto fail;
>
> - if (args->perfmon_id) {
> - if (v3d->global_perfmon) {
> - ret = -EAGAIN;
> - goto fail_perfmon;
> - }
> + submit.jobs[submit.job_count++] = &job->base;
> + submit.jobs[submit.job_count++] = clean_job;
>
> - job->base.perfmon = v3d_perfmon_find(v3d_priv,
> - args->perfmon_id);
> - if (!job->base.perfmon) {
> - ret = -ENOENT;
> - goto fail_perfmon;
> - }
> - }
> -
> - mutex_lock(&v3d->sched_lock);
> - v3d_push_job(&job->base);
> -
> - ret = drm_sched_job_add_dependency(&clean_job->base,
> - dma_fence_get(job->base.done_fence));
> + ret = v3d_attach_perfmon_to_jobs(&submit, args->perfmon_id);
> if (ret)
> - goto fail_unreserve;
> + goto fail_perfmon;
>
> - v3d_push_job(clean_job);
> - mutex_unlock(&v3d->sched_lock);
> + ret = v3d_submit_jobs(&submit);
> + if (ret)
> + goto fail_perfmon;
>
> v3d_attach_fences_and_unlock_reservation(file_priv,
> clean_job,
> - &exec,
> - args->out_sync,
> - &se,
> + &submit.exec,
> + args->out_sync, &se,
> clean_job->done_fence);
>
> - v3d_job_put(&job->base);
> - v3d_job_put(clean_job);
> + for (int i = 0; i < submit.job_count; i++)
> + v3d_job_put(submit.jobs[i]);
>
> return 0;
>
> -fail_unreserve:
> - mutex_unlock(&v3d->sched_lock);
> fail_perfmon:
> - drm_exec_fini(&exec);
> + drm_exec_fini(&submit.exec);
> fail:
> v3d_job_cleanup((void *)job);
> v3d_job_cleanup(clean_job);
>
Small comments aside, on the overall to me this looks like a great and
worthwhile consolidation which makes things much easier to follow! At
least for someone like me who is not at home in v3d.
Regards,
Tvrtko
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 09/10] drm/v3d: Refactor CPU ioctl into unified submission chain
2026-04-13 15:03 ` [PATCH 09/10] drm/v3d: Refactor CPU ioctl into unified submission chain Maíra Canal
@ 2026-04-17 14:38 ` Tvrtko Ursulin
2026-05-09 13:33 ` Maíra Canal
0 siblings, 1 reply; 24+ messages in thread
From: Tvrtko Ursulin @ 2026-04-17 14:38 UTC (permalink / raw)
To: Maíra Canal, Melissa Wen, Iago Toral, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
Christian König
Cc: kernel-dev, dri-devel
On 13/04/2026 16:03, Maíra Canal wrote:
> Restructure the CPU ioctl so that all job types, including indirect CSD,
> use a single struct v3d_submit chain and a single DRM exec context.
>
> Currently, v3d_get_cpu_indirect_csd_params(), which is the indirect CSD
> parser, not only parses the ioctl arguments, but also creates the jobs and
> locks the BOs during extension parsing. This breaks the default submission
> flow and creates a nested job submission.
>
> This refactoring turns v3d_get_cpu_indirect_csd_params() into a pure
> parser. Now, job creation and BO locking happen in the ioctl function,
> which appends the indirect CSD and CLEAN_CACHE jobs to the same struct
> v3d_submit and locks the union of all jobs' BOs under one drm_exec. This
> eliminates the second drm_exec, the nested submission, and the conditional
> two-pass fence attachment that the CPU ioctl previously required for the
> indirect CSD path.
>
> For the refactoring, change the functions v3d_lock_bo_reservations(),
> v3d_lookup_bos(), v3d_setup_csd_jobs_and_bos() and
> v3d_attach_fences_and_unlock_reservation() to take struct v3d_submit
> instead of individual arguments. The BO locking helper now iterates over
> all jobs in the submission and locks the union, instead of only handling
> the last job of the chain.
>
> Signed-off-by: Maíra Canal <mcanal@igalia.com>
> ---
> drivers/gpu/drm/v3d/v3d_drv.h | 9 +-
> drivers/gpu/drm/v3d/v3d_submit.c | 341 ++++++++++++++-------------------------
> 2 files changed, 121 insertions(+), 229 deletions(-)
>
> diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
> index fc12e5215fb0..071d919fe860 100644
> --- a/drivers/gpu/drm/v3d/v3d_drv.h
> +++ b/drivers/gpu/drm/v3d/v3d_drv.h
> @@ -422,8 +422,10 @@ struct v3d_indirect_csd_info {
> /* Indirect CSD */
> struct v3d_csd_job *job;
>
> - /* Clean cache job associated to the Indirect CSD job */
> - struct v3d_job *clean_job;
> + /* Indirect CSD args, stashed by the extension parser and later used
> + * to create the CSD job from them.
> + */
> + struct drm_v3d_submit_csd args;
>
> /* Offset within the BO where the workgroup counts are stored */
> u32 offset;
> @@ -438,9 +440,6 @@ struct v3d_indirect_csd_info {
>
> /* Indirect BO */
> struct drm_gem_object *indirect;
> -
> - /* Context of the Indirect CSD job */
> - struct drm_exec exec;
> };
>
> struct v3d_timestamp_query_info {
> diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
> index 5b519a892985..36402fc25c10 100644
> --- a/drivers/gpu/drm/v3d/v3d_submit.c
> +++ b/drivers/gpu/drm/v3d/v3d_submit.c
> @@ -20,28 +20,42 @@
> * to v3d, so we don't attach dma-buf fences to them.
> */
It the comment above (not shown in the diff) still accurate?
> static int
> -v3d_lock_bo_reservations(struct v3d_job *job, struct drm_exec *exec)
> +v3d_lock_bo_reservations(struct v3d_submit *submit)
> {
> - int i, ret;
> + int i, j, ret;
>
> - drm_exec_init(exec, DRM_EXEC_INTERRUPTIBLE_WAIT, job->bo_count);
> - drm_exec_until_all_locked(exec) {
> - ret = drm_exec_prepare_array(exec, job->bo, job->bo_count, 1);
> + drm_exec_init(&submit->exec,
> + DRM_EXEC_INTERRUPTIBLE_WAIT | DRM_EXEC_IGNORE_DUPLICATES, 0);
> + drm_exec_until_all_locked(&submit->exec) {
> + for (i = 0; i < submit->job_count; i++) {
> + struct v3d_job *job = submit->jobs[i];
As per v3d_lookup_bos only the last job carries the initialize job->bo
so is the loop required? Only for reserving the fence slot?
[comes back later]
Hmmm, or is the csd submit the special case where two jobs can point to
different bos?
Sorry this is a monster diff. I ended up applying it and am jumping back
and forth.
> +
> + ret = drm_exec_prepare_array(&submit->exec, job->bo,
> + job->bo_count, 1);
> + if (ret)
> + break;
> + }
> + drm_exec_retry_on_contention(&submit->exec);
> if (ret)
> goto fail;
> }
>
> - for (i = 0; i < job->bo_count; i++) {
> - ret = drm_sched_job_add_implicit_dependencies(&job->base,
> - job->bo[i], true);
> - if (ret)
> - goto fail;
> + for (i = 0; i < submit->job_count; i++) {
> + struct v3d_job *job = submit->jobs[i];
> +
> + for (j = 0; j < job->bo_count; j++) {
> + ret = drm_sched_job_add_implicit_dependencies(&job->base,
> + job->bo[j],
> + true);
> + if (ret)
> + goto fail;
> + }
> }
>
> return 0;
>
> fail:
> - drm_exec_fini(exec);
> + drm_exec_fini(&submit->exec);
> return ret;
> }
>
> @@ -62,25 +76,23 @@ v3d_lock_bo_reservations(struct v3d_job *job, struct drm_exec *exec)
> * failure, because that will happen at `v3d_job_free()`.
> */
> static int
> -v3d_lookup_bos(struct drm_device *dev,
> - struct drm_file *file_priv,
> - struct v3d_job *job,
> - u64 bo_handles,
> - u32 bo_count)
> +v3d_lookup_bos(struct v3d_submit *submit, u64 bo_handles, u32 bo_count)
> {
> - job->bo_count = bo_count;
> + struct v3d_job *last_job = submit->jobs[submit->job_count - 1];
>
> - if (!job->bo_count) {
> + last_job->bo_count = bo_count;
> +
> + if (!last_job->bo_count) {
> /* See comment on bo_index for why we have to check
> * this.
> */
> - drm_warn(dev, "Rendering requires BOs\n");
> + drm_warn(&submit->v3d->drm, "Rendering requires BOs\n");
> return -EINVAL;
> }
>
> - return drm_gem_objects_lookup(file_priv,
> + return drm_gem_objects_lookup(submit->file_priv,
> (void __user *)(uintptr_t)bo_handles,
> - job->bo_count, &job->bo);
> + last_job->bo_count, &last_job->bo);
> }
>
> static void
> @@ -141,25 +153,6 @@ void v3d_job_put(struct v3d_job *job)
> kref_put(&job->refcount, job->free);
> }
>
> -static int
> -v3d_job_allocate(struct v3d_dev *v3d, void **container, size_t size)
> -{
> - *container = kcalloc(1, size, GFP_KERNEL);
> - if (!*container) {
> - drm_err(&v3d->drm, "Cannot allocate memory for V3D job.\n");
> - return -ENOMEM;
> - }
> -
> - return 0;
> -}
> -
> -static void
> -v3d_job_deallocate(void **container)
> -{
> - kfree(*container);
> - *container = NULL;
> -}
> -
> static int
> v3d_job_add_syncobjs(struct v3d_job *job, struct drm_file *file_priv,
> u32 in_sync, struct v3d_submit_ext *se, enum v3d_queue queue)
> @@ -200,47 +193,6 @@ v3d_job_add_syncobjs(struct v3d_job *job, struct drm_file *file_priv,
> return 0;
> }
>
> -static int
> -v3d_job_init(struct v3d_dev *v3d, struct drm_file *file_priv,
> - struct v3d_job *job, void (*free)(struct kref *ref),
> - u32 in_sync, struct v3d_submit_ext *se, enum v3d_queue queue)
> -{
> - struct v3d_file_priv *v3d_priv = file_priv->driver_priv;
> - int ret;
> -
> - job->v3d = v3d;
> - job->free = free;
> - job->file_priv = v3d_priv;
> -
> - ret = drm_sched_job_init(&job->base, &v3d_priv->sched_entity[queue],
> - 1, v3d_priv, file_priv->client_id);
> - if (ret)
> - return ret;
> -
> - ret = v3d_job_add_syncobjs(job, file_priv, in_sync, se, queue);
> - if (ret)
> - goto fail_job_init;
> -
> - /* CPU jobs don't require hardware resources */
> - if (queue != V3D_CPU) {
> - ret = v3d_pm_runtime_get(v3d);
> - if (ret)
> - goto fail_job_init;
> - job->has_pm_ref = true;
> - }
> -
> - kref_init(&job->refcount);
> -
> - job->client_stats = v3d_stats_get(v3d_priv->stats[queue]);
> - job->global_stats = v3d_stats_get(v3d->queue[queue].stats);
> -
> - return 0;
> -
> -fail_job_init:
> - drm_sched_job_cleanup(&job->base);
> - return ret;
> -}
> -
> static int
> v3d_submit_add_job(struct v3d_submit *submit, void **container, size_t size,
> void (*free)(struct kref *ref), enum v3d_queue queue)
> @@ -354,31 +306,35 @@ v3d_submit_jobs(struct v3d_submit *submit)
> }
>
> static void
> -v3d_attach_fences_and_unlock_reservation(struct drm_file *file_priv,
> - struct v3d_job *job,
> - struct drm_exec *exec,
> - u32 out_sync,
> - struct v3d_submit_ext *se,
> - struct dma_fence *done_fence)
> +v3d_attach_fences_and_unlock_reservation(struct v3d_submit *submit,
> + u32 out_sync, struct v3d_submit_ext *se)
> {
> - struct drm_syncobj *sync_out;
> bool has_multisync = se && (se->flags & DRM_V3D_EXT_ID_MULTI_SYNC);
> - int i;
> + struct v3d_job *last_job = submit->jobs[submit->job_count - 1];
> + struct drm_syncobj *sync_out;
> + int i, j;
>
> - for (i = 0; i < job->bo_count; i++) {
> - /* XXX: Use shared fences for read-only objects. */
> - dma_resv_add_fence(job->bo[i]->resv, job->done_fence,
> - DMA_RESV_USAGE_WRITE);
> + /* The submission's last fence covers the entire submission. Attach it
> + * to every BO touched by any job in the submission.
> + */
> + for (i = 0; i < submit->job_count; i++) {
> + struct v3d_job *job = submit->jobs[i];
> +
> + for (j = 0; j < job->bo_count; j++) {
> + /* XXX: Use shared fences for read-only objects. */
> + dma_resv_add_fence(job->bo[j]->resv, last_job->done_fence,
> + DMA_RESV_USAGE_WRITE);
> + }
> }
>
> - drm_exec_fini(exec);
> + drm_exec_fini(&submit->exec);
>
> /* Update the return sync object for the job */
> /* If it only supports a single signal semaphore*/
> if (!has_multisync) {
> - sync_out = drm_syncobj_find(file_priv, out_sync);
> + sync_out = drm_syncobj_find(submit->file_priv, out_sync);
> if (sync_out) {
> - drm_syncobj_replace_fence(sync_out, done_fence);
> + drm_syncobj_replace_fence(sync_out, last_job->done_fence);
> drm_syncobj_put(sync_out);
> }
> return;
> @@ -388,7 +344,7 @@ v3d_attach_fences_and_unlock_reservation(struct drm_file *file_priv,
> if (se->out_sync_count) {
> for (i = 0; i < se->out_sync_count; i++) {
> drm_syncobj_replace_fence(se->out_syncs[i].syncobj,
> - done_fence);
> + last_job->done_fence);
> drm_syncobj_put(se->out_syncs[i].syncobj);
> }
> kvfree(se->out_syncs);
> @@ -396,46 +352,36 @@ v3d_attach_fences_and_unlock_reservation(struct drm_file *file_priv,
> }
>
> static int
> -v3d_setup_csd_jobs_and_bos(struct drm_file *file_priv,
> - struct v3d_dev *v3d,
> +v3d_setup_csd_jobs_and_bos(struct v3d_submit *submit,
> struct drm_v3d_submit_csd *args,
> - struct v3d_csd_job **job,
> - struct v3d_job **clean_job,
> - struct v3d_submit_ext *se,
> - struct drm_exec *exec)
> + struct v3d_submit_ext *se)
> {
> + struct v3d_csd_job *job = NULL;
> + struct v3d_job *clean_job = NULL;
> int ret;
>
> - ret = v3d_job_allocate(v3d, (void *)job, sizeof(**job));
> + ret = v3d_submit_add_job(submit, (void **)&job, sizeof(*job),
> + v3d_job_free, V3D_CSD);
> if (ret)
> return ret;
>
> - ret = v3d_job_init(v3d, file_priv, &(*job)->base,
> - v3d_job_free, args->in_sync, se, V3D_CSD);
> - if (ret) {
> - v3d_job_deallocate((void *)job);
> - return ret;
> - }
> -
> - ret = v3d_job_allocate(v3d, (void *)clean_job, sizeof(**clean_job));
> + ret = v3d_job_add_syncobjs(&job->base, submit->file_priv, args->in_sync,
> + se, V3D_CSD);
> if (ret)
> return ret;
>
> - ret = v3d_job_init(v3d, file_priv, *clean_job,
> - v3d_job_free, 0, NULL, V3D_CACHE_CLEAN);
> - if (ret) {
> - v3d_job_deallocate((void *)clean_job);
> - return ret;
> - }
> + job->args = *args;
>
> - (*job)->args = *args;
> -
> - ret = v3d_lookup_bos(&v3d->drm, file_priv, *clean_job,
> - args->bo_handles, args->bo_handle_count);
> + ret = v3d_submit_add_job(submit, (void **)&clean_job, sizeof(*clean_job),
> + v3d_job_free, V3D_CACHE_CLEAN);
> if (ret)
> return ret;
>
> - return v3d_lock_bo_reservations(*clean_job, exec);
> + ret = v3d_lookup_bos(submit, args->bo_handles, args->bo_handle_count);
> + if (ret)
> + return ret;
> +
> + return v3d_lock_bo_reservations(submit);
> }
>
> static void
> @@ -579,6 +525,7 @@ v3d_get_cpu_indirect_csd_params(struct drm_file *file_priv,
> }
>
> job->job_type = V3D_CPU_JOB_TYPE_INDIRECT_CSD;
> + info->args = indirect_csd.submit;
> info->offset = indirect_csd.offset;
> info->wg_size = indirect_csd.wg_size;
> memcpy(&info->wg_uniform_offsets, &indirect_csd.wg_uniform_offsets,
> @@ -586,9 +533,7 @@ v3d_get_cpu_indirect_csd_params(struct drm_file *file_priv,
>
> info->indirect = drm_gem_object_lookup(file_priv, indirect_csd.indirect);
>
> - return v3d_setup_csd_jobs_and_bos(file_priv, v3d, &indirect_csd.submit,
> - &info->job, &info->clean_job,
> - NULL, &info->exec);
> + return 0;
> }
>
> /* Get data for the query timestamp job submission. */
> @@ -1070,14 +1015,11 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
> goto fail;
> }
>
> - ret = v3d_lookup_bos(dev, file_priv,
> - submit.jobs[submit.job_count - 1],
> - args->bo_handles, args->bo_handle_count);
> + ret = v3d_lookup_bos(&submit, args->bo_handles, args->bo_handle_count);
> if (ret)
> goto fail;
>
> - ret = v3d_lock_bo_reservations(submit.jobs[submit.job_count - 1],
> - &submit.exec);
> + ret = v3d_lock_bo_reservations(&submit);
> if (ret)
> goto fail;
>
> @@ -1089,11 +1031,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
> if (ret)
> goto fail_perfmon;
>
> - v3d_attach_fences_and_unlock_reservation(file_priv,
> - submit.jobs[submit.job_count - 1],
> - &submit.exec,
> - args->out_sync, &se,
> - submit.jobs[submit.job_count - 1]->done_fence);
> + v3d_attach_fences_and_unlock_reservation(&submit, args->out_sync, &se);
>
> for (int i = 0; i < submit.job_count; i++)
> v3d_job_put(submit.jobs[i]);
> @@ -1184,7 +1122,7 @@ v3d_submit_tfu_ioctl(struct drm_device *dev, void *data,
> job->base.bo[job->base.bo_count] = bo;
> }
>
> - ret = v3d_lock_bo_reservations(&job->base, &submit.exec);
> + ret = v3d_lock_bo_reservations(&submit);
> if (ret)
> goto fail;
>
> @@ -1192,10 +1130,7 @@ v3d_submit_tfu_ioctl(struct drm_device *dev, void *data,
> if (ret)
> goto fail_submit;
>
> - v3d_attach_fences_and_unlock_reservation(file_priv,
> - &job->base, &submit.exec,
> - args->out_sync, &se,
> - job->base.done_fence);
> + v3d_attach_fences_and_unlock_reservation(&submit, args->out_sync, &se);
Could v3d_attach_fences_and_unlock_reservation be folded into
v3d_submit_jobs?
Regards,
Tvrtko
>
> v3d_job_put(&job->base);
>
> @@ -1230,8 +1165,6 @@ v3d_submit_csd_ioctl(struct drm_device *dev, void *data,
> };
> struct drm_v3d_submit_csd *args = data;
> struct v3d_submit_ext se = {0};
> - struct v3d_csd_job *job = NULL;
> - struct v3d_job *clean_job = NULL;
> int ret;
>
> trace_v3d_submit_csd_ioctl(dev, args->cfg[5], args->cfg[6]);
> @@ -1257,14 +1190,10 @@ v3d_submit_csd_ioctl(struct drm_device *dev, void *data,
> }
> }
>
> - ret = v3d_setup_csd_jobs_and_bos(file_priv, v3d, args,
> - &job, &clean_job, &se, &submit.exec);
> + ret = v3d_setup_csd_jobs_and_bos(&submit, args, &se);
> if (ret)
> goto fail;
>
> - submit.jobs[submit.job_count++] = &job->base;
> - submit.jobs[submit.job_count++] = clean_job;
> -
> ret = v3d_attach_perfmon_to_jobs(&submit, args->perfmon_id);
> if (ret)
> goto fail_perfmon;
> @@ -1273,11 +1202,7 @@ v3d_submit_csd_ioctl(struct drm_device *dev, void *data,
> if (ret)
> goto fail_perfmon;
>
> - v3d_attach_fences_and_unlock_reservation(file_priv,
> - clean_job,
> - &submit.exec,
> - args->out_sync, &se,
> - clean_job->done_fence);
> + v3d_attach_fences_and_unlock_reservation(&submit, args->out_sync, &se);
>
> for (int i = 0; i < submit.job_count; i++)
> v3d_job_put(submit.jobs[i]);
> @@ -1287,8 +1212,8 @@ v3d_submit_csd_ioctl(struct drm_device *dev, void *data,
> fail_perfmon:
> drm_exec_fini(&submit.exec);
> fail:
> - v3d_job_cleanup((void *)job);
> - v3d_job_cleanup(clean_job);
> + for (int i = 0; i < submit.job_count; i++)
> + v3d_job_cleanup(submit.jobs[i]);
> v3d_put_multisync_post_deps(&se);
>
> return ret;
> @@ -1316,14 +1241,14 @@ int
> v3d_submit_cpu_ioctl(struct drm_device *dev, void *data,
> struct drm_file *file_priv)
> {
> - struct v3d_dev *v3d = to_v3d_dev(dev);
> + struct v3d_submit submit = {
> + .v3d = to_v3d_dev(dev),
> + .file_priv = file_priv,
> + .job_count = 0
> + };
> struct drm_v3d_submit_cpu *args = data;
> struct v3d_submit_ext se = {0};
> - struct v3d_submit_ext *out_se = NULL;
> struct v3d_cpu_job *cpu_job = NULL;
> - struct v3d_csd_job *csd_job = NULL;
> - struct v3d_job *clean_job = NULL;
> - struct drm_exec exec;
> int ret;
>
> if (args->flags && !(args->flags & DRM_V3D_SUBMIT_EXTENSION)) {
> @@ -1331,7 +1256,8 @@ v3d_submit_cpu_ioctl(struct drm_device *dev, void *data,
> return -EINVAL;
> }
>
> - ret = v3d_job_allocate(v3d, (void *)&cpu_job, sizeof(*cpu_job));
> + ret = v3d_submit_add_job(&submit, (void **)&cpu_job, sizeof(*cpu_job),
> + v3d_job_free, V3D_CPU);
> if (ret)
> return ret;
>
> @@ -1356,86 +1282,53 @@ v3d_submit_cpu_ioctl(struct drm_device *dev, void *data,
> goto fail;
> }
>
> - trace_v3d_submit_cpu_ioctl(&v3d->drm, cpu_job->job_type);
> + trace_v3d_submit_cpu_ioctl(dev, cpu_job->job_type);
>
> - ret = v3d_job_init(v3d, file_priv, &cpu_job->base,
> - v3d_job_free, 0, &se, V3D_CPU);
> - if (ret) {
> - v3d_job_deallocate((void *)&cpu_job);
> + ret = v3d_job_add_syncobjs(&cpu_job->base, file_priv, 0, &se, V3D_CPU);
> + if (ret)
> goto fail;
> - }
> -
> - clean_job = cpu_job->indirect_csd.clean_job;
> - csd_job = cpu_job->indirect_csd.job;
>
> + /* Look up the CPU jobs' BOs first, so v3d_setup_csd_jobs_and_bos(), which
> + * locks all jobs' BOs at its end, picks them up too in the case of indirect
> + * CSD.
> + */
> if (args->bo_handle_count) {
> - ret = v3d_lookup_bos(dev, file_priv, &cpu_job->base,
> - args->bo_handles, args->bo_handle_count);
> - if (ret)
> - goto fail;
> -
> - ret = v3d_lock_bo_reservations(&cpu_job->base, &exec);
> + ret = v3d_lookup_bos(&submit, args->bo_handles, args->bo_handle_count);
> if (ret)
> goto fail;
> }
>
> - mutex_lock(&v3d->sched_lock);
> - v3d_push_job(&cpu_job->base);
> -
> - switch (cpu_job->job_type) {
> - case V3D_CPU_JOB_TYPE_INDIRECT_CSD:
> - ret = drm_sched_job_add_dependency(&csd_job->base.base,
> - dma_fence_get(cpu_job->base.done_fence));
> + if (cpu_job->job_type == V3D_CPU_JOB_TYPE_INDIRECT_CSD) {
> + ret = v3d_setup_csd_jobs_and_bos(&submit, &cpu_job->indirect_csd.args,
> + NULL);
> if (ret)
> - goto fail_unreserve;
> + goto fail;
>
> - v3d_push_job(&csd_job->base);
> -
> - ret = drm_sched_job_add_dependency(&clean_job->base,
> - dma_fence_get(csd_job->base.done_fence));
> + /* The CSD job was appended at jobs[1] */
> + cpu_job->indirect_csd.job = container_of(submit.jobs[1], struct v3d_csd_job,
> + base);
> + } else {
> + ret = v3d_lock_bo_reservations(&submit);
> if (ret)
> - goto fail_unreserve;
> -
> - v3d_push_job(clean_job);
> -
> - break;
> - default:
> - break;
> - }
> - mutex_unlock(&v3d->sched_lock);
> -
> - out_se = (cpu_job->job_type == V3D_CPU_JOB_TYPE_INDIRECT_CSD) ? NULL : &se;
> -
> - v3d_attach_fences_and_unlock_reservation(file_priv,
> - &cpu_job->base,
> - &exec, 0,
> - out_se, cpu_job->base.done_fence);
> -
> - switch (cpu_job->job_type) {
> - case V3D_CPU_JOB_TYPE_INDIRECT_CSD:
> - v3d_attach_fences_and_unlock_reservation(file_priv,
> - clean_job,
> - &cpu_job->indirect_csd.exec,
> - 0, &se, clean_job->done_fence);
> - break;
> - default:
> - break;
> + goto fail;
> }
>
> - v3d_job_put(&cpu_job->base);
> - v3d_job_put(&csd_job->base);
> - v3d_job_put(clean_job);
> + ret = v3d_submit_jobs(&submit);
> + if (ret)
> + goto fail_unlock;
> +
> + v3d_attach_fences_and_unlock_reservation(&submit, 0, &se);
> +
> + for (int i = 0; i < submit.job_count; i++)
> + v3d_job_put(submit.jobs[i]);
>
> return 0;
>
> -fail_unreserve:
> - mutex_unlock(&v3d->sched_lock);
> - drm_exec_fini(&exec);
> - drm_exec_fini(&cpu_job->indirect_csd.exec);
> +fail_unlock:
> + drm_exec_fini(&submit.exec);
> fail:
> - v3d_job_cleanup((void *)cpu_job);
> - v3d_job_cleanup((void *)csd_job);
> - v3d_job_cleanup(clean_job);
> + for (int i = 0; i < submit.job_count; i++)
> + v3d_job_cleanup(submit.jobs[i]);
> v3d_put_multisync_post_deps(&se);
> kvfree(cpu_job->timestamp_query.queries);
> kvfree(cpu_job->performance_query.queries);
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 10/10] drm/v3d: Ensure atomic submissions in v3d_submit_jobs()
2026-04-13 15:03 ` [PATCH 10/10] drm/v3d: Ensure atomic submissions in v3d_submit_jobs() Maíra Canal
@ 2026-04-17 15:02 ` Tvrtko Ursulin
0 siblings, 0 replies; 24+ messages in thread
From: Tvrtko Ursulin @ 2026-04-17 15:02 UTC (permalink / raw)
To: Maíra Canal, Melissa Wen, Iago Toral, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
Christian König
Cc: kernel-dev, dri-devel
On 13/04/2026 16:03, Maíra Canal wrote:
> Currently, v3d_submit_jobs() arms and pushes each job one at a time,
> wiring dependencies between consecutive jobs after each push. If
> drm_sched_job_add_dependency() fails midway, the already-pushed jobs are
> scheduler-owned and will be submitted to the GPU for execution, even though
> the subsequent jobs won't be submitted.
>
> This breaks the atomicity of the submissions, as only some of the jobs
> from a submission would be submitted to the hardware, while the other part
> fails.
>
> Restructure v3d_submit_jobs() into three phases: (1) arm all jobs belonging
> to a given submission, (2) wire inter-job dependencies, and (3) push all
> jobs to the scheduler unconditionally. Phase (2) can fail; on failure,
> it marks every armed job finished fence with an error, so that run_job()
> callbacks skip hardware execution.
>
> This guarantees that every armed job is always pushed, either to run
> or to be skipped, and it also ensures the atomicity of a submission.
>
> Adapt all submit ioctls to proceed unconditionally to fence attachment,
> even if v3d_submit_jobs() fails, and return the error code to userspace.
>
> Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> Signed-off-by: Maíra Canal <mcanal@igalia.com>
> ---
> drivers/gpu/drm/v3d/v3d_sched.c | 6 ++++
> drivers/gpu/drm/v3d/v3d_submit.c | 63 ++++++++++++++++------------------------
> 2 files changed, 31 insertions(+), 38 deletions(-)
>
> diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
> index f3999678e5da..cf1be78e4d27 100644
> --- a/drivers/gpu/drm/v3d/v3d_sched.c
> +++ b/drivers/gpu/drm/v3d/v3d_sched.c
> @@ -670,6 +670,9 @@ v3d_cpu_job_run(struct drm_sched_job *sched_job)
> struct v3d_cpu_job *job = to_cpu_job(sched_job);
> struct v3d_dev *v3d = job->base.v3d;
>
> + if (unlikely(job->base.base.s_fence->finished.error))
> + return NULL;
> +
> if (job->job_type >= ARRAY_SIZE(cpu_job_function)) {
> drm_dbg(&v3d->drm, "Unknown CPU job: %d\n", job->job_type);
> return NULL;
> @@ -693,6 +696,9 @@ v3d_cache_clean_job_run(struct drm_sched_job *sched_job)
> struct v3d_job *job = to_v3d_job(sched_job);
> struct v3d_dev *v3d = job->v3d;
>
> + if (unlikely(job->base.s_fence->finished.error))
> + return NULL;
> +
> v3d_job_start_stats(job);
>
> v3d_clean_caches(v3d);
> diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
> index 36402fc25c10..46d8bedce97c 100644
> --- a/drivers/gpu/drm/v3d/v3d_submit.c
> +++ b/drivers/gpu/drm/v3d/v3d_submit.c
> @@ -268,41 +268,40 @@ v3d_attach_perfmon_to_jobs(struct v3d_submit *submit, u32 perfmon_id)
> return 0;
> }
>
> -static void
> -v3d_push_job(struct v3d_job *job)
> -{
> - drm_sched_job_arm(&job->base);
> -
> - job->done_fence = dma_fence_get(&job->base.s_fence->finished);
> -
> - /* put by scheduler job completion */
> - kref_get(&job->refcount);
> -
> - drm_sched_entity_push_job(&job->base);
> -}
> -
> static int
> v3d_submit_jobs(struct v3d_submit *submit)
> {
> struct v3d_dev *v3d = submit->v3d;
> - int ret;
> + int i, j, ret = 0;
>
> guard(mutex)(&v3d->sched_lock);
>
> - for (int i = 0; i < submit->job_count; i++) {
> + for (i = 0; i < submit->job_count; i++) {
> struct v3d_job *job = submit->jobs[i];
>
> - v3d_push_job(job);
> + drm_sched_job_arm(&job->base);
> + job->done_fence = dma_fence_get(&job->base.s_fence->finished);
>
> - if (i + 1 < submit->job_count) {
> - ret = drm_sched_job_add_dependency(&submit->jobs[i + 1]->base,
> - dma_fence_get(job->done_fence));
> - if (ret)
> - return ret;
> + /* put by scheduler job completion */
> + kref_get(&job->refcount);
> + }
> +
> + for (i = 0; i + 1 < submit->job_count; i++) {
> + ret = drm_sched_job_add_dependency(&submit->jobs[i + 1]->base,
> + dma_fence_get(submit->jobs[i]->done_fence));
> + if (ret) {
> + /* Mark every armed job as failed so run_job() skips execution */
> + for (j = 0; j < submit->job_count; j++)
> + dma_fence_set_error(&submit->jobs[j]->base.s_fence->finished,
> + ret);
> + break;
> }
> }
>
> - return 0;
> + for (i = 0; i < submit->job_count; i++)
> + drm_sched_entity_push_job(&submit->jobs[i]->base);
> +
> + return ret;
> }
>
> static void
> @@ -1028,15 +1027,13 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
> goto fail_perfmon;
>
> ret = v3d_submit_jobs(&submit);
> - if (ret)
> - goto fail_perfmon;
>
> v3d_attach_fences_and_unlock_reservation(&submit, args->out_sync, &se);
Not sure if it safe to end up calling dma_resv_add_fence with fences in
error state. Maybe it works but maybe it can stop and it would be better
to skip that step?
Also the syncobjs exported to userspace - should that be skipped too?
Regards,
Tvrtko
>
> for (int i = 0; i < submit.job_count; i++)
> v3d_job_put(submit.jobs[i]);
>
> - return 0;
> + return ret;
>
> fail_perfmon:
> drm_exec_fini(&submit.exec);
> @@ -1127,17 +1124,13 @@ v3d_submit_tfu_ioctl(struct drm_device *dev, void *data,
> goto fail;
>
> ret = v3d_submit_jobs(&submit);
> - if (ret)
> - goto fail_submit;
>
> v3d_attach_fences_and_unlock_reservation(&submit, args->out_sync, &se);
>
> v3d_job_put(&job->base);
>
> - return 0;
> + return ret;
>
> -fail_submit:
> - drm_exec_fini(&submit.exec);
> fail:
> v3d_job_cleanup((void *)job);
> v3d_put_multisync_post_deps(&se);
> @@ -1199,15 +1192,13 @@ v3d_submit_csd_ioctl(struct drm_device *dev, void *data,
> goto fail_perfmon;
>
> ret = v3d_submit_jobs(&submit);
> - if (ret)
> - goto fail_perfmon;
>
> v3d_attach_fences_and_unlock_reservation(&submit, args->out_sync, &se);
>
> for (int i = 0; i < submit.job_count; i++)
> v3d_job_put(submit.jobs[i]);
>
> - return 0;
> + return ret;
>
> fail_perfmon:
> drm_exec_fini(&submit.exec);
> @@ -1314,18 +1305,14 @@ v3d_submit_cpu_ioctl(struct drm_device *dev, void *data,
> }
>
> ret = v3d_submit_jobs(&submit);
> - if (ret)
> - goto fail_unlock;
>
> v3d_attach_fences_and_unlock_reservation(&submit, 0, &se);
>
> for (int i = 0; i < submit.job_count; i++)
> v3d_job_put(submit.jobs[i]);
>
> - return 0;
> + return ret;
>
> -fail_unlock:
> - drm_exec_fini(&submit.exec);
> fail:
> for (int i = 0; i < submit.job_count; i++)
> v3d_job_cleanup(submit.jobs[i]);
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 06/10] drm/v3d: Reject invalid syncobj handles in submit ioctls
2026-04-13 15:03 ` [PATCH 06/10] drm/v3d: Reject invalid syncobj handles in submit ioctls Maíra Canal
2026-04-16 11:59 ` Tvrtko Ursulin
@ 2026-04-17 15:05 ` Tvrtko Ursulin
1 sibling, 0 replies; 24+ messages in thread
From: Tvrtko Ursulin @ 2026-04-17 15:05 UTC (permalink / raw)
To: Maíra Canal, Melissa Wen, Iago Toral, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
Christian König
Cc: kernel-dev, dri-devel
On 13/04/2026 16:03, Maíra Canal wrote:
> drm_sched_job_add_syncobj_dependency() returns -ENOENT both when the
> handle is zero and when the handle is non-zero but does not find a
> corresponding existing syncobj (userspace bug). The driver previously
> ignored -ENOENT in both cases, silently accepting broken handles.
>
> Distinguish the two: skip the call entirely when the handle is zero, as
> there is no dependency, and let -ENOENT propagate for non-zero handles
> that don't resolve, turning the error into a proper return to userspace.
I was reminded of this patch while reading the rest of the series.
Specifically I wondered if v3d_attach_fences_and_unlock_reservation()
has the same problem of silently ignoring invalid syncobj handles?
Regards,
Tvrtko
> Signed-off-by: Maíra Canal <mcanal@igalia.com>
> ---
> drivers/gpu/drm/v3d/v3d_submit.c | 23 ++++++++++++-----------
> 1 file changed, 12 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
> index 054367ba533d..fe8b5757c3e8 100644
> --- a/drivers/gpu/drm/v3d/v3d_submit.c
> +++ b/drivers/gpu/drm/v3d/v3d_submit.c
> @@ -171,12 +171,11 @@ v3d_job_add_syncobjs(struct v3d_job *job, struct drm_file *file_priv,
> int ret = 0;
>
> if (!has_multisync) {
> - ret = drm_sched_job_add_syncobj_dependency(&job->base, file_priv,
> - in_sync, 0);
> - // TODO: Investigate why this was filtered out for the IOCTL.
> - if (ret && ret != -ENOENT)
> - return ret;
> - return 0;
> + /* Ignore syncobj if its handle is NULL */
> + if (in_sync)
> + ret = drm_sched_job_add_syncobj_dependency(&job->base, file_priv,
> + in_sync, 0);
> + return ret;
> }
>
> if (se->in_sync_count && se->wait_stage == queue) {
> @@ -190,11 +189,13 @@ v3d_job_add_syncobjs(struct v3d_job *job, struct drm_file *file_priv,
> return -EFAULT;
> }
>
> - ret = drm_sched_job_add_syncobj_dependency(&job->base,
> - file_priv, in.handle, 0);
> - // TODO: Investigate why this was filtered out for the IOCTL.
> - if (ret && ret != -ENOENT)
> - return ret;
> + /* Ignore syncobj if its handle is NULL */
> + if (in.handle) {
> + ret = drm_sched_job_add_syncobj_dependency(&job->base,
> + file_priv, in.handle, 0);
> + if (ret)
> + return ret;
> + }
> }
> }
>
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 02/10] drm/v3d: Clear queue->active_job when v3d_fence_create() fails
2026-04-16 11:39 ` Tvrtko Ursulin
@ 2026-05-08 14:01 ` Maíra Canal
0 siblings, 0 replies; 24+ messages in thread
From: Maíra Canal @ 2026-05-08 14:01 UTC (permalink / raw)
To: Tvrtko Ursulin, Melissa Wen, Iago Toral, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
Christian König
Cc: kernel-dev, dri-devel
Hi Tvrtko,
On 16/04/26 08:39, Tvrtko Ursulin wrote:
>
> On 13/04/2026 16:03, Maíra Canal wrote:
>> The run_job() callbacks for BIN, RENDER, TFU and CSD assign the incoming
>> job to queue->active_job before calling v3d_fence_create(). If
>> v3d_fence_create() fails, the callback returns NULL without clearing
>> active_job, leaving a dangling pointer.
>
> I don't know the driver well enough to know if this can be hit, or under
> which circumstances (spurious interrupt?), but the fix looks correct
> regardless.
>
> One small doubt below.
>
>> Create a failure path in all run_job() callbacks that clears the active
>> job before returning NULL. The BIN path takes queue->queue_lock around
>> the
>> clear as it races against v3d_overflow_mem_work(); RENDER, TFU and CSD
>> paths have no concurrent reader, so the clear is lock-free.
>>
>> Fixes: a783a09ee76d ("drm/v3d: Refactor job management.")
>> Signed-off-by: Maíra Canal <mcanal@igalia.com>
>> ---
>> drivers/gpu/drm/v3d/v3d_sched.c | 52 +++++++++++++++++++++++
>> +-----------------
>> 1 file changed, 30 insertions(+), 22 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/
>> v3d_sched.c
>> index 1855ef5b3b5f..d42db75e639b 100644
>> --- a/drivers/gpu/drm/v3d/v3d_sched.c
>> +++ b/drivers/gpu/drm/v3d/v3d_sched.c
>> @@ -206,12 +206,8 @@ static struct dma_fence *v3d_bin_job_run(struct
>> drm_sched_job *sched_job)
>> struct dma_fence *fence;
>> unsigned long irqflags;
>> - if (unlikely(job->base.base.s_fence->finished.error)) {
>> - spin_lock_irqsave(&queue->queue_lock, irqflags);
>> - queue->active_job = NULL;
>> - spin_unlock_irqrestore(&queue->queue_lock, irqflags);
>> - return NULL;
>> - }
>> + if (unlikely(job->base.base.s_fence->finished.error))
>> + goto out_clean_job;
>> /* Lock required around bin_job update vs
>> * v3d_overflow_mem_work().
>> @@ -228,7 +224,7 @@ static struct dma_fence *v3d_bin_job_run(struct
>> drm_sched_job *sched_job)
>> fence = v3d_fence_create(v3d, V3D_BIN);
>> if (IS_ERR(fence))
>> - return NULL;
>> + goto out_clean_job;
>
> I wonder whether it would be feasible to simply only assign to queue-
> >active_job after fence has been created and have a smaller diff for
> backporting?
I'm not sure if it's worth to backport this patch, as this issue dates
back to Linux 5.10. Considering that the issue can only happen if
kzalloc fails, I'd consider it a low priority bug.
>
> Either way the patch looks correct to me:
>
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Thanks for the review!
Best regards,
- Maíra
>
> Regards,
>
> Tvrtko
>
>> if (job->base.irq_fence)
>> dma_fence_put(job->base.irq_fence);
>> @@ -256,6 +252,12 @@ static struct dma_fence *v3d_bin_job_run(struct
>> drm_sched_job *sched_job)
>> V3D_CORE_WRITE(0, V3D_CLE_CT0QEA, job->end);
>> return fence;
>> +
>> +out_clean_job:
>> + spin_lock_irqsave(&queue->queue_lock, irqflags);
>> + queue->active_job = NULL;
>> + spin_unlock_irqrestore(&queue->queue_lock, irqflags);
>> + return NULL;
>> }
>> static struct dma_fence *v3d_render_job_run(struct drm_sched_job
>> *sched_job)
>> @@ -265,10 +267,8 @@ static struct dma_fence
>> *v3d_render_job_run(struct drm_sched_job *sched_job)
>> struct drm_device *dev = &v3d->drm;
>> struct dma_fence *fence;
>> - if (unlikely(job->base.base.s_fence->finished.error)) {
>> - v3d->queue[V3D_RENDER].active_job = NULL;
>> - return NULL;
>> - }
>> + if (unlikely(job->base.base.s_fence->finished.error))
>> + goto out_clean_job;
>> v3d->queue[V3D_RENDER].active_job = &job->base;
>> @@ -282,7 +282,7 @@ static struct dma_fence *v3d_render_job_run(struct
>> drm_sched_job *sched_job)
>> fence = v3d_fence_create(v3d, V3D_RENDER);
>> if (IS_ERR(fence))
>> - return NULL;
>> + goto out_clean_job;
>> if (job->base.irq_fence)
>> dma_fence_put(job->base.irq_fence);
>> @@ -303,6 +303,10 @@ static struct dma_fence
>> *v3d_render_job_run(struct drm_sched_job *sched_job)
>> V3D_CORE_WRITE(0, V3D_CLE_CT1QEA, job->end);
>> return fence;
>> +
>> +out_clean_job:
>> + v3d->queue[V3D_RENDER].active_job = NULL;
>> + return NULL;
>> }
>> static struct dma_fence *
>> @@ -313,16 +317,14 @@ v3d_tfu_job_run(struct drm_sched_job *sched_job)
>> struct drm_device *dev = &v3d->drm;
>> struct dma_fence *fence;
>> - if (unlikely(job->base.base.s_fence->finished.error)) {
>> - v3d->queue[V3D_TFU].active_job = NULL;
>> - return NULL;
>> - }
>> + if (unlikely(job->base.base.s_fence->finished.error))
>> + goto out_clean_job;
>> v3d->queue[V3D_TFU].active_job = &job->base;
>> fence = v3d_fence_create(v3d, V3D_TFU);
>> if (IS_ERR(fence))
>> - return NULL;
>> + goto out_clean_job;
>> if (job->base.irq_fence)
>> dma_fence_put(job->base.irq_fence);
>> @@ -350,6 +352,10 @@ v3d_tfu_job_run(struct drm_sched_job *sched_job)
>> V3D_WRITE(V3D_TFU_ICFG(v3d->ver), job->args.icfg |
>> V3D_TFU_ICFG_IOC);
>> return fence;
>> +
>> +out_clean_job:
>> + v3d->queue[V3D_TFU].active_job = NULL;
>> + return NULL;
>> }
>> static struct dma_fence *
>> @@ -361,10 +367,8 @@ v3d_csd_job_run(struct drm_sched_job *sched_job)
>> struct dma_fence *fence;
>> int i, csd_cfg0_reg;
>> - if (unlikely(job->base.base.s_fence->finished.error)) {
>> - v3d->queue[V3D_CSD].active_job = NULL;
>> - return NULL;
>> - }
>> + if (unlikely(job->base.base.s_fence->finished.error))
>> + goto out_clean_job;
>> v3d->queue[V3D_CSD].active_job = &job->base;
>> @@ -372,7 +376,7 @@ v3d_csd_job_run(struct drm_sched_job *sched_job)
>> fence = v3d_fence_create(v3d, V3D_CSD);
>> if (IS_ERR(fence))
>> - return NULL;
>> + goto out_clean_job;
>> if (job->base.irq_fence)
>> dma_fence_put(job->base.irq_fence);
>> @@ -399,6 +403,10 @@ v3d_csd_job_run(struct drm_sched_job *sched_job)
>> V3D_CORE_WRITE(0, csd_cfg0_reg, job->args.cfg[0]);
>> return fence;
>> +
>> +out_clean_job:
>> + v3d->queue[V3D_CSD].active_job = NULL;
>> + return NULL;
>> }
>> static void
>>
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 06/10] drm/v3d: Reject invalid syncobj handles in submit ioctls
2026-04-16 11:59 ` Tvrtko Ursulin
@ 2026-05-08 15:28 ` Maíra Canal
0 siblings, 0 replies; 24+ messages in thread
From: Maíra Canal @ 2026-05-08 15:28 UTC (permalink / raw)
To: Tvrtko Ursulin, Melissa Wen, Iago Toral, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
Christian König
Cc: kernel-dev, dri-devel
Hi Tvrtko,
On 16/04/26 08:59, Tvrtko Ursulin wrote:
>
> On 13/04/2026 16:03, Maíra Canal wrote:
>> drm_sched_job_add_syncobj_dependency() returns -ENOENT both when the
>> handle is zero and when the handle is non-zero but does not find a
>> corresponding existing syncobj (userspace bug). The driver previously
>> ignored -ENOENT in both cases, silently accepting broken handles.
>>
>> Distinguish the two: skip the call entirely when the handle is zero, as
>> there is no dependency, and let -ENOENT propagate for non-zero handles
>> that don't resolve, turning the error into a proper return to userspace.
>>
>> Signed-off-by: Maíra Canal <mcanal@igalia.com>
>> ---
>> drivers/gpu/drm/v3d/v3d_submit.c | 23 ++++++++++++-----------
>> 1 file changed, 12 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/
>> v3d_submit.c
>> index 054367ba533d..fe8b5757c3e8 100644
>> --- a/drivers/gpu/drm/v3d/v3d_submit.c
>> +++ b/drivers/gpu/drm/v3d/v3d_submit.c
>> @@ -171,12 +171,11 @@ v3d_job_add_syncobjs(struct v3d_job *job, struct
>> drm_file *file_priv,
>> int ret = 0;
>> if (!has_multisync) {
>> - ret = drm_sched_job_add_syncobj_dependency(&job->base,
>> file_priv,
>> - in_sync, 0);
>> - // TODO: Investigate why this was filtered out for the IOCTL.
>> - if (ret && ret != -ENOENT)
>> - return ret;
>> - return 0;
>> + /* Ignore syncobj if its handle is NULL */
>
> Nitpick s/NULL/zero/ since it is not a pointer.
>
>> + if (in_sync)
>> + ret = drm_sched_job_add_syncobj_dependency(&job->base,
>> file_priv,
>> + in_sync, 0);
>
> Ah now ret needs to be initialized to zero (comment from previous patch)!
>
>> + return ret;
>> }
>> if (se->in_sync_count && se->wait_stage == queue) {
>> @@ -190,11 +189,13 @@ v3d_job_add_syncobjs(struct v3d_job *job, struct
>> drm_file *file_priv,
>> return -EFAULT;
>> }
>> - ret = drm_sched_job_add_syncobj_dependency(&job->base,
>> - file_priv, in.handle, 0);
>> - // TODO: Investigate why this was filtered out for the
>> IOCTL.
>> - if (ret && ret != -ENOENT)
>> - return ret;
>> + /* Ignore syncobj if its handle is NULL */
>
> As above. Or perhaps even NULL makes sense as in null handle being an
> invalid handle reads more obvious than a zero handle?
>
>> + if (in.handle) {
>> + ret = drm_sched_job_add_syncobj_dependency(&job->base,
>> + file_priv, in.handle, 0);
>> + if (ret)
>> + return ret;
>> + }
>> }
>> }
>>
>
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>
> With a slight danger there is some userspace out there which relies on
> this? Have you look at how Mesa uses it, and the history?
I've checked userspace usage and thankfully, we never relied on this.
IGT tests are also passing (which were the reason I added this -ENOENT
some years ago, as without them, the tests would fail).
Thanks for your review!
Best regards,
- Maíra
>
> Regards,
>
> Tvrtko
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 09/10] drm/v3d: Refactor CPU ioctl into unified submission chain
2026-04-17 14:38 ` Tvrtko Ursulin
@ 2026-05-09 13:33 ` Maíra Canal
0 siblings, 0 replies; 24+ messages in thread
From: Maíra Canal @ 2026-05-09 13:33 UTC (permalink / raw)
To: Tvrtko Ursulin, Melissa Wen, Iago Toral, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
Christian König
Cc: kernel-dev, dri-devel
Hi Tvrtko,
On 17/04/26 11:38, Tvrtko Ursulin wrote:
>
> On 13/04/2026 16:03, Maíra Canal wrote:
>> Restructure the CPU ioctl so that all job types, including indirect CSD,
>> use a single struct v3d_submit chain and a single DRM exec context.
>>
>> Currently, v3d_get_cpu_indirect_csd_params(), which is the indirect CSD
>> parser, not only parses the ioctl arguments, but also creates the jobs
>> and
>> locks the BOs during extension parsing. This breaks the default
>> submission
>> flow and creates a nested job submission.
>>
>> This refactoring turns v3d_get_cpu_indirect_csd_params() into a pure
>> parser. Now, job creation and BO locking happen in the ioctl function,
>> which appends the indirect CSD and CLEAN_CACHE jobs to the same struct
>> v3d_submit and locks the union of all jobs' BOs under one drm_exec. This
>> eliminates the second drm_exec, the nested submission, and the
>> conditional
>> two-pass fence attachment that the CPU ioctl previously required for the
>> indirect CSD path.
>>
>> For the refactoring, change the functions v3d_lock_bo_reservations(),
>> v3d_lookup_bos(), v3d_setup_csd_jobs_and_bos() and
>> v3d_attach_fences_and_unlock_reservation() to take struct v3d_submit
>> instead of individual arguments. The BO locking helper now iterates over
>> all jobs in the submission and locks the union, instead of only handling
>> the last job of the chain.
>>
>> Signed-off-by: Maíra Canal <mcanal@igalia.com>
>> ---
>> drivers/gpu/drm/v3d/v3d_drv.h | 9 +-
>> drivers/gpu/drm/v3d/v3d_submit.c | 341 +++++++++++++
>> +-------------------------
>> 2 files changed, 121 insertions(+), 229 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/
>> v3d_drv.h
>> index fc12e5215fb0..071d919fe860 100644
>> --- a/drivers/gpu/drm/v3d/v3d_drv.h
>> +++ b/drivers/gpu/drm/v3d/v3d_drv.h
>> @@ -422,8 +422,10 @@ struct v3d_indirect_csd_info {
>> /* Indirect CSD */
>> struct v3d_csd_job *job;
>> - /* Clean cache job associated to the Indirect CSD job */
>> - struct v3d_job *clean_job;
>> + /* Indirect CSD args, stashed by the extension parser and later used
>> + * to create the CSD job from them.
>> + */
>> + struct drm_v3d_submit_csd args;
>> /* Offset within the BO where the workgroup counts are stored */
>> u32 offset;
>> @@ -438,9 +440,6 @@ struct v3d_indirect_csd_info {
>> /* Indirect BO */
>> struct drm_gem_object *indirect;
>> -
>> - /* Context of the Indirect CSD job */
>> - struct drm_exec exec;
>> };
>> struct v3d_timestamp_query_info {
>> diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/
>> v3d_submit.c
>> index 5b519a892985..36402fc25c10 100644
>> --- a/drivers/gpu/drm/v3d/v3d_submit.c
>> +++ b/drivers/gpu/drm/v3d/v3d_submit.c
>> @@ -20,28 +20,42 @@
>> * to v3d, so we don't attach dma-buf fences to them.
>> */
>
> It the comment above (not shown in the diff) still accurate?
>
>> static int
>> -v3d_lock_bo_reservations(struct v3d_job *job, struct drm_exec *exec)
>> +v3d_lock_bo_reservations(struct v3d_submit *submit)
>> {
>> - int i, ret;
>> + int i, j, ret;
>> - drm_exec_init(exec, DRM_EXEC_INTERRUPTIBLE_WAIT, job->bo_count);
>> - drm_exec_until_all_locked(exec) {
>> - ret = drm_exec_prepare_array(exec, job->bo, job->bo_count, 1);
>> + drm_exec_init(&submit->exec,
>> + DRM_EXEC_INTERRUPTIBLE_WAIT |
>> DRM_EXEC_IGNORE_DUPLICATES, 0);
>> + drm_exec_until_all_locked(&submit->exec) {
>> + for (i = 0; i < submit->job_count; i++) {
>> + struct v3d_job *job = submit->jobs[i];
>
> As per v3d_lookup_bos only the last job carries the initialize job->bo
> so is the loop required? Only for reserving the fence slot?
>
> [comes back later]
>
> Hmmm, or is the csd submit the special case where two jobs can point to
> different bos?
Exactly. In the indirect CSD case, the CPU job points to one BO array
and the CSD job points to another. That's why this rework was needed.
>
> Sorry this is a monster diff. I ended up applying it and am jumping back
> and forth.
I'm thinking about splitting this patch into two for the next version to
ease the reviewing process.
Thanks for the review!
Best regards,
- Maíra
^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2026-05-09 13:33 UTC | newest]
Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-13 15:03 [PATCH 00/10] drm/v3d: Scheduler and submission fixes and refactoring Maíra Canal
2026-04-13 15:03 ` [PATCH 01/10] drm/v3d: Drop unused drm_encoder.h include from v3d_drv.h Maíra Canal
2026-04-13 15:03 ` [PATCH 02/10] drm/v3d: Clear queue->active_job when v3d_fence_create() fails Maíra Canal
2026-04-16 11:39 ` Tvrtko Ursulin
2026-05-08 14:01 ` Maíra Canal
2026-04-13 15:03 ` [PATCH 03/10] drm/v3d: Use inline lock for dma fence initialization Maíra Canal
2026-04-16 11:44 ` Tvrtko Ursulin
2026-04-13 15:03 ` [PATCH 04/10] drm/v3d: Replace spin_lock_irqsave() with spin_lock_irq() Maíra Canal
2026-04-16 11:46 ` Tvrtko Ursulin
2026-04-13 15:03 ` [PATCH 05/10] drm/v3d: Extract v3d_job_add_syncobjs() helper Maíra Canal
2026-04-16 11:53 ` Tvrtko Ursulin
2026-04-13 15:03 ` [PATCH 06/10] drm/v3d: Reject invalid syncobj handles in submit ioctls Maíra Canal
2026-04-16 11:59 ` Tvrtko Ursulin
2026-05-08 15:28 ` Maíra Canal
2026-04-17 15:05 ` Tvrtko Ursulin
2026-04-13 15:03 ` [PATCH 07/10] drm/v3d: Migrate BO reservation locking to DRM exec Maíra Canal
2026-04-16 12:24 ` Tvrtko Ursulin
2026-04-13 15:03 ` [PATCH 08/10] drm/v3d: Introduce struct v3d_submit and convert CL/TFU/CSD ioctls Maíra Canal
2026-04-16 14:16 ` Tvrtko Ursulin
2026-04-13 15:03 ` [PATCH 09/10] drm/v3d: Refactor CPU ioctl into unified submission chain Maíra Canal
2026-04-17 14:38 ` Tvrtko Ursulin
2026-05-09 13:33 ` Maíra Canal
2026-04-13 15:03 ` [PATCH 10/10] drm/v3d: Ensure atomic submissions in v3d_submit_jobs() Maíra Canal
2026-04-17 15:02 ` Tvrtko Ursulin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox