All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3] drm/v3d: bound CPU-job query writes to their destination BO
@ 2026-06-17 12:58 Michael Bommarito
  2026-06-17 13:15 ` sashiko-bot
  0 siblings, 1 reply; 2+ messages in thread
From: Michael Bommarito @ 2026-06-17 12:58 UTC (permalink / raw)
  To: Melissa Wen, Maira Canal
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Iago Toral Quiroga, dri-devel, linux-kernel

The V3D_SUBMIT_CPU CPU jobs take user-supplied offsets and indices and
consume them at exec time without checking that the accesses stay inside
their BO:

  - TIMESTAMP_QUERY and RESET_TIMESTAMP_QUERY write one u64 per query
    into bo[0] at a fully user-controlled per-query offset.
  - COPY_TIMESTAMP_QUERY copies one u64 per query into bo[0] at
    offset + i * stride, and reads each result from a user-controlled
    offset in the source bo[1].
  - COPY_PERFORMANCE_QUERY writes nperfmons * DRM_V3D_MAX_PERF_COUNTERS
    counter slots plus an availability slot into bo[0] at the same
    geometry.
  - INDIRECT_CSD reads three u32 work-group counts from bo[0] at a
    user-controlled offset, then writes each count back into the
    indirect BO at a user-controlled u32 index (wg_uniform_offsets[]).
    It also dereferences the indirect BO without checking that the
    user-supplied handle lookup succeeded.

A render-node user (DRM_RENDER_ALLOW, no master, no capability) can make
the handlers read or write past a BO's vmap mapping, or dereference a
NULL indirect BO.

Validate the full access extent against the BO size once the BOs are
looked up, before the job is queued, rejecting out-of-range geometry
with -EINVAL and a missing indirect BO with -ENOENT. The copy extent
offset + (count - 1) * stride + write_size is computed in u64, mirroring
the u8 * pointer arithmetic in the executors: (count - 1) * stride is a
u32 * u32 product that is exact in u64, so one overflow check on the
total guards the bound. The performance slot count and the bare
timestamp, copy-source and indirect offsets are computed in u64 the same
way, so a user value cannot wrap the comparison.

Fixes: 18b8413b25b7 ("drm/v3d: Create a CPU job extension for a indirect CSD job")
Fixes: 9ba0ff3e083f ("drm/v3d: Create a CPU job extension for the timestamp query job")
Fixes: 34a101e64296 ("drm/v3d: Create a CPU job extension for the reset timestamp job")
Fixes: 6745f3e44a20 ("drm/v3d: Create a CPU job extension to copy timestamp query to a buffer")
Fixes: 209e8d2695ee ("drm/v3d: Create a CPU job extension for the copy performance query job")
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
---
v3:
 - Bound the INDIRECT_CSD job as well (Maira Canal). The exec reads three
   u32 work-group counts from bo[0] at the user offset and writes each back
   into the indirect BO at a user-supplied u32 index (wg_uniform_offsets[]),
   and dereferences the indirect BO; reject an out-of-range offset or index
   and a NULL indirect BO from a failed handle lookup (-ENOENT).
 - Compute the copy extent and the performance slot count in u64 rather than
   u32 + check_*_overflow() (Maira Canal). (count - 1) * stride is a u32 * u32
   product that is exact in u64, so a single overflow check guards the total,
   matching the u8 * pointer arithmetic in v3d_copy_query_results() and
   v3d_copy_performance_query(). Drop the now-redundant check_mul_overflow()
   in the performance branch and note the (u64) cast on the per-query offset
   checks.
 - Add Fixes: 34a101e64296 for the RESET_TIMESTAMP_QUERY path: the bare
   timestamp case bounds it too (v3d_reset_timestamp_queries() writes a u64
   zero at a user offset), and that handler was introduced separately from
   the timestamp-query job. All five Fixes commits are part of the v6.8 CPU
   job extension series.
 - The INDIRECT_CSD NULL-handle reject overlaps "drm/v3d: Reject invalid
   indirect BO handle in indirect CSD setup" from your Miscellaneous fixes
   series. I kept it here so this patch stands alone on the current base; if
   that lands first this check is redundant and I will drop it, or rebase on
   top of your series -- whichever you prefer.
v2 (https://lore.kernel.org/all/20260614211644.217116-1-michael.bommarito@gmail.com/):
 - Also bound the bare TIMESTAMP_QUERY and RESET_TIMESTAMP_QUERY job types
   (Maira Canal); drop the unused span variable and the can-never-overflow
   check on the fixed-size timestamp slots; drop the KUnit reproducer, which
   was a no-go for upstreaming (Maira Canal); one Fixes tag per introducing
   CPU-job extension.
v1: https://lore.kernel.org/all/20260614131058.2525157-1-michael.bommarito@gmail.com/

 drivers/gpu/drm/v3d/v3d_submit.c | 136 +++++++++++++++++++++++++++++++
 1 file changed, 136 insertions(+)

diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index ee4512db294b3..9265f7bcd2766 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -4,6 +4,8 @@
  * Copyright (C) 2023 Raspberry Pi
  */
 
+#include <linux/overflow.h>
+
 #include <drm/drm_print.h>
 #include <drm/drm_syncobj.h>
 
@@ -1246,6 +1248,136 @@ static const unsigned int cpu_job_bo_handle_count[] = {
 	[V3D_CPU_JOB_TYPE_COPY_PERFORMANCE_QUERY] = 1,
 };
 
+/* Reject offset + (count - 1) * stride + write_size if it leaves the BO. */
+static int
+v3d_check_copy_extent(struct drm_device *dev, size_t bo_size,
+		      u32 offset, u32 stride, u32 count, u64 write_size)
+{
+	u64 last;
+
+	if (!count)
+		return 0;
+
+	/*
+	 * The executors walk a u8 * cursor, so the furthest written byte is
+	 * offset + (count - 1) * stride + write_size, matching the pointer
+	 * arithmetic in v3d_copy_query_results()/v3d_copy_performance_query().
+	 * (count - 1) * stride is a u32 * u32 product that is exact in u64,
+	 * and offset + write_size stays far below the u64 range, so a single
+	 * overflow check guards the total.
+	 */
+	last = write_size + offset;
+	if (check_add_overflow((u64)(count - 1) * stride, last, &last) ||
+	    last > bo_size) {
+		drm_dbg(dev, "CPU job copy buffer exceeds the destination BO.\n");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+/* Reject a query CPU job whose writes would land outside their BO. */
+static int
+v3d_cpu_job_bounds_check(struct v3d_cpu_job *job)
+{
+	struct drm_device *dev = &job->base.v3d->drm;
+	struct v3d_timestamp_query_info *tquery = &job->timestamp_query;
+	struct v3d_copy_query_results_info *copy = &job->copy;
+	u32 elem = copy->do_64bit ? sizeof(u64) : sizeof(u32);
+	struct v3d_bo *dst, *src;
+	u64 slots, write_size;
+	int i;
+
+	switch (job->job_type) {
+	case V3D_CPU_JOB_TYPE_TIMESTAMP_QUERY:
+	case V3D_CPU_JOB_TYPE_RESET_TIMESTAMP_QUERY:
+		/*
+		 * Each query writes one u64 timestamp slot into bo[0]. The
+		 * offset is cast to u64 so a 0xffffffff query offset cannot wrap
+		 * the addition past the BO size.
+		 */
+		dst = to_v3d_bo(job->base.bo[0]);
+
+		for (i = 0; i < tquery->count; i++) {
+			if ((u64)tquery->queries[i].offset + sizeof(u64) >
+			    dst->base.base.size)
+				goto err_range;
+		}
+		return 0;
+	case V3D_CPU_JOB_TYPE_COPY_TIMESTAMP_QUERY:
+		/* Copies one u64 per query from bo[1] into bo[0]. */
+		dst = to_v3d_bo(job->base.bo[0]);
+		src = to_v3d_bo(job->base.bo[1]);
+
+		for (i = 0; i < tquery->count; i++) {
+			if ((u64)tquery->queries[i].offset + sizeof(u64) >
+			    src->base.base.size)
+				goto err_range;
+		}
+
+		write_size = (copy->availability_bit ? 2 : 1) * elem;
+		return v3d_check_copy_extent(dev, dst->base.base.size,
+					     copy->offset, copy->stride,
+					     tquery->count, write_size);
+	case V3D_CPU_JOB_TYPE_COPY_PERFORMANCE_QUERY:
+		/*
+		 * Each query writes nperfmons * DRM_V3D_MAX_PERF_COUNTERS
+		 * counter slots into bo[0], plus an availability slot at index
+		 * ncounters. nperfmons and ncounters are user values; the slot
+		 * count and write size are computed in u64 (u32 * small
+		 * constant) so they cannot wrap.
+		 */
+		dst = to_v3d_bo(job->base.bo[0]);
+
+		slots = (u64)job->performance_query.nperfmons *
+			DRM_V3D_MAX_PERF_COUNTERS;
+		if (copy->availability_bit)
+			slots = max(slots,
+				    (u64)job->performance_query.ncounters + 1);
+
+		write_size = slots * elem;
+		return v3d_check_copy_extent(dev, dst->base.base.size,
+					     copy->offset, copy->stride,
+					     job->performance_query.count,
+					     write_size);
+	case V3D_CPU_JOB_TYPE_INDIRECT_CSD: {
+		struct v3d_indirect_csd_info *icsd = &job->indirect_csd;
+
+		/*
+		 * The exec-time work-group rewrite reads three u32 counts from
+		 * bo[0] at the user-supplied offset, then writes each count back
+		 * into the indirect BO at a user-supplied u32 index. Reject a
+		 * missing indirect BO (drm_gem_object_lookup() returned NULL for
+		 * a bad handle, which the exec path would otherwise dereference)
+		 * and bound both accesses.
+		 */
+		if (!icsd->indirect)
+			return -ENOENT;
+
+		dst = to_v3d_bo(job->base.bo[0]);
+		if ((u64)icsd->offset + 3 * sizeof(u32) > dst->base.base.size)
+			goto err_range;
+
+		src = to_v3d_bo(icsd->indirect);
+		for (i = 0; i < 3; i++) {
+			u32 uidx = icsd->wg_uniform_offsets[i];
+
+			if (uidx != 0xffffffff &&
+			    (u64)uidx * sizeof(u32) + sizeof(u32) >
+			    src->base.base.size)
+				goto err_range;
+		}
+		return 0;
+	}
+	default:
+		return 0;
+	}
+
+err_range:
+	drm_dbg(dev, "CPU job query offset exceeds the BO.\n");
+	return -EINVAL;
+}
+
 /**
  * v3d_submit_cpu_ioctl() - Submits a CPU job to the V3D.
  * @dev: DRM device
@@ -1317,6 +1449,10 @@ v3d_submit_cpu_ioctl(struct drm_device *dev, void *data,
 		if (ret)
 			goto fail;
 
+		ret = v3d_cpu_job_bounds_check(cpu_job);
+		if (ret)
+			goto fail;
+
 		ret = v3d_lock_bo_reservations(&cpu_job->base, &acquire_ctx);
 		if (ret)
 			goto fail;
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH v3] drm/v3d: bound CPU-job query writes to their destination BO
  2026-06-17 12:58 [PATCH v3] drm/v3d: bound CPU-job query writes to their destination BO Michael Bommarito
@ 2026-06-17 13:15 ` sashiko-bot
  0 siblings, 0 replies; 2+ messages in thread
From: sashiko-bot @ 2026-06-17 13:15 UTC (permalink / raw)
  To: Michael Bommarito; +Cc: dri-devel

Thank you for your contribution! Sashiko AI review found 8 potential issue(s) to consider:

New issues:
- [Medium] Loop variable 'i' declared as signed 'int' causes negative indexing and out-of-bounds reads when 'tquery->count' exceeds INT_MAX.

Pre-existing issues:
- [Critical] Uninitialized CPU job structure cleanup causes a NULL pointer dereference in `drm_sched_job_cleanup()`.
- [Critical] Cleaning up CPU jobs that have already been pushed to the DRM scheduler queue causes severe state corruption and Use-After-Free.
- [High] Memory and WW mutex leaks for INDIRECT_CSD clean_job BO references on intermediate error paths.
- [High] Missing check for vmap() allocation failure leads to NULL pointer dereferences in CPU job executors.
- [High] The INDIRECT_CSD CPU job accesses the indirect GEM BO without extracting its implicit dependencies, causing it to race with prior GPU jobs.
- [High] Double vmap() on the same Buffer Object permanently leaks virtual memory mappings.
- [High] User-controlled wg_counts integer overflow evaluates to 0, triggering a WARN_ON and potential Denial of Service.
--

commit 176c9c3615bb9c83354ef76d006e602c0a5318e8
Author: Michael Bommarito <michael.bommarito@gmail.com>
Subject: drm/v3d: bound CPU-job query writes to their destination BO

The commit adds bounds checking for CPU jobs to ensure that user-supplied
offsets and indices do not result in reads or writes outside their
associated Buffer Objects (BOs), preventing potential out-of-bounds memory
accesses by render-node users.

> diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
> index 1db43c6a078d5..0d3a388fbba57 100644
> --- a/drivers/gpu/drm/v3d/v3d_submit.c
> +++ b/drivers/gpu/drm/v3d/v3d_submit.c
[ ... ]
> +static int
> +v3d_cpu_job_bounds_check(struct v3d_cpu_job *job)
> +{
> +	struct drm_device *dev = &job->base.v3d->drm;
> +	struct v3d_timestamp_query_info *tquery = &job->timestamp_query;
> +	struct v3d_copy_query_results_info *copy = &job->copy;
> +	u32 elem = copy->do_64bit ? sizeof(u64) : sizeof(u32);
> +	struct v3d_bo *dst, *src;
> +	u64 slots, write_size;
> +	int i;
> +
> +	switch (job->job_type) {
> +	case V3D_CPU_JOB_TYPE_TIMESTAMP_QUERY:
> +	case V3D_CPU_JOB_TYPE_RESET_TIMESTAMP_QUERY:
> +		/*
> +		 * Each query writes one u64 timestamp slot into bo[0]. The
> +		 * offset is cast to u64 so a 0xffffffff query offset cannot wrap
> +		 * the addition past the BO size.
> +		 */
> +		dst = to_v3d_bo(job->base.bo[0]);
> +
> +		for (i = 0; i < tquery->count; i++) {

[Severity: Medium]
Is it possible for tquery->count to exceed the maximum value of a signed int?
If so, does the counter i wrap around to a negative number, causing an
out-of-bounds read when accessing tquery->queries[i].offset?

> +			if ((u64)tquery->queries[i].offset + sizeof(u64) >
> +			    dst->base.base.size)
> +				goto err_range;
> +		}
> +		return 0;
[ ... ]
> @@ -1346,6 +1478,10 @@ v3d_submit_cpu_ioctl(struct drm_device *dev, void *data,
>  		if (ret)
>  			goto fail;
>  
> +		ret = v3d_cpu_job_bounds_check(cpu_job);
> +		if (ret)
> +			goto fail;
> +

[Severity: High]
This is a pre-existing issue, but does jumping to the fail label here bypass
the fail_unreserve cleanup path?

For INDIRECT_CSD jobs, v3d_setup_csd_jobs_and_bos() dynamically allocates jobs
and locks the reservations for clean_job->bo. Does bypassing the
fail_unreserve label permanently leak the WW mutexes and cause deadlocks for
subsequent operations on these BOs?

>  		ret = v3d_lock_bo_reservations(&cpu_job->base, &acquire_ctx);
>  		if (ret)
>  			goto fail;

While reviewing this patch, I also noticed a few pre-existing issues in the
surrounding code:

drivers/gpu/drm/v3d/v3d_submit.c:v3d_submit_cpu_ioctl() {
    ...
    if (args->bo_handle_count != cpu_job_bo_handle_count[cpu_job->job_type]) {
        drm_dbg(dev, "This CPU job was not submitted with the proper number of BOs.\n");
        ret = -EINVAL;
        goto fail;
    }
    ...
fail:
    v3d_job_cleanup((void *)cpu_job);
}

[Severity: Critical]
This is a pre-existing issue, but since the error occurs before
v3d_job_init() is called, cpu_job->base.s_fence is NULL. Does the
subsequent call to drm_sched_job_cleanup() unconditionally dereference
job->s_fence->finished.refcount, causing a kernel panic?


drivers/gpu/drm/v3d/v3d_submit.c:v3d_submit_cpu_ioctl() {
    ...
    v3d_push_job(&cpu_job->base);
    ...
    case V3D_CPU_JOB_TYPE_INDIRECT_CSD:
        ret = drm_sched_job_add_dependency(&csd_job->base.base,
                                           dma_fence_get(cpu_job->base.done_fence));
        if (ret)
            goto fail_unreserve;
    ...
fail_unreserve:
    ...
fail:
    v3d_job_cleanup((void *)cpu_job);
}

[Severity: Critical]
This is a pre-existing issue, but when goto fail_unreserve falls through
to the fail label and calls v3d_job_cleanup(cpu_job), does this corrupt the
state of jobs that have already been armed and pushed to the active DRM
scheduler queue?


drivers/gpu/drm/v3d/v3d_bo.c:v3d_get_bo_vaddr() {
    ...
    bo->vaddr = vmap(obj->pages, obj->base.size >> PAGE_SHIFT, VM_MAP, pgprot_writecombine(PAGE_KERNEL));
    ...
}

[Severity: High]
This is a pre-existing issue, but the result of vmap() is assigned directly
to bo->vaddr. If vmap() returns NULL due to memory fragmentation, do CPU
job executors like v3d_rewrite_csd_job_wg_counts_from_indirect() blindly
dereference this NULL pointer?


drivers/gpu/drm/v3d/v3d_submit.c:v3d_get_cpu_indirect_csd_params() {
    ...
    info->indirect = drm_gem_object_lookup(file_priv, indirect_csd.indirect);
    ...
}

[Severity: High]
This is a pre-existing issue, but since the indirect BO is not added to
cpu_job->base.bo, it appears to bypass v3d_lock_bo_reservations(). Does this
allow v3d_rewrite_csd_job_wg_counts_from_indirect() to read and modify the BO
while a prior GPU dispatch might still be writing to it?


drivers/gpu/drm/v3d/v3d_sched.c:v3d_rewrite_csd_job_wg_counts_from_indirect() {
    ...
    v3d_get_bo_vaddr(bo);
    v3d_get_bo_vaddr(indirect);
    ...
}

[Severity: High]
This is a pre-existing issue, but if a user provides the exact same GEM
object handle for both the primary bo and the indirect BO, does the second
vmap() silently overwrite the vaddr pointer from the first, permanently
leaking the prior virtual memory mapping?


drivers/gpu/drm/v3d/v3d_sched.c:v3d_rewrite_csd_job_wg_counts_from_indirect() {
    ...
    num_batches = DIV_ROUND_UP(indirect_csd->wg_size, 16) * (wg_counts[0] * wg_counts[1] * wg_counts[2]);
    ...
    args->cfg[4] = num_batches - 1;
    ...
    WARN_ON(args->cfg[4] == ~0);
}

[Severity: High]
This is a pre-existing issue, but since wg_counts are user-supplied 32-bit
unsigned integers, can specific inputs cleanly overflow the product to 0?
Does this underflow args->cfg[4] to 0xFFFFFFFF and trigger the WARN_ON,
potentially crashing systems with panic_on_warn enabled?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260617125811.3692951-1-michael.bommarito@gmail.com?part=1

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2026-06-17 13:15 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-17 12:58 [PATCH v3] drm/v3d: bound CPU-job query writes to their destination BO Michael Bommarito
2026-06-17 13:15 ` sashiko-bot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.