[PATCH 0/6] Miscellaneous fixes for the Host1x driver

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/6] Miscellaneous fixes for the Host1x driver
@ 2026-06-09  8:09 Mikko Perttunen
  2026-06-09  8:09 ` [PATCH 1/6] gpu: host1x: Wait for timeout worker completion on channel free Mikko Perttunen
                   ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: Mikko Perttunen @ 2026-06-09  8:09 UTC (permalink / raw)
  To: Thierry Reding, David Airlie, Simona Vetter
  Cc: dri-devel, linux-tegra, linux-kernel, Mikko Perttunen

This series has a number of small miscellaneous fixes to the host1x 
driver.

Patches 1 to 4 fix various logic issues that are unlikely to happen
but technically possible.

Patch 5 fixes a return type from unsigned int to int -- no functional
difference.

Patch 6 adds makes syncpoint value arithmetic explicitly wrapping, 
mostly to help static/dynamic analysis tools.

---
Mikko Perttunen (6):
      gpu: host1x: Wait for timeout worker completion on channel free
      gpu: host1x: Avoid double device_add when clients already present
      gpu: host1x: Fix offset calculation in trace_write_gather
      gpu: host1x: Avoid stack over-read in debug output helpers
      gpu: host1x: Change pin_job() return type to int
      gpu: host1x: Annotate intentional syncpoint wrap-around

 drivers/gpu/host1x/bus.c           |  2 +-
 drivers/gpu/host1x/cdma.c          |  3 ++-
 drivers/gpu/host1x/debug.c         |  4 ++--
 drivers/gpu/host1x/hw/cdma_hw.c    |  2 +-
 drivers/gpu/host1x/hw/channel_hw.c | 15 +++++++++------
 drivers/gpu/host1x/intr.c          |  5 +++--
 drivers/gpu/host1x/job.c           |  2 +-
 drivers/gpu/host1x/syncpt.c        |  7 ++++++-
 drivers/gpu/host1x/syncpt.h        |  3 ++-
 9 files changed, 27 insertions(+), 16 deletions(-)
---
base-commit: 4549871118cf616eecdd2d939f78e3b9e1dddc48
change-id: 20260608-b4-host1x-small-fixes-a-081cfea2c073


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/6] gpu: host1x: Wait for timeout worker completion on channel free
  2026-06-09  8:09 [PATCH 0/6] Miscellaneous fixes for the Host1x driver Mikko Perttunen
@ 2026-06-09  8:09 ` Mikko Perttunen
  2026-06-09  8:24   ` sashiko-bot
  2026-06-09  8:09 ` [PATCH 2/6] gpu: host1x: Avoid double device_add when clients already present Mikko Perttunen
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Mikko Perttunen @ 2026-06-09  8:09 UTC (permalink / raw)
  To: Thierry Reding, David Airlie, Simona Vetter
  Cc: dri-devel, linux-tegra, linux-kernel, Mikko Perttunen

cdma_timeout_destroy() used cancel_delayed_work() to cancel pending
timeout work when destroying the CDMA. Usually this is fine, but
there is a narrow race condition where the timeout handler has started
execution but has not taken cdma->lock; the channel is freed causing
cdma_stop to take cdma->lock and flush the channel; host1x_cdma_deinit
then proceeds with deinitializing cdma while the handler is waiting to
take cdma->lock.

Therefore change cdma_timeout_destroy to use cancel_delayed_work_sync
instead to ensure any pending timeout work completes before proceeding.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/host1x/hw/cdma_hw.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/host1x/hw/cdma_hw.c b/drivers/gpu/host1x/hw/cdma_hw.c
index 3f3f0018eee0..ab714d221120 100644
--- a/drivers/gpu/host1x/hw/cdma_hw.c
+++ b/drivers/gpu/host1x/hw/cdma_hw.c
@@ -355,7 +355,7 @@ static int cdma_timeout_init(struct host1x_cdma *cdma)
 static void cdma_timeout_destroy(struct host1x_cdma *cdma)
 {
 	if (cdma->timeout.initialized)
-		cancel_delayed_work(&cdma->timeout.wq);
+		cancel_delayed_work_sync(&cdma->timeout.wq);
 
 	cdma->timeout.initialized = false;
 }

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/6] gpu: host1x: Wait for timeout worker completion on channel free
  2026-06-09  8:09 ` [PATCH 1/6] gpu: host1x: Wait for timeout worker completion on channel free Mikko Perttunen
@ 2026-06-09  8:24   ` sashiko-bot
  0 siblings, 0 replies; 12+ messages in thread
From: sashiko-bot @ 2026-06-09  8:24 UTC (permalink / raw)
  To: Mikko Perttunen; +Cc: dri-devel

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:

Pre-existing issues:
- [High] Missing teardown of `cdma->update_work` during channel destruction causes workqueue corruption and use-after-free races.
- [High] Spurious timeout handler execution aborts newly-started jobs due to unsafe non-synchronous timer cancellation in `update_cdma_locked()`.
--

commit 716b1c0819a18e8bf5a813661ef56163eb1114aa
Author: Mikko Perttunen <mperttunen@nvidia.com>

gpu: host1x: Wait for timeout worker completion on channel free

This commit addresses a race condition during CDMA teardown. By replacing
cancel_delayed_work() with cancel_delayed_work_sync(), it ensures pending
timeout work completes before the CDMA channel is freed. This prevents the
timeout handler from waiting on a destroyed lock.

> diff --git a/drivers/gpu/host1x/hw/cdma_hw.c b/drivers/gpu/host1x/hw/cdma_hw.c
> index 3f3f0018eee0d..ab714d221120c 100644
> --- a/drivers/gpu/host1x/hw/cdma_hw.c
> +++ b/drivers/gpu/host1x/hw/cdma_hw.c
> @@ -355,7 +355,7 @@ static int cdma_timeout_init(struct host1x_cdma *cdma)
>  static void cdma_timeout_destroy(struct host1x_cdma *cdma)
>  {
>  	if (cdma->timeout.initialized)
> -		cancel_delayed_work(&cdma->timeout.wq);
> +		cancel_delayed_work_sync(&cdma->timeout.wq);

[Severity: High]
This isn't a bug introduced by this patch, but does the missing teardown of
cdma->update_work during channel destruction cause workqueue corruption and
use-after-free races?

When release_channel() calls host1x_cdma_deinit(), it destroys the push
buffer but fails to call cancel_work_sync(&cdma->update_work):

drivers/gpu/host1x/cdma.c:host1x_cdma_deinit() {
    ...
	host1x_pushbuffer_destroy(pb);
	host1x_hw_cdma_timeout_destroy(host1x, cdma);

	return 0;
}

The cdma->update_work can be scheduled asynchronously by a hardware syncpoint
interrupt. If the channel is immediately reallocated to a new client,
host1x_cdma_init() unconditionally executes INIT_WORK(&cdma->update_work, ...).

Calling INIT_WORK() on an actively queued or running work struct can corrupt
the workqueue linked lists. Additionally, if the stray cdma_update_work()
executes after the push buffer is destroyed, can it dereference a NULL
cdma->push_buffer.mapped?

[Severity: High]
This is a pre-existing issue, but does spurious timeout handler execution
abort newly-started jobs due to unsafe non-synchronous timer cancellation in
update_cdma_locked()?

When a job finishes, stop_cdma_timer_locked() uses non-sync
cancel_delayed_work() under a lock, leaving the executing handler to run:

drivers/gpu/host1x/cdma.c:stop_cdma_timer_locked() {
	cancel_delayed_work(&cdma->timeout.wq);
	cdma->timeout.client = NULL;
}

If the timeout handler cdma_timeout_handler() has already started on another
CPU, it blocks waiting for cdma->lock. The current thread then immediately
starts the next job, updating cdma->timeout.client and syncpt_val, and
releases the lock.

The stale timeout handler wakes up, sees cdma->timeout.client is not NULL,
and checks if the NEW job has completed. Since it hasn't, the handler
spuriously assumes a timeout occurred, freezes the CDMA channel, and aborts
the new job.

Is a generation sequence number or explicit time check against start_ktime in
the handler required to fix this race safely, since using
cancel_delayed_work_sync() here would cause a deadlock?

>  
>  	cdma->timeout.initialized = false;
>  }

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260609-b4-host1x-small-fixes-a-v1-0-7c1131c0b3ad@nvidia.com?part=1

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 2/6] gpu: host1x: Avoid double device_add when clients already present
  2026-06-09  8:09 [PATCH 0/6] Miscellaneous fixes for the Host1x driver Mikko Perttunen
  2026-06-09  8:09 ` [PATCH 1/6] gpu: host1x: Wait for timeout worker completion on channel free Mikko Perttunen
@ 2026-06-09  8:09 ` Mikko Perttunen
  2026-06-09  8:20   ` sashiko-bot
  2026-06-09  8:09 ` [PATCH 3/6] gpu: host1x: Fix offset calculation in trace_write_gather Mikko Perttunen
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Mikko Perttunen @ 2026-06-09  8:09 UTC (permalink / raw)
  To: Thierry Reding, David Airlie, Simona Vetter
  Cc: dri-devel, linux-tegra, linux-kernel, Mikko Perttunen

host1x_device_add looks through the idle clients list to populate
subdevs, and any matches entries are moved from the subdevs list
to the active list. If all subdevs are populated, device_add will
be called on the device. The secondary "subdevs list empty" check
will then incorrectly again call device_add.

However, this would require a convoluted scenario since clients don't
typically end up on the idle clients list.

Fix by checking whether the device was already added before adding
again.

Fixes: fab823d82ee5 ("gpu: host1x: Allow loading tegra-drm without enabled engines")
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/host1x/bus.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/host1x/bus.c b/drivers/gpu/host1x/bus.c
index f814eb4941c0..fe4af98e8fc6 100644
--- a/drivers/gpu/host1x/bus.c
+++ b/drivers/gpu/host1x/bus.c
@@ -508,7 +508,7 @@ static int host1x_device_add(struct host1x *host1x,
 	 * Add device even if there are no subdevs to ensure syncpoint functionality
 	 * is available regardless of whether any engine subdevices are present
 	 */
-	if (list_empty(&device->subdevs)) {
+	if (list_empty(&device->subdevs) && !device->registered) {
 		err = device_add(&device->dev);
 		if (err < 0)
 			dev_err(&device->dev, "failed to add device: %d\n", err);

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/6] gpu: host1x: Avoid double device_add when clients already present
  2026-06-09  8:09 ` [PATCH 2/6] gpu: host1x: Avoid double device_add when clients already present Mikko Perttunen
@ 2026-06-09  8:20   ` sashiko-bot
  0 siblings, 0 replies; 12+ messages in thread
From: sashiko-bot @ 2026-06-09  8:20 UTC (permalink / raw)
  To: Mikko Perttunen; +Cc: dri-devel

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] The patch prevents a double `device_add()` on the success path, but causes a forbidden second `device_add()` call if the first attempt fails.
--

commit ece8de02ea1a088261ec6c5d40e683d0804f1957
Author: Mikko Perttunen <mperttunen@nvidia.com>

gpu: host1x: Avoid double device_add when clients already present

This commit fixes an issue in host1x_device_add where device_add is called
twice if all subdevs are matched and moved to the active list. It prevents
the second call by checking a registered flag before invoking device_add at
the end of the function.

> diff --git a/drivers/gpu/host1x/bus.c b/drivers/gpu/host1x/bus.c
> index f814eb4941c02..fe4af98e8fc68 100644
> --- a/drivers/gpu/host1x/bus.c
> +++ b/drivers/gpu/host1x/bus.c

[ ... ]

> @@ -508,7 +508,7 @@ static int host1x_device_add(struct host1x *host1x,
>  	 * Add device even if there are no subdevs to ensure syncpoint functionality
>  	 * is available regardless of whether any engine subdevices are present
>  	 */
> -	if (list_empty(&device->subdevs)) {
> +	if (list_empty(&device->subdevs) && !device->registered) {

[Severity: High]
Does this cause a forbidden second device_add() call if the first attempt
fails during the subdevice registration flow?

If device_add() is called earlier during host1x_subdev_register() and fails,
device->registered remains false. The execution then continues to this final
block in host1x_device_add(). 

Since list_empty(&device->subdevs) could evaluate to true and 
!device->registered is also true, won't this trigger an immediate second
device_add() on the identically failed structure?

>  		err = device_add(&device->dev);
>  		if (err < 0)
>  			dev_err(&device->dev, "failed to add device: %d\n", err);

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260609-b4-host1x-small-fixes-a-v1-0-7c1131c0b3ad@nvidia.com?part=2

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 3/6] gpu: host1x: Fix offset calculation in trace_write_gather
  2026-06-09  8:09 [PATCH 0/6] Miscellaneous fixes for the Host1x driver Mikko Perttunen
  2026-06-09  8:09 ` [PATCH 1/6] gpu: host1x: Wait for timeout worker completion on channel free Mikko Perttunen
  2026-06-09  8:09 ` [PATCH 2/6] gpu: host1x: Avoid double device_add when clients already present Mikko Perttunen
@ 2026-06-09  8:09 ` Mikko Perttunen
  2026-06-09  8:26   ` sashiko-bot
  2026-06-09  8:09 ` [PATCH 4/6] gpu: host1x: Avoid stack over-read in debug output helpers Mikko Perttunen
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Mikko Perttunen @ 2026-06-09  8:09 UTC (permalink / raw)
  To: Thierry Reding, David Airlie, Simona Vetter
  Cc: dri-devel, linux-tegra, linux-kernel, Mikko Perttunen

When a gather longer than 2*TRACE_MAX_LENGTH (256) words is traced
through host1x_cdma_push_gather, the reported BO offset drifts from
the third iteration onward.

Fix the calculation by properly calculating the value on each loop
rather than accumulating.

In reality, gathers tend to be pretty short so this is unlikely to
ever have been observed.

Fixes: b40d02bf96e0 ("gpu: host1x: Use struct host1x_bo pointers in traces")
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/host1x/hw/channel_hw.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/host1x/hw/channel_hw.c b/drivers/gpu/host1x/hw/channel_hw.c
index 2df6a16d484e..9dda73199889 100644
--- a/drivers/gpu/host1x/hw/channel_hw.c
+++ b/drivers/gpu/host1x/hw/channel_hw.c
@@ -36,10 +36,9 @@ static void trace_write_gather(struct host1x_cdma *cdma, struct host1x_bo *bo,
 		for (i = 0; i < words; i += TRACE_MAX_LENGTH) {
 			u32 num_words = min(words - i, TRACE_MAX_LENGTH);
 
-			offset += i * sizeof(u32);
-
 			trace_host1x_cdma_push_gather(dev_name(dev), bo,
-						      num_words, offset,
+						      num_words,
+						      offset + i * sizeof(u32),
 						      mem);
 		}
 

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/6] gpu: host1x: Fix offset calculation in trace_write_gather
  2026-06-09  8:09 ` [PATCH 3/6] gpu: host1x: Fix offset calculation in trace_write_gather Mikko Perttunen
@ 2026-06-09  8:26   ` sashiko-bot
  0 siblings, 0 replies; 12+ messages in thread
From: sashiko-bot @ 2026-06-09  8:26 UTC (permalink / raw)
  To: Mikko Perttunen; +Cc: dri-devel

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:

Pre-existing issues:
- [Critical] Integer overflow in `copy_gathers` leads to a massive heap buffer overflow
- [High] Mismatched buffer object and offset in `trace_write_gather` causes out-of-bounds read and information leak
- [High] Trace event uses fixed-size scalar field for dynamic string, causing use-after-free
- [High] Missing error pointer check for `host1x_bo_mmap()` leads to kernel oops on failure
--

commit 32809268254f182a4349aa6566cab30f99b736b0
Author: Mikko Perttunen <mperttunen@nvidia.com>

gpu: host1x: Fix offset calculation in trace_write_gather

When a gather longer than 2*TRACE_MAX_LENGTH (256) words is traced
through host1x_cdma_push_gather, the reported BO offset drifts from
the third iteration onward.

[Severity: Critical]
This is a pre-existing issue, but does copy_gathers() in
drivers/gpu/host1x/job.c have an integer overflow when accumulating sizes?

drivers/gpu/host1x/job.c:copy_gathers() {
    ...
    size += g->words * sizeof(u32);
    ...
}

If userspace submits a job with enough commands, could size wrap around on
32-bit systems? 

This wrapped size would then be used by dma_alloc_wc() to allocate a small
buffer, followed by a loop that copies a large amount of data:

drivers/gpu/host1x/job.c:copy_gathers() {
    ...
    memcpy(job->gather_copy_mapped + offset, gather + g->offset,
           g->words * sizeof(u32));
    ...
}

Could this lead to a large heap buffer overflow?

[Severity: High]
This isn't a bug introduced by this patch, but looking at the trace event
definition for host1x_cdma_push_gather:

include/trace/events/host1x.h:
    __field(const char *, name)

According to trace subsystem rules, dynamic-length strings should be stored
using __string() and __assign_str() to capture the string contents. By using
a pointer field, only the address is captured.

If the trace buffer is read after the device has been unregistered and its
name freed, could this pointer dangle and cause a use-after-free when
formatting the trace?

[Severity: High]
This is a pre-existing issue, but I noticed host1x_bo_mmap() can return an
error pointer (e.g. ERR_PTR(-ENOMEM) from tegra_bo_mmap()).

In trace_write_gather() earlier in this file:

    if (host1x_debug_trace_cmdbuf)
        mem = host1x_bo_mmap(bo);

    if (mem) {

Is it possible for mem to be an error pointer here? If so, checking if (mem)
will treat the error pointer as a valid address.

There appear to be similar checks elsewhere that might not catch error
pointers:

drivers/gpu/host1x/job.c:copy_gathers() {
    ...
    gather = host1x_bo_mmap(g->bo);
    memcpy(job->gather_copy_mapped + offset, gather + g->offset,
           g->words * sizeof(u32));
    ...
}

drivers/gpu/host1x/job.c:do_relocs() {
    ...
    cmdbuf_addr = host1x_bo_mmap(cmdbuf);

    if (unlikely(!cmdbuf_addr)) {
    ...
}

Could these lead to a kernel oops if host1x_bo_mmap() fails and returns an
error pointer instead of NULL?

> diff --git a/drivers/gpu/host1x/hw/channel_hw.c b/drivers/gpu/host1x/hw/channel_hw.c
> index 2df6a16d484e0..9dda73199889c 100644
> --- a/drivers/gpu/host1x/hw/channel_hw.c
> +++ b/drivers/gpu/host1x/hw/channel_hw.c

[ ... ]

> @@ -36,10 +36,9 @@ static void trace_write_gather(struct host1x_cdma *cdma, struct host1x_bo *bo,

[Severity: High]
This is a pre-existing issue, but is there a mismatched buffer object and
offset passed to trace_write_gather()?

When the firewall is enabled, copy_gathers() combines all gathers into a
single DMA buffer and overwrites g->offset with the cumulative byte offset,
but leaves g->bo pointing to the original buffer:

drivers/gpu/host1x/job.c:copy_gathers() {
    ...
    g->base = job->gather_copy;
    g->offset = offset;
    ...
}

Then submit_gathers() passes these to trace_write_gather():

drivers/gpu/host1x/hw/channel_hw.c:submit_gathers() {
    ...
    trace_write_gather(cdma, g->bo, g->offset, g->words);
    ...
}

Since trace_write_gather() maps the original BO but uses the new combined
offset:

>  		for (i = 0; i < words; i += TRACE_MAX_LENGTH) {
>  			u32 num_words = min(words - i, TRACE_MAX_LENGTH);
>  
> -			offset += i * sizeof(u32);
> -
>  			trace_host1x_cdma_push_gather(dev_name(dev), bo,
> -						      num_words, offset,
> +						      num_words,
> +						      offset + i * sizeof(u32),
>  						      mem);
>  		}
>  

Will this cause trace_host1x_cdma_push_gather() to read out of bounds from
the mapped original BO using the large combined offset?

include/trace/events/host1x.h:host1x_cdma_push_gather() {
    ...
    memcpy(__get_dynamic_array(cmdbuf), cmdbuf+offset,
           words * sizeof(u32));
    ...
}

Could this leak kernel data to the trace buffer?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260609-b4-host1x-small-fixes-a-v1-0-7c1131c0b3ad@nvidia.com?part=3

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 4/6] gpu: host1x: Avoid stack over-read in debug output helpers
  2026-06-09  8:09 [PATCH 0/6] Miscellaneous fixes for the Host1x driver Mikko Perttunen
                   ` (2 preceding siblings ...)
  2026-06-09  8:09 ` [PATCH 3/6] gpu: host1x: Fix offset calculation in trace_write_gather Mikko Perttunen
@ 2026-06-09  8:09 ` Mikko Perttunen
  2026-06-09  8:09 ` [PATCH 5/6] gpu: host1x: Change pin_job() return type to int Mikko Perttunen
  2026-06-09  8:09 ` [PATCH 6/6] gpu: host1x: Annotate intentional syncpoint wrap-around Mikko Perttunen
  5 siblings, 0 replies; 12+ messages in thread
From: Mikko Perttunen @ 2026-06-09  8:09 UTC (permalink / raw)
  To: Thierry Reding, David Airlie, Simona Vetter
  Cc: dri-devel, linux-tegra, linux-kernel, Mikko Perttunen

host1x_debug_output() and host1x_debug_cont() used vsnprintf(), which
returns the length the formatted string would have reached with an
unbounded buffer. That return value was passed straight to o->fn as
the number of bytes to emit.

This could cause a read past end of the output buffer if a call to
host1x_debug_* produced a string longer than 256 bytes. This only
affected the debugfs files as the printk debug sink ignores the
number of bytes. In practice, this is very unlikely to occur.

Fix by switching to vscnprintf(), which returns the number of bytes
actually written.

Fixes: 6236451d83a7 ("gpu: host1x: Add debug support")
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/host1x/debug.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/host1x/debug.c b/drivers/gpu/host1x/debug.c
index 6433c00d5d7e..b828f773fc06 100644
--- a/drivers/gpu/host1x/debug.c
+++ b/drivers/gpu/host1x/debug.c
@@ -31,7 +31,7 @@ void host1x_debug_output(struct output *o, const char *fmt, ...)
 	int len;
 
 	va_start(args, fmt);
-	len = vsnprintf(o->buf, sizeof(o->buf), fmt, args);
+	len = vscnprintf(o->buf, sizeof(o->buf), fmt, args);
 	va_end(args);
 
 	o->fn(o->ctx, o->buf, len, false);
@@ -43,7 +43,7 @@ void host1x_debug_cont(struct output *o, const char *fmt, ...)
 	int len;
 
 	va_start(args, fmt);
-	len = vsnprintf(o->buf, sizeof(o->buf), fmt, args);
+	len = vscnprintf(o->buf, sizeof(o->buf), fmt, args);
 	va_end(args);
 
 	o->fn(o->ctx, o->buf, len, true);

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 5/6] gpu: host1x: Change pin_job() return type to int
  2026-06-09  8:09 [PATCH 0/6] Miscellaneous fixes for the Host1x driver Mikko Perttunen
                   ` (3 preceding siblings ...)
  2026-06-09  8:09 ` [PATCH 4/6] gpu: host1x: Avoid stack over-read in debug output helpers Mikko Perttunen
@ 2026-06-09  8:09 ` Mikko Perttunen
  2026-06-09  8:29   ` sashiko-bot
  2026-06-09  8:09 ` [PATCH 6/6] gpu: host1x: Annotate intentional syncpoint wrap-around Mikko Perttunen
  5 siblings, 1 reply; 12+ messages in thread
From: Mikko Perttunen @ 2026-06-09  8:09 UTC (permalink / raw)
  To: Thierry Reding, David Airlie, Simona Vetter
  Cc: dri-devel, linux-tegra, linux-kernel, Mikko Perttunen

pin_job() returns negative errno values on error paths (-EINVAL,
-ENOMEM, PTR_ERR() of mapping) but was declared as unsigned int.
The caller would immediately cast back to int, so there was no
functional issue, but it still warrants fixing.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/host1x/job.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c
index 3ed49e1fd933..165979319599 100644
--- a/drivers/gpu/host1x/job.c
+++ b/drivers/gpu/host1x/job.c
@@ -138,7 +138,7 @@ void host1x_job_add_wait(struct host1x_job *job, u32 id, u32 thresh,
 }
 EXPORT_SYMBOL(host1x_job_add_wait);
 
-static unsigned int pin_job(struct host1x *host, struct host1x_job *job)
+static int pin_job(struct host1x *host, struct host1x_job *job)
 {
 	unsigned long mask = HOST1X_RELOC_READ | HOST1X_RELOC_WRITE;
 	struct host1x_client *client = job->client;

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 5/6] gpu: host1x: Change pin_job() return type to int
  2026-06-09  8:09 ` [PATCH 5/6] gpu: host1x: Change pin_job() return type to int Mikko Perttunen
@ 2026-06-09  8:29   ` sashiko-bot
  0 siblings, 0 replies; 12+ messages in thread
From: sashiko-bot @ 2026-06-09  8:29 UTC (permalink / raw)
  To: Mikko Perttunen; +Cc: dri-devel

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:

Pre-existing issues:
- [Critical] An integer overflow in copy_gathers() leads to a massive heap buffer overflow during memcpy.
- [High] Silent failure when iommu_map_sgtable() returns a negative error code.
- [High] Multiple error paths in pin_job() leak host1x_bo references and pinned host1x_bo_mapping objects.
- [High] Missing or incorrect validation of host1x_bo_mmap() return value causes kernel panic.
--

commit 49509dd1b9a9c195fff52b9936852ca75bde12b1
Author: Mikko Perttunen <mperttunen@nvidia.com>

gpu: host1x: Change pin_job() return type to int

pin_job() returns negative errno values on error paths (-EINVAL,
-ENOMEM, PTR_ERR() of mapping) but was declared as unsigned int.
The caller would immediately cast back to int, so there was no
functional issue, but it still warrants fixing.

> diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c
> index 3ed49e1fd9332..1659793195991 100644
> --- a/drivers/gpu/host1x/job.c
> +++ b/drivers/gpu/host1x/job.c
[ ... ]
> @@ -138,7 +138,7 @@ void host1x_job_add_wait(struct host1x_job *job, u32 id, u32 thresh,
>  }
>  EXPORT_SYMBOL(host1x_job_add_wait);
>  
> -static unsigned int pin_job(struct host1x *host, struct host1x_job *job)
> +static int pin_job(struct host1x *host, struct host1x_job *job)
>  {

[Severity: High]
This isn't a bug introduced by this patch, but while reviewing pin_job(),
does this code silently ignore mapping failures?

drivers/gpu/host1x/job.c:pin_job() {
    ...
			err = iommu_map_sgtable(host->domain,
						iova_dma_addr(&host->iova, alloc),
						map->sgt, IOMMU_READ);
			if (err == 0) {
				__free_iova(&host->iova, alloc);
				err = -EINVAL;
				goto put;
			}
    ...
}

iommu_map_sgtable() returns a positive mapped size on success and a negative
error code (like -EINVAL or -ENOMEM) on failure.

Since the check is exactly if (err == 0), a negative error code evaluates
to false, skipping the error path (goto put;).

This allows the code to proceed with an unmapped IOVA, which would cause the
IOMMU to block hardware memory access and trigger faults during execution.

Is this check intended to be if (err <= 0)?

[Severity: High]
This is a pre-existing issue, but do these error paths leak host1x_bo
references and mappings?

drivers/gpu/host1x/job.c:pin_job() {
    ...
		map = host1x_bo_pin(dev, bo, direction, NULL);
		if (IS_ERR(map)) {
			err = PTR_ERR(map);
			goto unpin;
		}

		if (map->chunks > 1) {
			err = -EINVAL;
			goto unpin;
		}

		job->addr_phys[job->num_unpins] = map->phys;
		job->unpins[job->num_unpins].map = map;
		job->num_unpins++;
    ...
}

In the loop, job->num_unpins is only incremented at the end of each iteration.

If an error occurs midway (e.g., map->chunks > 1), the loop jumps to unpin.

The cleanup routine host1x_job_unpin() iterates strictly up to job->num_unpins,
failing to release the BO reference and unpin the memory allocated during the
currently failing iteration.

Could a malicious user repeatedly trigger this error path to exhaust kernel
memory?

[Severity: High]
This is a pre-existing issue, but does this code dereference an error
pointer if host1x_bo_mmap() fails?

drivers/gpu/host1x/job.c:do_relocs() {
    ...
		if (!cmdbuf_addr) {
			cmdbuf_addr = host1x_bo_mmap(cmdbuf);

			if (unlikely(!cmdbuf_addr)) {
    ...
}

And similarly in copy_gathers():

drivers/gpu/host1x/job.c:copy_gathers() {
    ...
		gather = host1x_bo_mmap(g->bo);
		memcpy(job->gather_copy_mapped + offset, gather + g->offset,
		       g->words * sizeof(u32));
    ...
}

host1x_bo_mmap() returns an ERR_PTR() on failure (e.g., -ENOMEM), not NULL.

In do_relocs(), the check !cmdbuf_addr fails to detect an error pointer.
In copy_gathers(), the result is passed directly to memcpy() without any
checks.

Could this lead to a page fault and kernel panic if userspace triggers a
vmap failure?

[Severity: Critical]
This isn't a bug introduced by this patch, but can this size calculation
overflow on 32-bit platforms?

drivers/gpu/host1x/job.c:copy_gathers() {
    ...
	for (i = 0; i < job->num_cmds; i++) {
		struct host1x_job_gather *g;

		if (job->cmds[i].is_wait)
			continue;

		g = &job->cmds[i].gather;

		size += g->words * sizeof(u32);
	}
    ...
}

On 32-bit platforms, size_t size is 32 bits.

If an ioctl is submitted with a large number of command buffers
(e.g., num_cmdbufs = 65541 and words = 16383), the size calculation results
in 4,295,027,340, wrapping around to 60,044.

dma_alloc_wc() then allocates this undersized buffer, and the subsequent
memcpy() loop would copy out of bounds.

Could this lead to a massive heap buffer overflow?

>  	unsigned long mask = HOST1X_RELOC_READ | HOST1X_RELOC_WRITE;
>  	struct host1x_client *client = job->client;

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260609-b4-host1x-small-fixes-a-v1-0-7c1131c0b3ad@nvidia.com?part=5

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 6/6] gpu: host1x: Annotate intentional syncpoint wrap-around
  2026-06-09  8:09 [PATCH 0/6] Miscellaneous fixes for the Host1x driver Mikko Perttunen
                   ` (4 preceding siblings ...)
  2026-06-09  8:09 ` [PATCH 5/6] gpu: host1x: Change pin_job() return type to int Mikko Perttunen
@ 2026-06-09  8:09 ` Mikko Perttunen
  2026-06-09  8:29   ` sashiko-bot
  5 siblings, 1 reply; 12+ messages in thread
From: Mikko Perttunen @ 2026-06-09  8:09 UTC (permalink / raw)
  To: Thierry Reding, David Airlie, Simona Vetter
  Cc: dri-devel, linux-tegra, linux-kernel, Mikko Perttunen

Host1x syncpoints are 32-bit counters that roll over by design.
To make that explicit in the code, use wrapping_* functions whenever
arithmetic is done on syncpoint values.

Atomic operations cannot be updated but a comment is added.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/host1x/cdma.c          |  3 ++-
 drivers/gpu/host1x/hw/channel_hw.c | 10 +++++++---
 drivers/gpu/host1x/intr.c          |  5 +++--
 drivers/gpu/host1x/syncpt.c        |  7 ++++++-
 drivers/gpu/host1x/syncpt.h        |  3 ++-
 5 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/host1x/cdma.c b/drivers/gpu/host1x/cdma.c
index ba2e572567c0..f6d3db2c8c39 100644
--- a/drivers/gpu/host1x/cdma.c
+++ b/drivers/gpu/host1x/cdma.c
@@ -13,6 +13,7 @@
 #include <linux/interrupt.h>
 #include <linux/kernel.h>
 #include <linux/kfifo.h>
+#include <linux/overflow.h>
 #include <linux/slab.h>
 #include <trace/events/host1x.h>
 
@@ -419,7 +420,7 @@ void host1x_cdma_update_sync_queue(struct host1x_cdma *cdma,
 		/* won't need a timeout when replayed */
 		job->timeout = 0;
 
-		syncpt_incrs = job->syncpt_end - syncpt_val;
+		syncpt_incrs = wrapping_sub(u32, job->syncpt_end, syncpt_val);
 		dev_dbg(dev, "%s: CPU incr (%d)\n", __func__, syncpt_incrs);
 
 		host1x_job_dump(dev, job);
diff --git a/drivers/gpu/host1x/hw/channel_hw.c b/drivers/gpu/host1x/hw/channel_hw.c
index 9dda73199889..a8251ec0810c 100644
--- a/drivers/gpu/host1x/hw/channel_hw.c
+++ b/drivers/gpu/host1x/hw/channel_hw.c
@@ -7,6 +7,7 @@
 
 #include <linux/host1x.h>
 #include <linux/iommu.h>
+#include <linux/overflow.h>
 #include <linux/slab.h>
 
 #include <trace/events/host1x.h>
@@ -120,7 +121,8 @@ static void submit_gathers(struct host1x_job *job, struct host1x_job_cmd *cmds,
 
 		if (cmd->is_wait) {
 			if (cmd->wait.relative)
-				threshold = job_syncpt_base + cmd->wait.threshold;
+				threshold = wrapping_add(u32, job_syncpt_base,
+							 cmd->wait.threshold);
 			else
 				threshold = cmd->wait.threshold;
 
@@ -259,7 +261,8 @@ static void channel_program_cdma(struct host1x_job *job)
 
 	/* Submit work. */
 	job->syncpt_end = host1x_syncpt_incr_max(sp, job->syncpt_incrs);
-	submit_gathers(job, job->cmds + i, job->num_cmds - i, job->syncpt_end - job->syncpt_incrs);
+	submit_gathers(job, job->cmds + i, job->num_cmds - i,
+		       wrapping_sub(u32, job->syncpt_end, job->syncpt_incrs));
 
 	/* Before releasing MLOCK, ensure engine is idle again. */
 	fence = host1x_syncpt_incr_max(sp, 1);
@@ -297,7 +300,8 @@ static void channel_program_cdma(struct host1x_job *job)
 
 	job->syncpt_end = host1x_syncpt_incr_max(sp, job->syncpt_incrs);
 
-	submit_gathers(job, job->cmds, job->num_cmds, job->syncpt_end - job->syncpt_incrs);
+	submit_gathers(job, job->cmds, job->num_cmds,
+		       wrapping_sub(u32, job->syncpt_end, job->syncpt_incrs));
 #endif
 }
 
diff --git a/drivers/gpu/host1x/intr.c b/drivers/gpu/host1x/intr.c
index f77a678949e9..f9fd8a471e60 100644
--- a/drivers/gpu/host1x/intr.c
+++ b/drivers/gpu/host1x/intr.c
@@ -7,6 +7,7 @@
 
 #include <linux/clk.h>
 #include <linux/interrupt.h>
+#include <linux/overflow.h>
 #include "dev.h"
 #include "fence.h"
 #include "intr.h"
@@ -17,7 +18,7 @@ static void host1x_intr_add_fence_to_list(struct host1x_fence_list *list,
 	struct host1x_syncpt_fence *fence_in_list;
 
 	list_for_each_entry_reverse(fence_in_list, &list->list, list) {
-		if ((s32)(fence_in_list->threshold - fence->threshold) <= 0) {
+		if ((s32)wrapping_sub(u32, fence_in_list->threshold, fence->threshold) <= 0) {
 			/* Fence in list is before us, we can insert here */
 			list_add(&fence->list, &fence_in_list->list);
 			return;
@@ -83,7 +84,7 @@ void host1x_intr_handle_interrupt(struct host1x *host, unsigned int id)
 	spin_lock(&sp->fences.lock);
 
 	list_for_each_entry_safe(fence, tmp, &sp->fences.list, list) {
-		if (((value - fence->threshold) & 0x80000000U) != 0U) {
+		if ((wrapping_sub(u32, value, fence->threshold) & 0x80000000U) != 0U) {
 			/* Fence is not yet expired, we are done */
 			break;
 		}
diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
index acc7d82e0585..9ac4f0c80728 100644
--- a/drivers/gpu/host1x/syncpt.c
+++ b/drivers/gpu/host1x/syncpt.c
@@ -8,6 +8,7 @@
 #include <linux/module.h>
 #include <linux/device.h>
 #include <linux/dma-fence.h>
+#include <linux/overflow.h>
 #include <linux/slab.h>
 
 #include <trace/events/host1x.h>
@@ -126,6 +127,10 @@ EXPORT_SYMBOL(host1x_syncpt_id);
  */
 u32 host1x_syncpt_incr_max(struct host1x_syncpt *sp, u32 incrs)
 {
+	/*
+	 * Syncpoint values are intended to be modulo 2^32, so overflow
+	 * here is intended.
+	 */
 	return (u32)atomic_add_return(incrs, &sp->max_val);
 }
 EXPORT_SYMBOL(host1x_syncpt_incr_max);
@@ -274,7 +279,7 @@ bool host1x_syncpt_is_expired(struct host1x_syncpt *sp, u32 thresh)
 
 	current_val = (u32)atomic_read(&sp->min_val);
 
-	return ((current_val - thresh) & 0x80000000U) == 0U;
+	return (wrapping_sub(u32, current_val, thresh) & 0x80000000U) == 0U;
 }
 
 int host1x_syncpt_init(struct host1x *host)
diff --git a/drivers/gpu/host1x/syncpt.h b/drivers/gpu/host1x/syncpt.h
index 4c3f3b2f0e9c..9eff42efc445 100644
--- a/drivers/gpu/host1x/syncpt.h
+++ b/drivers/gpu/host1x/syncpt.h
@@ -12,6 +12,7 @@
 #include <linux/host1x.h>
 #include <linux/kernel.h>
 #include <linux/kref.h>
+#include <linux/overflow.h>
 #include <linux/sched.h>
 
 #include "fence.h"
@@ -77,7 +78,7 @@ static inline bool host1x_syncpt_check_max(struct host1x_syncpt *sp, u32 real)
 	if (sp->client_managed)
 		return true;
 	max = host1x_syncpt_read_max(sp);
-	return (s32)(max - real) >= 0;
+	return (s32)wrapping_sub(u32, max, real) >= 0;
 }
 
 /* Return true if sync point is client managed. */

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 6/6] gpu: host1x: Annotate intentional syncpoint wrap-around
  2026-06-09  8:09 ` [PATCH 6/6] gpu: host1x: Annotate intentional syncpoint wrap-around Mikko Perttunen
@ 2026-06-09  8:29   ` sashiko-bot
  0 siblings, 0 replies; 12+ messages in thread
From: sashiko-bot @ 2026-06-09  8:29 UTC (permalink / raw)
  To: Mikko Perttunen; +Cc: dri-devel

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:

Pre-existing issues:
- [High] Direct `<` comparison of wrap-around syncpoint values leads to incorrect evaluation of job completion status during timeout recovery.
--

commit 00ebce8e032b66ba3bfbcd7a17200301254bdb05
Author: Mikko Perttunen <mperttunen@nvidia.com>

gpu: host1x: Annotate intentional syncpoint wrap-around

Host1x syncpoints are 32-bit counters that roll over by design. This commit
uses wrapping_* functions whenever arithmetic is done on syncpoint values to
make this explicit in the code.

> diff --git a/drivers/gpu/host1x/cdma.c b/drivers/gpu/host1x/cdma.c
> index ba2e572567c03..f6d3db2c8c395 100644
> --- a/drivers/gpu/host1x/cdma.c
> +++ b/drivers/gpu/host1x/cdma.c

[ ... ]

> @@ -419,7 +420,7 @@ void host1x_cdma_update_sync_queue(struct host1x_cdma *cdma,
>  		/* won't need a timeout when replayed */
>  		job->timeout = 0;
>  
> -		syncpt_incrs = job->syncpt_end - syncpt_val;
> +		syncpt_incrs = wrapping_sub(u32, job->syncpt_end, syncpt_val);

[Severity: High]
This is a pre-existing issue, but I noticed a potential problem earlier in
this same function regarding syncpt_val and job->syncpt_end wrap-around.

In host1x_cdma_update_sync_queue(), the loop over the sync_queue uses a
direct comparison to check if a job is uncompleted:

	list_for_each_entry(job, &cdma->sync_queue, list) {
		if (syncpt_val < job->syncpt_end) {

			if (!list_is_last(&job->list, &cdma->sync_queue))
				next_job = list_next_entry(job, list);

			goto syncpt_incr;
		}

If syncpt_val is near the maximum value and job->syncpt_end has wrapped
around to a small value, the condition evaluates to false. This makes the
recovery logic incorrectly assume the job has completed, skipping the CPU
increments and resuming the hardware channel without bypassing the failed
job.

Should that comparison also be updated to handle wrap-around conditions
safely?

>  		dev_dbg(dev, "%s: CPU incr (%d)\n", __func__, syncpt_incrs);
>  
>  		host1x_job_dump(dev, job);

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260609-b4-host1x-small-fixes-a-v1-0-7c1131c0b3ad@nvidia.com?part=6

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2026-06-09  8:29 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-09  8:09 [PATCH 0/6] Miscellaneous fixes for the Host1x driver Mikko Perttunen
2026-06-09  8:09 ` [PATCH 1/6] gpu: host1x: Wait for timeout worker completion on channel free Mikko Perttunen
2026-06-09  8:24   ` sashiko-bot
2026-06-09  8:09 ` [PATCH 2/6] gpu: host1x: Avoid double device_add when clients already present Mikko Perttunen
2026-06-09  8:20   ` sashiko-bot
2026-06-09  8:09 ` [PATCH 3/6] gpu: host1x: Fix offset calculation in trace_write_gather Mikko Perttunen
2026-06-09  8:26   ` sashiko-bot
2026-06-09  8:09 ` [PATCH 4/6] gpu: host1x: Avoid stack over-read in debug output helpers Mikko Perttunen
2026-06-09  8:09 ` [PATCH 5/6] gpu: host1x: Change pin_job() return type to int Mikko Perttunen
2026-06-09  8:29   ` sashiko-bot
2026-06-09  8:09 ` [PATCH 6/6] gpu: host1x: Annotate intentional syncpoint wrap-around Mikko Perttunen
2026-06-09  8:29   ` sashiko-bot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.