From: sashiko-bot@kernel.org
To: "Mikko Perttunen" <mperttunen@nvidia.com>
Cc: dri-devel@lists.freedesktop.org
Subject: Re: [PATCH 1/6] gpu: host1x: Wait for timeout worker completion on channel free
Date: Tue, 09 Jun 2026 08:24:17 +0000 [thread overview]
Message-ID: <20260609082417.C84081F00893@smtp.kernel.org> (raw)
In-Reply-To: <20260609-b4-host1x-small-fixes-a-v1-1-7c1131c0b3ad@nvidia.com>
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
Pre-existing issues:
- [High] Missing teardown of `cdma->update_work` during channel destruction causes workqueue corruption and use-after-free races.
- [High] Spurious timeout handler execution aborts newly-started jobs due to unsafe non-synchronous timer cancellation in `update_cdma_locked()`.
--
commit 716b1c0819a18e8bf5a813661ef56163eb1114aa
Author: Mikko Perttunen <mperttunen@nvidia.com>
gpu: host1x: Wait for timeout worker completion on channel free
This commit addresses a race condition during CDMA teardown. By replacing
cancel_delayed_work() with cancel_delayed_work_sync(), it ensures pending
timeout work completes before the CDMA channel is freed. This prevents the
timeout handler from waiting on a destroyed lock.
> diff --git a/drivers/gpu/host1x/hw/cdma_hw.c b/drivers/gpu/host1x/hw/cdma_hw.c
> index 3f3f0018eee0d..ab714d221120c 100644
> --- a/drivers/gpu/host1x/hw/cdma_hw.c
> +++ b/drivers/gpu/host1x/hw/cdma_hw.c
> @@ -355,7 +355,7 @@ static int cdma_timeout_init(struct host1x_cdma *cdma)
> static void cdma_timeout_destroy(struct host1x_cdma *cdma)
> {
> if (cdma->timeout.initialized)
> - cancel_delayed_work(&cdma->timeout.wq);
> + cancel_delayed_work_sync(&cdma->timeout.wq);
[Severity: High]
This isn't a bug introduced by this patch, but does the missing teardown of
cdma->update_work during channel destruction cause workqueue corruption and
use-after-free races?
When release_channel() calls host1x_cdma_deinit(), it destroys the push
buffer but fails to call cancel_work_sync(&cdma->update_work):
drivers/gpu/host1x/cdma.c:host1x_cdma_deinit() {
...
host1x_pushbuffer_destroy(pb);
host1x_hw_cdma_timeout_destroy(host1x, cdma);
return 0;
}
The cdma->update_work can be scheduled asynchronously by a hardware syncpoint
interrupt. If the channel is immediately reallocated to a new client,
host1x_cdma_init() unconditionally executes INIT_WORK(&cdma->update_work, ...).
Calling INIT_WORK() on an actively queued or running work struct can corrupt
the workqueue linked lists. Additionally, if the stray cdma_update_work()
executes after the push buffer is destroyed, can it dereference a NULL
cdma->push_buffer.mapped?
[Severity: High]
This is a pre-existing issue, but does spurious timeout handler execution
abort newly-started jobs due to unsafe non-synchronous timer cancellation in
update_cdma_locked()?
When a job finishes, stop_cdma_timer_locked() uses non-sync
cancel_delayed_work() under a lock, leaving the executing handler to run:
drivers/gpu/host1x/cdma.c:stop_cdma_timer_locked() {
cancel_delayed_work(&cdma->timeout.wq);
cdma->timeout.client = NULL;
}
If the timeout handler cdma_timeout_handler() has already started on another
CPU, it blocks waiting for cdma->lock. The current thread then immediately
starts the next job, updating cdma->timeout.client and syncpt_val, and
releases the lock.
The stale timeout handler wakes up, sees cdma->timeout.client is not NULL,
and checks if the NEW job has completed. Since it hasn't, the handler
spuriously assumes a timeout occurred, freezes the CDMA channel, and aborts
the new job.
Is a generation sequence number or explicit time check against start_ktime in
the handler required to fix this race safely, since using
cancel_delayed_work_sync() here would cause a deadlock?
>
> cdma->timeout.initialized = false;
> }
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260609-b4-host1x-small-fixes-a-v1-0-7c1131c0b3ad@nvidia.com?part=1
next prev parent reply other threads:[~2026-06-09 8:24 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-09 8:09 [PATCH 0/6] Miscellaneous fixes for the Host1x driver Mikko Perttunen
2026-06-09 8:09 ` [PATCH 1/6] gpu: host1x: Wait for timeout worker completion on channel free Mikko Perttunen
2026-06-09 8:24 ` sashiko-bot [this message]
2026-06-09 8:09 ` [PATCH 2/6] gpu: host1x: Avoid double device_add when clients already present Mikko Perttunen
2026-06-09 8:20 ` sashiko-bot
2026-06-09 8:09 ` [PATCH 3/6] gpu: host1x: Fix offset calculation in trace_write_gather Mikko Perttunen
2026-06-09 8:26 ` sashiko-bot
2026-06-09 8:09 ` [PATCH 4/6] gpu: host1x: Avoid stack over-read in debug output helpers Mikko Perttunen
2026-06-09 8:09 ` [PATCH 5/6] gpu: host1x: Change pin_job() return type to int Mikko Perttunen
2026-06-09 8:29 ` sashiko-bot
2026-06-09 8:09 ` [PATCH 6/6] gpu: host1x: Annotate intentional syncpoint wrap-around Mikko Perttunen
2026-06-09 8:29 ` sashiko-bot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260609082417.C84081F00893@smtp.kernel.org \
--to=sashiko-bot@kernel.org \
--cc=dri-devel@lists.freedesktop.org \
--cc=mperttunen@nvidia.com \
--cc=sashiko-reviews@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.