All of lore.kernel.org
 help / color / mirror / Atom feed
From: sashiko-bot@kernel.org
To: "Boris Brezillon" <boris.brezillon@collabora.com>
Cc: dri-devel@lists.freedesktop.org
Subject: Re: [PATCH 02/11] drm/panthor: Keep the reset work disabled until everything is initialized
Date: Thu, 25 Jun 2026 12:59:49 +0000	[thread overview]
Message-ID: <20260625125950.B03D91F00A3E@smtp.kernel.org> (raw)
In-Reply-To: <20260625-panthor-misc-fixes-v1-2-b67ed973fea6@collabora.com>

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:

New issues:
- [High] Disabling the reset work during initialization causes asynchronous reset requests to be silently dropped, permanently wedging the GPU by leaving the pending flag stuck at 1.

Pre-existing issues:
- [High] The reset work remains enabled during the error teardown path if `drm_dev_register` fails, allowing concurrent reset work to access sub-components as they are destroyed.
--

commit d88d89caab294e4f6ec4a893f2e535ca95fb0890
Author: Boris Brezillon <boris.brezillon@collabora.com>

drm/panthor: Keep the reset work disabled until everything is initialized

The reset work will sub-component reset helpers, which might not be
ready if the reset happens during initialization, leading to NULL
pointer dereferences or worse. Avoid that by keeping the reset work
disabled while we're initializing those sub-components.

Closes: https://sashiko.dev/#/patchset/20260625-panthor-signal-from-irq-v5-0-8836a74e0ef9@collabora.com?part=4

> diff --git a/drivers/gpu/drm/panthor/panthor_device.c b/drivers/gpu/drm/panthor/panthor_device.c
> --- a/drivers/gpu/drm/panthor/panthor_device.c
> +++ b/drivers/gpu/drm/panthor/panthor_device.c
> @@ -207,6 +207,7 @@ int panthor_device_init(struct panthor_device *ptdev)
>  	*dummy_page_virt = 1;
>  
>  	INIT_WORK(&ptdev->reset.work, panthor_device_reset_work);
> +	disable_work(&ptdev->reset.work);
>  	ptdev->reset.wq = alloc_ordered_workqueue("panthor-reset-wq", 0);
>  	if (!ptdev->reset.wq)
>  		return -ENOMEM;

[ ... ]

> @@ -285,6 +286,9 @@ int panthor_device_init(struct panthor_device *ptdev)
>  
>  	panthor_gem_init(ptdev);
>  
> +	/* Now that everything is initialized, we can enable the reset work. */
> +	enable_work(&ptdev->reset.work);
> +
>  	/* ~3 frames */
>  	pm_runtime_set_autosuspend_delay(ptdev->base.dev, 50);
>  	pm_runtime_use_autosuspend(ptdev->base.dev);

[Severity: High]
If a reset is requested while the work is disabled (e.g., due to a
firmware timeout or early MMU fault during driver probe), does the device
become permanently wedged?

Looking at panthor_device_schedule_reset(), it relies on queue_work()
succeeding after atomic_cmpxchg() sets the pending flag:

panthor_device_schedule_reset() {
    if (!atomic_cmpxchg(&ptdev->reset.pending, 0, 1) &&
        atomic_read(&ptdev->pm.state) == PANTHOR_DEVICE_PM_STATE_ACTIVE) {
        queue_work(ptdev->reset.wq, &ptdev->reset.work);
    }
}

Because the work is disabled via disable_work() earlier, queue_work()
fails and the work is never queued.

When enable_work() is called here, it decrements the disable count but
does not requeue the dropped work or clear the pending flag. Because
ptdev->reset.pending remains 1, will all future reset requests fail the
atomic_cmpxchg() check, leaving the GPU unable to recover?

[Severity: High]
This isn't a bug introduced by this patch, but if drm_dev_register()
fails shortly after this, does the reset work remain enabled during the
teardown sequence?

If drm_dev_register() fails, the driver jumps to err_disable_autosuspend
to destroy sub-components sequentially. Because drm_dev_unplug() is not
called in this explicit error path, dev->unplugged remains false.

If a concurrent FW timeout or fault triggers during this teardown sequence,
any concurrently scheduled reset work will pass the drm_dev_enter() check
in panthor_device_reset_work():

panthor_device_reset_work() {
    if (!drm_dev_enter(&ptdev->base, &cookie))
        return;

    panthor_sched_pre_reset(ptdev);
    panthor_fw_pre_reset(ptdev, true);
    ...
}

Could this cause the reset work to interact with sub-components that are
actively being destroyed, leading to use-after-free or NULL pointer
dereferences?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260625-panthor-misc-fixes-v1-0-b67ed973fea6@collabora.com?part=2

  reply	other threads:[~2026-06-25 12:59 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-25 12:40 [PATCH 00/11] drm/panthor: Misc fixes for bugs found by shashiko Boris Brezillon
2026-06-25 12:40 ` [PATCH 01/11] drm/panthor: Always use the IRQ-safe variant when acquiring the fence lock Boris Brezillon
2026-06-25 12:51   ` sashiko-bot
2026-06-26  9:11   ` Liviu Dudau
2026-06-25 12:40 ` [PATCH 02/11] drm/panthor: Keep the reset work disabled until everything is initialized Boris Brezillon
2026-06-25 12:59   ` sashiko-bot [this message]
2026-06-26  9:13   ` Liviu Dudau
2026-06-25 12:40 ` [PATCH 03/11] drm/panthor: Fix UAF on works queued to panthor_cleanup_wq Boris Brezillon
2026-06-25 12:56   ` sashiko-bot
2026-06-25 14:20   ` Boris Brezillon
2026-06-25 12:40 ` [PATCH 04/11] drm/panthor: Fix potential invalid pointer deref in group_process_tiler_oom() Boris Brezillon
2026-06-25 12:54   ` sashiko-bot
2026-06-26  9:14   ` Liviu Dudau
2026-06-25 12:40 ` [PATCH 05/11] drm/panthor: Fix theoretical IOMEM access in suspended state Boris Brezillon
2026-06-26  9:29   ` Liviu Dudau
2026-06-26 11:40     ` Boris Brezillon
2026-06-26 13:13       ` Liviu Dudau
2026-06-25 12:40 ` [PATCH 06/11] drm/panthor: Don't overrule pending immediate ticks in sched_resume_tick() Boris Brezillon
2026-06-26 12:45   ` Liviu Dudau
2026-06-26 13:19     ` Boris Brezillon
2026-06-25 12:40 ` [PATCH 07/11] drm/panthor: Fix panthor_pwr_unplug() Boris Brezillon
2026-06-26 12:42   ` Liviu Dudau
2026-06-25 12:40 ` [PATCH 08/11] drm/panthor: Drop a needless check in panthor_fw_unplug() Boris Brezillon
2026-06-25 12:53   ` sashiko-bot
2026-06-26 13:11   ` Liviu Dudau
2026-06-25 12:40 ` [PATCH 09/11] drm/panthor: Fix a leak when a group is evicted before the tiler OOM is serviced Boris Brezillon
2026-06-26 13:12   ` Liviu Dudau
2026-06-25 12:40 ` [PATCH 10/11] drm/panthor: Interrupt group start/resumption if group_bind_locked() fails Boris Brezillon
2026-06-26 13:14   ` Liviu Dudau
2026-06-25 12:40 ` [PATCH 11/11] drm/panthor: Keep interrupts masked until they are needed Boris Brezillon
2026-06-26 13:18   ` Liviu Dudau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260625125950.B03D91F00A3E@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=boris.brezillon@collabora.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=sashiko-reviews@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.