* [PATCH] drm/nouveau: Document weird looking bugfix
@ 2026-06-10 8:26 Philipp Stanner
2026-06-10 9:12 ` Christian König
2026-06-10 12:12 ` Tvrtko Ursulin
0 siblings, 2 replies; 3+ messages in thread
From: Philipp Stanner @ 2026-06-10 8:26 UTC (permalink / raw)
To: Lyude Paul, Danilo Krummrich, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
Christian König
Cc: dri-devel, nouveau, linux-kernel, linux-media, Philipp Stanner
commit c8a5d5ea3ba6 ("nouveau: fix client work fence deletion race")
fixed a race. To do so, it replaced the automatically locking
dma_fence_is_signaled() with manual locks plus
dma_fence_is_signaled_locked().
For someone browsing through the code, this reads very much like a
cleanup or rework leftover. Future contributors and / or new maintainers
not familiar with the history might be tempted to remove that bugfix.
Document the bugfix.
Signed-off-by: Philipp Stanner <phasta@kernel.org>
---
(I did not test this)
---
drivers/gpu/drm/nouveau/nouveau_drm.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
index 42a81166f3a9..519a0c164a72 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
@@ -159,6 +159,13 @@ nouveau_cli_work_ready(struct dma_fence *fence)
unsigned long flags;
bool ret = true;
+ /*
+ * This is not a cleanup / rework leftover, but a bugfix to prevent a
+ * race with someone signalling the fence. The locked
+ * dma_fence_is_signaled() cannot be used. The dma_fence implementation
+ * is not fully synchronized with locks, but also uses atomic bits,
+ * which can cause the dma_fence_put() below to be executed too soon.
+ */
dma_fence_lock_irqsave(fence, flags);
if (!dma_fence_is_signaled_locked(fence))
ret = false;
--
2.54.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] drm/nouveau: Document weird looking bugfix
2026-06-10 8:26 [PATCH] drm/nouveau: Document weird looking bugfix Philipp Stanner
@ 2026-06-10 9:12 ` Christian König
2026-06-10 12:12 ` Tvrtko Ursulin
1 sibling, 0 replies; 3+ messages in thread
From: Christian König @ 2026-06-10 9:12 UTC (permalink / raw)
To: Philipp Stanner, Lyude Paul, Danilo Krummrich, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
Sumit Semwal
Cc: dri-devel, nouveau, linux-kernel, linux-media
On 6/10/26 10:26, Philipp Stanner wrote:
> commit c8a5d5ea3ba6 ("nouveau: fix client work fence deletion race")
> fixed a race. To do so, it replaced the automatically locking
> dma_fence_is_signaled() with manual locks plus
> dma_fence_is_signaled_locked().
>
> For someone browsing through the code, this reads very much like a
> cleanup or rework leftover. Future contributors and / or new maintainers
> not familiar with the history might be tempted to remove that bugfix.
>
> Document the bugfix.
>
> Signed-off-by: Philipp Stanner <phasta@kernel.org>
Reviewed-by: Christian König <christian.koenig@amd.com>
> ---
> (I did not test this)
> ---
> drivers/gpu/drm/nouveau/nouveau_drm.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
> index 42a81166f3a9..519a0c164a72 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_drm.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
> @@ -159,6 +159,13 @@ nouveau_cli_work_ready(struct dma_fence *fence)
> unsigned long flags;
> bool ret = true;
>
> + /*
> + * This is not a cleanup / rework leftover, but a bugfix to prevent a
> + * race with someone signalling the fence. The locked
> + * dma_fence_is_signaled() cannot be used. The dma_fence implementation
> + * is not fully synchronized with locks, but also uses atomic bits,
> + * which can cause the dma_fence_put() below to be executed too soon.
> + */
> dma_fence_lock_irqsave(fence, flags);
> if (!dma_fence_is_signaled_locked(fence))
> ret = false;
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] drm/nouveau: Document weird looking bugfix
2026-06-10 8:26 [PATCH] drm/nouveau: Document weird looking bugfix Philipp Stanner
2026-06-10 9:12 ` Christian König
@ 2026-06-10 12:12 ` Tvrtko Ursulin
1 sibling, 0 replies; 3+ messages in thread
From: Tvrtko Ursulin @ 2026-06-10 12:12 UTC (permalink / raw)
To: Philipp Stanner, Lyude Paul, Danilo Krummrich, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
Sumit Semwal, Christian König
Cc: dri-devel, nouveau, linux-kernel, linux-media
On 10/06/2026 09:26, Philipp Stanner wrote:
> commit c8a5d5ea3ba6 ("nouveau: fix client work fence deletion race")
> fixed a race. To do so, it replaced the automatically locking
> dma_fence_is_signaled() with manual locks plus
> dma_fence_is_signaled_locked().
>
> For someone browsing through the code, this reads very much like a
> cleanup or rework leftover. Future contributors and / or new maintainers
> not familiar with the history might be tempted to remove that bugfix.
>
> Document the bugfix.
>
> Signed-off-by: Philipp Stanner <phasta@kernel.org>
> ---
> (I did not test this)
> ---
> drivers/gpu/drm/nouveau/nouveau_drm.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
> index 42a81166f3a9..519a0c164a72 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_drm.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
> @@ -159,6 +159,13 @@ nouveau_cli_work_ready(struct dma_fence *fence)
> unsigned long flags;
> bool ret = true;
>
> + /*
> + * This is not a cleanup / rework leftover, but a bugfix to prevent a
> + * race with someone signalling the fence. The locked
> + * dma_fence_is_signaled() cannot be used. The dma_fence implementation
> + * is not fully synchronized with locks, but also uses atomic bits,
> + * which can cause the dma_fence_put() below to be executed too soon.
> + */
IMHO it would also be interesting to document why this happens from the
nouveau point of view.
For example I see the two references held on this fences in the call
chain, but apparently neither are enough to close the race. Which
suggests a third party has a pointer to this fence but with no reference.
I talk about this:
nouveau_gem_object_unmap -> nouveau_cli_work_queue
There it grabs a reference before queing the worker. In the worker it
drops it before calling the callback nouveau_gem_object_unmap installed:
static void
nouveau_cli_work(struct work_struct *w)
{
struct nouveau_cli *cli = container_of(w, typeof(*cli), work);
struct nouveau_cli_work *work, *wtmp;
mutex_lock(&cli->lock);
list_for_each_entry_safe(work, wtmp, &cli->worker, head) {
if (!work->fence || nouveau_cli_work_ready(work->fence)) {
... nouveau_cli_work_ready can drop one reference
list_del(&work->head);
work->func(work);
... then work->func was set to nouveau_gem_object_delete_work by
nouveau_gem_object_unmap, which will end up calling:
nouveau_gem_object_delete -> nouveau_fence_unref
On possibly the same fence.
So if there a path inside nouveau itself which signals the fence without
holding a reference then could be it that the problem is self-inflicted
and not due a dma-fence quirks?
I am not entirely sure since it is not very clear. It needs someone with
nouveau expertise to clarify.
Regards,
Tvrtko
> dma_fence_lock_irqsave(fence, flags);
> if (!dma_fence_is_signaled_locked(fence))
> ret = false;
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-06-10 12:12 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-10 8:26 [PATCH] drm/nouveau: Document weird looking bugfix Philipp Stanner
2026-06-10 9:12 ` Christian König
2026-06-10 12:12 ` Tvrtko Ursulin
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.