dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
From: Philipp Stanner <phasta@mailbox.org>
To: "Christian König" <ckoenig.leichtzumerken@gmail.com>,
	simona.vetter@ffwll.ch, tvrtko.ursulin@igalia.com,
	airlied@gmail.com, dakr@kernel.org, sumit.semwal@linaro.org,
	dri-devel@lists.freedesktop.org, linux-media@vger.kernel.org
Subject: Re: [PATCH 2/2] dma-buf: add warning when dma_fence is signaled from IOCTL
Date: Thu, 04 Sep 2025 14:45:25 +0200	[thread overview]
Message-ID: <b6c09af972e7d4a9cb1c9d6621f77f923c3a8a56.camel@mailbox.org> (raw)
In-Reply-To: <20250812143402.8619-2-christian.koenig@amd.com>

On Tue, 2025-08-12 at 16:34 +0200, Christian König wrote:
> From: Christian König <ckoenig@able.fritz.box>
> 
> We have the re-occurring problem that people try to invent a
> DMA-fences implementation which signals fences based on an userspace
> IOCTL.
> 
> This is well known as source of hard to track down crashes and is
> documented to be an invalid approach. The problem is that it seems
> to work during initial testing and only long term tests points out
> why this can never work correctly.
> 
> So give at least a warning when people try to signal a fence from
> task context and not from interrupts or a work item. This check is
> certainly not perfect but better than nothing.
> 
> Signed-off-by: Christian König <ckoenig@able.fritz.box>
> ---
>  drivers/dma-buf/dma-fence.c | 59 +++++++++++++++++++++++++++----------
>  include/linux/dma-fence.h   |  9 ++++--
>  2 files changed, 51 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> index 3f78c56b58dc..2bce620eacac 100644
> --- a/drivers/dma-buf/dma-fence.c
> +++ b/drivers/dma-buf/dma-fence.c
> @@ -345,33 +345,23 @@ void __dma_fence_might_wait(void)
>  }
>  #endif
>  
> -
>  /**
> - * dma_fence_signal_timestamp_locked - signal completion of a fence
> + * dma_fence_signal_internal - internal signal completion of a fence
>   * @fence: the fence to signal
>   * @timestamp: fence signal timestamp in kernel's CLOCK_MONOTONIC time domain
>   *
> - * Signal completion for software callbacks on a fence, this will unblock
> - * dma_fence_wait() calls and run all the callbacks added with
> - * dma_fence_add_callback(). Can be called multiple times, but since a fence
> - * can only go from the unsignaled to the signaled state and not back, it will
> - * only be effective the first time. Set the timestamp provided as the fence
> - * signal timestamp.
> - *
> - * Unlike dma_fence_signal_timestamp(), this function must be called with
> - * &dma_fence.lock held.
> + * Internal signal the dma_fence without error checking. Should *NEVER* be used
> + * by drivers or external code directly.
>   *
>   * Returns 0 on success and a negative error value when @fence has been
>   * signalled already.
>   */
> -int dma_fence_signal_timestamp_locked(struct dma_fence *fence,
> -				      ktime_t timestamp)
> +int dma_fence_signal_internal(struct dma_fence *fence, ktime_t timestamp)
>  {
>  	struct dma_fence_cb *cur, *tmp;
>  	struct list_head cb_list;
>  
>  	lockdep_assert_held(fence->lock);
> -
>  	if (unlikely(test_and_set_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
>  				      &fence->flags)))
>  		return -EINVAL;
> @@ -390,7 +380,46 @@ int dma_fence_signal_timestamp_locked(struct dma_fence *fence,
>  
>  	return 0;
>  }
> -EXPORT_SYMBOL(dma_fence_signal_timestamp_locked);
> +EXPORT_SYMBOL(dma_fence_signal_internal);
> +
> +/**
> + * dma_fence_signal_timestamp_locked - signal completion of a fence
> + * @fence: the fence to signal
> + * @timestamp: fence signal timestamp in kernel's CLOCK_MONOTONIC time domain
> + *
> + * Signal completion for software callbacks on a fence, this will unblock
> + * dma_fence_wait() calls and run all the callbacks added with
> + * dma_fence_add_callback(). Can be called multiple times, but since a fence
> + * can only go from the unsignaled to the signaled state and not back, it will
> + * only be effective the first time. Set the timestamp provided as the fence
> + * signal timestamp.
> + *
> + * Unlike dma_fence_signal_timestamp(), this function must be called with
> + * &dma_fence.lock held.
> + *
> + * Returns 0 on success and a negative error value when @fence has been
> + * signalled already.
> + */
> +int dma_fence_signal_timestamp_locked(struct dma_fence *fence,
> +				      ktime_t timestamp)
> +{
> +	/*
> +	 * We have the re-occurring problem that people try to invent a
> +	 * DMA-fences implementation which signals fences based on an userspace
> +	 * IOCTL.
> +	 *
> +	 * This is well known as source of hard to track down crashes and is
> +	 * documented to be an invalid approach. The problem is that it seems
> +	 * to work during initial testing and only long term tests points out
> +	 * why this can never work correctly.
> +	 *
> +	 * So give at least a warning when people try to signal a fence from
> +	 * task context and not from interrupts or a work item. This check is
> +	 * certainly not perfect but better than nothing.
> +	 */
> +	WARN_ON_ONCE(!in_interrupt() && !current_work());

I only just realized that this could cause false-positive warnings
should a driver tear down drm/sched with drm_sched_fini() plus
drm_sched_backend_ops.cancel_job(). This signals left over dma_fences
with an error.

And drm_sched_fini() doesn't usually run in work items, does it. It can
be invoked through ioctls that destroy a channel and the associated
scheduler, with rmmod etc.

cancel_job() is (for now) only used in the unit tests since we recently
had to revert the usage in Nouveau (because of a different problem). I
intend to add it back there, though; and we established cancel_job() as
the way to deal with drm/sched's memory leaks in lengthy discussions.


I think that warning should not be added. False positive warnings in
dmesg would confuse users, cause unnecessary support tickets etc.


P.


> +	return dma_fence_signal_internal(fence, timestamp);
> +}
>  
>  /**
>   * dma_fence_signal_timestamp - signal completion of a fence
> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
> index 64639e104110..8dbcd66989b8 100644
> --- a/include/linux/dma-fence.h
> +++ b/include/linux/dma-fence.h
> @@ -369,6 +369,7 @@ int dma_fence_signal_locked(struct dma_fence *fence);
>  int dma_fence_signal_timestamp(struct dma_fence *fence, ktime_t timestamp);
>  int dma_fence_signal_timestamp_locked(struct dma_fence *fence,
>  				      ktime_t timestamp);
> +int dma_fence_signal_internal(struct dma_fence *fence, ktime_t timestamp);
>  signed long dma_fence_default_wait(struct dma_fence *fence,
>  				   bool intr, signed long timeout);
>  int dma_fence_add_callback(struct dma_fence *fence,
> @@ -422,7 +423,7 @@ dma_fence_is_signaled_locked(struct dma_fence *fence)
>  		return true;
>  
>  	if (fence->ops->signaled && fence->ops->signaled(fence)) {
> -		dma_fence_signal_locked(fence);
> +		dma_fence_signal_internal(fence, ktime_get());
>  		return true;
>  	}
>  
> @@ -448,11 +449,15 @@ dma_fence_is_signaled_locked(struct dma_fence *fence)
>  static inline bool
>  dma_fence_is_signaled(struct dma_fence *fence)
>  {
> +	unsigned long flags;
> +
>  	if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
>  		return true;
>  
>  	if (fence->ops->signaled && fence->ops->signaled(fence)) {
> -		dma_fence_signal(fence);
> +		spin_lock_irqsave(fence->lock, flags);
> +		dma_fence_signal_internal(fence, ktime_get());
> +		spin_unlock_irqrestore(fence->lock, flags);
>  		return true;
>  	}
>  


      parent reply	other threads:[~2025-09-04 12:45 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-12 14:34 [PATCH 1/2] dma-buf/sw_sync: put fence signaling into work item Christian König
2025-08-12 14:34 ` [PATCH 2/2] dma-buf: add warning when dma_fence is signaled from IOCTL Christian König
2025-08-13  8:16   ` Philipp Stanner
2025-08-13  9:18     ` Christian König
2025-08-13  8:20   ` Tvrtko Ursulin
2025-08-13 11:10     ` Christian König
2025-08-13 11:59       ` Tvrtko Ursulin
2025-08-21 12:46         ` Christian König
2025-08-22  8:08           ` Tvrtko Ursulin
2025-08-20  7:27   ` kernel test robot
2025-08-21 13:00     ` Christian König
2025-09-01 11:14       ` Tvrtko Ursulin
2025-09-04 12:45   ` Philipp Stanner [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b6c09af972e7d4a9cb1c9d6621f77f923c3a8a56.camel@mailbox.org \
    --to=phasta@mailbox.org \
    --cc=airlied@gmail.com \
    --cc=ckoenig.leichtzumerken@gmail.com \
    --cc=dakr@kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=linux-media@vger.kernel.org \
    --cc=phasta@kernel.org \
    --cc=simona.vetter@ffwll.ch \
    --cc=sumit.semwal@linaro.org \
    --cc=tvrtko.ursulin@igalia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).