public inbox for linux-media@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] dma-fence: Move signalling tracepoint to before ops detach
@ 2026-03-30 13:36 Tvrtko Ursulin
  2026-03-30 14:24 ` Philipp Stanner
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Tvrtko Ursulin @ 2026-03-30 13:36 UTC (permalink / raw)
  To: dri-devel
  Cc: kernel-dev, Tvrtko Ursulin, Christian König, Philipp Stanner,
	Boris Brezillon, linux-media, linaro-mm-sig

Move the signalling tracepoint to before fence->ops are reset otherwise
tracepoint will dereference a null pointer.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Fixes: 541c8f2468b9 ("dma-buf: detach fence ops on signal v3")
Cc: Christian König <christian.koenig@amd.com>
Cc: Philipp Stanner <phasta@kernel.org>
Cc: Boris Brezillon <boris.brezillon@collabora.com>
Cc: linux-media@vger.kernel.org
Cc: linaro-mm-sig@lists.linaro.org
---
 drivers/dma-buf/dma-fence.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index 1826ba73094c..1c1eaecaf1b0 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -363,6 +363,8 @@ void dma_fence_signal_timestamp_locked(struct dma_fence *fence,
 				      &fence->flags)))
 		return;
 
+	trace_dma_fence_signaled(fence);
+
 	/*
 	 * When neither a release nor a wait operation is specified set the ops
 	 * pointer to NULL to allow the fence structure to become independent
@@ -377,7 +379,6 @@ void dma_fence_signal_timestamp_locked(struct dma_fence *fence,
 
 	fence->timestamp = timestamp;
 	set_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, &fence->flags);
-	trace_dma_fence_signaled(fence);
 
 	list_for_each_entry_safe(cur, tmp, &cb_list, node) {
 		INIT_LIST_HEAD(&cur->node);
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] dma-fence: Move signalling tracepoint to before ops detach
  2026-03-30 13:36 [PATCH] dma-fence: Move signalling tracepoint to before ops detach Tvrtko Ursulin
@ 2026-03-30 14:24 ` Philipp Stanner
  2026-03-31  7:49 ` Boris Brezillon
  2026-03-31  9:19 ` Christian König
  2 siblings, 0 replies; 9+ messages in thread
From: Philipp Stanner @ 2026-03-30 14:24 UTC (permalink / raw)
  To: Tvrtko Ursulin, dri-devel
  Cc: kernel-dev, Christian König, Philipp Stanner,
	Boris Brezillon, linux-media, linaro-mm-sig

IMO the title should state that this fixes a NULL ptr deref, since that's very significant.

On Mon, 2026-03-30 at 14:36 +0100, Tvrtko Ursulin wrote:
> Move the signalling tracepoint to before fence->ops are reset otherwise
> tracepoint will dereference a null pointer.

Can't fully follow; you're talking about the fence ops detachment for
signaled fences?

> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> Fixes: 541c8f2468b9 ("dma-buf: detach fence ops on signal v3")
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Philipp Stanner <phasta@kernel.org>
> Cc: Boris Brezillon <boris.brezillon@collabora.com>
> Cc: linux-media@vger.kernel.org
> Cc: linaro-mm-sig@lists.linaro.org
> ---
>  drivers/dma-buf/dma-fence.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> index 1826ba73094c..1c1eaecaf1b0 100644
> --- a/drivers/dma-buf/dma-fence.c
> +++ b/drivers/dma-buf/dma-fence.c
> @@ -363,6 +363,8 @@ void dma_fence_signal_timestamp_locked(struct dma_fence *fence,
>  				      &fence->flags)))
>  		return;
>  
> +	trace_dma_fence_signaled(fence);
> +

If it's about touching the ops, the decisive action would be the
IS_SIGNALED flag, wouldn't it? So trace_dma_fence_signaleld() should be
above the flag test?


P.

>  	/*
>  	 * When neither a release nor a wait operation is specified set the ops
>  	 * pointer to NULL to allow the fence structure to become independent
> @@ -377,7 +379,6 @@ void dma_fence_signal_timestamp_locked(struct dma_fence *fence,
>  
>  	fence->timestamp = timestamp;
>  	set_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, &fence->flags);
> -	trace_dma_fence_signaled(fence);
>  
>  	list_for_each_entry_safe(cur, tmp, &cb_list, node) {
>  		INIT_LIST_HEAD(&cur->node);


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] dma-fence: Move signalling tracepoint to before ops detach
  2026-03-30 13:36 [PATCH] dma-fence: Move signalling tracepoint to before ops detach Tvrtko Ursulin
  2026-03-30 14:24 ` Philipp Stanner
@ 2026-03-31  7:49 ` Boris Brezillon
  2026-04-09 13:58   ` Tvrtko Ursulin
  2026-03-31  9:19 ` Christian König
  2 siblings, 1 reply; 9+ messages in thread
From: Boris Brezillon @ 2026-03-31  7:49 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: dri-devel, kernel-dev, Christian König, Philipp Stanner,
	linux-media, linaro-mm-sig

On Mon, 30 Mar 2026 14:36:23 +0100
Tvrtko Ursulin <tvrtko.ursulin@igalia.com> wrote:

> Move the signalling tracepoint to before fence->ops are reset otherwise
> tracepoint will dereference a null pointer.

I suspect other trace points are impacted too
(trace_dma_fence_destroy() is, at the very least).

> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> Fixes: 541c8f2468b9 ("dma-buf: detach fence ops on signal v3")
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Philipp Stanner <phasta@kernel.org>
> Cc: Boris Brezillon <boris.brezillon@collabora.com>
> Cc: linux-media@vger.kernel.org
> Cc: linaro-mm-sig@lists.linaro.org
> ---
>  drivers/dma-buf/dma-fence.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> index 1826ba73094c..1c1eaecaf1b0 100644
> --- a/drivers/dma-buf/dma-fence.c
> +++ b/drivers/dma-buf/dma-fence.c
> @@ -363,6 +363,8 @@ void dma_fence_signal_timestamp_locked(struct dma_fence *fence,
>  				      &fence->flags)))
>  		return;
>  
> +	trace_dma_fence_signaled(fence);
> +
>  	/*
>  	 * When neither a release nor a wait operation is specified set the ops
>  	 * pointer to NULL to allow the fence structure to become independent
> @@ -377,7 +379,6 @@ void dma_fence_signal_timestamp_locked(struct dma_fence *fence,
>  
>  	fence->timestamp = timestamp;
>  	set_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, &fence->flags);
> -	trace_dma_fence_signaled(fence);
>  
>  	list_for_each_entry_safe(cur, tmp, &cb_list, node) {
>  		INIT_LIST_HEAD(&cur->node);


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] dma-fence: Move signalling tracepoint to before ops detach
  2026-03-30 13:36 [PATCH] dma-fence: Move signalling tracepoint to before ops detach Tvrtko Ursulin
  2026-03-30 14:24 ` Philipp Stanner
  2026-03-31  7:49 ` Boris Brezillon
@ 2026-03-31  9:19 ` Christian König
  2 siblings, 0 replies; 9+ messages in thread
From: Christian König @ 2026-03-31  9:19 UTC (permalink / raw)
  To: Tvrtko Ursulin, dri-devel
  Cc: kernel-dev, Philipp Stanner, Boris Brezillon, linux-media,
	linaro-mm-sig

On 3/30/26 15:36, Tvrtko Ursulin wrote:
> Move the signalling tracepoint to before fence->ops are reset otherwise
> tracepoint will dereference a null pointer.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> Fixes: 541c8f2468b9 ("dma-buf: detach fence ops on signal v3")
> Cc: Christian König <christian.koenig@amd.com>
> Cc: Philipp Stanner <phasta@kernel.org>
> Cc: Boris Brezillon <boris.brezillon@collabora.com>
> Cc: linux-media@vger.kernel.org
> Cc: linaro-mm-sig@lists.linaro.org

Good catch. I just silently assumed that the fence is not signaled when we traced signaling.

Going to take another look since there might be more problems, but for now:

Reviewed-by: Christian König <christian.koenig@amd.com>

> ---
>  drivers/dma-buf/dma-fence.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> index 1826ba73094c..1c1eaecaf1b0 100644
> --- a/drivers/dma-buf/dma-fence.c
> +++ b/drivers/dma-buf/dma-fence.c
> @@ -363,6 +363,8 @@ void dma_fence_signal_timestamp_locked(struct dma_fence *fence,
>  				      &fence->flags)))
>  		return;
>  
> +	trace_dma_fence_signaled(fence);
> +
>  	/*
>  	 * When neither a release nor a wait operation is specified set the ops
>  	 * pointer to NULL to allow the fence structure to become independent
> @@ -377,7 +379,6 @@ void dma_fence_signal_timestamp_locked(struct dma_fence *fence,
>  
>  	fence->timestamp = timestamp;
>  	set_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, &fence->flags);
> -	trace_dma_fence_signaled(fence);
>  
>  	list_for_each_entry_safe(cur, tmp, &cb_list, node) {
>  		INIT_LIST_HEAD(&cur->node);


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] dma-fence: Move signalling tracepoint to before ops detach
  2026-03-31  7:49 ` Boris Brezillon
@ 2026-04-09 13:58   ` Tvrtko Ursulin
  2026-04-10  8:58     ` Christian König
  0 siblings, 1 reply; 9+ messages in thread
From: Tvrtko Ursulin @ 2026-04-09 13:58 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: dri-devel, kernel-dev, Christian König, Philipp Stanner,
	linux-media, linaro-mm-sig


On 31/03/2026 08:49, Boris Brezillon wrote:
> On Mon, 30 Mar 2026 14:36:23 +0100
> Tvrtko Ursulin <tvrtko.ursulin@igalia.com> wrote:
> 
>> Move the signalling tracepoint to before fence->ops are reset otherwise
>> tracepoint will dereference a null pointer.
> 
> I suspect other trace points are impacted too
> (trace_dma_fence_destroy() is, at the very least).

Indeed. I wonder why that did not trigger for me, while the one I fix 
here was an insta-crash...

To fix trace_dma_fence_destroy I think we need a new tracepoint 
definition ie. move it away from the existing event class - make it just 
log the context and seqno.

Anyone has a better idea?

Regards,

Tvrtko

>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>> Fixes: 541c8f2468b9 ("dma-buf: detach fence ops on signal v3")
>> Cc: Christian König <christian.koenig@amd.com>
>> Cc: Philipp Stanner <phasta@kernel.org>
>> Cc: Boris Brezillon <boris.brezillon@collabora.com>
>> Cc: linux-media@vger.kernel.org
>> Cc: linaro-mm-sig@lists.linaro.org
>> ---
>>   drivers/dma-buf/dma-fence.c | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
>> index 1826ba73094c..1c1eaecaf1b0 100644
>> --- a/drivers/dma-buf/dma-fence.c
>> +++ b/drivers/dma-buf/dma-fence.c
>> @@ -363,6 +363,8 @@ void dma_fence_signal_timestamp_locked(struct dma_fence *fence,
>>   				      &fence->flags)))
>>   		return;
>>   
>> +	trace_dma_fence_signaled(fence);
>> +
>>   	/*
>>   	 * When neither a release nor a wait operation is specified set the ops
>>   	 * pointer to NULL to allow the fence structure to become independent
>> @@ -377,7 +379,6 @@ void dma_fence_signal_timestamp_locked(struct dma_fence *fence,
>>   
>>   	fence->timestamp = timestamp;
>>   	set_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, &fence->flags);
>> -	trace_dma_fence_signaled(fence);
>>   
>>   	list_for_each_entry_safe(cur, tmp, &cb_list, node) {
>>   		INIT_LIST_HEAD(&cur->node);
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] dma-fence: Move signalling tracepoint to before ops detach
  2026-04-09 13:58   ` Tvrtko Ursulin
@ 2026-04-10  8:58     ` Christian König
  2026-04-10 15:37       ` Tvrtko Ursulin
  0 siblings, 1 reply; 9+ messages in thread
From: Christian König @ 2026-04-10  8:58 UTC (permalink / raw)
  To: Tvrtko Ursulin, Boris Brezillon
  Cc: dri-devel, kernel-dev, Philipp Stanner, linux-media,
	linaro-mm-sig

On 4/9/26 15:58, Tvrtko Ursulin wrote:
> 
> On 31/03/2026 08:49, Boris Brezillon wrote:
>> On Mon, 30 Mar 2026 14:36:23 +0100
>> Tvrtko Ursulin <tvrtko.ursulin@igalia.com> wrote:
>>
>>> Move the signalling tracepoint to before fence->ops are reset otherwise
>>> tracepoint will dereference a null pointer.
>>
>> I suspect other trace points are impacted too
>> (trace_dma_fence_destroy() is, at the very least).
> 
> Indeed. I wonder why that did not trigger for me, while the one I fix here was an insta-crash...

You need to actually enable the trace points and at least for the destroy one nobody is usually interested in that.

> 
> To fix trace_dma_fence_destroy I think we need a new tracepoint definition ie. move it away from the existing event class - make it just log the context and seqno.
> 
> Anyone has a better idea?

The idea of tracing without accessing fence->ops sounds valid to me.

Alternatively we could call dma_fence_timeline_name() and dma_fence_driver_name() from the tracepoint as well, but that means the tracepoints now require a RCU read side lock.

Regards,
Christian.

> 
> Regards,
> 
> Tvrtko
> 
>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>>> Fixes: 541c8f2468b9 ("dma-buf: detach fence ops on signal v3")
>>> Cc: Christian König <christian.koenig@amd.com>
>>> Cc: Philipp Stanner <phasta@kernel.org>
>>> Cc: Boris Brezillon <boris.brezillon@collabora.com>
>>> Cc: linux-media@vger.kernel.org
>>> Cc: linaro-mm-sig@lists.linaro.org
>>> ---
>>>   drivers/dma-buf/dma-fence.c | 3 ++-
>>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
>>> index 1826ba73094c..1c1eaecaf1b0 100644
>>> --- a/drivers/dma-buf/dma-fence.c
>>> +++ b/drivers/dma-buf/dma-fence.c
>>> @@ -363,6 +363,8 @@ void dma_fence_signal_timestamp_locked(struct dma_fence *fence,
>>>                         &fence->flags)))
>>>           return;
>>>   +    trace_dma_fence_signaled(fence);
>>> +
>>>       /*
>>>        * When neither a release nor a wait operation is specified set the ops
>>>        * pointer to NULL to allow the fence structure to become independent
>>> @@ -377,7 +379,6 @@ void dma_fence_signal_timestamp_locked(struct dma_fence *fence,
>>>         fence->timestamp = timestamp;
>>>       set_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, &fence->flags);
>>> -    trace_dma_fence_signaled(fence);
>>>         list_for_each_entry_safe(cur, tmp, &cb_list, node) {
>>>           INIT_LIST_HEAD(&cur->node);
>>
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] dma-fence: Move signalling tracepoint to before ops detach
  2026-04-10  8:58     ` Christian König
@ 2026-04-10 15:37       ` Tvrtko Ursulin
  2026-04-13  6:52         ` Christian König
  0 siblings, 1 reply; 9+ messages in thread
From: Tvrtko Ursulin @ 2026-04-10 15:37 UTC (permalink / raw)
  To: Christian König, Boris Brezillon
  Cc: dri-devel, kernel-dev, Philipp Stanner, linux-media,
	linaro-mm-sig


On 10/04/2026 09:58, Christian König wrote:
> On 4/9/26 15:58, Tvrtko Ursulin wrote:
>>
>> On 31/03/2026 08:49, Boris Brezillon wrote:
>>> On Mon, 30 Mar 2026 14:36:23 +0100
>>> Tvrtko Ursulin <tvrtko.ursulin@igalia.com> wrote:
>>>
>>>> Move the signalling tracepoint to before fence->ops are reset otherwise
>>>> tracepoint will dereference a null pointer.
>>>
>>> I suspect other trace points are impacted too
>>> (trace_dma_fence_destroy() is, at the very least).
>>
>> Indeed. I wonder why that did not trigger for me, while the one I fix here was an insta-crash...
> 
> You need to actually enable the trace points and at least for the destroy one nobody is usually interested in that.

Right, but I was pretty sure I was enabling perf record -e 'dma_fence:*' 
when I hit this. Anyway, it doesn't matter, I could be misremembering.

>>
>> To fix trace_dma_fence_destroy I think we need a new tracepoint definition ie. move it away from the existing event class - make it just log the context and seqno.
>>
>> Anyone has a better idea?
> 
> The idea of tracing without accessing fence->ops sounds valid to me.
> 
> Alternatively we could call dma_fence_timeline_name() and dma_fence_driver_name() from the tracepoint as well, but that means the tracepoints now require a RCU read side lock.

We could possibly use the helpers. I am not sure if RCU annotation would 
have to be casted away to keep sparse happy, but more importantly, I 
think it would not be safe.

   thread A					thread B

   dma_fence_signal_timestamp_locked		dma_fence_timeline_name
     ..						ops = rcu_dereference(fence->ops);
						if (!dma_fence_test_signaled_flag(fence))
     test_and_set_bit
     ..
     RCU_INIT_POINTER(fence->ops, NULL);
     						return (const char __rcu *)ops->get_driver_name(fence); // OOPS!

Apologies for long line length, it did not fit otherwise.

Looks like we missed this. Or it is me who is missing something?
Regards,

Tvrtko

>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>>>> Fixes: 541c8f2468b9 ("dma-buf: detach fence ops on signal v3")
>>>> Cc: Christian König <christian.koenig@amd.com>
>>>> Cc: Philipp Stanner <phasta@kernel.org>
>>>> Cc: Boris Brezillon <boris.brezillon@collabora.com>
>>>> Cc: linux-media@vger.kernel.org
>>>> Cc: linaro-mm-sig@lists.linaro.org
>>>> ---
>>>>    drivers/dma-buf/dma-fence.c | 3 ++-
>>>>    1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
>>>> index 1826ba73094c..1c1eaecaf1b0 100644
>>>> --- a/drivers/dma-buf/dma-fence.c
>>>> +++ b/drivers/dma-buf/dma-fence.c
>>>> @@ -363,6 +363,8 @@ void dma_fence_signal_timestamp_locked(struct dma_fence *fence,
>>>>                          &fence->flags)))
>>>>            return;
>>>>    +    trace_dma_fence_signaled(fence);
>>>> +
>>>>        /*
>>>>         * When neither a release nor a wait operation is specified set the ops
>>>>         * pointer to NULL to allow the fence structure to become independent
>>>> @@ -377,7 +379,6 @@ void dma_fence_signal_timestamp_locked(struct dma_fence *fence,
>>>>          fence->timestamp = timestamp;
>>>>        set_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, &fence->flags);
>>>> -    trace_dma_fence_signaled(fence);
>>>>          list_for_each_entry_safe(cur, tmp, &cb_list, node) {
>>>>            INIT_LIST_HEAD(&cur->node);
>>>
>>
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] dma-fence: Move signalling tracepoint to before ops detach
  2026-04-10 15:37       ` Tvrtko Ursulin
@ 2026-04-13  6:52         ` Christian König
  2026-04-13  8:06           ` Tvrtko Ursulin
  0 siblings, 1 reply; 9+ messages in thread
From: Christian König @ 2026-04-13  6:52 UTC (permalink / raw)
  To: Tvrtko Ursulin, Boris Brezillon
  Cc: dri-devel, kernel-dev, Philipp Stanner, linux-media,
	linaro-mm-sig

On 4/10/26 17:37, Tvrtko Ursulin wrote:
> 
> On 10/04/2026 09:58, Christian König wrote:
>> On 4/9/26 15:58, Tvrtko Ursulin wrote:
>>>
>>> On 31/03/2026 08:49, Boris Brezillon wrote:
>>>> On Mon, 30 Mar 2026 14:36:23 +0100
>>>> Tvrtko Ursulin <tvrtko.ursulin@igalia.com> wrote:
>>>>
>>>>> Move the signalling tracepoint to before fence->ops are reset otherwise
>>>>> tracepoint will dereference a null pointer.
>>>>
>>>> I suspect other trace points are impacted too
>>>> (trace_dma_fence_destroy() is, at the very least).
>>>
>>> Indeed. I wonder why that did not trigger for me, while the one I fix here was an insta-crash...
>>
>> You need to actually enable the trace points and at least for the destroy one nobody is usually interested in that.
> 
> Right, but I was pretty sure I was enabling perf record -e 'dma_fence:*' when I hit this. Anyway, it doesn't matter, I could be misremembering.
> 
>>>
>>> To fix trace_dma_fence_destroy I think we need a new tracepoint definition ie. move it away from the existing event class - make it just log the context and seqno.
>>>
>>> Anyone has a better idea?
>>
>> The idea of tracing without accessing fence->ops sounds valid to me.
>>
>> Alternatively we could call dma_fence_timeline_name() and dma_fence_driver_name() from the tracepoint as well, but that means the tracepoints now require a RCU read side lock.
> 
> We could possibly use the helpers. I am not sure if RCU annotation would have to be casted away to keep sparse happy, but more importantly, I think it would not be safe.
> 
>   thread A                    thread B
> 
>   dma_fence_signal_timestamp_locked        dma_fence_timeline_name
>     ..                        ops = rcu_dereference(fence->ops);
>                         if (!dma_fence_test_signaled_flag(fence))
>     test_and_set_bit
>     ..
>     RCU_INIT_POINTER(fence->ops, NULL);
>                             return (const char __rcu *)ops->get_driver_name(fence); // OOPS!
> 
> Apologies for long line length, it did not fit otherwise.
> 
> Looks like we missed this. Or it is me who is missing something?

See function dma_fence_driver_name() and dma_fence_timeline_name():

ops = rcu_dereference(fence->ops);
if (!dma_fence_test_signaled_flag(fence))
	return (const char __rcu *)ops->get_driver_name(fence);

We first grab the ops pointer and then check if the fence is signaled or not. Since we first set the signaled flag and then NULL the ops pointer in the other thread we should be save here.

Could only be that test_bit() is not a memory barrier, but set_bit() is so that would be a bit surprising.

Alternatively I would be fine to switching testing ops for NULL instead of calling dma_fence_test_signaled_flag().

Regards,
Christian.

> Regards,
> 
> Tvrtko
> 
>>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>>>>> Fixes: 541c8f2468b9 ("dma-buf: detach fence ops on signal v3")
>>>>> Cc: Christian König <christian.koenig@amd.com>
>>>>> Cc: Philipp Stanner <phasta@kernel.org>
>>>>> Cc: Boris Brezillon <boris.brezillon@collabora.com>
>>>>> Cc: linux-media@vger.kernel.org
>>>>> Cc: linaro-mm-sig@lists.linaro.org
>>>>> ---
>>>>>    drivers/dma-buf/dma-fence.c | 3 ++-
>>>>>    1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
>>>>> index 1826ba73094c..1c1eaecaf1b0 100644
>>>>> --- a/drivers/dma-buf/dma-fence.c
>>>>> +++ b/drivers/dma-buf/dma-fence.c
>>>>> @@ -363,6 +363,8 @@ void dma_fence_signal_timestamp_locked(struct dma_fence *fence,
>>>>>                          &fence->flags)))
>>>>>            return;
>>>>>    +    trace_dma_fence_signaled(fence);
>>>>> +
>>>>>        /*
>>>>>         * When neither a release nor a wait operation is specified set the ops
>>>>>         * pointer to NULL to allow the fence structure to become independent
>>>>> @@ -377,7 +379,6 @@ void dma_fence_signal_timestamp_locked(struct dma_fence *fence,
>>>>>          fence->timestamp = timestamp;
>>>>>        set_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, &fence->flags);
>>>>> -    trace_dma_fence_signaled(fence);
>>>>>          list_for_each_entry_safe(cur, tmp, &cb_list, node) {
>>>>>            INIT_LIST_HEAD(&cur->node);
>>>>
>>>
>>
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] dma-fence: Move signalling tracepoint to before ops detach
  2026-04-13  6:52         ` Christian König
@ 2026-04-13  8:06           ` Tvrtko Ursulin
  0 siblings, 0 replies; 9+ messages in thread
From: Tvrtko Ursulin @ 2026-04-13  8:06 UTC (permalink / raw)
  To: Christian König, Boris Brezillon
  Cc: dri-devel, kernel-dev, Philipp Stanner, linux-media,
	linaro-mm-sig


On 13/04/2026 07:52, Christian König wrote:
> On 4/10/26 17:37, Tvrtko Ursulin wrote:
>>
>> On 10/04/2026 09:58, Christian König wrote:
>>> On 4/9/26 15:58, Tvrtko Ursulin wrote:
>>>>
>>>> On 31/03/2026 08:49, Boris Brezillon wrote:
>>>>> On Mon, 30 Mar 2026 14:36:23 +0100
>>>>> Tvrtko Ursulin <tvrtko.ursulin@igalia.com> wrote:
>>>>>
>>>>>> Move the signalling tracepoint to before fence->ops are reset otherwise
>>>>>> tracepoint will dereference a null pointer.
>>>>>
>>>>> I suspect other trace points are impacted too
>>>>> (trace_dma_fence_destroy() is, at the very least).
>>>>
>>>> Indeed. I wonder why that did not trigger for me, while the one I fix here was an insta-crash...
>>>
>>> You need to actually enable the trace points and at least for the destroy one nobody is usually interested in that.
>>
>> Right, but I was pretty sure I was enabling perf record -e 'dma_fence:*' when I hit this. Anyway, it doesn't matter, I could be misremembering.
>>
>>>>
>>>> To fix trace_dma_fence_destroy I think we need a new tracepoint definition ie. move it away from the existing event class - make it just log the context and seqno.
>>>>
>>>> Anyone has a better idea?
>>>
>>> The idea of tracing without accessing fence->ops sounds valid to me.
>>>
>>> Alternatively we could call dma_fence_timeline_name() and dma_fence_driver_name() from the tracepoint as well, but that means the tracepoints now require a RCU read side lock.
>>
>> We could possibly use the helpers. I am not sure if RCU annotation would have to be casted away to keep sparse happy, but more importantly, I think it would not be safe.
>>
>>    thread A                    thread B
>>
>>    dma_fence_signal_timestamp_locked        dma_fence_timeline_name
>>      ..                        ops = rcu_dereference(fence->ops);
>>                          if (!dma_fence_test_signaled_flag(fence))
>>      test_and_set_bit
>>      ..
>>      RCU_INIT_POINTER(fence->ops, NULL);
>>                              return (const char __rcu *)ops->get_driver_name(fence); // OOPS!
>>
>> Apologies for long line length, it did not fit otherwise.
>>
>> Looks like we missed this. Or it is me who is missing something?
> 
> See function dma_fence_driver_name() and dma_fence_timeline_name():
> 
> ops = rcu_dereference(fence->ops);
> if (!dma_fence_test_signaled_flag(fence))
> 	return (const char __rcu *)ops->get_driver_name(fence);
> 
> We first grab the ops pointer and then check if the fence is signaled or not. Since we first set the signaled flag and then NULL the ops pointer in the other thread we should be save here.
> 
> Could only be that test_bit() is not a memory barrier, but set_bit() is so that would be a bit surprising.

You are right, I got confused jumping back and forth between patches 
last week.

So changing to helpers on top of the change from this patch should be 
enough.

I only need to check if the __rcu needs to be casted away in the 
tracepoints assignments in case it can upset sparse. I will send a v2 
when I do that.

Regards,

Tvrtko
> 
> Alternatively I would be fine to switching testing ops for NULL instead of calling dma_fence_test_signaled_flag().
> 
> Regards,
> Christian.
> 
>> Regards,
>>
>> Tvrtko
>>
>>>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>>>>>> Fixes: 541c8f2468b9 ("dma-buf: detach fence ops on signal v3")
>>>>>> Cc: Christian König <christian.koenig@amd.com>
>>>>>> Cc: Philipp Stanner <phasta@kernel.org>
>>>>>> Cc: Boris Brezillon <boris.brezillon@collabora.com>
>>>>>> Cc: linux-media@vger.kernel.org
>>>>>> Cc: linaro-mm-sig@lists.linaro.org
>>>>>> ---
>>>>>>     drivers/dma-buf/dma-fence.c | 3 ++-
>>>>>>     1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
>>>>>> index 1826ba73094c..1c1eaecaf1b0 100644
>>>>>> --- a/drivers/dma-buf/dma-fence.c
>>>>>> +++ b/drivers/dma-buf/dma-fence.c
>>>>>> @@ -363,6 +363,8 @@ void dma_fence_signal_timestamp_locked(struct dma_fence *fence,
>>>>>>                           &fence->flags)))
>>>>>>             return;
>>>>>>     +    trace_dma_fence_signaled(fence);
>>>>>> +
>>>>>>         /*
>>>>>>          * When neither a release nor a wait operation is specified set the ops
>>>>>>          * pointer to NULL to allow the fence structure to become independent
>>>>>> @@ -377,7 +379,6 @@ void dma_fence_signal_timestamp_locked(struct dma_fence *fence,
>>>>>>           fence->timestamp = timestamp;
>>>>>>         set_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, &fence->flags);
>>>>>> -    trace_dma_fence_signaled(fence);
>>>>>>           list_for_each_entry_safe(cur, tmp, &cb_list, node) {
>>>>>>             INIT_LIST_HEAD(&cur->node);
>>>>>
>>>>
>>>
>>
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2026-04-13  8:07 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-30 13:36 [PATCH] dma-fence: Move signalling tracepoint to before ops detach Tvrtko Ursulin
2026-03-30 14:24 ` Philipp Stanner
2026-03-31  7:49 ` Boris Brezillon
2026-04-09 13:58   ` Tvrtko Ursulin
2026-04-10  8:58     ` Christian König
2026-04-10 15:37       ` Tvrtko Ursulin
2026-04-13  6:52         ` Christian König
2026-04-13  8:06           ` Tvrtko Ursulin
2026-03-31  9:19 ` Christian König

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox