All of lore.kernel.org
 help / color / mirror / Atom feed
From: Boris Brezillon <boris.brezillon@collabora.com>
To: "Christian König" <christian.koenig@amd.com>
Cc: phasta@kernel.org, dakr@kernel.org,
	Tvrtko Ursulin <tvrtko.ursulin@igalia.com>,
	dri-devel <dri-devel@lists.freedesktop.org>
Subject: Re: dma_fence: force users to take the lock manually
Date: Fri, 6 Mar 2026 10:46:46 +0100	[thread overview]
Message-ID: <20260306104646.36319162@fedora> (raw)
In-Reply-To: <e8b47e9f-f8cd-4be4-953a-931816e5f429@amd.com>

On Fri, 6 Mar 2026 09:10:52 +0100
Christian König <christian.koenig@amd.com> wrote:

> On 3/5/26 16:12, Boris Brezillon wrote:
> > Hi,
> > 
> > On Thu, 5 Mar 2026 14:59:02 +0100
> > Christian König <christian.koenig@amd.com> wrote:
> >   
> >> On 3/5/26 14:54, Philipp Stanner wrote:  
> >>> Yo Christian,
> >>>
> >>> a while ago we were discussing this problem
> >>>
> >>> dma_fence_set_error(f, -ECANCELED);  
> > 
> > If you really have two concurrent threads setting the error, this part
> > is racy, though I can't think of any situation where concurrent
> > signaling of a set of fences wouldn't be protected by another external
> > lock.  
> 
> This is actually massively problematic and the reason why we have the WARN_ON in dma_fence_set_error().
> 
> What drivers usually do is to disable the normal signaling path, e.g. turn off interrupts for example, and then set and error and signal the fence manually.
> 
> The problem is that this has a *huge* potential for being racy, for example when you tell the HW to not give you an interrupt any more it can always been than interrupt processing has already started but wasn't able yet to grab a lock or similar.
> 
> I think we should start enforcing correct handling and have a lockdep check in dma_fence_set_error() that the dma_fence lock is hold while calling it.

Sure, I don't mind you dropping the non-locked variants and forcing
users to lock around set_error() + signal().

> 
> >>> dma_fence_signal(f); // racy!  
> > 
> > This is not racy because dma_fence_signal() takes/releases the
> > lock internally. Besides, calling dma_fence_signal() on an already
> > signaled fence is considered an invalid pattern if I trust the -EINVAL
> > returned here[1].  
> 
> No, that is also something we want to remove. IIRC Philip proposed some patches to clean that up already.

What do you mean? You want dma_fence_signal_locked() (or the variants
of it) to not return an error when the fence is already signaled, or
you want to prevent this double-signal from happening. The plan for the
rust abstraction is to do the latter.

> 
> >>>
> >>>
> >>> I think you mentioned that you are considering to redesign the
> >>> dma_fence API so that users have to take the lock themselves to touch
> >>> the fence:
> >>>
> >>> dma_fence_lock(f);
> >>> dma_fence_set_error(f, -ECANCELED);
> >>> dma_fence_signal(f);  
> > 
> > I guess you mean dma_fence_signal_locked().
> >   
> >>> dme_fence_unlock(f);
> >>>
> >>>
> >>> Is that still up to date? Is there work in progress about that?    
> >>
> >> It's on my "maybe if I ever have time for that" list, but yeah I think it would be really nice to have and a great cleanup.
> >>
> >> We have a bunch of different functions which provide both a _locked() and _unlocked() variant just because callers where to lazy to lock the fence.
> >>
> >> Especially the dma_fence_signal function is overloaded 4 (!) times with locked/unlocked and with and without timestamp functions.
> >>  
> >>> I discovered that I might need / want that for the Rust abstractions.    
> >>
> >> Well my educated guess is for Rust you only want the locked function and never allow callers to be lazy.  
> > 
> > I don't think we have an immediate need for manual locking in rust
> > drivers (no signaling done under an already dma_fence-locked section
> > that I can think of), especially after the inline_lock you've
> > introduced. Now, I don't think it matters if only the _locked() variant
> > is exposed and the rust code is expected to acquire/release the lock
> > manually, all I'm saying is that we probably don't need that in drivers
> > (might be different if we start implementing fence containers like
> > arrays and chain in rust, but I don't think we have an immediate need
> > for that).  
> 
> Well as I wrote above you either have super reliable locking in your signaling path or you will need that for error handling.

Not really. With rust's ownership model, you can make it so only one
thread gets to own the DriverFence (the signal-able fence object), and
the DriverFence::signal() method consumes this object. This implies
that only one path gets to signal the DriverFence, and after that it
vanishes, so no one else can signal it anymore. Just to clarify, by
vanishes, I mean that the signal-able view disappears, but the
observable object (Fence) can stay around, so it can be monitored (and
only monitored) by others. With this model, it doesn't matter that
_set_error() is set under a dma_fence locked section or not, because
the concurrency is addressed at a higher level.

Again, I'm not saying the changes Christian and you have been
discussing are pointless (they might help with the C implementations to
get things right), I'm just saying it's not strictly needed for the rust
abstraction, that's all.

  reply	other threads:[~2026-03-06  9:46 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-05 13:54 dma_fence: force users to take the lock manually Philipp Stanner
2026-03-05 13:59 ` Christian König
2026-03-05 15:12   ` Boris Brezillon
2026-03-06  8:10     ` Christian König
2026-03-06  9:46       ` Boris Brezillon [this message]
2026-03-06  9:54         ` Philipp Stanner
2026-03-06 10:27           ` Boris Brezillon
2026-03-06  9:58         ` Christian König
2026-03-06 10:37           ` Boris Brezillon
2026-03-06 11:03             ` Christian König
2026-03-06 11:24               ` Boris Brezillon
2026-03-06 11:57                 ` Philipp Stanner
2026-03-06 12:31                   ` Christian König
2026-03-06 12:36                     ` Philipp Stanner
2026-03-06 12:54                       ` Christian König
2026-03-06 14:55                         ` Boris Brezillon
2026-03-09  9:33                           ` Christian König
2026-03-09 15:06                             ` Boris Brezillon
2026-03-09 16:46                               ` Christian König
2026-03-06 13:03                       ` Danilo Krummrich
2026-03-06 13:15                         ` Christian König
2026-03-06 13:36                           ` Philipp Stanner
2026-03-06 14:37                             ` Christian König
2026-03-06 15:25                               ` Boris Brezillon
2026-03-06 15:43                                 ` Boris Brezillon
2026-03-06 19:02                                   ` Philipp Stanner
2026-03-09  8:16                                     ` Boris Brezillon
2026-03-09  9:36                                       ` Christian König
2026-03-06 14:21                       ` Boris Brezillon
2026-03-06 12:48                   ` Boris Brezillon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260306104646.36319162@fedora \
    --to=boris.brezillon@collabora.com \
    --cc=christian.koenig@amd.com \
    --cc=dakr@kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=phasta@kernel.org \
    --cc=tvrtko.ursulin@igalia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.