rust-for-linux.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Philipp Stanner <phasta@mailbox.org>
To: Daniel Almeida <daniel.almeida@collabora.com>,
	Philipp Stanner <phasta@kernel.org>
Cc: "Alice Ryhl" <aliceryhl@google.com>,
	"Danilo Krummrich" <dakr@kernel.org>,
	"Christian König" <ckoenig.leichtzumerken@gmail.com>,
	"Tvrtko Ursulin" <tursulin@ursulin.net>,
	"Alexandre Courbot" <acourbot@nvidia.com>,
	"Boris Brezillon" <boris.brezillon@collabora.com>,
	"Dave Airlie" <airlied@redhat.com>,
	"Lyude Paul" <lyude@redhat.com>,
	"Peter Colberg" <pcolberg@redhat.com>,
	dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org,
	rust-for-linux@vger.kernel.org
Subject: Re: [RFC WIP 2/3] rust: sync: Add dma_fence abstractions
Date: Tue, 25 Nov 2025 10:48:12 +0100	[thread overview]
Message-ID: <bc4f01ec5172d29abd64429e3017cc53c0522e01.camel@mailbox.org> (raw)
In-Reply-To: <E55D72FC-AEF6-4D2D-973F-123306E4EB4C@collabora.com>

On Mon, 2025-11-24 at 09:49 -0300, Daniel Almeida wrote:
> Hi Phillipp,
> 
> > 

[…]

> > +use kernel::sync::{Arc, ArcBorrow};
> > +use kernel::c_str;
> > +
> > +/// Defines the callback function the dma-fence backend will call once the fence gets signalled.
> > +pub trait DmaFenceCbFunc {
> > +    /// The callback function. `cb` is a container of the data which the driver passed in
> > +    /// [`DmaFence::register_callback`].
> > +    fn callback(cb: Pin<KBox<DmaFenceCb<Self>>>)
> > +    where
> > +        Self: Sized;
> > +}
> > +
> > +/// Container for driver data which the driver gets back in its callback once the fence gets
> > +/// signalled.
> > +#[pin_data]
> > +pub struct DmaFenceCb<T: DmaFenceCbFunc> {
> > +    /// C struct needed for the backend.
> > +    #[pin]
> > +    inner: Opaque<bindings::dma_fence_cb>,
> > +    /// Driver data.
> > +    #[pin]
> > +    pub data: T,
> 
> We should probably deref to this type.

As a transparent container you mean. I can put it on the todo-list.

> 
> > +}
> > +
> > +impl<T: DmaFenceCbFunc + 'static> DmaFenceCb<T> {
> > 

[…]

> > +impl ArcBorrow<'_, DmaFenceCtx> {
> > +    /// Create a new fence, consuming `data`.
> > +    ///
> > +    /// The fence will increment the refcount of the fence context associated with this
> > +    /// [`DmaFenceCtx`].
> > +    pub fn new_fence<T>(
> > +        &mut self,
> > +        data: impl PinInit<T>,
> > +    ) -> Result<ARef<DmaFence<T>>> {
> > +        let fctx: Arc<DmaFenceCtx> = (*self).into();
> > +        let seqno: u64 = fctx.get_new_fence_seqno();
> > +
> > +        // TODO: Should we reset seqno in case of failure?
> 
> I think we should go back to the old value, yeah.

It would be trivial to implement that (just atomic.decrement()).

The thing why the TODO even exists is that I'm a bit unsure about
races. It seems we have to choose between either a gap in the seqnos or
the possiblity of seqnos being out of order.

If the user / driver creates fences with >1 thread on a fence context,
I mean.

We're pretty free in our choices, however. The shared fence-fctx
spinlock will be removed anyways, so one could later easily replace the
fctx atomic with a lock if that's desirable.

I can implement a seqno-decrement for now.

> 
> > +        // Pass `fctx` by value so that the fence will hold a reference to the DmaFenceCtx as long
> > +        // as it lives.
> > +        DmaFence::new(fctx, data, &self.lock, self.nr, seqno)
> > +    }
> > +}
> > +
> > 

[…]

> > +
> > +impl<T> DmaFence<T> {
> > +    // TODO: There could be a subtle potential problem here? The LLVM compiler backend can create
> > +    // several versions of this constant. Their content would be identical, but their addresses
> > +    // different.
> > +    const OPS: bindings::dma_fence_ops = Self::ops_create();
> > +
> > +    /// Create an initializer for a new [`DmaFence`].
> > +    fn new(
> > +        fctx: Arc<DmaFenceCtx>,
> > +        data: impl PinInit<T>, // TODO: The driver data should implement PinInit<T, Error>
> > +        lock: &Opaque<bindings::spinlock_t>,
> 
> I wonder if we should take a Rust lock directly?

Yes, good idea; Boqun has suggested that in the first dma_fence RFC,
too. It will be as simple as SpinLock<()>.

Will do in dma_fence RFC v2 and rebase the Jobqueue on it.

> 
> > +        context: u64,
> > +        seqno: u64,
> > +    ) -> Result<ARef<Self>> {
> > +        let fence = pin_init!(Self {
> > +            inner <- Opaque::ffi_init(|slot: *mut bindings::dma_fence| {
> > +                let lock_ptr = &raw const (*lock);
> > +                // SAFETY: `slot` is a valid pointer to an uninitialized `struct dma_fence`.
> > +                // `lock_ptr` is a pointer to the spinlock of the fence context, which is shared
> 
> Then we should perhaps extract that lock from the fence context itself, instead
> of passing it as an argument? This relationship is not enforced in the current
> code.

See the series linked in the cover letter. Soon fences will all have
their own lock, and the fctx will either just be the atomic seqno or
have a separate lock.

> 
> > +                // among all the fences. This can't become a UAF because each fence takes a
> > +                // reference of the fence context.
> > +                unsafe { bindings::dma_fence_init(slot, &Self::OPS, Opaque::cast_into(lock_ptr), context, seqno) };
> > +            }),
> > +            data <- data,
> > +            signalling: false,
> > +            signalling_cookie: false,
> > +            fctx: fctx,
> > +        });
> > +
> > +        let b = KBox::pin_init(fence, GFP_KERNEL)?;
> > +
> > +        // SAFETY: We don't move the contents of `b` anywhere here. After unwrapping it, ARef will
> > +        // take care of preventing memory moves.
> > +        let rawptr = KBox::into_raw(unsafe { Pin::into_inner_unchecked(b) });
> > +
> > +        // SAFETY: `rawptr` was created validly above.
> > +        let aref = unsafe { ARef::from_raw(NonNull::new_unchecked(rawptr)) };
> > +
> > +        Ok(aref)
> > +    }
> > +
> > +    /// Mark the beginning of a DmaFence signalling critical section. Should be called once a fence
> > +    /// gets published.
> > +    ///
> > +    /// The signalling critical section is marked as finished automatically once the fence signals.
> > +    pub fn begin_signalling(&mut self) {
> > +        // FIXME: this needs to be mutable, obviously, but we can't borrow mutably. *sigh*
> 
> Is AtomicBool going away? Otherwise can you expand?

The AtomicBool is just used in the example demo code.

The issue here is that begin_signalling() should set a "this fence is
currently in the signalling section"-flag. So the fence needs to be
mutable. Then, however, Rust complains because self.signalling is not
protected by any lock.

So one needs some sort of synchronization. Stuffing a DmaFence into a
SpinLock would be overkill, however, considering that the C code
already takes care of properly taking all locks.

I've asked about that problem on Zulip once:
https://rust-for-linux.zulipchat.com/#narrow/channel/288089-General/topic/.E2.9C.94.20ARef.20without.20locking/near/539747635

Haven't looked deeper into solving it since, because those lockdep
guards are kind of nice-to-have at the moment.

I think the solution will be to make self.signalling an AtomicBool (I
think you meant that above?)

> 
> > +        self.signalling = true;
> > +        // TODO: Should we warn if a user tries to do this several times for the same
> > +        // fence? And should we ignore the request if the fence is already signalled?
> > +
> > +        // SAFETY: `dma_fence_begin_signalling()` works on global lockdep data and calling it is
> > +        // always safe.
> > +        self.signalling_cookie = unsafe { bindings::dma_fence_begin_signalling() };
> > +    }
> > +
> > 
> > +
> > +    /// Signal the fence. This will invoke all registered callbacks.
> > +    pub fn signal(&self) -> Result {
> > +        // SAFETY: `self` is refcounted.
> > +        let ret = unsafe { bindings::dma_fence_signal(self.as_raw()) };
> > +        if ret != 0 {
> > +            return Err(Error::from_errno(ret));
> > +        }
> 
> Do you think it’s worth it to have a separate error type for when fences
> are already signaled? I am asking, but I am not convinced either, FYI.

My tendency so far was to mostly ignore it. All users in C don't care
whether the fence already was signaled. dma_fence just returns -EINVAL
in that case (not a good choice anyways IMO).

AFAIR I was close to not having Rust's signal() return an error at all,
because it's kind of useless?

But, correct me if I'm wrong, I think the policy with Rust abstractions
is to map the C API as closely as possible?

Anyways, I'd expect Rust drivers to ignore that error, too.

> 
> > +
> > +        if self.signalling {
> > +            // SAFETY: `dma_fence_end_signalling()` works on global lockdep data. The only
> > +            // parameter is a boolean passed by value.
> > +            unsafe { bindings::dma_fence_end_signalling(self.signalling_cookie) };
> > +        }
> > +
> > +        Ok(())
> > +    }
> > +
> > +    /// Register a callback on a [`DmaFence`]. The callback will be invoked in the fence's
> > +    /// signalling path, i.e., critical section.
> > +    ///
> > +    /// Consumes `data`. `data` is passed back to the implemented callback function when the fence
> > +    /// gets signalled.
> > +    pub fn register_callback<U: DmaFenceCbFunc + 'static>(&self, data: impl PinInit<U>) -> Result {
> > +        let cb = DmaFenceCb::new(data)?;
> > +        let ptr = cb.into_foreign() as *mut DmaFenceCb<U>;
> > +        // SAFETY: `ptr` was created validly directly above.
> > +        let inner_cb = unsafe { (*ptr).inner.get() };
> > +
> > +        // SAFETY: `self.as_raw()` is valid because `self` is refcounted, `inner_cb` was created
> > +        // validly above and was turned into a ForeignOwnable, so it won't be dropped. `callback`
> > +        // has static life time.
> > +        let ret = unsafe {
> > +            bindings::dma_fence_add_callback(
> > +                self.as_raw(),
> > +                inner_cb,
> > +                Some(DmaFenceCb::<U>::callback),
> > +            )
> > +        };
> > +        if ret != 0 {
> > +            return Err(Error::from_errno(ret));
> 
> Related to the question above: not being able to place a callback seems to be
> common in the DRM scheduler (i.e.: the fence has signaled already). Hence my
> question about a separate error type, as we would have to single out
> Err(EINVAL) often otherwise.

Interesting, dma_fence_add_callback() actually returns a different
error code, ENOENT, if the fence was signalled already, whereas
dma_fence_signal() gives EINVAL.

That's kind of not optimal. I'll consider fixing that on the C side.

Regardless, what's relevant for us here is that not being able to
register a callback is a serious failure that needs to be checked,
whereas trying to signal an already signaled fence is valid behavior
and no big deal.

If you try to register a callback on an already signaled fence, that
effectively means that you as the registering party need to take care
of the callback's code being executed, since dma_fence won't do that
anymore. The jobqueue design is catching that problem through the API
design; you were asking about the cb-registering API in the other mail,
I'll answer there.



Thx for the review!

P.

  reply	other threads:[~2025-11-25  9:48 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-18 13:25 [RFC WIP 0/3] DRM Jobqueue Philipp Stanner
2025-11-18 13:25 ` [RFC WIP 1/3] rust: list: Add unsafe for container_of Philipp Stanner
2025-11-18 13:25 ` [RFC WIP 2/3] rust: sync: Add dma_fence abstractions Philipp Stanner
2025-11-21 23:03   ` Lyude Paul
2025-11-24  9:31     ` Alice Ryhl
2025-11-27 13:45     ` Philipp Stanner
2025-11-24 12:49   ` Daniel Almeida
2025-11-25  9:48     ` Philipp Stanner [this message]
2025-11-25 10:58       ` Boris Brezillon
2025-11-25 12:20         ` Philipp Stanner
2025-11-25 13:50           ` Boris Brezillon
2025-11-26 13:42             ` Philipp Stanner
2025-11-26 14:55               ` Boris Brezillon
2025-11-27 13:48       ` Daniel Almeida
2025-11-28 11:08     ` Philipp Stanner
2025-11-28 12:21       ` Daniel Almeida
2025-11-18 13:25 ` [RFC WIP 3/3] rust/drm: Add initial jobqueue sceleton Philipp Stanner
2025-11-24 13:58   ` Daniel Almeida
2025-11-25 13:20     ` Philipp Stanner
2025-11-27 14:13       ` Daniel Almeida
2025-11-28 10:07         ` Philipp Stanner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bc4f01ec5172d29abd64429e3017cc53c0522e01.camel@mailbox.org \
    --to=phasta@mailbox.org \
    --cc=acourbot@nvidia.com \
    --cc=airlied@redhat.com \
    --cc=aliceryhl@google.com \
    --cc=boris.brezillon@collabora.com \
    --cc=ckoenig.leichtzumerken@gmail.com \
    --cc=dakr@kernel.org \
    --cc=daniel.almeida@collabora.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lyude@redhat.com \
    --cc=pcolberg@redhat.com \
    --cc=phasta@kernel.org \
    --cc=rust-for-linux@vger.kernel.org \
    --cc=tursulin@ursulin.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).