From: Philipp Stanner <phasta@mailbox.org>
To: Daniel Almeida <daniel.almeida@collabora.com>,
Philipp Stanner <phasta@kernel.org>
Cc: "Alice Ryhl" <aliceryhl@google.com>,
"Danilo Krummrich" <dakr@kernel.org>,
"Christian König" <ckoenig.leichtzumerken@gmail.com>,
"Tvrtko Ursulin" <tursulin@ursulin.net>,
"Alexandre Courbot" <acourbot@nvidia.com>,
"Boris Brezillon" <boris.brezillon@collabora.com>,
"Dave Airlie" <airlied@redhat.com>,
"Lyude Paul" <lyude@redhat.com>,
"Peter Colberg" <pcolberg@redhat.com>,
dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org,
rust-for-linux@vger.kernel.org
Subject: Re: [RFC WIP 2/3] rust: sync: Add dma_fence abstractions
Date: Tue, 25 Nov 2025 10:48:12 +0100 [thread overview]
Message-ID: <bc4f01ec5172d29abd64429e3017cc53c0522e01.camel@mailbox.org> (raw)
In-Reply-To: <E55D72FC-AEF6-4D2D-973F-123306E4EB4C@collabora.com>
On Mon, 2025-11-24 at 09:49 -0300, Daniel Almeida wrote:
> Hi Phillipp,
>
> >
[…]
> > +use kernel::sync::{Arc, ArcBorrow};
> > +use kernel::c_str;
> > +
> > +/// Defines the callback function the dma-fence backend will call once the fence gets signalled.
> > +pub trait DmaFenceCbFunc {
> > + /// The callback function. `cb` is a container of the data which the driver passed in
> > + /// [`DmaFence::register_callback`].
> > + fn callback(cb: Pin<KBox<DmaFenceCb<Self>>>)
> > + where
> > + Self: Sized;
> > +}
> > +
> > +/// Container for driver data which the driver gets back in its callback once the fence gets
> > +/// signalled.
> > +#[pin_data]
> > +pub struct DmaFenceCb<T: DmaFenceCbFunc> {
> > + /// C struct needed for the backend.
> > + #[pin]
> > + inner: Opaque<bindings::dma_fence_cb>,
> > + /// Driver data.
> > + #[pin]
> > + pub data: T,
>
> We should probably deref to this type.
As a transparent container you mean. I can put it on the todo-list.
>
> > +}
> > +
> > +impl<T: DmaFenceCbFunc + 'static> DmaFenceCb<T> {
> >
[…]
> > +impl ArcBorrow<'_, DmaFenceCtx> {
> > + /// Create a new fence, consuming `data`.
> > + ///
> > + /// The fence will increment the refcount of the fence context associated with this
> > + /// [`DmaFenceCtx`].
> > + pub fn new_fence<T>(
> > + &mut self,
> > + data: impl PinInit<T>,
> > + ) -> Result<ARef<DmaFence<T>>> {
> > + let fctx: Arc<DmaFenceCtx> = (*self).into();
> > + let seqno: u64 = fctx.get_new_fence_seqno();
> > +
> > + // TODO: Should we reset seqno in case of failure?
>
> I think we should go back to the old value, yeah.
It would be trivial to implement that (just atomic.decrement()).
The thing why the TODO even exists is that I'm a bit unsure about
races. It seems we have to choose between either a gap in the seqnos or
the possiblity of seqnos being out of order.
If the user / driver creates fences with >1 thread on a fence context,
I mean.
We're pretty free in our choices, however. The shared fence-fctx
spinlock will be removed anyways, so one could later easily replace the
fctx atomic with a lock if that's desirable.
I can implement a seqno-decrement for now.
>
> > + // Pass `fctx` by value so that the fence will hold a reference to the DmaFenceCtx as long
> > + // as it lives.
> > + DmaFence::new(fctx, data, &self.lock, self.nr, seqno)
> > + }
> > +}
> > +
> >
[…]
> > +
> > +impl<T> DmaFence<T> {
> > + // TODO: There could be a subtle potential problem here? The LLVM compiler backend can create
> > + // several versions of this constant. Their content would be identical, but their addresses
> > + // different.
> > + const OPS: bindings::dma_fence_ops = Self::ops_create();
> > +
> > + /// Create an initializer for a new [`DmaFence`].
> > + fn new(
> > + fctx: Arc<DmaFenceCtx>,
> > + data: impl PinInit<T>, // TODO: The driver data should implement PinInit<T, Error>
> > + lock: &Opaque<bindings::spinlock_t>,
>
> I wonder if we should take a Rust lock directly?
Yes, good idea; Boqun has suggested that in the first dma_fence RFC,
too. It will be as simple as SpinLock<()>.
Will do in dma_fence RFC v2 and rebase the Jobqueue on it.
>
> > + context: u64,
> > + seqno: u64,
> > + ) -> Result<ARef<Self>> {
> > + let fence = pin_init!(Self {
> > + inner <- Opaque::ffi_init(|slot: *mut bindings::dma_fence| {
> > + let lock_ptr = &raw const (*lock);
> > + // SAFETY: `slot` is a valid pointer to an uninitialized `struct dma_fence`.
> > + // `lock_ptr` is a pointer to the spinlock of the fence context, which is shared
>
> Then we should perhaps extract that lock from the fence context itself, instead
> of passing it as an argument? This relationship is not enforced in the current
> code.
See the series linked in the cover letter. Soon fences will all have
their own lock, and the fctx will either just be the atomic seqno or
have a separate lock.
>
> > + // among all the fences. This can't become a UAF because each fence takes a
> > + // reference of the fence context.
> > + unsafe { bindings::dma_fence_init(slot, &Self::OPS, Opaque::cast_into(lock_ptr), context, seqno) };
> > + }),
> > + data <- data,
> > + signalling: false,
> > + signalling_cookie: false,
> > + fctx: fctx,
> > + });
> > +
> > + let b = KBox::pin_init(fence, GFP_KERNEL)?;
> > +
> > + // SAFETY: We don't move the contents of `b` anywhere here. After unwrapping it, ARef will
> > + // take care of preventing memory moves.
> > + let rawptr = KBox::into_raw(unsafe { Pin::into_inner_unchecked(b) });
> > +
> > + // SAFETY: `rawptr` was created validly above.
> > + let aref = unsafe { ARef::from_raw(NonNull::new_unchecked(rawptr)) };
> > +
> > + Ok(aref)
> > + }
> > +
> > + /// Mark the beginning of a DmaFence signalling critical section. Should be called once a fence
> > + /// gets published.
> > + ///
> > + /// The signalling critical section is marked as finished automatically once the fence signals.
> > + pub fn begin_signalling(&mut self) {
> > + // FIXME: this needs to be mutable, obviously, but we can't borrow mutably. *sigh*
>
> Is AtomicBool going away? Otherwise can you expand?
The AtomicBool is just used in the example demo code.
The issue here is that begin_signalling() should set a "this fence is
currently in the signalling section"-flag. So the fence needs to be
mutable. Then, however, Rust complains because self.signalling is not
protected by any lock.
So one needs some sort of synchronization. Stuffing a DmaFence into a
SpinLock would be overkill, however, considering that the C code
already takes care of properly taking all locks.
I've asked about that problem on Zulip once:
https://rust-for-linux.zulipchat.com/#narrow/channel/288089-General/topic/.E2.9C.94.20ARef.20without.20locking/near/539747635
Haven't looked deeper into solving it since, because those lockdep
guards are kind of nice-to-have at the moment.
I think the solution will be to make self.signalling an AtomicBool (I
think you meant that above?)
>
> > + self.signalling = true;
> > + // TODO: Should we warn if a user tries to do this several times for the same
> > + // fence? And should we ignore the request if the fence is already signalled?
> > +
> > + // SAFETY: `dma_fence_begin_signalling()` works on global lockdep data and calling it is
> > + // always safe.
> > + self.signalling_cookie = unsafe { bindings::dma_fence_begin_signalling() };
> > + }
> > +
> >
> > +
> > + /// Signal the fence. This will invoke all registered callbacks.
> > + pub fn signal(&self) -> Result {
> > + // SAFETY: `self` is refcounted.
> > + let ret = unsafe { bindings::dma_fence_signal(self.as_raw()) };
> > + if ret != 0 {
> > + return Err(Error::from_errno(ret));
> > + }
>
> Do you think it’s worth it to have a separate error type for when fences
> are already signaled? I am asking, but I am not convinced either, FYI.
My tendency so far was to mostly ignore it. All users in C don't care
whether the fence already was signaled. dma_fence just returns -EINVAL
in that case (not a good choice anyways IMO).
AFAIR I was close to not having Rust's signal() return an error at all,
because it's kind of useless?
But, correct me if I'm wrong, I think the policy with Rust abstractions
is to map the C API as closely as possible?
Anyways, I'd expect Rust drivers to ignore that error, too.
>
> > +
> > + if self.signalling {
> > + // SAFETY: `dma_fence_end_signalling()` works on global lockdep data. The only
> > + // parameter is a boolean passed by value.
> > + unsafe { bindings::dma_fence_end_signalling(self.signalling_cookie) };
> > + }
> > +
> > + Ok(())
> > + }
> > +
> > + /// Register a callback on a [`DmaFence`]. The callback will be invoked in the fence's
> > + /// signalling path, i.e., critical section.
> > + ///
> > + /// Consumes `data`. `data` is passed back to the implemented callback function when the fence
> > + /// gets signalled.
> > + pub fn register_callback<U: DmaFenceCbFunc + 'static>(&self, data: impl PinInit<U>) -> Result {
> > + let cb = DmaFenceCb::new(data)?;
> > + let ptr = cb.into_foreign() as *mut DmaFenceCb<U>;
> > + // SAFETY: `ptr` was created validly directly above.
> > + let inner_cb = unsafe { (*ptr).inner.get() };
> > +
> > + // SAFETY: `self.as_raw()` is valid because `self` is refcounted, `inner_cb` was created
> > + // validly above and was turned into a ForeignOwnable, so it won't be dropped. `callback`
> > + // has static life time.
> > + let ret = unsafe {
> > + bindings::dma_fence_add_callback(
> > + self.as_raw(),
> > + inner_cb,
> > + Some(DmaFenceCb::<U>::callback),
> > + )
> > + };
> > + if ret != 0 {
> > + return Err(Error::from_errno(ret));
>
> Related to the question above: not being able to place a callback seems to be
> common in the DRM scheduler (i.e.: the fence has signaled already). Hence my
> question about a separate error type, as we would have to single out
> Err(EINVAL) often otherwise.
Interesting, dma_fence_add_callback() actually returns a different
error code, ENOENT, if the fence was signalled already, whereas
dma_fence_signal() gives EINVAL.
That's kind of not optimal. I'll consider fixing that on the C side.
Regardless, what's relevant for us here is that not being able to
register a callback is a serious failure that needs to be checked,
whereas trying to signal an already signaled fence is valid behavior
and no big deal.
If you try to register a callback on an already signaled fence, that
effectively means that you as the registering party need to take care
of the callback's code being executed, since dma_fence won't do that
anymore. The jobqueue design is catching that problem through the API
design; you were asking about the cb-registering API in the other mail,
I'll answer there.
Thx for the review!
P.
next prev parent reply other threads:[~2025-11-25 9:48 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-18 13:25 [RFC WIP 0/3] DRM Jobqueue Philipp Stanner
2025-11-18 13:25 ` [RFC WIP 1/3] rust: list: Add unsafe for container_of Philipp Stanner
2025-11-18 13:25 ` [RFC WIP 2/3] rust: sync: Add dma_fence abstractions Philipp Stanner
2025-11-21 23:03 ` Lyude Paul
2025-11-24 9:31 ` Alice Ryhl
2025-11-27 13:45 ` Philipp Stanner
2025-11-24 12:49 ` Daniel Almeida
2025-11-25 9:48 ` Philipp Stanner [this message]
2025-11-25 10:58 ` Boris Brezillon
2025-11-25 12:20 ` Philipp Stanner
2025-11-25 13:50 ` Boris Brezillon
2025-11-26 13:42 ` Philipp Stanner
2025-11-26 14:55 ` Boris Brezillon
2025-11-27 13:48 ` Daniel Almeida
2025-11-28 11:08 ` Philipp Stanner
2025-11-28 12:21 ` Daniel Almeida
2025-11-18 13:25 ` [RFC WIP 3/3] rust/drm: Add initial jobqueue sceleton Philipp Stanner
2025-11-24 13:58 ` Daniel Almeida
2025-11-25 13:20 ` Philipp Stanner
2025-11-27 14:13 ` Daniel Almeida
2025-11-28 10:07 ` Philipp Stanner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bc4f01ec5172d29abd64429e3017cc53c0522e01.camel@mailbox.org \
--to=phasta@mailbox.org \
--cc=acourbot@nvidia.com \
--cc=airlied@redhat.com \
--cc=aliceryhl@google.com \
--cc=boris.brezillon@collabora.com \
--cc=ckoenig.leichtzumerken@gmail.com \
--cc=dakr@kernel.org \
--cc=daniel.almeida@collabora.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=linux-kernel@vger.kernel.org \
--cc=lyude@redhat.com \
--cc=pcolberg@redhat.com \
--cc=phasta@kernel.org \
--cc=rust-for-linux@vger.kernel.org \
--cc=tursulin@ursulin.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).