Re: [PATCH v3 3/4] rust: Add dma_fence abstractions

All of lore.kernel.org
 help / color / mirror / Atom feed

From: sashiko-bot@kernel.org
To: "Philipp Stanner" <phasta@kernel.org>
Cc: ojeda@kernel.org, dri-devel@lists.freedesktop.org
Subject: Re: [PATCH v3 3/4] rust: Add dma_fence abstractions
Date: Tue, 23 Jun 2026 20:33:11 +0000	[thread overview]
Message-ID: <20260623203311.64BB01F00A3A@smtp.kernel.org> (raw)
In-Reply-To: <20260618132628.3811068-5-phasta@kernel.org>

Thank you for your contribution! Sashiko AI review found 6 potential issue(s) to consider:
- [Critical] Early publication of the fence callback exposes uninitialized memory to concurrent execution.
- [High] `Fence::is_signaled()` unconditionally acquires the fence's spinlock, causing a CPU deadlock if called from a fence callback.
- [High] Missing `dma_fence_begin_signalling()` and `dma_fence_end_signalling()` lockdep annotations breaks deadlock detection.
- [High] Aliasing violation in `FenceCbRegistration` Drop causes Undefined Behavior due to concurrent mutable references.
- [High] Missing `'static` bound on `DriverFence` generic types allows deferred destruction of short-lived references, leading to Use-After-Free.
- [Medium] Dropping `FenceCtx<T>` and user data in RCU softirq context triggers 'sleeping in atomic context' kernel BUG.
--

commit 77832e103c4c07b8e062803330d24f8483753627
Author: Philipp Stanner <phasta@kernel.org>

rust: Add dma_fence abstractions

This patch adds Rust abstractions for the dma_fence synchronization primitives.
It provides DriverFence and Fence types to decouple fence issuers and
consumers. It also implements callbacks and RCU-based destruction logic.

> diff --git a/rust/kernel/dma_buf/dma_fence.rs b/rust/kernel/dma_buf/dma_fence.rs
> new file mode 100644
> index 0000000000000..dcb41c8733aca
> --- /dev/null
> +++ b/rust/kernel/dma_buf/dma_fence.rs

[ ... ]

> +    pub fn new<'a>(fence: &'a Fence, callback: T) -> impl PinInit<Self, CallbackError<T>> + 'a
> +    where
> +        T: 'a,
> +    {
> +        let mut callback = Some(callback);
> +
> +        try_pin_init!(Self {
> +            cb <- Opaque::try_ffi_init(|ptr| {
> +                // SAFETY: `fence.inner.get()` is a valid, initialized `struct dma_fence`. `ptr`
> +                // points to the `struct dma_fence_cb` field within the pinned allocation, so it
> +                // remains valid until `dma_fence_remove_callback()` in `PinnedDrop` or until the
> +                // callback fires.
> +                let ret = unsafe {
> +                    to_result(bindings::dma_fence_add_callback(
> +                        fence.inner.get(),
> +                        ptr,
> +                        Some(Self::dma_fence_callback),
> +                    ))
> +                };
> +                match ret {
> +                    Ok(()) => Ok(()),
> +                    Err(e) => {
> +                        let cb_back = callback.take().expect("Success path did not run.");
> +                        if e == ENOENT {
> +                            Err(CallbackError::AlreadySignaled(cb_back))
> +                        } else {
> +                            Err(CallbackError::Other(e))
> +                        }
> +                    }
> +                }
> +            }),
> +            callback: callback.take().expect("Error path did not run."),

[Severity: Critical]
Does this code expose uninitialized memory to concurrent execution? By
initializing `cb` and calling `bindings::dma_fence_add_callback()` first, the
callback is published to the C backend and the spinlock is unlocked.

If a concurrent `dma_fence_signal` on another CPU triggers the callback
immediately, could `dma_fence_callback` attempt to access
`(*reg).callback.called()` before the `try_pin_init!` macro proceeds to
initialize the `callback` field?

[ ... ]

> +    unsafe extern "C" fn dma_fence_callback(
> +        _fence: *mut bindings::dma_fence,
> +        cb: *mut bindings::dma_fence_cb,
> +    ) {
> +        let ptr = Opaque::cast_from(cb).cast_mut();
> +
> +        // SAFETY: All `cb` we can receive here have been created in such a way
> +        // that they are embedded into a `FenceCbRegistration`. The backend
> +        // ensures synchronisation so whoever holds the registration object
> +        // cannot drop it while this code is running. See `FenceCbRegistration::drop`.
> +        unsafe {
> +            let reg: *mut Self = container_of!(ptr, Self, cb);
> +
> +            (*reg).callback.called();
> +        }
> +    }

[ ... ]

> +#[pinned_drop]
> +impl<T: FenceCb> PinnedDrop for FenceCbRegistration<T> {
> +    fn drop(self: Pin<&mut Self>) {

[Severity: High]
Could this result in an aliasing violation and Undefined Behavior?

When `FenceCbRegistration` drops, `PinnedDrop::drop` takes an exclusive
`Pin<&mut Self>`. If the fence signals concurrently, another CPU could execute
`dma_fence_callback` and invoke `(*reg).callback.called(&mut self)` using a
raw pointer.

Does taking an exclusive reference while another thread actively accesses the
data concurrently via a raw pointer violate Rust's aliasing rules?

[ ... ]

> +    pub fn is_signaled(&self) -> bool {
> +        let fence = self.as_raw();
> +        let mut fence_flags: usize = 0;
> +        let flag_ptr = &raw mut fence_flags;
> +
> +        // We shouuld not use `dma_fence_is_signaled_locked()` here, because
> +        // according to the C backend's recommendations, that function is problematic
> +        // and we should avoid calling that function with a lock held.
> +
> +        // SAFETY: `self` is valid by definition. We take the spinlock above.
> +        let ret = unsafe { bindings::dma_fence_is_signaled(fence) };
> +
> +        // To guarantee that an API caller can 100% rely on the signalling being
> +        // completed (i.e., all fence callbacks ran), we have to take the lock.
> +        //
> +        // The reason is that the C dma_fence backend currently does not carefully
> +        // synchronize the `dma_fence_is_signaled()` function with the proper
> +        // spinlock. This can lead to the function returning `true` while fence
> +        // callbacks are still being executed. This can be mitigated by guarding
> +        // the entire function with the spinlock.
> +        //
> +        // See commit c8a5d5ea3ba6a.
> +
> +        // SAFETY: `fence` is valid because `self` is valid. `flag_ptr` is
> +        // merely a pointer to an integer, which lives as long as this function.
> +        unsafe { bindings::dma_fence_lock_irqsave(fence, flag_ptr) };

[Severity: High]
Can this code cause a CPU deadlock if called from a fence callback?

The Rust `Fence::is_signaled()` wrapper unconditionally acquires the fence's
spinlock via `dma_fence_lock_irqsave()`. However, the C backend's
`dma_fence_signal_timestamp_locked` executes callbacks with the fence's
spinlock already held.

If an API user calls `is_signaled()` from within the `FenceCb::called()`
callback on the same fence, wouldn't it attempt to acquire the non-recursive
spinlock a second time?

> +        // SAFETY: `fence` is valid because `self` is valid. `flag_ptr` is
> +        // merely a pointer to an integer, which lives as long as this function.
> +        // The lock was taken above.
> +        unsafe { bindings::dma_fence_unlock_irqrestore(fence, flag_ptr) };
> +
> +        ret
> +    }

[ ... ]

> +    pub fn signal(self, res: Result) {
> +        let fence = self.as_raw();
> +        let mut fence_flags: usize = 0;
> +        let flag_ptr = &raw mut fence_flags;
> +
> +        // SAFETY: `fence` is valid because `self` is valid. `flag_ptr` is
> +        // merely a pointer to an integer, which lives as long as this function.
> +        unsafe { bindings::dma_fence_lock_irqsave(fence, flag_ptr) };
> +        // SAFETY: `fence` is valid because `self` is valid. The lock must be
> +        // held, which we acquired directly above.
> +        if !unsafe { bindings::dma_fence_test_signaled_flag(fence) } {
> +            if let Err(err) = res {
> +                // SAFETY: `fence` is valid because `self` is valid. The fence
> +                // must not have been signaled yet, which we check directly above.
> +                unsafe { bindings::dma_fence_set_error(fence, err.to_errno()) };
> +            }
> +            // SAFETY: `fence` is valid because `self` is valid. The lock must
> +            // be held, which we acquired above.
> +            unsafe { bindings::dma_fence_signal_locked(fence) };

[Severity: High]
Are we missing the mandatory lockdep annotations for signaling here?

The `dma_fence` subsystem contract mandates that any code signaling a fence
must be annotated with `dma_fence_begin_signalling()` and
`dma_fence_end_signalling()` to prime lockdep for cross-driver deadlock
detection.

By manually signaling the fence with `dma_fence_signal_locked()` without these
annotations, does this silently disable lockdep's deadlock tracking for Rust
drivers?

> +        }
> +        // SAFETY: `fence` is valid because `self` is valid. `flag_ptr` is
> +        // merely a pointer to an integer, which lives as long as this function.
> +        // The lock must be held, which we acquired above.
> +        unsafe { bindings::dma_fence_unlock_irqrestore(fence, flag_ptr) };
> +    }

[ ... ]

> +impl<T: Send + Sync + FenceCtxOps> Drop for DriverFence<T> {
> +    fn drop(&mut self) {
> +        let fence = self.as_raw();
> +        let mut fence_flags: usize = 0;
> +        let flag_ptr = &raw mut fence_flags;
> +
> +        // SAFETY: `fence` is valid until the `call_rcu()` below. `flag_ptr` is
> +        // merely a pointer to an integer, which lives as long as this function.
> +        unsafe { bindings::dma_fence_lock_irqsave(fence, flag_ptr) };
> +
> +        // Use dma_fence_test_signaled_flag() instead of dma_fence_is_signaled_locked()
> +        // because the C backend wants to get rid of the latter.
> +
> +        // SAFETY: `fence` is valid until the `call_rcu()` below.
> +        let signaled: bool = unsafe { bindings::dma_fence_test_signaled_flag(fence) };
> +        if warn_on!(!signaled) {
> +            // SAFETY: `fence` is valid until the `call_rcu()` below. The fence
> +            // must not have been signaled yet, which we check directly above.
> +            unsafe { bindings::dma_fence_set_error(fence, ECANCELED.to_errno()) };
> +            // SAFETY: `fence` is valid until the `call_rcu()` below. The lock must
> +            // be held, which we acquired above.
> +            unsafe { bindings::dma_fence_signal_locked(fence) };

[Severity: High]
Does this drop signaling path also omit the required
`dma_fence_begin_signalling()` and `dma_fence_end_signalling()` lockdep
annotations?

> +        }

[ ... ]

> +unsafe extern "C" fn drop_driver_fence_data<T: Send + Sync + FenceCtxOps>(
> +    head: *mut bindings::callback_head,
> +) {
> +    // SAFETY: Caller provides a pointer to the `rcu_head` field of a `DriverFenceData<C>`.
> +    let fence_data = unsafe { container_of!(head, DriverFenceData<T>, rcu_head) };
> +
> +    // SAFETY: `fence_data` was created validly above. All the fence's data will
> +    // only drop below, but the raw pointer to the raw C `dma_fence` remains
> +    // valid because the reference count is only decremented at the end of the
> +    // function.
> +    let fence = unsafe { (*fence_data).inner.inner.get() };
> +
> +    // SAFETY: `fence_data` was created validly above. A grace period has passed.
> +    // All callbacks which might have had access to the `fctx` are gone now.
> +    unsafe { drop_in_place(&raw mut (*fence_data).fctx) };

[Severity: Medium]
Since `FenceCtx<T>` and the generic context data `T` are dropped here in
softirq (atomic) context via `call_rcu`, could this trigger a "sleeping in
atomic context" kernel BUG if `T` performs sleepable operations in its `Drop`
implementation?

Would it be helpful to add documentation or trait bounds warning users that
these types must be drop-safe in atomic context?

> +
> +    // SAFETY: `fence_data` was created validly above. The user has already
> +    // dropped the only conventional accessor to the user data, the `DriverFence`,
> +    // one grace period ago. All accessors are gone now.
> +    unsafe { drop_in_place(&raw mut (*fence_data).data) };

[Severity: High]
Does this missing `'static` bound on `DriverFence` generic types allow the
deferred destruction of short-lived references?

Since `DriverFence<T>` does not require `T` to be bounded by `'static'`, a user
could instantiate it with a type holding a short-lived reference (e.g., to a
local stack variable).

When `DriverFence` drops, `call_rcu` defers the actual destruction to
`drop_driver_fence_data::<T>`, which escapes the borrow checker. If the stack
variable goes out of scope before the RCU callback runs, will `drop_in_place`
access a dangling pointer and cause a Use-After-Free?

> +
> +    // The inner `Fence` explicitly does not get dropped because there may be
> +    // many more users / consumers, each holding their own reference.
> +
> +    // SAFETY: Once a `DriverFence` is initialized, the inner `fence` is
> +    // valid and initialized. It is valid until the refcount drops
> +    // to 0, which can earliest happen once we drop the `DriverFence`'s reference
> +    // here.
> +    unsafe { bindings::dma_fence_put(fence) };
> +
> +    // The actual memory the data associated with a `DriverFence` lives in
> +    // gets freed by the C dma_fence backend once the fence's refcount reaches 0.
> +}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260618132628.3811068-2-phasta@kernel.org?part=3

next prev parent reply	other threads:[~2026-06-23 20:33 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-18 13:26 [PATCH v3 0/4] rust / dma_buf: Add abstractions for dma_fence Philipp Stanner
2026-06-18 13:26 ` [PATCH v3 1/4] rust: types: implement ForeignOwnable for ARef<T> Philipp Stanner
2026-06-23 20:18   ` sashiko-bot
2026-06-18 13:26 ` [PATCH v3 2/4] rust: error: Add ECANCELED error code Philipp Stanner
2026-06-18 19:14   ` Miguel Ojeda
2026-06-18 13:26 ` [PATCH v3 3/4] rust: Add dma_fence abstractions Philipp Stanner
2026-06-23 20:33   ` sashiko-bot [this message]
2026-06-18 13:26 ` [PATCH v3 4/4] MAINTAINERS: Add entry for Rust dma-buf Philipp Stanner
2026-06-18 13:30   ` Christian König
2026-06-18 15:07     ` Sumit Semwal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260623203311.64BB01F00A3A@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=ojeda@kernel.org \
    --cc=phasta@kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.