* [RFC PATCH 1/4] rust: list: Add unsafe for container_of
2026-02-03 8:13 [RFC PATCH 0/4] Add dma_fence abstractions and DRM Jobqueue Philipp Stanner
@ 2026-02-03 8:14 ` Philipp Stanner
2026-02-03 15:25 ` Gary Guo
2026-02-04 10:30 ` Alice Ryhl
2026-02-03 8:14 ` [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions Philipp Stanner
` (3 subsequent siblings)
4 siblings, 2 replies; 103+ messages in thread
From: Philipp Stanner @ 2026-02-03 8:14 UTC (permalink / raw)
To: David Airlie, Simona Vetter, Danilo Krummrich, Alice Ryhl,
Gary Guo, Benno Lossin, Christian König, Boris Brezillon,
Daniel Almeida, Joel Fernandes
Cc: linux-kernel, dri-devel, rust-for-linux, Philipp Stanner, stable
impl_list_item_mod.rs calls container_of() without unsafe blocks at a
couple of places. Since container_of() is an unsafe macro / function,
the blocks are strictly necessary.
For unknown reasons, that problem was so far not visible and only gets
visible once one utilizes the list implementation from within the core
crate:
error[E0133]: call to unsafe function `core::ptr::mut_ptr::<impl *mut T>::byte_sub`
is unsafe and requires unsafe block
--> rust/kernel/lib.rs:252:29
|
252 | let container_ptr = field_ptr.byte_sub(offset).cast::<$Container>();
| ^^^^^^^^^^^^^^^^^^^^^^^^^^ call to unsafe function
|
::: rust/kernel/drm/jq.rs:98:1
|
98 | / impl_list_item! {
99 | | impl ListItem<0> for BasicItem { using ListLinks { self.links }; }
100 | | }
| |_- in this macro invocation
|
note: an unsafe function restricts its caller, but its body is safe by default
--> rust/kernel/list/impl_list_item_mod.rs:216:13
|
216 | unsafe fn view_value(me: *mut $crate::list::ListLinks<$num>) -> *const Self {
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
::: rust/kernel/drm/jq.rs:98:1
|
98 | / impl_list_item! {
99 | | impl ListItem<0> for BasicItem { using ListLinks { self.links }; }
100 | | }
| |_- in this macro invocation
= note: requested on the command line with `-D unsafe-op-in-unsafe-fn`
= note: this error originates in the macro `$crate::container_of` which comes
from the expansion of the macro `impl_list_item`
Add unsafe blocks to container_of to fix the issue.
Cc: stable@vger.kernel.org # v6.17+
Fixes: c77f85b347dd ("rust: list: remove OFFSET constants")
Suggested-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Philipp Stanner <phasta@kernel.org>
---
rust/kernel/list/impl_list_item_mod.rs | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/rust/kernel/list/impl_list_item_mod.rs b/rust/kernel/list/impl_list_item_mod.rs
index 202bc6f97c13..7052095efde5 100644
--- a/rust/kernel/list/impl_list_item_mod.rs
+++ b/rust/kernel/list/impl_list_item_mod.rs
@@ -217,7 +217,7 @@ unsafe fn view_value(me: *mut $crate::list::ListLinks<$num>) -> *const Self {
// SAFETY: `me` originates from the most recent call to `prepare_to_insert`, so it
// points at the field `$field` in a value of type `Self`. Thus, reversing that
// operation is still in-bounds of the allocation.
- $crate::container_of!(me, Self, $($field).*)
+ unsafe { $crate::container_of!(me, Self, $($field).*) }
}
// GUARANTEES:
@@ -242,7 +242,7 @@ unsafe fn post_remove(me: *mut $crate::list::ListLinks<$num>) -> *const Self {
// SAFETY: `me` originates from the most recent call to `prepare_to_insert`, so it
// points at the field `$field` in a value of type `Self`. Thus, reversing that
// operation is still in-bounds of the allocation.
- $crate::container_of!(me, Self, $($field).*)
+ unsafe { $crate::container_of!(me, Self, $($field).*) }
}
}
)*};
@@ -270,9 +270,9 @@ unsafe fn prepare_to_insert(me: *const Self) -> *mut $crate::list::ListLinks<$nu
// SAFETY: The caller promises that `me` points at a valid value of type `Self`.
let links_field = unsafe { <Self as $crate::list::ListItem<$num>>::view_links(me) };
- let container = $crate::container_of!(
+ let container = unsafe { $crate::container_of!(
links_field, $crate::list::ListLinksSelfPtr<Self, $num>, inner
- );
+ ) };
// SAFETY: By the same reasoning above, `links_field` is a valid pointer.
let self_ptr = unsafe {
@@ -319,9 +319,9 @@ unsafe fn view_links(me: *const Self) -> *mut $crate::list::ListLinks<$num> {
// `ListArc` containing `Self` until the next call to `post_remove`. The value cannot
// be destroyed while a `ListArc` reference exists.
unsafe fn view_value(links_field: *mut $crate::list::ListLinks<$num>) -> *const Self {
- let container = $crate::container_of!(
+ let container = unsafe { $crate::container_of!(
links_field, $crate::list::ListLinksSelfPtr<Self, $num>, inner
- );
+ ) };
// SAFETY: By the same reasoning above, `links_field` is a valid pointer.
let self_ptr = unsafe {
--
2.49.0
^ permalink raw reply related [flat|nested] 103+ messages in thread* Re: [RFC PATCH 1/4] rust: list: Add unsafe for container_of
2026-02-03 8:14 ` [RFC PATCH 1/4] rust: list: Add unsafe for container_of Philipp Stanner
@ 2026-02-03 15:25 ` Gary Guo
2026-02-04 10:30 ` Alice Ryhl
1 sibling, 0 replies; 103+ messages in thread
From: Gary Guo @ 2026-02-03 15:25 UTC (permalink / raw)
To: Philipp Stanner, David Airlie, Simona Vetter, Danilo Krummrich,
Alice Ryhl, Gary Guo, Benno Lossin, Christian König,
Boris Brezillon, Daniel Almeida, Joel Fernandes
Cc: linux-kernel, dri-devel, rust-for-linux, stable
On Tue Feb 3, 2026 at 8:14 AM GMT, Philipp Stanner wrote:
> impl_list_item_mod.rs calls container_of() without unsafe blocks at a
> couple of places. Since container_of() is an unsafe macro / function,
> the blocks are strictly necessary.
>
> For unknown reasons, that problem was so far not visible and only gets
> visible once one utilizes the list implementation from within the core
> crate:
The reason is that the error enabled via "unsafe-op-in-unsafe-fn" is a lint
rather than a hard compiler error, and Rust suppresses lints triggered inside
macro from another crate.
When the macro is used in kernel crate itself, it's no longer suppressed.
>
> error[E0133]: call to unsafe function `core::ptr::mut_ptr::<impl *mut T>::byte_sub`
> is unsafe and requires unsafe block
> --> rust/kernel/lib.rs:252:29
> |
> 252 | let container_ptr = field_ptr.byte_sub(offset).cast::<$Container>();
> | ^^^^^^^^^^^^^^^^^^^^^^^^^^ call to unsafe function
> |
> ::: rust/kernel/drm/jq.rs:98:1
> |
> 98 | / impl_list_item! {
> 99 | | impl ListItem<0> for BasicItem { using ListLinks { self.links }; }
> 100 | | }
> | |_- in this macro invocation
> |
> note: an unsafe function restricts its caller, but its body is safe by default
> --> rust/kernel/list/impl_list_item_mod.rs:216:13
> |
> 216 | unsafe fn view_value(me: *mut $crate::list::ListLinks<$num>) -> *const Self {
> | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> |
> ::: rust/kernel/drm/jq.rs:98:1
> |
> 98 | / impl_list_item! {
> 99 | | impl ListItem<0> for BasicItem { using ListLinks { self.links }; }
> 100 | | }
> | |_- in this macro invocation
> = note: requested on the command line with `-D unsafe-op-in-unsafe-fn`
> = note: this error originates in the macro `$crate::container_of` which comes
> from the expansion of the macro `impl_list_item`
>
> Add unsafe blocks to container_of to fix the issue.
>
> Cc: stable@vger.kernel.org # v6.17+
> Fixes: c77f85b347dd ("rust: list: remove OFFSET constants")
> Suggested-by: Alice Ryhl <aliceryhl@google.com>
> Signed-off-by: Philipp Stanner <phasta@kernel.org>
Reviewed-by: Gary Guo <gary@garyguo.net>
Can you send it as a standalone patch so it's clear that this is intended to be
picked rather than part of the RFC series?
Best,
Gary
> ---
> rust/kernel/list/impl_list_item_mod.rs | 12 ++++++------
> 1 file changed, 6 insertions(+), 6 deletions(-)
^ permalink raw reply [flat|nested] 103+ messages in thread* Re: [RFC PATCH 1/4] rust: list: Add unsafe for container_of
2026-02-03 8:14 ` [RFC PATCH 1/4] rust: list: Add unsafe for container_of Philipp Stanner
2026-02-03 15:25 ` Gary Guo
@ 2026-02-04 10:30 ` Alice Ryhl
1 sibling, 0 replies; 103+ messages in thread
From: Alice Ryhl @ 2026-02-04 10:30 UTC (permalink / raw)
To: Philipp Stanner
Cc: David Airlie, Simona Vetter, Danilo Krummrich, Gary Guo,
Benno Lossin, Christian König, Boris Brezillon,
Daniel Almeida, Joel Fernandes, linux-kernel, dri-devel,
rust-for-linux, stable
On Tue, Feb 03, 2026 at 09:14:00AM +0100, Philipp Stanner wrote:
> impl_list_item_mod.rs calls container_of() without unsafe blocks at a
> couple of places. Since container_of() is an unsafe macro / function,
> the blocks are strictly necessary.
>
> For unknown reasons, that problem was so far not visible and only gets
> visible once one utilizes the list implementation from within the core
> crate:
>
> error[E0133]: call to unsafe function `core::ptr::mut_ptr::<impl *mut T>::byte_sub`
> is unsafe and requires unsafe block
> --> rust/kernel/lib.rs:252:29
> |
> 252 | let container_ptr = field_ptr.byte_sub(offset).cast::<$Container>();
> | ^^^^^^^^^^^^^^^^^^^^^^^^^^ call to unsafe function
> |
> ::: rust/kernel/drm/jq.rs:98:1
> |
> 98 | / impl_list_item! {
> 99 | | impl ListItem<0> for BasicItem { using ListLinks { self.links }; }
> 100 | | }
> | |_- in this macro invocation
> |
> note: an unsafe function restricts its caller, but its body is safe by default
> --> rust/kernel/list/impl_list_item_mod.rs:216:13
> |
> 216 | unsafe fn view_value(me: *mut $crate::list::ListLinks<$num>) -> *const Self {
> | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> |
> ::: rust/kernel/drm/jq.rs:98:1
> |
> 98 | / impl_list_item! {
> 99 | | impl ListItem<0> for BasicItem { using ListLinks { self.links }; }
> 100 | | }
> | |_- in this macro invocation
> = note: requested on the command line with `-D unsafe-op-in-unsafe-fn`
> = note: this error originates in the macro `$crate::container_of` which comes
> from the expansion of the macro `impl_list_item`
>
> Add unsafe blocks to container_of to fix the issue.
>
> Cc: stable@vger.kernel.org # v6.17+
> Fixes: c77f85b347dd ("rust: list: remove OFFSET constants")
> Suggested-by: Alice Ryhl <aliceryhl@google.com>
> Signed-off-by: Philipp Stanner <phasta@kernel.org>
With the reason that Gary shared added to the commit message:
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
> ---
> rust/kernel/list/impl_list_item_mod.rs | 12 ++++++------
> 1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/rust/kernel/list/impl_list_item_mod.rs b/rust/kernel/list/impl_list_item_mod.rs
> index 202bc6f97c13..7052095efde5 100644
> --- a/rust/kernel/list/impl_list_item_mod.rs
> +++ b/rust/kernel/list/impl_list_item_mod.rs
> @@ -217,7 +217,7 @@ unsafe fn view_value(me: *mut $crate::list::ListLinks<$num>) -> *const Self {
> // SAFETY: `me` originates from the most recent call to `prepare_to_insert`, so it
> // points at the field `$field` in a value of type `Self`. Thus, reversing that
> // operation is still in-bounds of the allocation.
> - $crate::container_of!(me, Self, $($field).*)
> + unsafe { $crate::container_of!(me, Self, $($field).*) }
> }
>
> // GUARANTEES:
> @@ -242,7 +242,7 @@ unsafe fn post_remove(me: *mut $crate::list::ListLinks<$num>) -> *const Self {
> // SAFETY: `me` originates from the most recent call to `prepare_to_insert`, so it
> // points at the field `$field` in a value of type `Self`. Thus, reversing that
> // operation is still in-bounds of the allocation.
> - $crate::container_of!(me, Self, $($field).*)
> + unsafe { $crate::container_of!(me, Self, $($field).*) }
> }
> }
> )*};
> @@ -270,9 +270,9 @@ unsafe fn prepare_to_insert(me: *const Self) -> *mut $crate::list::ListLinks<$nu
> // SAFETY: The caller promises that `me` points at a valid value of type `Self`.
> let links_field = unsafe { <Self as $crate::list::ListItem<$num>>::view_links(me) };
>
> - let container = $crate::container_of!(
> + let container = unsafe { $crate::container_of!(
> links_field, $crate::list::ListLinksSelfPtr<Self, $num>, inner
> - );
> + ) };
It may be cleaner to write this as:
let container = unsafe {
$crate::container_of!(
links_field, $crate::list::ListLinksSelfPtr<Self, $num>, inner
)
};
Rustfmt has no effect on macro definitions, but if this was not a macro,
then I believe that rustfmt would format it like the above.
>
> // SAFETY: By the same reasoning above, `links_field` is a valid pointer.
> let self_ptr = unsafe {
> @@ -319,9 +319,9 @@ unsafe fn view_links(me: *const Self) -> *mut $crate::list::ListLinks<$num> {
> // `ListArc` containing `Self` until the next call to `post_remove`. The value cannot
> // be destroyed while a `ListArc` reference exists.
> unsafe fn view_value(links_field: *mut $crate::list::ListLinks<$num>) -> *const Self {
> - let container = $crate::container_of!(
> + let container = unsafe { $crate::container_of!(
> links_field, $crate::list::ListLinksSelfPtr<Self, $num>, inner
> - );
> + ) };
Ditto here.
Alice
^ permalink raw reply [flat|nested] 103+ messages in thread
* [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-03 8:13 [RFC PATCH 0/4] Add dma_fence abstractions and DRM Jobqueue Philipp Stanner
2026-02-03 8:14 ` [RFC PATCH 1/4] rust: list: Add unsafe for container_of Philipp Stanner
@ 2026-02-03 8:14 ` Philipp Stanner
2026-02-05 8:57 ` Boris Brezillon
` (2 more replies)
2026-02-03 8:14 ` [RFC PATCH 3/4] rust/drm: Add DRM Jobqueue Philipp Stanner
` (2 subsequent siblings)
4 siblings, 3 replies; 103+ messages in thread
From: Philipp Stanner @ 2026-02-03 8:14 UTC (permalink / raw)
To: David Airlie, Simona Vetter, Danilo Krummrich, Alice Ryhl,
Gary Guo, Benno Lossin, Christian König, Boris Brezillon,
Daniel Almeida, Joel Fernandes
Cc: linux-kernel, dri-devel, rust-for-linux, Philipp Stanner
dma_fence is a synchronization mechanism which is needed by virtually
all GPU drivers.
A dma_fence offers many features, among which the most important ones
are registering callbacks (for example to kick off a work item) which
get executed once a fence gets signalled.
dma_fence has a number of callbacks. Only the two most basic ones
(get_driver_name(), get_timeline_name()) are abstracted since they are
enough to enable the fundamental functionality.
Callbacks in Rust are registered by passing driver data which implements
the Rust callback trait, whose function will be called by the C backend.
dma_fence's are always refcounted, so implement AlwaysRefcounted for
them. Once a reference drops to zero, the C backend calls a release
function, where we implement drop_in_place() to conveniently marry that
C-cleanup mechanism with Rust's ownership concepts.
This patch provides basic functionality, but is still missing:
- An implementation of PinInit<T, Error> for all driver data.
- A clever implementation for working dma_fence_begin_signalling()
guards. See the corresponding TODO in the code.
- Abstractions for dma_fence_remove_callback() - needed to cleanly
decouple third parties from fences for life time soundness.
Signed-off-by: Philipp Stanner <phasta@kernel.org>
---
rust/bindings/bindings_helper.h | 1 +
rust/helpers/dma_fence.c | 28 +++
rust/helpers/helpers.c | 1 +
rust/helpers/spinlock.c | 5 +
rust/kernel/sync.rs | 2 +
rust/kernel/sync/dma_fence.rs | 396 ++++++++++++++++++++++++++++++++
6 files changed, 433 insertions(+)
create mode 100644 rust/helpers/dma_fence.c
create mode 100644 rust/kernel/sync/dma_fence.rs
diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
index 2e43c66635a2..fc3cb5eb0be5 100644
--- a/rust/bindings/bindings_helper.h
+++ b/rust/bindings/bindings_helper.h
@@ -51,6 +51,7 @@
#include <linux/device/faux.h>
#include <linux/dma-direction.h>
#include <linux/dma-mapping.h>
+#include <linux/dma-fence.h>
#include <linux/errname.h>
#include <linux/ethtool.h>
#include <linux/fdtable.h>
diff --git a/rust/helpers/dma_fence.c b/rust/helpers/dma_fence.c
new file mode 100644
index 000000000000..cc93297db4b1
--- /dev/null
+++ b/rust/helpers/dma_fence.c
@@ -0,0 +1,28 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/dma-fence.h>
+
+void rust_helper_dma_fence_get(struct dma_fence *f)
+{
+ dma_fence_get(f);
+}
+
+void rust_helper_dma_fence_put(struct dma_fence *f)
+{
+ dma_fence_put(f);
+}
+
+bool rust_helper_dma_fence_begin_signalling(void)
+{
+ return dma_fence_begin_signalling();
+}
+
+void rust_helper_dma_fence_end_signalling(bool cookie)
+{
+ dma_fence_end_signalling(cookie);
+}
+
+bool rust_helper_dma_fence_is_signaled(struct dma_fence *f)
+{
+ return dma_fence_is_signaled(f);
+}
diff --git a/rust/helpers/helpers.c b/rust/helpers/helpers.c
index 551da6c9b506..70c690bdb0a5 100644
--- a/rust/helpers/helpers.c
+++ b/rust/helpers/helpers.c
@@ -25,6 +25,7 @@
#include "cred.c"
#include "device.c"
#include "dma.c"
+#include "dma_fence.c"
#include "drm.c"
#include "err.c"
#include "irq.c"
diff --git a/rust/helpers/spinlock.c b/rust/helpers/spinlock.c
index 42c4bf01a23e..017ac447ebbd 100644
--- a/rust/helpers/spinlock.c
+++ b/rust/helpers/spinlock.c
@@ -16,6 +16,11 @@ void rust_helper___spin_lock_init(spinlock_t *lock, const char *name,
#endif /* CONFIG_DEBUG_SPINLOCK */
}
+void rust_helper_spin_lock_init(spinlock_t *lock)
+{
+ spin_lock_init(lock);
+}
+
void rust_helper_spin_lock(spinlock_t *lock)
{
spin_lock(lock);
diff --git a/rust/kernel/sync.rs b/rust/kernel/sync.rs
index cf5b638a097d..85e524ea9118 100644
--- a/rust/kernel/sync.rs
+++ b/rust/kernel/sync.rs
@@ -14,6 +14,7 @@
pub mod atomic;
pub mod barrier;
pub mod completion;
+pub mod dma_fence;
mod condvar;
pub mod lock;
mod locked_by;
@@ -23,6 +24,7 @@
pub use arc::{Arc, ArcBorrow, UniqueArc};
pub use completion::Completion;
+pub use dma_fence::{DmaFence, DmaFenceCtx, DmaFenceCb, DmaFenceCbFunc};
pub use condvar::{new_condvar, CondVar, CondVarTimeoutResult};
pub use lock::global::{global_lock, GlobalGuard, GlobalLock, GlobalLockBackend, GlobalLockedBy};
pub use lock::mutex::{new_mutex, Mutex, MutexGuard};
diff --git a/rust/kernel/sync/dma_fence.rs b/rust/kernel/sync/dma_fence.rs
new file mode 100644
index 000000000000..9c15426f8432
--- /dev/null
+++ b/rust/kernel/sync/dma_fence.rs
@@ -0,0 +1,396 @@
+// SPDX-License-Identifier: GPL-2.0
+//
+// Copyright (C) 2025, 2026 Red Hat Inc.:
+// - Philipp Stanner <pstanner@redhat.com>
+
+//! DmaFence support.
+//!
+//! Reference: <https://docs.kernel.org/driver-api/dma-buf.html#c.dma_fence>
+//!
+//! C header: [`include/linux/dma-fence.h`](srctree/include/linux/dma-fence.h)
+
+use crate::{
+ bindings,
+ prelude::*,
+ types::ForeignOwnable,
+ types::{ARef, AlwaysRefCounted, Opaque},
+};
+
+use core::{
+ ptr::{drop_in_place, NonNull},
+ sync::atomic::{AtomicU64, Ordering},
+};
+
+use kernel::sync::{Arc, ArcBorrow};
+use kernel::c_str;
+
+/// Defines the callback function the dma-fence backend will call once the fence gets signalled.
+pub trait DmaFenceCbFunc {
+ /// The callback function. `cb` is a container of the data which the driver passed in
+ /// [`DmaFence::register_callback`].
+ fn callback(cb: Pin<KBox<DmaFenceCb<Self>>>)
+ where
+ Self: Sized;
+}
+
+/// Container for driver data which the driver gets back in its callback once the fence gets
+/// signalled.
+#[pin_data]
+pub struct DmaFenceCb<T: DmaFenceCbFunc> {
+ /// C struct needed for the backend.
+ #[pin]
+ inner: Opaque<bindings::dma_fence_cb>,
+ /// Driver data.
+ #[pin]
+ pub data: T,
+}
+
+impl<T: DmaFenceCbFunc + 'static> DmaFenceCb<T> {
+ fn new(data: impl PinInit<T>) -> Result<Pin<KBox<Self>>> {
+ let cb = try_pin_init!(Self {
+ inner: Opaque::zeroed(), // This gets initialized by the C backend.
+ data <- data,
+ });
+
+ KBox::pin_init(cb, GFP_KERNEL)
+ }
+
+ /// Callback for the C dma_fence backend.
+ ///
+ /// # Safety
+ /// All data used and cast in this function was validly created by
+ /// [`DmaFence::register_callback`] and isn't modified by the C backend until this callback
+ /// here has run.
+ unsafe extern "C" fn callback(
+ _fence_ptr: *mut bindings::dma_fence,
+ cb_ptr: *mut bindings::dma_fence_cb,
+ ) {
+ let cb_ptr = Opaque::cast_from(cb_ptr);
+
+ // SAFETY: The constructor guarantees that `cb_ptr` is always `inner` of a DmaFenceCb.
+ let cb_ptr = unsafe { crate::container_of!(cb_ptr, Self, inner) }.cast_mut() as *mut c_void;
+ // SAFETY: `cp_ptr` is the heap memory of a Pin<Kbox<Self>> because it was created by
+ // invoking ForeignOwnable::into_foreign() on such an instance.
+ let cb = unsafe { <Pin<KBox<Self>> as ForeignOwnable>::from_foreign(cb_ptr) };
+
+ // Pass ownership back over to the driver.
+ T::callback(cb);
+ }
+}
+
+/// A dma-fence context. A fence context takes care of associating related fences with each other,
+/// providing each with raising sequence numbers and a common identifier.
+#[pin_data]
+pub struct DmaFenceCtx {
+ /// An opaque spinlock. Only ever passed to the C backend, never used by Rust.
+ #[pin]
+ lock: Opaque<bindings::spinlock_t>,
+ /// The fence context number.
+ nr: u64,
+ /// The sequence number for the next fence created.
+ seqno: AtomicU64,
+}
+
+impl DmaFenceCtx {
+ /// Create a new `DmaFenceCtx`.
+ pub fn new() -> Result<Arc<Self>> {
+ let ctx = pin_init!(Self {
+ // Feed in a non-Rust spinlock for now, since the Rust side never needs the lock.
+ lock <- Opaque::ffi_init(|slot: *mut bindings::spinlock| {
+ // SAFETY: `slot` is a valid pointer to an uninitialized `struct spinlock_t`.
+ unsafe { bindings::spin_lock_init(slot) };
+ }),
+ // SAFETY: `dma_fence_context_alloc()` merely works on a global atomic. Parameter `1`
+ // is the number of contexts we want to allocate.
+ nr: unsafe { bindings::dma_fence_context_alloc(1) },
+ seqno: AtomicU64::new(0),
+ });
+
+ Arc::pin_init(ctx, GFP_KERNEL)
+ }
+
+ fn get_new_fence_seqno(&self) -> u64 {
+ self.seqno.fetch_add(1, Ordering::Relaxed)
+ }
+}
+
+// SAFETY: The DmaFenceCtx is merely a wrapper around an atomic integer.
+unsafe impl Send for DmaFenceCtx {}
+// SAFETY: The DmaFenceCtx is merely a wrapper around an atomic integer.
+unsafe impl Sync for DmaFenceCtx {}
+
+impl ArcBorrow<'_, DmaFenceCtx> {
+ /// Create a new fence, consuming `data`.
+ ///
+ /// The fence will increment the refcount of the fence context associated with this
+ /// [`DmaFenceCtx`].
+ pub fn new_fence<T>(
+ &mut self,
+ data: impl PinInit<T>,
+ ) -> Result<ARef<DmaFence<T>>> {
+ let fctx: Arc<DmaFenceCtx> = (*self).into();
+ let seqno: u64 = fctx.get_new_fence_seqno();
+
+ // TODO: Should we reset seqno in case of failure?
+ // Pass `fctx` by value so that the fence will hold a reference to the DmaFenceCtx as long
+ // as it lives.
+ DmaFence::new(fctx, data, &self.lock, self.nr, seqno)
+ }
+}
+
+/// A synchronization primitive mainly for GPU drivers.
+///
+/// DmaFences are always reference counted. The typical use case is that one side registers
+/// callbacks on the fence which will perform a certain action (such as queueing work) once the
+/// other side signals the fence.
+///
+/// # Examples
+///
+/// ```
+/// use kernel::sync::{Arc, ArcBorrow, DmaFence, DmaFenceCtx, DmaFenceCb, DmaFenceCbFunc};
+/// use core::sync::atomic::{self, AtomicBool};
+///
+/// static mut CHECKER: AtomicBool = AtomicBool::new(false);
+///
+/// struct CallbackData {
+/// i: u32,
+/// }
+///
+/// impl CallbackData {
+/// fn new() -> Self {
+/// Self { i: 9 }
+/// }
+/// }
+///
+/// impl DmaFenceCbFunc for CallbackData {
+/// fn callback(cb: Pin<KBox<DmaFenceCb<Self>>>) where Self: Sized {
+/// assert_eq!(cb.data.i, 9);
+/// // SAFETY: Just to have an easy way for testing. This cannot race with the checker
+/// // because the fence signalling callbacks are executed synchronously.
+/// unsafe { CHECKER.store(true, atomic::Ordering::Relaxed); }
+/// }
+/// }
+///
+/// struct DriverData {
+/// i: u32,
+/// }
+///
+/// impl DriverData {
+/// fn new() -> Self {
+/// Self { i: 5 }
+/// }
+/// }
+///
+/// let data = DriverData::new();
+/// let fctx = DmaFenceCtx::new()?;
+///
+/// let mut fence = fctx.as_arc_borrow().new_fence(data)?;
+///
+/// let cb_data = CallbackData::new();
+/// fence.register_callback(cb_data);
+/// // fence.begin_signalling();
+/// fence.signal()?;
+/// // Now check wehether the callback was actually executed.
+/// // SAFETY: `fence.signal()` above works sequentially. We just check here whether the signalling
+/// // actually did set the boolean correctly.
+/// unsafe { assert_eq!(CHECKER.load(atomic::Ordering::Relaxed), true); }
+///
+/// Ok::<(), Error>(())
+/// ```
+#[pin_data]
+pub struct DmaFence<T> {
+ /// The actual dma_fence passed to C.
+ #[pin]
+ inner: Opaque<bindings::dma_fence>,
+ /// User data.
+ #[pin]
+ data: T,
+ /// Marks whether the fence is currently in the signalling critical section.
+ signalling: bool,
+ /// A boolean needed for the C backend's lockdep guard.
+ signalling_cookie: bool,
+ /// A reference to the associated [`DmaFenceCtx`] so that it cannot be dropped while there are
+ /// still fences around.
+ fctx: Arc<DmaFenceCtx>,
+}
+
+// SAFETY: `DmaFence` is safe to be sent to any task.
+unsafe impl<T> Send for DmaFence<T> {}
+
+// SAFETY: `DmaFence` is safe to be accessed concurrently.
+unsafe impl<T> Sync for DmaFence<T> {}
+
+// SAFETY: These implement the C backends refcounting methods which are proven to work correctly.
+unsafe impl<T> AlwaysRefCounted for DmaFence<T> {
+ fn inc_ref(&self) {
+ // SAFETY: `self.as_raw()` is a pointer to a valid `struct dma_fence`.
+ unsafe { bindings::dma_fence_get(self.as_raw()) }
+ }
+
+ /// # Safety
+ ///
+ /// `ptr`must be a valid pointer to a [`DmaFence`].
+ unsafe fn dec_ref(ptr: NonNull<Self>) {
+ // SAFETY: `ptr` is never a NULL pointer; and when `dec_ref()` is called
+ // the fence is by definition still valid.
+ let fence = unsafe { (*ptr.as_ptr()).inner.get() };
+
+ // SAFETY: Valid because `fence` was created validly above.
+ unsafe { bindings::dma_fence_put(fence) }
+ }
+}
+
+impl<T> DmaFence<T> {
+ // TODO: There could be a subtle potential problem here? The LLVM compiler backend can create
+ // several versions of this constant. Their content would be identical, but their addresses
+ // different.
+ const OPS: bindings::dma_fence_ops = Self::ops_create();
+
+ /// Create an initializer for a new [`DmaFence`].
+ fn new(
+ fctx: Arc<DmaFenceCtx>,
+ data: impl PinInit<T>, // TODO: The driver data should implement PinInit<T, Error>
+ lock: &Opaque<bindings::spinlock_t>,
+ context: u64,
+ seqno: u64,
+ ) -> Result<ARef<Self>> {
+ let fence = pin_init!(Self {
+ inner <- Opaque::ffi_init(|slot: *mut bindings::dma_fence| {
+ let lock_ptr = &raw const (*lock);
+ // SAFETY: `slot` is a valid pointer to an uninitialized `struct dma_fence`.
+ // `lock_ptr` is a pointer to the spinlock of the fence context, which is shared
+ // among all the fences. This can't become a UAF because each fence takes a
+ // reference of the fence context.
+ unsafe { bindings::dma_fence_init(slot, &Self::OPS, Opaque::cast_into(lock_ptr), context, seqno) };
+ }),
+ data <- data,
+ signalling: false,
+ signalling_cookie: false,
+ fctx: fctx,
+ });
+
+ let b = KBox::pin_init(fence, GFP_KERNEL)?;
+
+ // SAFETY: We don't move the contents of `b` anywhere here. After unwrapping it, ARef will
+ // take care of preventing memory moves.
+ let rawptr = KBox::into_raw(unsafe { Pin::into_inner_unchecked(b) });
+
+ // SAFETY: `rawptr` was created validly above.
+ let aref = unsafe { ARef::from_raw(NonNull::new_unchecked(rawptr)) };
+
+ Ok(aref)
+ }
+
+ /// Mark the beginning of a DmaFence signalling critical section. Should be called once a fence
+ /// gets published.
+ ///
+ /// The signalling critical section is marked as finished automatically once the fence signals.
+ pub fn begin_signalling(&mut self) {
+ // FIXME: this needs to be mutable, obviously, but we can't borrow mutably. *sigh*
+ self.signalling = true;
+ // TODO: Should we warn if a user tries to do this several times for the same
+ // fence? And should we ignore the request if the fence is already signalled?
+
+ // SAFETY: `dma_fence_begin_signalling()` works on global lockdep data and calling it is
+ // always safe.
+ self.signalling_cookie = unsafe { bindings::dma_fence_begin_signalling() };
+ }
+
+ const fn ops_create() -> bindings::dma_fence_ops {
+ // SAFETY: Zeroing out memory on the stack is always safe.
+ let mut ops: bindings::dma_fence_ops = unsafe { core::mem::zeroed() };
+
+ ops.get_driver_name = Some(Self::get_driver_name);
+ ops.get_timeline_name = Some(Self::get_timeline_name);
+ ops.release = Some(Self::release);
+
+ ops
+ }
+
+ // The C backend demands the following two callbacks. They are intended for
+ // cross-driver communication, i.e., for another driver to figure out to
+ // whom a fence belongs. As we don't support that currently in the Rust
+ // implementation, let's go for dummy data. By the way it has already been
+ // proposed to remove those callbacks from C, since there are barely any
+ // users.
+ //
+ // And implementing them properly in Rust would require a mandatory interface
+ // and potentially open questions about UAF bugs when the module gets unloaded.
+ extern "C" fn get_driver_name(_ptr: *mut bindings::dma_fence) -> *const c_char {
+ c_str!("DRIVER_NAME_UNUSED").as_char_ptr()
+ }
+
+ extern "C" fn get_timeline_name(_ptr: *mut bindings::dma_fence) -> *const c_char {
+ c_str!("TIMELINE_NAME_UNUSED").as_char_ptr()
+ }
+
+ /// The release function called by the C backend once the refcount drops to 0. We use this to
+ /// drop the Rust dma-fence, too. Since [`DmaFence`] implements [`AlwaysRefCounted`], this is
+ /// perfectly safe and a convenient way to concile the two release mechanisms of C and Rust.
+ unsafe extern "C" fn release(ptr: *mut bindings::dma_fence) {
+ let ptr = Opaque::cast_from(ptr);
+
+ // SAFETY: The constructor guarantees that `ptr` is always the inner fence of a DmaFence.
+ let fence = unsafe { crate::container_of!(ptr, Self, inner) }.cast_mut();
+
+ // SAFETY: See above. Also, the release callback will only be called once, when the
+ // refcount drops to 0, and when that happens the fence is by definition still valid.
+ unsafe { drop_in_place(fence) };
+ }
+
+ /// Signal the fence. This will invoke all registered callbacks.
+ pub fn signal(&self) -> Result {
+ // SAFETY: `self` is refcounted.
+ let ret = unsafe { bindings::dma_fence_signal(self.as_raw()) };
+ if ret != 0 {
+ return Err(Error::from_errno(ret));
+ }
+
+ if self.signalling {
+ // SAFETY: `dma_fence_end_signalling()` works on global lockdep data. The only
+ // parameter is a boolean passed by value.
+ unsafe { bindings::dma_fence_end_signalling(self.signalling_cookie) };
+ }
+
+ Ok(())
+ }
+
+ /// Check whether the fence was signalled at the moment of the function call.
+ pub fn is_signaled(&self) -> bool {
+ // SAFETY: self is by definition still valid. The backend ensures proper
+ // locking. We don't implement the dma_fence fastpath backend_ops
+ // callbacks, so this merely checks a boolean and has no side effects.
+ unsafe { bindings::dma_fence_is_signaled(self.as_raw()) }
+ }
+
+ /// Register a callback on a [`DmaFence`]. The callback will be invoked in the fence's
+ /// signalling path, i.e., critical section.
+ ///
+ /// Consumes `data`. `data` is passed back to the implemented callback function when the fence
+ /// gets signalled.
+ pub fn register_callback<U: DmaFenceCbFunc + 'static>(&self, data: impl PinInit<U>) -> Result {
+ let cb = DmaFenceCb::new(data)?;
+ let ptr = cb.into_foreign() as *mut DmaFenceCb<U>;
+ // SAFETY: `ptr` was created validly directly above.
+ let inner_cb = unsafe { (*ptr).inner.get() };
+
+ // SAFETY: `self.as_raw()` is valid because `self` is refcounted, `inner_cb` was created
+ // validly above and was turned into a ForeignOwnable, so it won't be dropped. `callback`
+ // has static life time.
+ let ret = unsafe {
+ bindings::dma_fence_add_callback(
+ self.as_raw(),
+ inner_cb,
+ Some(DmaFenceCb::<U>::callback),
+ )
+ };
+ if ret != 0 {
+ return Err(Error::from_errno(ret));
+ }
+ Ok(())
+ }
+
+ fn as_raw(&self) -> *mut bindings::dma_fence {
+ self.inner.get()
+ }
+}
--
2.49.0
^ permalink raw reply related [flat|nested] 103+ messages in thread* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-03 8:14 ` [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions Philipp Stanner
@ 2026-02-05 8:57 ` Boris Brezillon
2026-02-06 10:23 ` Danilo Krummrich
2026-02-05 10:16 ` Boris Brezillon
2026-02-09 11:30 ` Alice Ryhl
2 siblings, 1 reply; 103+ messages in thread
From: Boris Brezillon @ 2026-02-05 8:57 UTC (permalink / raw)
To: Philipp Stanner
Cc: David Airlie, Simona Vetter, Danilo Krummrich, Alice Ryhl,
Gary Guo, Benno Lossin, Christian König, Daniel Almeida,
Joel Fernandes, linux-kernel, dri-devel, rust-for-linux
Hi Philipp,
On Tue, 3 Feb 2026 09:14:01 +0100
Philipp Stanner <phasta@kernel.org> wrote:
> +
> + /// Mark the beginning of a DmaFence signalling critical section. Should be called once a fence
> + /// gets published.
> + ///
> + /// The signalling critical section is marked as finished automatically once the fence signals.
> + pub fn begin_signalling(&mut self) {
> + // FIXME: this needs to be mutable, obviously, but we can't borrow mutably. *sigh*
> + self.signalling = true;
> + // TODO: Should we warn if a user tries to do this several times for the same
> + // fence? And should we ignore the request if the fence is already signalled?
> +
> + // SAFETY: `dma_fence_begin_signalling()` works on global lockdep data and calling it is
> + // always safe.
> + self.signalling_cookie = unsafe { bindings::dma_fence_begin_signalling() };
Maybe it's me misunderstanding what's happening here, but I don't think
you can tie the signalling section to the fence [published -> signalled]
timeframe. DmaFence signalling critical section is a section of code, in
a thread, that's needed for any previously published fence to be signalled,
and as such, has constraints on memory allocation types and locks it can
acquire (any lock taken while a blocking allocation happens can't be taken
in the DmaFence signalling path for instance).
But here, you're going to flag the thread doing the submission as
being in the signalling path, and this will likely be dropped in a
different thread (because the signalling will happen asynchronously,
when the job is done or cancelled). Think about this sequence for
instance:
thread A thread B (workqueue thread)
iocl(SUBMIT_xxx) {
create(job)
arm(job->fence) <- fence signalling section starts here
queue(job)
}
ioctl(xxxx) {
kmalloc(GFP_KERNEL) <- BOOM, can't do blocking alloc in a
fence signalling path
}
job_done_work() {
...
signal(job->fence) <- fence signalling section supposedly ends here in
a completely different thread it was started
}
Unfortunately, I don't know how to translate that in rust, but we
need a way to check if any path code path does a DmaFence.signal(),
go back to the entry point (for a WorkItem, that would be
WorkItem::run() for instance), and make it a DmaFenceSignallingPath.
Not only that, but we need to know all the deps that make it so
this path can be called (if I take the WorkItem example, that would
be the path that leads to the WorkItem being scheduled).
Hopefully Alice and other rust experts can come up with some ideas
to handle that.
Regards,
Boris
^ permalink raw reply [flat|nested] 103+ messages in thread* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-05 8:57 ` Boris Brezillon
@ 2026-02-06 10:23 ` Danilo Krummrich
2026-02-09 8:19 ` Philipp Stanner
0 siblings, 1 reply; 103+ messages in thread
From: Danilo Krummrich @ 2026-02-06 10:23 UTC (permalink / raw)
To: Boris Brezillon
Cc: Philipp Stanner, David Airlie, Simona Vetter, Alice Ryhl,
Gary Guo, Benno Lossin, Christian König, Daniel Almeida,
Joel Fernandes, linux-kernel, dri-devel, rust-for-linux
On Thu Feb 5, 2026 at 9:57 AM CET, Boris Brezillon wrote:
> On Tue, 3 Feb 2026 09:14:01 +0100
> Philipp Stanner <phasta@kernel.org> wrote:
> Unfortunately, I don't know how to translate that in rust, but we
> need a way to check if any path code path does a DmaFence.signal(),
> go back to the entry point (for a WorkItem, that would be
> WorkItem::run() for instance), and make it a DmaFenceSignallingPath.
> Not only that, but we need to know all the deps that make it so
> this path can be called (if I take the WorkItem example, that would
> be the path that leads to the WorkItem being scheduled).
I think we need a guard object for this that is not Send, just like for any
other lock.
Internally, those markers rely on lockdep, i.e. they just acquire and release a
"fake" lock.
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-06 10:23 ` Danilo Krummrich
@ 2026-02-09 8:19 ` Philipp Stanner
2026-02-09 14:58 ` Boris Brezillon
0 siblings, 1 reply; 103+ messages in thread
From: Philipp Stanner @ 2026-02-09 8:19 UTC (permalink / raw)
To: Danilo Krummrich, Boris Brezillon
Cc: Philipp Stanner, David Airlie, Simona Vetter, Alice Ryhl,
Gary Guo, Benno Lossin, Christian König, Daniel Almeida,
Joel Fernandes, linux-kernel, dri-devel, rust-for-linux
On Fri, 2026-02-06 at 11:23 +0100, Danilo Krummrich wrote:
> On Thu Feb 5, 2026 at 9:57 AM CET, Boris Brezillon wrote:
> > On Tue, 3 Feb 2026 09:14:01 +0100
> > Philipp Stanner <phasta@kernel.org> wrote:
> > Unfortunately, I don't know how to translate that in rust, but we
> > need a way to check if any path code path does a DmaFence.signal(),
> > go back to the entry point (for a WorkItem, that would be
> > WorkItem::run() for instance), and make it a DmaFenceSignallingPath.
> > Not only that, but we need to know all the deps that make it so
> > this path can be called (if I take the WorkItem example, that would
> > be the path that leads to the WorkItem being scheduled).
>
> I think we need a guard object for this that is not Send, just like for any
> other lock.
>
> Internally, those markers rely on lockdep, i.e. they just acquire and release a
> "fake" lock.
The guard object would be created through fence.begin_signalling(), wouldn't it?
And when it drops you call dma_fence_end_signalling()?
How would that ensure that the driver actually marks the signalling region correctly?
P.
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-09 8:19 ` Philipp Stanner
@ 2026-02-09 14:58 ` Boris Brezillon
2026-02-10 8:16 ` Christian König
0 siblings, 1 reply; 103+ messages in thread
From: Boris Brezillon @ 2026-02-09 14:58 UTC (permalink / raw)
To: Philipp Stanner
Cc: phasta, Danilo Krummrich, David Airlie, Simona Vetter, Alice Ryhl,
Gary Guo, Benno Lossin, Christian König, Daniel Almeida,
Joel Fernandes, linux-kernel, dri-devel, rust-for-linux
On Mon, 09 Feb 2026 09:19:46 +0100
Philipp Stanner <phasta@mailbox.org> wrote:
> On Fri, 2026-02-06 at 11:23 +0100, Danilo Krummrich wrote:
> > On Thu Feb 5, 2026 at 9:57 AM CET, Boris Brezillon wrote:
> > > On Tue, 3 Feb 2026 09:14:01 +0100
> > > Philipp Stanner <phasta@kernel.org> wrote:
> > > Unfortunately, I don't know how to translate that in rust, but we
> > > need a way to check if any path code path does a DmaFence.signal(),
> > > go back to the entry point (for a WorkItem, that would be
> > > WorkItem::run() for instance), and make it a DmaFenceSignallingPath.
> > > Not only that, but we need to know all the deps that make it so
> > > this path can be called (if I take the WorkItem example, that would
> > > be the path that leads to the WorkItem being scheduled).
> >
> > I think we need a guard object for this that is not Send, just like for any
> > other lock.
> >
> > Internally, those markers rely on lockdep, i.e. they just acquire and release a
> > "fake" lock.
>
> The guard object would be created through fence.begin_signalling(), wouldn't it?
It shouldn't be a (&self)-method, because at the start of a DMA
signaling path, you don't necessarily know which fence you're going to
signal (you might actually signal several of them).
> And when it drops you call dma_fence_end_signalling()?
Yep, dma_fence_end_signalling() should be called when the guard is
dropped.
>
> How would that ensure that the driver actually marks the signalling region correctly?
Nothing, and that's a problem we have in C: you have no way of telling
which code section is going to be a DMA-signaling path. I can't think
of any way to make that safer in rust, unfortunately. The best I can
think of would be to
- Have a special DmaFenceSignalWorkItem (wrapper a WorkItem with extra
constraints) that's designed for DMA-fence signaling, and that takes
the DmaSignaling guard around the ::run() call.
- We would then need to ensure that any code path scheduling this work
item is also in a DMA-signaling path by taking a ref to the
DmaSignalingGuard. This of course doesn't guarantee that the section
is wide enough to prevent any non-authorized operations in any path
leading to this WorkItem scheduling, but it would at least force the
caller to consider the problem.
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-09 14:58 ` Boris Brezillon
@ 2026-02-10 8:16 ` Christian König
2026-02-10 8:38 ` Alice Ryhl
0 siblings, 1 reply; 103+ messages in thread
From: Christian König @ 2026-02-10 8:16 UTC (permalink / raw)
To: Boris Brezillon, Philipp Stanner
Cc: phasta, Danilo Krummrich, David Airlie, Simona Vetter, Alice Ryhl,
Gary Guo, Benno Lossin, Daniel Almeida, Joel Fernandes,
linux-kernel, dri-devel, rust-for-linux
On 2/9/26 15:58, Boris Brezillon wrote:
> On Mon, 09 Feb 2026 09:19:46 +0100
> Philipp Stanner <phasta@mailbox.org> wrote:
>
>> On Fri, 2026-02-06 at 11:23 +0100, Danilo Krummrich wrote:
>>> On Thu Feb 5, 2026 at 9:57 AM CET, Boris Brezillon wrote:
>>>> On Tue, 3 Feb 2026 09:14:01 +0100
>>>> Philipp Stanner <phasta@kernel.org> wrote:
>>>> Unfortunately, I don't know how to translate that in rust, but we
>>>> need a way to check if any path code path does a DmaFence.signal(),
>>>> go back to the entry point (for a WorkItem, that would be
>>>> WorkItem::run() for instance), and make it a DmaFenceSignallingPath.
>>>> Not only that, but we need to know all the deps that make it so
>>>> this path can be called (if I take the WorkItem example, that would
>>>> be the path that leads to the WorkItem being scheduled).
>>>
>>> I think we need a guard object for this that is not Send, just like for any
>>> other lock.
>>>
>>> Internally, those markers rely on lockdep, i.e. they just acquire and release a
>>> "fake" lock.
>>
>> The guard object would be created through fence.begin_signalling(), wouldn't it?
>
> It shouldn't be a (&self)-method, because at the start of a DMA
> signaling path, you don't necessarily know which fence you're going to
> signal (you might actually signal several of them).
>
>> And when it drops you call dma_fence_end_signalling()?
>
> Yep, dma_fence_end_signalling() should be called when the guard is
> dropped.
>
>>
>> How would that ensure that the driver actually marks the signalling region correctly?
>
> Nothing, and that's a problem we have in C: you have no way of telling
> which code section is going to be a DMA-signaling path. I can't think
> of any way to make that safer in rust, unfortunately. The best I can
> think of would be to
>
> - Have a special DmaFenceSignalWorkItem (wrapper a WorkItem with extra
> constraints) that's designed for DMA-fence signaling, and that takes
> the DmaSignaling guard around the ::run() call.
> - We would then need to ensure that any code path scheduling this work
> item is also in a DMA-signaling path by taking a ref to the
> DmaSignalingGuard. This of course doesn't guarantee that the section
> is wide enough to prevent any non-authorized operations in any path
> leading to this WorkItem scheduling, but it would at least force the
> caller to consider the problem.
On the C side I have a patch set which does something very similar.
It's basically a WARN_ON_ONCE() which triggers as soon as you try to signal a DMA fence from an IOCTL, or more specific process context.
Signaling a DMA fence from interrupt context, a work item or kernel thread is still allowed, there is just the hole that you can schedule a work item from process context as well.
The major problem with that patch set is that we have tons of very hacky signaling paths in drivers already because we initially didn't knew how much trouble getting this wrong causes.
I'm strongly in favor of getting this right for the rust side from the beginning and enforcing strict rules for every code trying to implement a DMA fence.
Regards,
Christian.
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-10 8:16 ` Christian König
@ 2026-02-10 8:38 ` Alice Ryhl
2026-02-10 9:06 ` Philipp Stanner
` (2 more replies)
0 siblings, 3 replies; 103+ messages in thread
From: Alice Ryhl @ 2026-02-10 8:38 UTC (permalink / raw)
To: Christian König
Cc: Boris Brezillon, Philipp Stanner, phasta, Danilo Krummrich,
David Airlie, Simona Vetter, Gary Guo, Benno Lossin,
Daniel Almeida, Joel Fernandes, linux-kernel, dri-devel,
rust-for-linux
On Tue, Feb 10, 2026 at 09:16:34AM +0100, Christian König wrote:
> On 2/9/26 15:58, Boris Brezillon wrote:
> > On Mon, 09 Feb 2026 09:19:46 +0100
> > Philipp Stanner <phasta@mailbox.org> wrote:
> >
> >> On Fri, 2026-02-06 at 11:23 +0100, Danilo Krummrich wrote:
> >>> On Thu Feb 5, 2026 at 9:57 AM CET, Boris Brezillon wrote:
> >>>> On Tue, 3 Feb 2026 09:14:01 +0100
> >>>> Philipp Stanner <phasta@kernel.org> wrote:
> >>>> Unfortunately, I don't know how to translate that in rust, but we
> >>>> need a way to check if any path code path does a DmaFence.signal(),
> >>>> go back to the entry point (for a WorkItem, that would be
> >>>> WorkItem::run() for instance), and make it a DmaFenceSignallingPath.
> >>>> Not only that, but we need to know all the deps that make it so
> >>>> this path can be called (if I take the WorkItem example, that would
> >>>> be the path that leads to the WorkItem being scheduled).
> >>>
> >>> I think we need a guard object for this that is not Send, just like for any
> >>> other lock.
> >>>
> >>> Internally, those markers rely on lockdep, i.e. they just acquire and release a
> >>> "fake" lock.
> >>
> >> The guard object would be created through fence.begin_signalling(), wouldn't it?
> >
> > It shouldn't be a (&self)-method, because at the start of a DMA
> > signaling path, you don't necessarily know which fence you're going to
> > signal (you might actually signal several of them).
> >
> >> And when it drops you call dma_fence_end_signalling()?
> >
> > Yep, dma_fence_end_signalling() should be called when the guard is
> > dropped.
> >
> >>
> >> How would that ensure that the driver actually marks the signalling region correctly?
> >
> > Nothing, and that's a problem we have in C: you have no way of telling
> > which code section is going to be a DMA-signaling path. I can't think
> > of any way to make that safer in rust, unfortunately. The best I can
> > think of would be to
> >
> > - Have a special DmaFenceSignalWorkItem (wrapper a WorkItem with extra
> > constraints) that's designed for DMA-fence signaling, and that takes
> > the DmaSignaling guard around the ::run() call.
> > - We would then need to ensure that any code path scheduling this work
> > item is also in a DMA-signaling path by taking a ref to the
> > DmaSignalingGuard. This of course doesn't guarantee that the section
> > is wide enough to prevent any non-authorized operations in any path
> > leading to this WorkItem scheduling, but it would at least force the
> > caller to consider the problem.
>
> On the C side I have a patch set which does something very similar.
>
> It's basically a WARN_ON_ONCE() which triggers as soon as you try to
> signal a DMA fence from an IOCTL, or more specific process context.
>
> Signaling a DMA fence from interrupt context, a work item or kernel
> thread is still allowed, there is just the hole that you can schedule
> a work item from process context as well.
>
> The major problem with that patch set is that we have tons of very
> hacky signaling paths in drivers already because we initially didn't
> knew how much trouble getting this wrong causes.
>
> I'm strongly in favor of getting this right for the rust side from the
> beginning and enforcing strict rules for every code trying to
> implement a DMA fence.
Hmm. Could you say a bit more about what the rules are? I just re-read
the comments in dma-fence.c, but I have some questions.
First, how does the signalling annotation work when the signalling path
crosses thread boundaries? For example, let's say I call an ioctl to
perform an async VM_BIND, then the dma fence signalling critical path
starts in the ioctl, but then it moves into a workqueue and finishes
there, right?
Second, it looks like we have the same challenge as with irq locks where
you must properly nest dma_fence_begin_signalling() regions, and can't
e.g. do this:
c1 = dma_fence_begin_signalling()
c2 = dma_fence_begin_signalling()
dma_fence_end_signalling(c1)
dma_fence_end_signalling(c2)
Alice
^ permalink raw reply [flat|nested] 103+ messages in thread* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-10 8:38 ` Alice Ryhl
@ 2026-02-10 9:06 ` Philipp Stanner
2026-02-10 9:54 ` Christian König
2026-02-10 9:15 ` Boris Brezillon
2026-02-10 9:26 ` Christian König
2 siblings, 1 reply; 103+ messages in thread
From: Philipp Stanner @ 2026-02-10 9:06 UTC (permalink / raw)
To: Alice Ryhl, Christian König
Cc: Boris Brezillon, phasta, Danilo Krummrich, David Airlie,
Simona Vetter, Gary Guo, Benno Lossin, Daniel Almeida,
Joel Fernandes, linux-kernel, dri-devel, rust-for-linux
On Tue, 2026-02-10 at 08:38 +0000, Alice Ryhl wrote:
> On Tue, Feb 10, 2026 at 09:16:34AM +0100, Christian König wrote:
> >
> >
> > On the C side I have a patch set which does something very similar.
> >
> > It's basically a WARN_ON_ONCE() which triggers as soon as you try to
> > signal a DMA fence from an IOCTL, or more specific process context.
> >
> > Signaling a DMA fence from interrupt context, a work item or kernel
> > thread is still allowed, there is just the hole that you can schedule
> > a work item from process context as well.
> >
> > The major problem with that patch set is that we have tons of very
> > hacky signaling paths in drivers already because we initially didn't
> > knew how much trouble getting this wrong causes.
> >
> > I'm strongly in favor of getting this right for the rust side from the
> > beginning and enforcing strict rules for every code trying to
> > implement a DMA fence.
>
> Hmm. Could you say a bit more about what the rules are? I just re-read
> the comments in dma-fence.c, but I have some questions.
The rules need to be written down. Elaborately and in detail, once and
for all.
We're having those discussions about the mysterious "dma fence rules"
for years, but no one has ever seen them, despite knowing that they
exist.
They seem to live only in the heads of a small number of GPU developers
who figured out through long years of experience what works and what
doesn't.
There are reasons why the state writes the laws on paper somewhere. So
that you can read them and have a source of authority..
That would end much of our endless misunderstandings and repetitive
discussions.
P.
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-10 9:06 ` Philipp Stanner
@ 2026-02-10 9:54 ` Christian König
0 siblings, 0 replies; 103+ messages in thread
From: Christian König @ 2026-02-10 9:54 UTC (permalink / raw)
To: phasta, Alice Ryhl
Cc: Boris Brezillon, Danilo Krummrich, David Airlie, Simona Vetter,
Gary Guo, Benno Lossin, Daniel Almeida, Joel Fernandes,
linux-kernel, dri-devel, rust-for-linux
On 2/10/26 10:06, Philipp Stanner wrote:
> On Tue, 2026-02-10 at 08:38 +0000, Alice Ryhl wrote:
>> On Tue, Feb 10, 2026 at 09:16:34AM +0100, Christian König wrote:
>>>
>>>
>>> On the C side I have a patch set which does something very similar.
>>>
>>> It's basically a WARN_ON_ONCE() which triggers as soon as you try to
>>> signal a DMA fence from an IOCTL, or more specific process context.
>>>
>>> Signaling a DMA fence from interrupt context, a work item or kernel
>>> thread is still allowed, there is just the hole that you can schedule
>>> a work item from process context as well.
>>>
>>> The major problem with that patch set is that we have tons of very
>>> hacky signaling paths in drivers already because we initially didn't
>>> knew how much trouble getting this wrong causes.
>>>
>>> I'm strongly in favor of getting this right for the rust side from the
>>> beginning and enforcing strict rules for every code trying to
>>> implement a DMA fence.
>>
>> Hmm. Could you say a bit more about what the rules are? I just re-read
>> the comments in dma-fence.c, but I have some questions.
>
> The rules need to be written down. Elaborately and in detail, once and
> for all.
>
> We're having those discussions about the mysterious "dma fence rules"
> for years, but no one has ever seen them, despite knowing that they
> exist.
Well we have this here: https://kernel.org/doc/html/v5.9/driver-api/dma-buf.html#indefinite-dma-fences
> They seem to live only in the heads of a small number of GPU developers
> who figured out through long years of experience what works and what
> doesn't.
It's not even experience you need for that. I've pointed out the problems we would have with this even before the original dma_fence patches were merged.
You just have a very very small number of people who sees the design and immediately realize what that means, while everybody else even with documentation doesn't seem to get the full consequences.
> There are reasons why the state writes the laws on paper somewhere. So
> that you can read them and have a source of authority..
Yeah but you also have an executive who enforces those laws. At some point you just give up on explaining the background.
I think it would help massively when lockdep could point out even more problematic approaches.
> That would end much of our endless misunderstandings and repetitive
> discussions.
You won't believe how graceful I would be if we could somehow improve the situation.
I've literally had to explain to whole teams who spend men years on developing something that their approach is flawed and they can throw away all their work because of those dma_fence limitations here.
This is a repeating pattern which happens at least once or twice a year.
Regards,
Christian.
>
>
> P.
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-10 8:38 ` Alice Ryhl
2026-02-10 9:06 ` Philipp Stanner
@ 2026-02-10 9:15 ` Boris Brezillon
2026-02-10 10:15 ` Alice Ryhl
2026-02-10 9:26 ` Christian König
2 siblings, 1 reply; 103+ messages in thread
From: Boris Brezillon @ 2026-02-10 9:15 UTC (permalink / raw)
To: Alice Ryhl
Cc: Christian König, Philipp Stanner, phasta, Danilo Krummrich,
David Airlie, Simona Vetter, Gary Guo, Benno Lossin,
Daniel Almeida, Joel Fernandes, linux-kernel, dri-devel,
rust-for-linux
On Tue, 10 Feb 2026 08:38:00 +0000
Alice Ryhl <aliceryhl@google.com> wrote:
> On Tue, Feb 10, 2026 at 09:16:34AM +0100, Christian König wrote:
> > On 2/9/26 15:58, Boris Brezillon wrote:
> > > On Mon, 09 Feb 2026 09:19:46 +0100
> > > Philipp Stanner <phasta@mailbox.org> wrote:
> > >
> > >> On Fri, 2026-02-06 at 11:23 +0100, Danilo Krummrich wrote:
> > >>> On Thu Feb 5, 2026 at 9:57 AM CET, Boris Brezillon wrote:
> > >>>> On Tue, 3 Feb 2026 09:14:01 +0100
> > >>>> Philipp Stanner <phasta@kernel.org> wrote:
> > >>>> Unfortunately, I don't know how to translate that in rust, but we
> > >>>> need a way to check if any path code path does a DmaFence.signal(),
> > >>>> go back to the entry point (for a WorkItem, that would be
> > >>>> WorkItem::run() for instance), and make it a DmaFenceSignallingPath.
> > >>>> Not only that, but we need to know all the deps that make it so
> > >>>> this path can be called (if I take the WorkItem example, that would
> > >>>> be the path that leads to the WorkItem being scheduled).
> > >>>
> > >>> I think we need a guard object for this that is not Send, just like for any
> > >>> other lock.
> > >>>
> > >>> Internally, those markers rely on lockdep, i.e. they just acquire and release a
> > >>> "fake" lock.
> > >>
> > >> The guard object would be created through fence.begin_signalling(), wouldn't it?
> > >
> > > It shouldn't be a (&self)-method, because at the start of a DMA
> > > signaling path, you don't necessarily know which fence you're going to
> > > signal (you might actually signal several of them).
> > >
> > >> And when it drops you call dma_fence_end_signalling()?
> > >
> > > Yep, dma_fence_end_signalling() should be called when the guard is
> > > dropped.
> > >
> > >>
> > >> How would that ensure that the driver actually marks the signalling region correctly?
> > >
> > > Nothing, and that's a problem we have in C: you have no way of telling
> > > which code section is going to be a DMA-signaling path. I can't think
> > > of any way to make that safer in rust, unfortunately. The best I can
> > > think of would be to
> > >
> > > - Have a special DmaFenceSignalWorkItem (wrapper a WorkItem with extra
> > > constraints) that's designed for DMA-fence signaling, and that takes
> > > the DmaSignaling guard around the ::run() call.
> > > - We would then need to ensure that any code path scheduling this work
> > > item is also in a DMA-signaling path by taking a ref to the
> > > DmaSignalingGuard. This of course doesn't guarantee that the section
> > > is wide enough to prevent any non-authorized operations in any path
> > > leading to this WorkItem scheduling, but it would at least force the
> > > caller to consider the problem.
> >
> > On the C side I have a patch set which does something very similar.
> >
> > It's basically a WARN_ON_ONCE() which triggers as soon as you try to
> > signal a DMA fence from an IOCTL, or more specific process context.
> >
> > Signaling a DMA fence from interrupt context, a work item or kernel
> > thread is still allowed, there is just the hole that you can schedule
> > a work item from process context as well.
> >
> > The major problem with that patch set is that we have tons of very
> > hacky signaling paths in drivers already because we initially didn't
> > knew how much trouble getting this wrong causes.
> >
> > I'm strongly in favor of getting this right for the rust side from the
> > beginning and enforcing strict rules for every code trying to
> > implement a DMA fence.
>
> Hmm. Could you say a bit more about what the rules are? I just re-read
> the comments in dma-fence.c, but I have some questions.
>
> First, how does the signalling annotation work when the signalling path
> crosses thread boundaries?
It's not supposed to cross the thread boundary at all. The annotation
is per-thread, and it that sense, it matches the lock guard model
perfectly.
> For example, let's say I call an ioctl to
> perform an async VM_BIND, then the dma fence signalling critical path
> starts in the ioctl, but then it moves into a workqueue and finishes
> there, right?
It's a bit trickier. The fence signalling path usually doesn't exist in
the submitting ioctl until the submission becomes effective and the
emitted fences are exposed to the outside world. That is, when:
- syncobjs are updated to point to this new fence
- fencefd pointing to this new fence is returned
- fence is added to the dma_resvs inside the gem/dma_buf objects
- ... (there might be other cases I forgot about)
In the submission path, what's important is that no blocking allocation
is done between the moment the fence is exposed, and the moment it's
queued. In practice what happens is that the job this fence is bound to
is queued even before the fences are exposed, so if anything, what we
should ensure is the ordering, and having a guarantee that a job being
queued means it's going to be dequeued and executed soon enough.
The second DMA signaling path exists in the context of the
workqueue/item dequeuing a job from the JobQueue (or drm_sched) and
pushing it to the HW. Then there's the IRQ handler being called to
inform the GPU is done executing this job, which might in some cases
lead to another work item being queued for further processing from
which the dma_fence is signaled. In other cases, the dma_fence is
signaled directly from the IRQ handler. All of these contexts are
considered being part of the DMA-signaling path. But it's not like the
fence signaling annotation is passed around, because the cookies
returned by dma_fence_begin_signalling() are only valid in a single
thread context, IIRC.
>
> Second, it looks like we have the same challenge as with irq locks where
> you must properly nest dma_fence_begin_signalling() regions, and can't
> e.g. do this:
>
> c1 = dma_fence_begin_signalling()
> c2 = dma_fence_begin_signalling()
> dma_fence_end_signalling(c1)
> dma_fence_end_signalling(c2)
I think that's the case yes, you have to end in reverse being order.
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-10 9:15 ` Boris Brezillon
@ 2026-02-10 10:15 ` Alice Ryhl
2026-02-10 10:36 ` Danilo Krummrich
` (4 more replies)
0 siblings, 5 replies; 103+ messages in thread
From: Alice Ryhl @ 2026-02-10 10:15 UTC (permalink / raw)
To: Boris Brezillon
Cc: Christian König, Philipp Stanner, phasta, Danilo Krummrich,
David Airlie, Simona Vetter, Gary Guo, Benno Lossin,
Daniel Almeida, Joel Fernandes, linux-kernel, dri-devel,
rust-for-linux
On Tue, Feb 10, 2026 at 10:15:25AM +0100, Boris Brezillon wrote:
> On Tue, 10 Feb 2026 08:38:00 +0000
> Alice Ryhl <aliceryhl@google.com> wrote:
>
> > On Tue, Feb 10, 2026 at 09:16:34AM +0100, Christian König wrote:
> > > On 2/9/26 15:58, Boris Brezillon wrote:
> > > > On Mon, 09 Feb 2026 09:19:46 +0100
> > > > Philipp Stanner <phasta@mailbox.org> wrote:
> > > >
> > > >> On Fri, 2026-02-06 at 11:23 +0100, Danilo Krummrich wrote:
> > > >>> On Thu Feb 5, 2026 at 9:57 AM CET, Boris Brezillon wrote:
> > > >>>> On Tue, 3 Feb 2026 09:14:01 +0100
> > > >>>> Philipp Stanner <phasta@kernel.org> wrote:
> > > >>>> Unfortunately, I don't know how to translate that in rust, but we
> > > >>>> need a way to check if any path code path does a DmaFence.signal(),
> > > >>>> go back to the entry point (for a WorkItem, that would be
> > > >>>> WorkItem::run() for instance), and make it a DmaFenceSignallingPath.
> > > >>>> Not only that, but we need to know all the deps that make it so
> > > >>>> this path can be called (if I take the WorkItem example, that would
> > > >>>> be the path that leads to the WorkItem being scheduled).
> > > >>>
> > > >>> I think we need a guard object for this that is not Send, just like for any
> > > >>> other lock.
> > > >>>
> > > >>> Internally, those markers rely on lockdep, i.e. they just acquire and release a
> > > >>> "fake" lock.
> > > >>
> > > >> The guard object would be created through fence.begin_signalling(), wouldn't it?
> > > >
> > > > It shouldn't be a (&self)-method, because at the start of a DMA
> > > > signaling path, you don't necessarily know which fence you're going to
> > > > signal (you might actually signal several of them).
> > > >
> > > >> And when it drops you call dma_fence_end_signalling()?
> > > >
> > > > Yep, dma_fence_end_signalling() should be called when the guard is
> > > > dropped.
> > > >
> > > >>
> > > >> How would that ensure that the driver actually marks the signalling region correctly?
> > > >
> > > > Nothing, and that's a problem we have in C: you have no way of telling
> > > > which code section is going to be a DMA-signaling path. I can't think
> > > > of any way to make that safer in rust, unfortunately. The best I can
> > > > think of would be to
> > > >
> > > > - Have a special DmaFenceSignalWorkItem (wrapper a WorkItem with extra
> > > > constraints) that's designed for DMA-fence signaling, and that takes
> > > > the DmaSignaling guard around the ::run() call.
> > > > - We would then need to ensure that any code path scheduling this work
> > > > item is also in a DMA-signaling path by taking a ref to the
> > > > DmaSignalingGuard. This of course doesn't guarantee that the section
> > > > is wide enough to prevent any non-authorized operations in any path
> > > > leading to this WorkItem scheduling, but it would at least force the
> > > > caller to consider the problem.
> > >
> > > On the C side I have a patch set which does something very similar.
> > >
> > > It's basically a WARN_ON_ONCE() which triggers as soon as you try to
> > > signal a DMA fence from an IOCTL, or more specific process context.
> > >
> > > Signaling a DMA fence from interrupt context, a work item or kernel
> > > thread is still allowed, there is just the hole that you can schedule
> > > a work item from process context as well.
> > >
> > > The major problem with that patch set is that we have tons of very
> > > hacky signaling paths in drivers already because we initially didn't
> > > knew how much trouble getting this wrong causes.
> > >
> > > I'm strongly in favor of getting this right for the rust side from the
> > > beginning and enforcing strict rules for every code trying to
> > > implement a DMA fence.
> >
> > Hmm. Could you say a bit more about what the rules are? I just re-read
> > the comments in dma-fence.c, but I have some questions.
> >
> > First, how does the signalling annotation work when the signalling path
> > crosses thread boundaries?
>
> It's not supposed to cross the thread boundary at all. The annotation
> is per-thread, and it that sense, it matches the lock guard model
> perfectly.
>
> > For example, let's say I call an ioctl to
> > perform an async VM_BIND, then the dma fence signalling critical path
> > starts in the ioctl, but then it moves into a workqueue and finishes
> > there, right?
>
> It's a bit trickier. The fence signalling path usually doesn't exist in
> the submitting ioctl until the submission becomes effective and the
> emitted fences are exposed to the outside world. That is, when:
> - syncobjs are updated to point to this new fence
> - fencefd pointing to this new fence is returned
> - fence is added to the dma_resvs inside the gem/dma_buf objects
> - ... (there might be other cases I forgot about)
>
> In the submission path, what's important is that no blocking allocation
> is done between the moment the fence is exposed, and the moment it's
> queued. In practice what happens is that the job this fence is bound to
> is queued even before the fences are exposed, so if anything, what we
> should ensure is the ordering, and having a guarantee that a job being
> queued means it's going to be dequeued and executed soon enough.
>
> The second DMA signaling path exists in the context of the
> workqueue/item dequeuing a job from the JobQueue (or drm_sched) and
> pushing it to the HW. Then there's the IRQ handler being called to
> inform the GPU is done executing this job, which might in some cases
> lead to another work item being queued for further processing from
> which the dma_fence is signaled. In other cases, the dma_fence is
> signaled directly from the IRQ handler. All of these contexts are
> considered being part of the DMA-signaling path. But it's not like the
> fence signaling annotation is passed around, because the cookies
> returned by dma_fence_begin_signalling() are only valid in a single
> thread context, IIRC.
Ok I understand what's going on now.
You talk about it as-if there are two signalling paths, one in the ioctl
and another in the workqueue. But that sounds like a workaround to make
it work with how dma_fence_begin_signalling() is implemented, and not
the true situation. (Added: looks like Christian confirms this.)
One way you can see this is by looking at what we require of the
workqueue. For all this to work, it's pretty important that we never
schedule anything on the workqueue that's not signalling safe, since
otherwise you could have a deadlock where the workqueue is executes some
random job calling kmalloc(GFP_KERNEL) and then blocks on our fence,
meaning that the VM_BIND job never gets scheduled since the workqueue
is never freed up. Deadlock.
And the correct way to model the above in lockdep is to have the DMA
fence lockdep key be nested *outside* the lockdep key of the workqueue.
Because there is a step in the signalling path where you wait for other
jobs to complete before the signalling path is able to continue.
And the meaning of such a lockdep dependency is exactly that the
critical region moves from one thread to another.
Perhaps we could have an API like this:
// Split the DriverDmaFence into two separate fence concepts:
struct PrivateFence { ... }
struct PublishedFence { ... }
/// The owner of this value must ensure that this fence is signalled.
struct MustBeSignalled<'fence> { ... }
/// Proof value indicating that the fence has either already been
/// signalled, or it will be. The lifetime ensures that you cannot mix
/// up the proof value.
struct WillBeSignalled<'fence> { ... }
/// Create a PublishedFence by entering a region that promises to signal
/// it.
///
/// The only way to return from the `region` closure it to construct a
/// WillBeSignalled value, and the only way to do that (see below) is to
/// signal it, or spawn a workqueue job that promises to signal it.
/// (Since the only way for that workqueue job to exit is to construct a
/// second WillBeSignalled value.)
fn dma_fence_begin_signalling(
fence: PrivateFence,
region: impl for<'f> FnOnce(MustBeSignalled<'f>) -> WillBeSignalled<'f>,
) -> PublishedFence {
let cookie = bindings::dma_fence_begin_signalling();
region(<create token here>);
bindings::dma_fence_end_signalling(cookie);
fence.i_promise_it_will_be_signalled();
}
impl MustBeSignalled<'_> {
/// Drivers generally should not use this one.
fn i_promise_it_will_be_signalled(self) -> WillBeSignalled { ... }
/// One way to ensure the fence has been signalled is to signal it.
fn signal_fence(self) -> WillBeSignalled {
self.fence.signal();
self.i_promise_it_will_be_signalled()
}
/// Another way to ensure the fence will be signalled is to spawn a
/// workqueue item that promises to signal it.
fn transfer_to_wq(
self,
wq: &Workqueue,
item: impl DmaFenceWorkItem,
) -> WillBeSignalled {
// briefly obtain the lock class of the wq to indicate to
// lockdep that the signalling path "blocks" on arbitrary jobs
// from this wq completing
bindings::lock_acquire(&wq->key);
bindings::lock_release(&wq->key);
// enqueue the job
wq.enqueue(item, wq);
// The signature of DmaFenceWorkItem::run() promises to arrange
// for it to be signalled.
self.i_promise_it_will_be_signalled()
}
}
trait DmaFenceWorkItem {
fn run<'f>(self, fence: MustBeSignalled<'f>) -> WillBeSignalled<'f>;
}
with this API, the ioctl can do this:
let published_fence = dma_fence_begin_signalling(|fence| {
fence.transfer_to_wq(my_wq, my_work_item)
});
somehow_publish_the_fence_to_userspace(published_fence);
And we're ensured that the fence is really signalled because the
signature of the work item closure is such that it can only return from
DmaFenceWorkItem::run() by signalling the fence (or spawning it to a
second workqueue, which in turn promises to signal it).
Of course, the signature only enforces that the fence is (or will be)
signalled if you return. One can still put an infinite loop inside
dma_fence_begin_signalling() and avoid signalling it, but there's
nothing we can do about that.
> > Second, it looks like we have the same challenge as with irq locks where
> > you must properly nest dma_fence_begin_signalling() regions, and can't
> > e.g. do this:
> >
> > c1 = dma_fence_begin_signalling()
> > c2 = dma_fence_begin_signalling()
> > dma_fence_end_signalling(c1)
> > dma_fence_end_signalling(c2)
>
> I think that's the case yes, you have to end in reverse being order.
Ok.
Alice
^ permalink raw reply [flat|nested] 103+ messages in thread* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-10 10:15 ` Alice Ryhl
@ 2026-02-10 10:36 ` Danilo Krummrich
2026-02-10 10:46 ` Christian König
2026-02-10 10:46 ` Boris Brezillon
` (3 subsequent siblings)
4 siblings, 1 reply; 103+ messages in thread
From: Danilo Krummrich @ 2026-02-10 10:36 UTC (permalink / raw)
To: Alice Ryhl
Cc: Boris Brezillon, Christian König, Philipp Stanner, phasta,
David Airlie, Simona Vetter, Gary Guo, Benno Lossin,
Daniel Almeida, Joel Fernandes, linux-kernel, dri-devel,
rust-for-linux
On Tue Feb 10, 2026 at 11:15 AM CET, Alice Ryhl wrote:
> One way you can see this is by looking at what we require of the
> workqueue. For all this to work, it's pretty important that we never
> schedule anything on the workqueue that's not signalling safe, since
> otherwise you could have a deadlock where the workqueue is executes some
> random job calling kmalloc(GFP_KERNEL) and then blocks on our fence,
> meaning that the VM_BIND job never gets scheduled since the workqueue
> is never freed up. Deadlock.
Yes, I also pointed this out multiple times in the past in the context of C GPU
scheduler discussions. It really depends on the workqueue and how it is used.
In the C GPU scheduler the driver can pass its own workqueue to the scheduler,
which means that the driver has to ensure that at least one out of the
wq->max_active works is free for the scheduler to make progress on the
scheduler's run and free job work.
Or in other words, there must be no more than wq->max_active - 1 works that
execute code violating the DMA fence signalling rules.
This is also why the JobQ needs its own workqueue and relying on the system WQ
is unsound.
In case of an ordered workqueue, it is always a potential deadlock to schedule
work that does non-atomic allocations or takes a lock that is used elsewhere for
non-atomic allocations of course.
^ permalink raw reply [flat|nested] 103+ messages in thread* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-10 10:36 ` Danilo Krummrich
@ 2026-02-10 10:46 ` Christian König
2026-02-10 11:40 ` Alice Ryhl
0 siblings, 1 reply; 103+ messages in thread
From: Christian König @ 2026-02-10 10:46 UTC (permalink / raw)
To: Danilo Krummrich, Alice Ryhl
Cc: Boris Brezillon, Philipp Stanner, phasta, David Airlie,
Simona Vetter, Gary Guo, Benno Lossin, Daniel Almeida,
Joel Fernandes, linux-kernel, dri-devel, rust-for-linux
On 2/10/26 11:36, Danilo Krummrich wrote:
> On Tue Feb 10, 2026 at 11:15 AM CET, Alice Ryhl wrote:
>> One way you can see this is by looking at what we require of the
>> workqueue. For all this to work, it's pretty important that we never
>> schedule anything on the workqueue that's not signalling safe, since
>> otherwise you could have a deadlock where the workqueue is executes some
>> random job calling kmalloc(GFP_KERNEL) and then blocks on our fence,
>> meaning that the VM_BIND job never gets scheduled since the workqueue
>> is never freed up. Deadlock.
>
> Yes, I also pointed this out multiple times in the past in the context of C GPU
> scheduler discussions. It really depends on the workqueue and how it is used.
>
> In the C GPU scheduler the driver can pass its own workqueue to the scheduler,
> which means that the driver has to ensure that at least one out of the
> wq->max_active works is free for the scheduler to make progress on the
> scheduler's run and free job work.
>
> Or in other words, there must be no more than wq->max_active - 1 works that
> execute code violating the DMA fence signalling rules.
*And* the workqueue must be created with WQ_MEM_RECLAIM so that work items can also start under memory pressure and not potentially cycle back into the memory management to wait for a dma_fence to signal.
But apart from that your explanation is perfectly correct, yes.
Thanks,
Christian.
> This is also why the JobQ needs its own workqueue and relying on the system WQ
> is unsound.
>
> In case of an ordered workqueue, it is always a potential deadlock to schedule
> work that does non-atomic allocations or takes a lock that is used elsewhere for
> non-atomic allocations of course.
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-10 10:46 ` Christian König
@ 2026-02-10 11:40 ` Alice Ryhl
2026-02-10 12:28 ` Boris Brezillon
2026-02-11 9:57 ` Danilo Krummrich
0 siblings, 2 replies; 103+ messages in thread
From: Alice Ryhl @ 2026-02-10 11:40 UTC (permalink / raw)
To: Christian König
Cc: Danilo Krummrich, Boris Brezillon, Philipp Stanner, phasta,
David Airlie, Simona Vetter, Gary Guo, Benno Lossin,
Daniel Almeida, Joel Fernandes, linux-kernel, dri-devel,
rust-for-linux
On Tue, Feb 10, 2026 at 11:46:44AM +0100, Christian König wrote:
> On 2/10/26 11:36, Danilo Krummrich wrote:
> > On Tue Feb 10, 2026 at 11:15 AM CET, Alice Ryhl wrote:
> >> One way you can see this is by looking at what we require of the
> >> workqueue. For all this to work, it's pretty important that we never
> >> schedule anything on the workqueue that's not signalling safe, since
> >> otherwise you could have a deadlock where the workqueue is executes some
> >> random job calling kmalloc(GFP_KERNEL) and then blocks on our fence,
> >> meaning that the VM_BIND job never gets scheduled since the workqueue
> >> is never freed up. Deadlock.
> >
> > Yes, I also pointed this out multiple times in the past in the context of C GPU
> > scheduler discussions. It really depends on the workqueue and how it is used.
> >
> > In the C GPU scheduler the driver can pass its own workqueue to the scheduler,
> > which means that the driver has to ensure that at least one out of the
> > wq->max_active works is free for the scheduler to make progress on the
> > scheduler's run and free job work.
> >
> > Or in other words, there must be no more than wq->max_active - 1 works that
> > execute code violating the DMA fence signalling rules.
Ouch, is that really the best way to do that? Why not two workqueues?
> *And* the workqueue must be created with WQ_MEM_RECLAIM so that work
> items can also start under memory pressure and not potentially cycle
> back into the memory management to wait for a dma_fence to signal.
>
> But apart from that your explanation is perfectly correct, yes.
Ah, interesting point.
Alice
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-10 11:40 ` Alice Ryhl
@ 2026-02-10 12:28 ` Boris Brezillon
2026-02-11 9:57 ` Danilo Krummrich
1 sibling, 0 replies; 103+ messages in thread
From: Boris Brezillon @ 2026-02-10 12:28 UTC (permalink / raw)
To: Alice Ryhl
Cc: Christian König, Danilo Krummrich, Philipp Stanner, phasta,
David Airlie, Simona Vetter, Gary Guo, Benno Lossin,
Daniel Almeida, Joel Fernandes, linux-kernel, dri-devel,
rust-for-linux
On Tue, 10 Feb 2026 11:40:14 +0000
Alice Ryhl <aliceryhl@google.com> wrote:
> On Tue, Feb 10, 2026 at 11:46:44AM +0100, Christian König wrote:
> > On 2/10/26 11:36, Danilo Krummrich wrote:
> > > On Tue Feb 10, 2026 at 11:15 AM CET, Alice Ryhl wrote:
> > >> One way you can see this is by looking at what we require of the
> > >> workqueue. For all this to work, it's pretty important that we never
> > >> schedule anything on the workqueue that's not signalling safe, since
> > >> otherwise you could have a deadlock where the workqueue is executes some
> > >> random job calling kmalloc(GFP_KERNEL) and then blocks on our fence,
> > >> meaning that the VM_BIND job never gets scheduled since the workqueue
> > >> is never freed up. Deadlock.
> > >
> > > Yes, I also pointed this out multiple times in the past in the context of C GPU
> > > scheduler discussions. It really depends on the workqueue and how it is used.
> > >
> > > In the C GPU scheduler the driver can pass its own workqueue to the scheduler,
> > > which means that the driver has to ensure that at least one out of the
> > > wq->max_active works is free for the scheduler to make progress on the
> > > scheduler's run and free job work.
> > >
> > > Or in other words, there must be no more than wq->max_active - 1 works that
> > > execute code violating the DMA fence signalling rules.
>
> Ouch, is that really the best way to do that? Why not two workqueues?
Honestly, I'm wondering if we're not better off adding the concept of
DmaFenceSignalingWorkqueue on which only DmaFenceSignalingWorkItem can
be scheduled, for our own sanity.
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-10 11:40 ` Alice Ryhl
2026-02-10 12:28 ` Boris Brezillon
@ 2026-02-11 9:57 ` Danilo Krummrich
2026-02-11 10:08 ` Philipp Stanner
2026-02-11 10:20 ` Boris Brezillon
1 sibling, 2 replies; 103+ messages in thread
From: Danilo Krummrich @ 2026-02-11 9:57 UTC (permalink / raw)
To: Alice Ryhl
Cc: Christian König, Boris Brezillon, Philipp Stanner, phasta,
David Airlie, Simona Vetter, Gary Guo, Benno Lossin,
Daniel Almeida, Joel Fernandes, linux-kernel, dri-devel,
rust-for-linux, lucas.demarchi, thomas.hellstrom, rodrigo.vivi
(Cc: Xe maintainers)
On Tue Feb 10, 2026 at 12:40 PM CET, Alice Ryhl wrote:
> On Tue, Feb 10, 2026 at 11:46:44AM +0100, Christian König wrote:
>> On 2/10/26 11:36, Danilo Krummrich wrote:
>> > On Tue Feb 10, 2026 at 11:15 AM CET, Alice Ryhl wrote:
>> >> One way you can see this is by looking at what we require of the
>> >> workqueue. For all this to work, it's pretty important that we never
>> >> schedule anything on the workqueue that's not signalling safe, since
>> >> otherwise you could have a deadlock where the workqueue is executes some
>> >> random job calling kmalloc(GFP_KERNEL) and then blocks on our fence,
>> >> meaning that the VM_BIND job never gets scheduled since the workqueue
>> >> is never freed up. Deadlock.
>> >
>> > Yes, I also pointed this out multiple times in the past in the context of C GPU
>> > scheduler discussions. It really depends on the workqueue and how it is used.
>> >
>> > In the C GPU scheduler the driver can pass its own workqueue to the scheduler,
>> > which means that the driver has to ensure that at least one out of the
>> > wq->max_active works is free for the scheduler to make progress on the
>> > scheduler's run and free job work.
>> >
>> > Or in other words, there must be no more than wq->max_active - 1 works that
>> > execute code violating the DMA fence signalling rules.
>
> Ouch, is that really the best way to do that? Why not two workqueues?
Most drivers making use of this re-use the same workqueue for multiple GPU
scheduler instances in firmware scheduling mode (i.e. 1:1 relationship between
scheduler and entity). This is equivalent to the JobQ use-case.
Note that we will have one JobQ instance per userspace queue, so sharing the
workqueue between JobQ instances can make sense.
Besides that, IIRC Xe was re-using the workqueue for something else, but that
doesn't seem to be the case anymore. I can only find [1], which more seems like
some custom GPU scheduler extention [2] to me...
[1] https://elixir.bootlin.com/linux/v6.18.6/source/drivers/gpu/drm/xe/xe_gpu_scheduler.c#L40
[2] https://elixir.bootlin.com/linux/v6.18.6/source/drivers/gpu/drm/xe/xe_gpu_scheduler_types.h#L28
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-11 9:57 ` Danilo Krummrich
@ 2026-02-11 10:08 ` Philipp Stanner
2026-02-11 10:28 ` Boris Brezillon
2026-02-11 10:20 ` Boris Brezillon
1 sibling, 1 reply; 103+ messages in thread
From: Philipp Stanner @ 2026-02-11 10:08 UTC (permalink / raw)
To: Danilo Krummrich, Alice Ryhl
Cc: Christian König, Boris Brezillon, phasta, David Airlie,
Simona Vetter, Gary Guo, Benno Lossin, Daniel Almeida,
Joel Fernandes, linux-kernel, dri-devel, rust-for-linux,
lucas.demarchi, thomas.hellstrom, rodrigo.vivi
On Wed, 2026-02-11 at 10:57 +0100, Danilo Krummrich wrote:
> (Cc: Xe maintainers)
>
> On Tue Feb 10, 2026 at 12:40 PM CET, Alice Ryhl wrote:
> > On Tue, Feb 10, 2026 at 11:46:44AM +0100, Christian König wrote:
> > > On 2/10/26 11:36, Danilo Krummrich wrote:
> > > > On Tue Feb 10, 2026 at 11:15 AM CET, Alice Ryhl wrote:
> > > > >
[…]
> > > >
> > > > Or in other words, there must be no more than wq->max_active - 1 works that
> > > > execute code violating the DMA fence signalling rules.
> >
> > Ouch, is that really the best way to do that? Why not two workqueues?
>
> Most drivers making use of this re-use the same workqueue for multiple GPU
> scheduler instances in firmware scheduling mode (i.e. 1:1 relationship between
> scheduler and entity). This is equivalent to the JobQ use-case.
>
> Note that we will have one JobQ instance per userspace queue, so sharing the
> workqueue between JobQ instances can make sense.
Why, what for?
P.
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-11 10:08 ` Philipp Stanner
@ 2026-02-11 10:28 ` Boris Brezillon
0 siblings, 0 replies; 103+ messages in thread
From: Boris Brezillon @ 2026-02-11 10:28 UTC (permalink / raw)
To: Philipp Stanner
Cc: phasta, Danilo Krummrich, Alice Ryhl, Christian König,
David Airlie, Simona Vetter, Gary Guo, Benno Lossin,
Daniel Almeida, Joel Fernandes, linux-kernel, dri-devel,
rust-for-linux, lucas.demarchi, thomas.hellstrom, rodrigo.vivi
On Wed, 11 Feb 2026 11:08:55 +0100
Philipp Stanner <phasta@mailbox.org> wrote:
> On Wed, 2026-02-11 at 10:57 +0100, Danilo Krummrich wrote:
> > (Cc: Xe maintainers)
> >
> > On Tue Feb 10, 2026 at 12:40 PM CET, Alice Ryhl wrote:
> > > On Tue, Feb 10, 2026 at 11:46:44AM +0100, Christian König wrote:
> > > > On 2/10/26 11:36, Danilo Krummrich wrote:
> > > > > On Tue Feb 10, 2026 at 11:15 AM CET, Alice Ryhl wrote:
> > > > > >
>
> […]
>
> > > > >
> > > > > Or in other words, there must be no more than wq->max_active - 1 works that
> > > > > execute code violating the DMA fence signalling rules.
> > >
> > > Ouch, is that really the best way to do that? Why not two workqueues?
> >
> > Most drivers making use of this re-use the same workqueue for multiple GPU
> > scheduler instances in firmware scheduling mode (i.e. 1:1 relationship between
> > scheduler and entity). This is equivalent to the JobQ use-case.
> >
> > Note that we will have one JobQ instance per userspace queue, so sharing the
> > workqueue between JobQ instances can make sense.
>
> Why, what for?
Because, even if it's not necessarily a 1:N relationship between queues
and threads these days (with the concept of shared worker pools), each
new workqueue usually imply the creation of new threads/resources, and
we usually don't need to have this level of parallelization (especially
if the communication channel with the FW can't be accessed
concurrently).
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-11 9:57 ` Danilo Krummrich
2026-02-11 10:08 ` Philipp Stanner
@ 2026-02-11 10:20 ` Boris Brezillon
2026-02-11 11:00 ` Danilo Krummrich
1 sibling, 1 reply; 103+ messages in thread
From: Boris Brezillon @ 2026-02-11 10:20 UTC (permalink / raw)
To: Danilo Krummrich
Cc: Alice Ryhl, Christian König, Philipp Stanner, phasta,
David Airlie, Simona Vetter, Gary Guo, Benno Lossin,
Daniel Almeida, Joel Fernandes, linux-kernel, dri-devel,
rust-for-linux, lucas.demarchi, thomas.hellstrom, rodrigo.vivi
On Wed, 11 Feb 2026 10:57:27 +0100
"Danilo Krummrich" <dakr@kernel.org> wrote:
> (Cc: Xe maintainers)
>
> On Tue Feb 10, 2026 at 12:40 PM CET, Alice Ryhl wrote:
> > On Tue, Feb 10, 2026 at 11:46:44AM +0100, Christian König wrote:
> >> On 2/10/26 11:36, Danilo Krummrich wrote:
> >> > On Tue Feb 10, 2026 at 11:15 AM CET, Alice Ryhl wrote:
> >> >> One way you can see this is by looking at what we require of the
> >> >> workqueue. For all this to work, it's pretty important that we never
> >> >> schedule anything on the workqueue that's not signalling safe, since
> >> >> otherwise you could have a deadlock where the workqueue is executes some
> >> >> random job calling kmalloc(GFP_KERNEL) and then blocks on our fence,
> >> >> meaning that the VM_BIND job never gets scheduled since the workqueue
> >> >> is never freed up. Deadlock.
> >> >
> >> > Yes, I also pointed this out multiple times in the past in the context of C GPU
> >> > scheduler discussions. It really depends on the workqueue and how it is used.
> >> >
> >> > In the C GPU scheduler the driver can pass its own workqueue to the scheduler,
> >> > which means that the driver has to ensure that at least one out of the
> >> > wq->max_active works is free for the scheduler to make progress on the
> >> > scheduler's run and free job work.
> >> >
> >> > Or in other words, there must be no more than wq->max_active - 1 works that
> >> > execute code violating the DMA fence signalling rules.
> >
> > Ouch, is that really the best way to do that? Why not two workqueues?
>
> Most drivers making use of this re-use the same workqueue for multiple GPU
> scheduler instances in firmware scheduling mode (i.e. 1:1 relationship between
> scheduler and entity). This is equivalent to the JobQ use-case.
>
> Note that we will have one JobQ instance per userspace queue, so sharing the
> workqueue between JobQ instances can make sense.
Definitely, but I think that's orthogonal to allowing this common
workqueue to be used for work items that don't comply with the
dma-fence signalling rules, isn't it?
>
> Besides that, IIRC Xe was re-using the workqueue for something else, but that
> doesn't seem to be the case anymore. I can only find [1], which more seems like
> some custom GPU scheduler extention [2] to me...
Yep, I think it can be the problematic case. It doesn't mean we can't
schedule work items that don't signal fences, but I think it'd be
simpler if we were forcing those to follow the same rules (no blocking
alloc, no locks taken that are also taken in other paths were blocking
allocs happen, etc) regardless of this wq->max_active value.
>
> [1] https://elixir.bootlin.com/linux/v6.18.6/source/drivers/gpu/drm/xe/xe_gpu_scheduler.c#L40
> [2] https://elixir.bootlin.com/linux/v6.18.6/source/drivers/gpu/drm/xe/xe_gpu_scheduler_types.h#L28
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-11 10:20 ` Boris Brezillon
@ 2026-02-11 11:00 ` Danilo Krummrich
2026-02-11 11:12 ` Boris Brezillon
0 siblings, 1 reply; 103+ messages in thread
From: Danilo Krummrich @ 2026-02-11 11:00 UTC (permalink / raw)
To: Boris Brezillon
Cc: Alice Ryhl, Christian König, Philipp Stanner, phasta,
David Airlie, Simona Vetter, Gary Guo, Benno Lossin,
Daniel Almeida, Joel Fernandes, linux-kernel, dri-devel,
rust-for-linux, lucas.demarchi, thomas.hellstrom, rodrigo.vivi
On Wed Feb 11, 2026 at 11:20 AM CET, Boris Brezillon wrote:
> On Wed, 11 Feb 2026 10:57:27 +0100
> "Danilo Krummrich" <dakr@kernel.org> wrote:
>
>> (Cc: Xe maintainers)
>>
>> On Tue Feb 10, 2026 at 12:40 PM CET, Alice Ryhl wrote:
>> > On Tue, Feb 10, 2026 at 11:46:44AM +0100, Christian König wrote:
>> >> On 2/10/26 11:36, Danilo Krummrich wrote:
>> >> > On Tue Feb 10, 2026 at 11:15 AM CET, Alice Ryhl wrote:
>> >> >> One way you can see this is by looking at what we require of the
>> >> >> workqueue. For all this to work, it's pretty important that we never
>> >> >> schedule anything on the workqueue that's not signalling safe, since
>> >> >> otherwise you could have a deadlock where the workqueue is executes some
>> >> >> random job calling kmalloc(GFP_KERNEL) and then blocks on our fence,
>> >> >> meaning that the VM_BIND job never gets scheduled since the workqueue
>> >> >> is never freed up. Deadlock.
>> >> >
>> >> > Yes, I also pointed this out multiple times in the past in the context of C GPU
>> >> > scheduler discussions. It really depends on the workqueue and how it is used.
>> >> >
>> >> > In the C GPU scheduler the driver can pass its own workqueue to the scheduler,
>> >> > which means that the driver has to ensure that at least one out of the
>> >> > wq->max_active works is free for the scheduler to make progress on the
>> >> > scheduler's run and free job work.
>> >> >
>> >> > Or in other words, there must be no more than wq->max_active - 1 works that
>> >> > execute code violating the DMA fence signalling rules.
>> >
>> > Ouch, is that really the best way to do that? Why not two workqueues?
>>
>> Most drivers making use of this re-use the same workqueue for multiple GPU
>> scheduler instances in firmware scheduling mode (i.e. 1:1 relationship between
>> scheduler and entity). This is equivalent to the JobQ use-case.
>>
>> Note that we will have one JobQ instance per userspace queue, so sharing the
>> workqueue between JobQ instances can make sense.
>
> Definitely, but I think that's orthogonal to allowing this common
> workqueue to be used for work items that don't comply with the
> dma-fence signalling rules, isn't it?
Yes and no. If we allow passing around shared WQs without a corresponding type
abstraction we open the door for drivers to abuse it the schedule their own
work.
I.e. sharing a workqueue between JobQs is fine, but we have to ensure they can't
be used for anything else.
>> Besides that, IIRC Xe was re-using the workqueue for something else, but that
>> doesn't seem to be the case anymore. I can only find [1], which more seems like
>> some custom GPU scheduler extention [2] to me...
>
> Yep, I think it can be the problematic case. It doesn't mean we can't
> schedule work items that don't signal fences, but I think it'd be
> simpler if we were forcing those to follow the same rules (no blocking
> alloc, no locks taken that are also taken in other paths were blocking
> allocs happen, etc) regardless of this wq->max_active value.
>
>>
>> [1] https://elixir.bootlin.com/linux/v6.18.6/source/drivers/gpu/drm/xe/xe_gpu_scheduler.c#L40
>> [2] https://elixir.bootlin.com/linux/v6.18.6/source/drivers/gpu/drm/xe/xe_gpu_scheduler_types.h#L28
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-11 11:00 ` Danilo Krummrich
@ 2026-02-11 11:12 ` Boris Brezillon
2026-02-11 14:38 ` Danilo Krummrich
0 siblings, 1 reply; 103+ messages in thread
From: Boris Brezillon @ 2026-02-11 11:12 UTC (permalink / raw)
To: Danilo Krummrich
Cc: Alice Ryhl, Christian König, Philipp Stanner, phasta,
David Airlie, Simona Vetter, Gary Guo, Benno Lossin,
Daniel Almeida, Joel Fernandes, linux-kernel, dri-devel,
rust-for-linux, lucas.demarchi, thomas.hellstrom, rodrigo.vivi
On Wed, 11 Feb 2026 12:00:30 +0100
"Danilo Krummrich" <dakr@kernel.org> wrote:
> On Wed Feb 11, 2026 at 11:20 AM CET, Boris Brezillon wrote:
> > On Wed, 11 Feb 2026 10:57:27 +0100
> > "Danilo Krummrich" <dakr@kernel.org> wrote:
> >
> >> (Cc: Xe maintainers)
> >>
> >> On Tue Feb 10, 2026 at 12:40 PM CET, Alice Ryhl wrote:
> >> > On Tue, Feb 10, 2026 at 11:46:44AM +0100, Christian König wrote:
> >> >> On 2/10/26 11:36, Danilo Krummrich wrote:
> >> >> > On Tue Feb 10, 2026 at 11:15 AM CET, Alice Ryhl wrote:
> >> >> >> One way you can see this is by looking at what we require of the
> >> >> >> workqueue. For all this to work, it's pretty important that we never
> >> >> >> schedule anything on the workqueue that's not signalling safe, since
> >> >> >> otherwise you could have a deadlock where the workqueue is executes some
> >> >> >> random job calling kmalloc(GFP_KERNEL) and then blocks on our fence,
> >> >> >> meaning that the VM_BIND job never gets scheduled since the workqueue
> >> >> >> is never freed up. Deadlock.
> >> >> >
> >> >> > Yes, I also pointed this out multiple times in the past in the context of C GPU
> >> >> > scheduler discussions. It really depends on the workqueue and how it is used.
> >> >> >
> >> >> > In the C GPU scheduler the driver can pass its own workqueue to the scheduler,
> >> >> > which means that the driver has to ensure that at least one out of the
> >> >> > wq->max_active works is free for the scheduler to make progress on the
> >> >> > scheduler's run and free job work.
> >> >> >
> >> >> > Or in other words, there must be no more than wq->max_active - 1 works that
> >> >> > execute code violating the DMA fence signalling rules.
> >> >
> >> > Ouch, is that really the best way to do that? Why not two workqueues?
> >>
> >> Most drivers making use of this re-use the same workqueue for multiple GPU
> >> scheduler instances in firmware scheduling mode (i.e. 1:1 relationship between
> >> scheduler and entity). This is equivalent to the JobQ use-case.
> >>
> >> Note that we will have one JobQ instance per userspace queue, so sharing the
> >> workqueue between JobQ instances can make sense.
> >
> > Definitely, but I think that's orthogonal to allowing this common
> > workqueue to be used for work items that don't comply with the
> > dma-fence signalling rules, isn't it?
>
> Yes and no. If we allow passing around shared WQs without a corresponding type
> abstraction we open the door for drivers to abuse it the schedule their own
> work.
>
> I.e. sharing a workqueue between JobQs is fine, but we have to ensure they can't
> be used for anything else.
Totally agree with that, and that's where I was going with this special
DmaFenceWorkqueue wrapper/abstract, that would only accept
scheduling MaySignalDmaFencesWorkItem objects.
>
> >> Besides that, IIRC Xe was re-using the workqueue for something else, but that
> >> doesn't seem to be the case anymore. I can only find [1], which more seems like
> >> some custom GPU scheduler extention [2] to me...
> >
> > Yep, I think it can be the problematic case. It doesn't mean we can't
> > schedule work items that don't signal fences, but I think it'd be
> > simpler if we were forcing those to follow the same rules (no blocking
> > alloc, no locks taken that are also taken in other paths were blocking
> > allocs happen, etc) regardless of this wq->max_active value.
> >
> >>
> >> [1] https://elixir.bootlin.com/linux/v6.18.6/source/drivers/gpu/drm/xe/xe_gpu_scheduler.c#L40
> >> [2] https://elixir.bootlin.com/linux/v6.18.6/source/drivers/gpu/drm/xe/xe_gpu_scheduler_types.h#L28
>
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-11 11:12 ` Boris Brezillon
@ 2026-02-11 14:38 ` Danilo Krummrich
2026-02-11 15:00 ` Boris Brezillon
0 siblings, 1 reply; 103+ messages in thread
From: Danilo Krummrich @ 2026-02-11 14:38 UTC (permalink / raw)
To: Boris Brezillon
Cc: Alice Ryhl, Christian König, Philipp Stanner, phasta,
David Airlie, Simona Vetter, Gary Guo, Benno Lossin,
Daniel Almeida, Joel Fernandes, linux-kernel, dri-devel,
rust-for-linux, lucas.demarchi, thomas.hellstrom, rodrigo.vivi
On Wed Feb 11, 2026 at 12:12 PM CET, Boris Brezillon wrote:
> On Wed, 11 Feb 2026 12:00:30 +0100
> "Danilo Krummrich" <dakr@kernel.org> wrote:
>> I.e. sharing a workqueue between JobQs is fine, but we have to ensure they can't
>> be used for anything else.
>
> Totally agree with that, and that's where I was going with this special
> DmaFenceWorkqueue wrapper/abstract, that would only accept
> scheduling MaySignalDmaFencesWorkItem objects.
Not sure if it has to be that complicated (for a first shot). At least for the
JobQ it would probably be enough to have a helper to create a new, let's say,
struct JobQueueWorker that encapsulates a (reference counted) workqueue, but
does not give access to it outside of jobq.rs.
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-11 14:38 ` Danilo Krummrich
@ 2026-02-11 15:00 ` Boris Brezillon
2026-02-11 15:05 ` Danilo Krummrich
2026-03-13 17:27 ` Matthew Brost
0 siblings, 2 replies; 103+ messages in thread
From: Boris Brezillon @ 2026-02-11 15:00 UTC (permalink / raw)
To: Danilo Krummrich
Cc: Alice Ryhl, Christian König, Philipp Stanner, phasta,
David Airlie, Simona Vetter, Gary Guo, Benno Lossin,
Daniel Almeida, Joel Fernandes, linux-kernel, dri-devel,
rust-for-linux, lucas.demarchi, thomas.hellstrom, rodrigo.vivi
On Wed, 11 Feb 2026 15:38:32 +0100
"Danilo Krummrich" <dakr@kernel.org> wrote:
> On Wed Feb 11, 2026 at 12:12 PM CET, Boris Brezillon wrote:
> > On Wed, 11 Feb 2026 12:00:30 +0100
> > "Danilo Krummrich" <dakr@kernel.org> wrote:
> >> I.e. sharing a workqueue between JobQs is fine, but we have to ensure they can't
> >> be used for anything else.
> >
> > Totally agree with that, and that's where I was going with this special
> > DmaFenceWorkqueue wrapper/abstract, that would only accept
> > scheduling MaySignalDmaFencesWorkItem objects.
>
> Not sure if it has to be that complicated (for a first shot). At least for the
> JobQ it would probably be enough to have a helper to create a new, let's say,
> struct JobQueueWorker that encapsulates a (reference counted) workqueue, but
> does not give access to it outside of jobq.rs.
Except we need to schedule some work items that are in the
DMA-signaling path but not directly controlled by the jobq.rs
implementation (see [1] for the post-execution work we schedule in
panthor).
The two options I can think of are:
1. Add a an unsafe interface to schedule work items on the wq attached
to JobQ. Safety requirements in that case being compliance with the
DMA-fence signalling rules.
2. The thing I was describing before, where we add the concept of
DmaFenceWorkqueue that can only take MaySignalDmaFencesWorkItem. We
can then have a DmaFenceWorkqueue that's global, and pass it to the
JobQueue so it can use it for its own work item.
We could start with option 1, sure, but since we're going to need to
schedule post-execution work items that have to be considered part of
the DMA-signalling path, I'd rather have these concepts clearly defined
from the start.
Mind if I give this DmaFenceWorkqueue/MaySignalDmaFencesWorkItem a try
to see what it looks like a get the discussion going from there
(hopefully it's just a thin wrapper around a regular
Workqueue/WorkItem, with an extra dma_fence_signalling annotation in
the WorkItem::run() path), or are you completely against the idea?
[1]https://elixir.bootlin.com/linux/v6.19-rc5/source/drivers/gpu/drm/panthor/panthor_sched.c#L1913
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-11 15:00 ` Boris Brezillon
@ 2026-02-11 15:05 ` Danilo Krummrich
2026-02-11 15:14 ` Boris Brezillon
2026-03-13 17:27 ` Matthew Brost
1 sibling, 1 reply; 103+ messages in thread
From: Danilo Krummrich @ 2026-02-11 15:05 UTC (permalink / raw)
To: Boris Brezillon
Cc: Alice Ryhl, Christian König, Philipp Stanner, phasta,
David Airlie, Simona Vetter, Gary Guo, Benno Lossin,
Daniel Almeida, Joel Fernandes, linux-kernel, dri-devel,
rust-for-linux, lucas.demarchi, thomas.hellstrom, rodrigo.vivi
On Wed Feb 11, 2026 at 4:00 PM CET, Boris Brezillon wrote:
> On Wed, 11 Feb 2026 15:38:32 +0100
> "Danilo Krummrich" <dakr@kernel.org> wrote:
>
>> On Wed Feb 11, 2026 at 12:12 PM CET, Boris Brezillon wrote:
>> > On Wed, 11 Feb 2026 12:00:30 +0100
>> > "Danilo Krummrich" <dakr@kernel.org> wrote:
>> >> I.e. sharing a workqueue between JobQs is fine, but we have to ensure they can't
>> >> be used for anything else.
>> >
>> > Totally agree with that, and that's where I was going with this special
>> > DmaFenceWorkqueue wrapper/abstract, that would only accept
>> > scheduling MaySignalDmaFencesWorkItem objects.
>>
>> Not sure if it has to be that complicated (for a first shot). At least for the
>> JobQ it would probably be enough to have a helper to create a new, let's say,
>> struct JobQueueWorker that encapsulates a (reference counted) workqueue, but
>> does not give access to it outside of jobq.rs.
>
> Except we need to schedule some work items that are in the
> DMA-signaling path but not directly controlled by the jobq.rs
> implementation (see [1] for the post-execution work we schedule in
> panthor).
>
> The two options I can think of are:
>
> 1. Add a an unsafe interface to schedule work items on the wq attached
> to JobQ. Safety requirements in that case being compliance with the
> DMA-fence signalling rules.
> 2. The thing I was describing before, where we add the concept of
> DmaFenceWorkqueue that can only take MaySignalDmaFencesWorkItem. We
> can then have a DmaFenceWorkqueue that's global, and pass it to the
> JobQueue so it can use it for its own work item.
>
> We could start with option 1, sure, but since we're going to need to
> schedule post-execution work items that have to be considered part of
> the DMA-signalling path, I'd rather have these concepts clearly defined
> from the start.
>
> Mind if I give this DmaFenceWorkqueue/MaySignalDmaFencesWorkItem a try
> to see what it looks like a get the discussion going from there
> (hopefully it's just a thin wrapper around a regular
> Workqueue/WorkItem, with an extra dma_fence_signalling annotation in
> the WorkItem::run() path), or are you completely against the idea?
Not at all, I think it's a good generalization.
But I'm very skeptical about the "we allow drivers to schedule arbitrary work on
the (shared) JobQueue workqueue" part. I think drivers can just have a separate
workqueue for such use-cases.
> [1]https://elixir.bootlin.com/linux/v6.19-rc5/source/drivers/gpu/drm/panthor/panthor_sched.c#L1913
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-11 15:05 ` Danilo Krummrich
@ 2026-02-11 15:14 ` Boris Brezillon
2026-02-11 15:16 ` Danilo Krummrich
0 siblings, 1 reply; 103+ messages in thread
From: Boris Brezillon @ 2026-02-11 15:14 UTC (permalink / raw)
To: Danilo Krummrich
Cc: Alice Ryhl, Christian König, Philipp Stanner, phasta,
David Airlie, Simona Vetter, Gary Guo, Benno Lossin,
Daniel Almeida, Joel Fernandes, linux-kernel, dri-devel,
rust-for-linux, lucas.demarchi, thomas.hellstrom, rodrigo.vivi
On Wed, 11 Feb 2026 16:05:48 +0100
"Danilo Krummrich" <dakr@kernel.org> wrote:
> On Wed Feb 11, 2026 at 4:00 PM CET, Boris Brezillon wrote:
> > On Wed, 11 Feb 2026 15:38:32 +0100
> > "Danilo Krummrich" <dakr@kernel.org> wrote:
> >
> >> On Wed Feb 11, 2026 at 12:12 PM CET, Boris Brezillon wrote:
> >> > On Wed, 11 Feb 2026 12:00:30 +0100
> >> > "Danilo Krummrich" <dakr@kernel.org> wrote:
> >> >> I.e. sharing a workqueue between JobQs is fine, but we have to ensure they can't
> >> >> be used for anything else.
> >> >
> >> > Totally agree with that, and that's where I was going with this special
> >> > DmaFenceWorkqueue wrapper/abstract, that would only accept
> >> > scheduling MaySignalDmaFencesWorkItem objects.
> >>
> >> Not sure if it has to be that complicated (for a first shot). At least for the
> >> JobQ it would probably be enough to have a helper to create a new, let's say,
> >> struct JobQueueWorker that encapsulates a (reference counted) workqueue, but
> >> does not give access to it outside of jobq.rs.
> >
> > Except we need to schedule some work items that are in the
> > DMA-signaling path but not directly controlled by the jobq.rs
> > implementation (see [1] for the post-execution work we schedule in
> > panthor).
> >
> > The two options I can think of are:
> >
> > 1. Add a an unsafe interface to schedule work items on the wq attached
> > to JobQ. Safety requirements in that case being compliance with the
> > DMA-fence signalling rules.
> > 2. The thing I was describing before, where we add the concept of
> > DmaFenceWorkqueue that can only take MaySignalDmaFencesWorkItem. We
> > can then have a DmaFenceWorkqueue that's global, and pass it to the
> > JobQueue so it can use it for its own work item.
> >
> > We could start with option 1, sure, but since we're going to need to
> > schedule post-execution work items that have to be considered part of
> > the DMA-signalling path, I'd rather have these concepts clearly defined
> > from the start.
> >
> > Mind if I give this DmaFenceWorkqueue/MaySignalDmaFencesWorkItem a try
> > to see what it looks like a get the discussion going from there
> > (hopefully it's just a thin wrapper around a regular
> > Workqueue/WorkItem, with an extra dma_fence_signalling annotation in
> > the WorkItem::run() path), or are you completely against the idea?
>
> Not at all, I think it's a good generalization.
>
> But I'm very skeptical about the "we allow drivers to schedule arbitrary work on
> the (shared) JobQueue workqueue" part. I think drivers can just have a separate
> workqueue for such use-cases.
Okay, that would be one DmaFenceWorkqueue only used for the driver
JobQueue instances (or one per-instance if the driver wants that)
wrapped into some object that doesn't expose it as a generic workqueue,
so only JobQueue instances can use it. And then drivers are free to
instantiate their own DmaFenceWorkqueue for anything else that's
still in the DMA-signalling path, but not directly related to
JobQueues. I think I'd be fine with that.
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-11 15:14 ` Boris Brezillon
@ 2026-02-11 15:16 ` Danilo Krummrich
0 siblings, 0 replies; 103+ messages in thread
From: Danilo Krummrich @ 2026-02-11 15:16 UTC (permalink / raw)
To: Boris Brezillon
Cc: Alice Ryhl, Christian König, Philipp Stanner, phasta,
David Airlie, Simona Vetter, Gary Guo, Benno Lossin,
Daniel Almeida, Joel Fernandes, linux-kernel, dri-devel,
rust-for-linux, lucas.demarchi, thomas.hellstrom, rodrigo.vivi
On Wed Feb 11, 2026 at 4:14 PM CET, Boris Brezillon wrote:
> Okay, that would be one DmaFenceWorkqueue only used for the driver
> JobQueue instances (or one per-instance if the driver wants that)
> wrapped into some object that doesn't expose it as a generic workqueue,
> so only JobQueue instances can use it. And then drivers are free to
> instantiate their own DmaFenceWorkqueue for anything else that's
> still in the DMA-signalling path, but not directly related to
> JobQueues. I think I'd be fine with that.
Yes, that sounds good to me.
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-11 15:00 ` Boris Brezillon
2026-02-11 15:05 ` Danilo Krummrich
@ 2026-03-13 17:27 ` Matthew Brost
1 sibling, 0 replies; 103+ messages in thread
From: Matthew Brost @ 2026-03-13 17:27 UTC (permalink / raw)
To: Boris Brezillon
Cc: Danilo Krummrich, Alice Ryhl, Christian König,
Philipp Stanner, phasta, David Airlie, Simona Vetter, Gary Guo,
Benno Lossin, Daniel Almeida, Joel Fernandes, linux-kernel,
dri-devel, rust-for-linux, lucas.demarchi, thomas.hellstrom,
rodrigo.vivi
On Wed, Feb 11, 2026 at 04:00:59PM +0100, Boris Brezillon wrote:
Jumping in here as I was tagged in this thread… a lot gets through.
Randomly picking a point to reply.
> On Wed, 11 Feb 2026 15:38:32 +0100
> "Danilo Krummrich" <dakr@kernel.org> wrote:
>
> > On Wed Feb 11, 2026 at 12:12 PM CET, Boris Brezillon wrote:
> > > On Wed, 11 Feb 2026 12:00:30 +0100
> > > "Danilo Krummrich" <dakr@kernel.org> wrote:
> > >> I.e. sharing a workqueue between JobQs is fine, but we have to ensure they can't
> > >> be used for anything else.
> > >
> > > Totally agree with that, and that's where I was going with this special
> > > DmaFenceWorkqueue wrapper/abstract, that would only accept
> > > scheduling MaySignalDmaFencesWorkItem objects.
> >
> > Not sure if it has to be that complicated (for a first shot). At least for the
> > JobQ it would probably be enough to have a helper to create a new, let's say,
> > struct JobQueueWorker that encapsulates a (reference counted) workqueue, but
> > does not give access to it outside of jobq.rs.
>
> Except we need to schedule some work items that are in the
> DMA-signaling path but not directly controlled by the jobq.rs
> implementation (see [1] for the post-execution work we schedule in
> panthor).
>
> The two options I can think of are:
>
> 1. Add a an unsafe interface to schedule work items on the wq attached
> to JobQ. Safety requirements in that case being compliance with the
> DMA-fence signalling rules.
For (1), use lockdep to enforce these rules. I have a patch for this
[1]. Something like this is probably what everyone needs—jobqueue can
either create a workqueue with this annotation or enforce that the one
being passed in already has it. I turned this on for all Xe workqueues
in the signaling path and immediately found a few bugs, and I know the
dma-fence rules pretty well, so this is clearly useful.
I think user-scheduling work on the submit work item is valid. The
primary case in Xe is control-plane messages (e.g., queue
suspend/resume, teardown, toggling queue priority in firmware, etc.).
You don’t want to race with submission while manipulating queue state,
so you order this work on the workqueue. Could you do this with a lock?
Probably. But then you’d have to audit every point that issues a
control-plane message to make sure you can take that lock.
There’s also the hazard where a control message is issued in IRQ context
but you need a mutex to manipulate the queue (In Xe this the mutex to
send firmware commands). For example, I’ve implemented fence deadlines
in Xe [2], which fire control-plane messages in IRQ context. Another
example is a job dropping a ref to the queue in IRQ context, and that
being the final reference that triggers teardown. I don’t do the later
yet in Xe, but it should be possible to drop your last queue ref when a
dma-fence signals (i.e., no free_job work — just a put in the dma-fence
signaling IRQ handler) if jobqueue is designed correctly.
I’m also not sure timeouts are supposed to work in jobqueue, but if you
need to stop/start the jobqueue to ensure you have full control over
your queue (e.g., new jobqueues aren’t racing), then you likely need a
second workqueue so you can stop the submit one, or you might be able to
get away with a mutex. This also applies to users scheduling workqueue
operations on this—such as global resets or migrating a VF—which stop
all jobqueue instances to perform fixups. These global events can’t race
with jobs timing out either, since multiple entities can’t be
stop/starting jobqueue instances at the same time without breaking
things.
This is why, in Xe, all job timeouts and all global events are scheduled
on a single workqueue instance shared among all DRM sched instances.
This has worked quite well, so I’d strongly recommend carrying this part
of DRM sched forward into whatever succeeds it.
I have a fairly detailed write-up of the Xe scheduler design [3] — it’s
a little stale, but it should describe how a subset of DRM sched works
very well to implement complex driver-side scheduling requirements. A
whole other subset of DRM sched is horrid, so I’d recommend taking the
good ideas from DRM sched (queue stop/start, workqueue-based ordering,
finished fences, job tracking to completion) and using those in
jobqueue, while dropping the bad ones (no real object-lifetime rules, no
ownership rules, no refcounting, wild teardown flows, wild dma-fence
callback manipulation, etc.) and not carrying those forward. Some of DRM
sched’s very bad ideas appear to be in jobqueue as well. I’d reconsider
those, but I won’t harp on the design at this point.
Matt
[1] https://patchwork.freedesktop.org/patch/682491/?series=156283&rev=1
[2] https://patchwork.freedesktop.org/patch/696820/?series=159479&rev=2
[3] https://patchwork.freedesktop.org/patch/669007/?series=153000&rev=3
> 2. The thing I was describing before, where we add the concept of
> DmaFenceWorkqueue that can only take MaySignalDmaFencesWorkItem. We
> can then have a DmaFenceWorkqueue that's global, and pass it to the
> JobQueue so it can use it for its own work item.
>
> We could start with option 1, sure, but since we're going to need to
> schedule post-execution work items that have to be considered part of
> the DMA-signalling path, I'd rather have these concepts clearly defined
> from the start.
>
> Mind if I give this DmaFenceWorkqueue/MaySignalDmaFencesWorkItem a try
> to see what it looks like a get the discussion going from there
> (hopefully it's just a thin wrapper around a regular
> Workqueue/WorkItem, with an extra dma_fence_signalling annotation in
> the WorkItem::run() path), or are you completely against the idea?
>
> [1]https://elixir.bootlin.com/linux/v6.19-rc5/source/drivers/gpu/drm/panthor/panthor_sched.c#L1913
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-10 10:15 ` Alice Ryhl
2026-02-10 10:36 ` Danilo Krummrich
@ 2026-02-10 10:46 ` Boris Brezillon
2026-02-10 11:34 ` Boris Brezillon
` (2 subsequent siblings)
4 siblings, 0 replies; 103+ messages in thread
From: Boris Brezillon @ 2026-02-10 10:46 UTC (permalink / raw)
To: Alice Ryhl
Cc: Christian König, Philipp Stanner, phasta, Danilo Krummrich,
David Airlie, Simona Vetter, Gary Guo, Benno Lossin,
Daniel Almeida, Joel Fernandes, linux-kernel, dri-devel,
rust-for-linux
On Tue, 10 Feb 2026 10:15:04 +0000
Alice Ryhl <aliceryhl@google.com> wrote:
> On Tue, Feb 10, 2026 at 10:15:25AM +0100, Boris Brezillon wrote:
> > On Tue, 10 Feb 2026 08:38:00 +0000
> > Alice Ryhl <aliceryhl@google.com> wrote:
> >
> > > On Tue, Feb 10, 2026 at 09:16:34AM +0100, Christian König wrote:
> > > > On 2/9/26 15:58, Boris Brezillon wrote:
> > > > > On Mon, 09 Feb 2026 09:19:46 +0100
> > > > > Philipp Stanner <phasta@mailbox.org> wrote:
> > > > >
> > > > >> On Fri, 2026-02-06 at 11:23 +0100, Danilo Krummrich wrote:
> > > > >>> On Thu Feb 5, 2026 at 9:57 AM CET, Boris Brezillon wrote:
> > > > >>>> On Tue, 3 Feb 2026 09:14:01 +0100
> > > > >>>> Philipp Stanner <phasta@kernel.org> wrote:
> > > > >>>> Unfortunately, I don't know how to translate that in rust, but we
> > > > >>>> need a way to check if any path code path does a DmaFence.signal(),
> > > > >>>> go back to the entry point (for a WorkItem, that would be
> > > > >>>> WorkItem::run() for instance), and make it a DmaFenceSignallingPath.
> > > > >>>> Not only that, but we need to know all the deps that make it so
> > > > >>>> this path can be called (if I take the WorkItem example, that would
> > > > >>>> be the path that leads to the WorkItem being scheduled).
> > > > >>>
> > > > >>> I think we need a guard object for this that is not Send, just like for any
> > > > >>> other lock.
> > > > >>>
> > > > >>> Internally, those markers rely on lockdep, i.e. they just acquire and release a
> > > > >>> "fake" lock.
> > > > >>
> > > > >> The guard object would be created through fence.begin_signalling(), wouldn't it?
> > > > >
> > > > > It shouldn't be a (&self)-method, because at the start of a DMA
> > > > > signaling path, you don't necessarily know which fence you're going to
> > > > > signal (you might actually signal several of them).
> > > > >
> > > > >> And when it drops you call dma_fence_end_signalling()?
> > > > >
> > > > > Yep, dma_fence_end_signalling() should be called when the guard is
> > > > > dropped.
> > > > >
> > > > >>
> > > > >> How would that ensure that the driver actually marks the signalling region correctly?
> > > > >
> > > > > Nothing, and that's a problem we have in C: you have no way of telling
> > > > > which code section is going to be a DMA-signaling path. I can't think
> > > > > of any way to make that safer in rust, unfortunately. The best I can
> > > > > think of would be to
> > > > >
> > > > > - Have a special DmaFenceSignalWorkItem (wrapper a WorkItem with extra
> > > > > constraints) that's designed for DMA-fence signaling, and that takes
> > > > > the DmaSignaling guard around the ::run() call.
> > > > > - We would then need to ensure that any code path scheduling this work
> > > > > item is also in a DMA-signaling path by taking a ref to the
> > > > > DmaSignalingGuard. This of course doesn't guarantee that the section
> > > > > is wide enough to prevent any non-authorized operations in any path
> > > > > leading to this WorkItem scheduling, but it would at least force the
> > > > > caller to consider the problem.
> > > >
> > > > On the C side I have a patch set which does something very similar.
> > > >
> > > > It's basically a WARN_ON_ONCE() which triggers as soon as you try to
> > > > signal a DMA fence from an IOCTL, or more specific process context.
> > > >
> > > > Signaling a DMA fence from interrupt context, a work item or kernel
> > > > thread is still allowed, there is just the hole that you can schedule
> > > > a work item from process context as well.
> > > >
> > > > The major problem with that patch set is that we have tons of very
> > > > hacky signaling paths in drivers already because we initially didn't
> > > > knew how much trouble getting this wrong causes.
> > > >
> > > > I'm strongly in favor of getting this right for the rust side from the
> > > > beginning and enforcing strict rules for every code trying to
> > > > implement a DMA fence.
> > >
> > > Hmm. Could you say a bit more about what the rules are? I just re-read
> > > the comments in dma-fence.c, but I have some questions.
> > >
> > > First, how does the signalling annotation work when the signalling path
> > > crosses thread boundaries?
> >
> > It's not supposed to cross the thread boundary at all. The annotation
> > is per-thread, and it that sense, it matches the lock guard model
> > perfectly.
> >
> > > For example, let's say I call an ioctl to
> > > perform an async VM_BIND, then the dma fence signalling critical path
> > > starts in the ioctl, but then it moves into a workqueue and finishes
> > > there, right?
> >
> > It's a bit trickier. The fence signalling path usually doesn't exist in
> > the submitting ioctl until the submission becomes effective and the
> > emitted fences are exposed to the outside world. That is, when:
> > - syncobjs are updated to point to this new fence
> > - fencefd pointing to this new fence is returned
> > - fence is added to the dma_resvs inside the gem/dma_buf objects
> > - ... (there might be other cases I forgot about)
> >
> > In the submission path, what's important is that no blocking allocation
> > is done between the moment the fence is exposed, and the moment it's
> > queued. In practice what happens is that the job this fence is bound to
> > is queued even before the fences are exposed, so if anything, what we
> > should ensure is the ordering, and having a guarantee that a job being
> > queued means it's going to be dequeued and executed soon enough.
> >
> > The second DMA signaling path exists in the context of the
> > workqueue/item dequeuing a job from the JobQueue (or drm_sched) and
> > pushing it to the HW. Then there's the IRQ handler being called to
> > inform the GPU is done executing this job, which might in some cases
> > lead to another work item being queued for further processing from
> > which the dma_fence is signaled. In other cases, the dma_fence is
> > signaled directly from the IRQ handler. All of these contexts are
> > considered being part of the DMA-signaling path. But it's not like the
> > fence signaling annotation is passed around, because the cookies
> > returned by dma_fence_begin_signalling() are only valid in a single
> > thread context, IIRC.
>
> Ok I understand what's going on now.
>
> You talk about it as-if there are two signalling paths, one in the ioctl
> and another in the workqueue. But that sounds like a workaround to make
> it work with how dma_fence_begin_signalling() is implemented, and not
> the true situation. (Added: looks like Christian confirms this.)
>
> One way you can see this is by looking at what we require of the
> workqueue. For all this to work, it's pretty important that we never
> schedule anything on the workqueue that's not signalling safe, since
> otherwise you could have a deadlock where the workqueue is executes some
> random job calling kmalloc(GFP_KERNEL) and then blocks on our fence,
> meaning that the VM_BIND job never gets scheduled since the workqueue
> is never freed up. Deadlock.
That's correct. I mean, I don't remember all the details around
workqueues, and it might be that a workqueue can be backed by multiple
threads, with new threads being spawned if things take too long. But
one of the reason most DRM drivers create their own workqueue is also
that they want to control what runs on it.
>
> And the correct way to model the above in lockdep is to have the DMA
> fence lockdep key be nested *outside* the lockdep key of the workqueue.
> Because there is a step in the signalling path where you wait for other
> jobs to complete before the signalling path is able to continue.
Yep, if the workqueue is shared, and non-DMA-signaling work items get
queued, they de-facto impact the DMA-signaling work items, and they
should be considered part of the DMA-signaling path too.
>
> And the meaning of such a lockdep dependency is exactly that the
> critical region moves from one thread to another.
We can see it like that. I was more referring to how this translate in
lockdep terms: that is, begin/end_signaling() need to be called in the
same thread, and any new thread which we know is going to help progress
on fence signaling needs to have its own begin/end_signaling(), no
matter how it's done (hopefully it's all automatic in rust).
>
>
> Perhaps we could have an API like this:
>
> // Split the DriverDmaFence into two separate fence concepts:
> struct PrivateFence { ... }
> struct PublishedFence { ... }
>
> /// The owner of this value must ensure that this fence is signalled.
> struct MustBeSignalled<'fence> { ... }
> /// Proof value indicating that the fence has either already been
> /// signalled, or it will be. The lifetime ensures that you cannot mix
> /// up the proof value.
> struct WillBeSignalled<'fence> { ... }
>
> /// Create a PublishedFence by entering a region that promises to signal
> /// it.
> ///
> /// The only way to return from the `region` closure it to construct a
> /// WillBeSignalled value, and the only way to do that (see below) is to
> /// signal it, or spawn a workqueue job that promises to signal it.
> /// (Since the only way for that workqueue job to exit is to construct a
> /// second WillBeSignalled value.)
> fn dma_fence_begin_signalling(
> fence: PrivateFence,
> region: impl for<'f> FnOnce(MustBeSignalled<'f>) -> WillBeSignalled<'f>,
> ) -> PublishedFence {
> let cookie = bindings::dma_fence_begin_signalling();
>
> region(<create token here>);
>
> bindings::dma_fence_end_signalling(cookie);
>
> fence.i_promise_it_will_be_signalled();
> }
>
> impl MustBeSignalled<'_> {
> /// Drivers generally should not use this one.
> fn i_promise_it_will_be_signalled(self) -> WillBeSignalled { ... }
>
> /// One way to ensure the fence has been signalled is to signal it.
> fn signal_fence(self) -> WillBeSignalled {
> self.fence.signal();
> self.i_promise_it_will_be_signalled()
> }
>
> /// Another way to ensure the fence will be signalled is to spawn a
> /// workqueue item that promises to signal it.
> fn transfer_to_wq(
> self,
> wq: &Workqueue,
> item: impl DmaFenceWorkItem,
> ) -> WillBeSignalled {
> // briefly obtain the lock class of the wq to indicate to
> // lockdep that the signalling path "blocks" on arbitrary jobs
> // from this wq completing
> bindings::lock_acquire(&wq->key);
> bindings::lock_release(&wq->key);
>
> // enqueue the job
> wq.enqueue(item, wq);
>
> // The signature of DmaFenceWorkItem::run() promises to arrange
> // for it to be signalled.
> self.i_promise_it_will_be_signalled()
> }
> }
>
> trait DmaFenceWorkItem {
> fn run<'f>(self, fence: MustBeSignalled<'f>) -> WillBeSignalled<'f>;
> }
>
>
> with this API, the ioctl can do this:
>
> let published_fence = dma_fence_begin_signalling(|fence| {
> fence.transfer_to_wq(my_wq, my_work_item)
> });
> somehow_publish_the_fence_to_userspace(published_fence);
>
> And we're ensured that the fence is really signalled because the
> signature of the work item closure is such that it can only return from
> DmaFenceWorkItem::run() by signalling the fence (or spawning it to a
> second workqueue, which in turn promises to signal it).
>
> Of course, the signature only enforces that the fence is (or will be)
> signalled if you return. One can still put an infinite loop inside
> dma_fence_begin_signalling() and avoid signalling it, but there's
> nothing we can do about that.
At first glance it seems to enforce what we want to enforce, but I'll
have a closer look to try and understand all the subtleties.
Thanks a lot for bringing up your expertise to this discussion, that's
super helpful.
^ permalink raw reply [flat|nested] 103+ messages in thread* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-10 10:15 ` Alice Ryhl
2026-02-10 10:36 ` Danilo Krummrich
2026-02-10 10:46 ` Boris Brezillon
@ 2026-02-10 11:34 ` Boris Brezillon
2026-02-10 11:45 ` Alice Ryhl
2026-02-10 12:36 ` Boris Brezillon
2026-02-10 12:49 ` Boris Brezillon
4 siblings, 1 reply; 103+ messages in thread
From: Boris Brezillon @ 2026-02-10 11:34 UTC (permalink / raw)
To: Alice Ryhl
Cc: Christian König, Philipp Stanner, phasta, Danilo Krummrich,
David Airlie, Simona Vetter, Gary Guo, Benno Lossin,
Daniel Almeida, Joel Fernandes, linux-kernel, dri-devel,
rust-for-linux
On Tue, 10 Feb 2026 10:15:04 +0000
Alice Ryhl <aliceryhl@google.com> wrote:
> impl MustBeSignalled<'_> {
> /// Drivers generally should not use this one.
> fn i_promise_it_will_be_signalled(self) -> WillBeSignalled { ... }
>
> /// One way to ensure the fence has been signalled is to signal it.
> fn signal_fence(self) -> WillBeSignalled {
> self.fence.signal();
> self.i_promise_it_will_be_signalled()
> }
>
> /// Another way to ensure the fence will be signalled is to spawn a
> /// workqueue item that promises to signal it.
> fn transfer_to_wq(
> self,
> wq: &Workqueue,
> item: impl DmaFenceWorkItem,
> ) -> WillBeSignalled {
> // briefly obtain the lock class of the wq to indicate to
> // lockdep that the signalling path "blocks" on arbitrary jobs
> // from this wq completing
> bindings::lock_acquire(&wq->key);
> bindings::lock_release(&wq->key);
Sorry, I'm still trying to connect the dots here. I get that the intent
is to ensure the pseudo-lock ordering is always:
-> dma_fence_lockdep_map
-> wq->lockdep_map
but how can this order be the same in the WorkItem execution path? My
interpretation of process_one_work() makes me think we'll end up with
-> wq->lockdep_map
-> work->run()
-> WorkItem::run()
-> dma_fence_lockdep_map
-> DmaFenceSignalingWorkItem::run()
...
Am I missing something? Is there a way you can insert the
dma_fence_lockdep_map acquisition before the wq->lockdep_map one in the
execution path?
>
> // enqueue the job
> wq.enqueue(item, wq);
>
> // The signature of DmaFenceWorkItem::run() promises to arrange
> // for it to be signalled.
> self.i_promise_it_will_be_signalled()
> }
> }
^ permalink raw reply [flat|nested] 103+ messages in thread* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-10 11:34 ` Boris Brezillon
@ 2026-02-10 11:45 ` Alice Ryhl
2026-02-10 12:21 ` Boris Brezillon
0 siblings, 1 reply; 103+ messages in thread
From: Alice Ryhl @ 2026-02-10 11:45 UTC (permalink / raw)
To: Boris Brezillon
Cc: Christian König, Philipp Stanner, phasta, Danilo Krummrich,
David Airlie, Simona Vetter, Gary Guo, Benno Lossin,
Daniel Almeida, Joel Fernandes, linux-kernel, dri-devel,
rust-for-linux
On Tue, Feb 10, 2026 at 12:34:32PM +0100, Boris Brezillon wrote:
> On Tue, 10 Feb 2026 10:15:04 +0000
> Alice Ryhl <aliceryhl@google.com> wrote:
>
> > impl MustBeSignalled<'_> {
> > /// Drivers generally should not use this one.
> > fn i_promise_it_will_be_signalled(self) -> WillBeSignalled { ... }
> >
> > /// One way to ensure the fence has been signalled is to signal it.
> > fn signal_fence(self) -> WillBeSignalled {
> > self.fence.signal();
> > self.i_promise_it_will_be_signalled()
> > }
> >
> > /// Another way to ensure the fence will be signalled is to spawn a
> > /// workqueue item that promises to signal it.
> > fn transfer_to_wq(
> > self,
> > wq: &Workqueue,
> > item: impl DmaFenceWorkItem,
> > ) -> WillBeSignalled {
> > // briefly obtain the lock class of the wq to indicate to
> > // lockdep that the signalling path "blocks" on arbitrary jobs
> > // from this wq completing
> > bindings::lock_acquire(&wq->key);
> > bindings::lock_release(&wq->key);
>
> Sorry, I'm still trying to connect the dots here. I get that the intent
> is to ensure the pseudo-lock ordering is always:
>
> -> dma_fence_lockdep_map
> -> wq->lockdep_map
>
> but how can this order be the same in the WorkItem execution path? My
> interpretation of process_one_work() makes me think we'll end up with
>
> -> wq->lockdep_map
> -> work->run()
> -> WorkItem::run()
> -> dma_fence_lockdep_map
> -> DmaFenceSignalingWorkItem::run()
> ...
>
> Am I missing something? Is there a way you can insert the
> dma_fence_lockdep_map acquisition before the wq->lockdep_map one in the
> execution path?
Conceptually, the dma_fence_lockdep_map is already taken by the time you
get to WorkItem::run() because it was taken all the way back in the
ioctl, so WorkItem::run() does not need to reacquire it.
Now, of course that does not translate cleanly to how lockdep does
things, so in lockdep we do have to re-acquire it in WorkItem::run().
You can do that by setting the trylock bit when calling lock_acquire()
on dma_fence_lockdep_map. This has the correct semantics because trylock
does not create an edge from wq->lockdep_map to dma_fence_lockdep_map.
Alice
^ permalink raw reply [flat|nested] 103+ messages in thread* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-10 11:45 ` Alice Ryhl
@ 2026-02-10 12:21 ` Boris Brezillon
2026-02-10 13:34 ` Alice Ryhl
0 siblings, 1 reply; 103+ messages in thread
From: Boris Brezillon @ 2026-02-10 12:21 UTC (permalink / raw)
To: Alice Ryhl
Cc: Christian König, Philipp Stanner, phasta, Danilo Krummrich,
David Airlie, Simona Vetter, Gary Guo, Benno Lossin,
Daniel Almeida, Joel Fernandes, linux-kernel, dri-devel,
rust-for-linux
On Tue, 10 Feb 2026 11:45:36 +0000
Alice Ryhl <aliceryhl@google.com> wrote:
> On Tue, Feb 10, 2026 at 12:34:32PM +0100, Boris Brezillon wrote:
> > On Tue, 10 Feb 2026 10:15:04 +0000
> > Alice Ryhl <aliceryhl@google.com> wrote:
> >
> > > impl MustBeSignalled<'_> {
> > > /// Drivers generally should not use this one.
> > > fn i_promise_it_will_be_signalled(self) -> WillBeSignalled { ... }
> > >
> > > /// One way to ensure the fence has been signalled is to signal it.
> > > fn signal_fence(self) -> WillBeSignalled {
> > > self.fence.signal();
> > > self.i_promise_it_will_be_signalled()
> > > }
> > >
> > > /// Another way to ensure the fence will be signalled is to spawn a
> > > /// workqueue item that promises to signal it.
> > > fn transfer_to_wq(
> > > self,
> > > wq: &Workqueue,
> > > item: impl DmaFenceWorkItem,
> > > ) -> WillBeSignalled {
> > > // briefly obtain the lock class of the wq to indicate to
> > > // lockdep that the signalling path "blocks" on arbitrary jobs
> > > // from this wq completing
> > > bindings::lock_acquire(&wq->key);
> > > bindings::lock_release(&wq->key);
> >
> > Sorry, I'm still trying to connect the dots here. I get that the intent
> > is to ensure the pseudo-lock ordering is always:
> >
> > -> dma_fence_lockdep_map
> > -> wq->lockdep_map
> >
> > but how can this order be the same in the WorkItem execution path? My
> > interpretation of process_one_work() makes me think we'll end up with
> >
> > -> wq->lockdep_map
> > -> work->run()
> > -> WorkItem::run()
> > -> dma_fence_lockdep_map
> > -> DmaFenceSignalingWorkItem::run()
> > ...
> >
> > Am I missing something? Is there a way you can insert the
> > dma_fence_lockdep_map acquisition before the wq->lockdep_map one in the
> > execution path?
>
> Conceptually, the dma_fence_lockdep_map is already taken by the time you
> get to WorkItem::run() because it was taken all the way back in the
> ioctl, so WorkItem::run() does not need to reacquire it.
>
> Now, of course that does not translate cleanly to how lockdep does
> things, so in lockdep we do have to re-acquire it in WorkItem::run().
> You can do that by setting the trylock bit when calling lock_acquire()
> on dma_fence_lockdep_map. This has the correct semantics because trylock
> does not create an edge from wq->lockdep_map to dma_fence_lockdep_map.
Ah, I never noticed dma_fence_begin_signalling() was recording a
try_lock not a regular lock. I guess it would do then.
^ permalink raw reply [flat|nested] 103+ messages in thread* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-10 12:21 ` Boris Brezillon
@ 2026-02-10 13:34 ` Alice Ryhl
0 siblings, 0 replies; 103+ messages in thread
From: Alice Ryhl @ 2026-02-10 13:34 UTC (permalink / raw)
To: Boris Brezillon
Cc: Christian König, Philipp Stanner, phasta, Danilo Krummrich,
David Airlie, Simona Vetter, Gary Guo, Benno Lossin,
Daniel Almeida, Joel Fernandes, linux-kernel, dri-devel,
rust-for-linux
On Tue, Feb 10, 2026 at 01:21:47PM +0100, Boris Brezillon wrote:
> On Tue, 10 Feb 2026 11:45:36 +0000
> Alice Ryhl <aliceryhl@google.com> wrote:
>
> > On Tue, Feb 10, 2026 at 12:34:32PM +0100, Boris Brezillon wrote:
> > > On Tue, 10 Feb 2026 10:15:04 +0000
> > > Alice Ryhl <aliceryhl@google.com> wrote:
> > >
> > > > impl MustBeSignalled<'_> {
> > > > /// Drivers generally should not use this one.
> > > > fn i_promise_it_will_be_signalled(self) -> WillBeSignalled { ... }
> > > >
> > > > /// One way to ensure the fence has been signalled is to signal it.
> > > > fn signal_fence(self) -> WillBeSignalled {
> > > > self.fence.signal();
> > > > self.i_promise_it_will_be_signalled()
> > > > }
> > > >
> > > > /// Another way to ensure the fence will be signalled is to spawn a
> > > > /// workqueue item that promises to signal it.
> > > > fn transfer_to_wq(
> > > > self,
> > > > wq: &Workqueue,
> > > > item: impl DmaFenceWorkItem,
> > > > ) -> WillBeSignalled {
> > > > // briefly obtain the lock class of the wq to indicate to
> > > > // lockdep that the signalling path "blocks" on arbitrary jobs
> > > > // from this wq completing
> > > > bindings::lock_acquire(&wq->key);
> > > > bindings::lock_release(&wq->key);
> > >
> > > Sorry, I'm still trying to connect the dots here. I get that the intent
> > > is to ensure the pseudo-lock ordering is always:
> > >
> > > -> dma_fence_lockdep_map
> > > -> wq->lockdep_map
> > >
> > > but how can this order be the same in the WorkItem execution path? My
> > > interpretation of process_one_work() makes me think we'll end up with
> > >
> > > -> wq->lockdep_map
> > > -> work->run()
> > > -> WorkItem::run()
> > > -> dma_fence_lockdep_map
> > > -> DmaFenceSignalingWorkItem::run()
> > > ...
> > >
> > > Am I missing something? Is there a way you can insert the
> > > dma_fence_lockdep_map acquisition before the wq->lockdep_map one in the
> > > execution path?
> >
> > Conceptually, the dma_fence_lockdep_map is already taken by the time you
> > get to WorkItem::run() because it was taken all the way back in the
> > ioctl, so WorkItem::run() does not need to reacquire it.
> >
> > Now, of course that does not translate cleanly to how lockdep does
> > things, so in lockdep we do have to re-acquire it in WorkItem::run().
> > You can do that by setting the trylock bit when calling lock_acquire()
> > on dma_fence_lockdep_map. This has the correct semantics because trylock
> > does not create an edge from wq->lockdep_map to dma_fence_lockdep_map.
>
> Ah, I never noticed dma_fence_begin_signalling() was recording a
> try_lock not a regular lock. I guess it would do then.
Calling dma_fence_begin_signalling() never blocks so 'trylock' is the
right option.
Actually, that raises one question for me. Right now it's implemented
like this:
/* explicitly nesting ... */
if (lock_is_held_type(&dma_fence_lockdep_map, 1))
return true;
/* ... and non-recursive successful read_trylock */
lock_acquire(&dma_fence_lockdep_map, 0, 1, 1, 1, NULL, _RET_IP_);
but why not drop the explicit nest check and pass `2` for read instead?
lock_acquire(&dma_fence_lockdep_map, 0, 1, 2, 1, NULL, _RET_IP_);
Note that passing 2 means that you're taking a readlock with
same-instance recursion allowed. This way you could get rid of the
cookie entirely because you're just taking the lock multiple times, and
lockdep will count how many times it's taken for you.
Alice
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-10 10:15 ` Alice Ryhl
` (2 preceding siblings ...)
2026-02-10 11:34 ` Boris Brezillon
@ 2026-02-10 12:36 ` Boris Brezillon
2026-02-10 13:15 ` Alice Ryhl
2026-02-10 12:49 ` Boris Brezillon
4 siblings, 1 reply; 103+ messages in thread
From: Boris Brezillon @ 2026-02-10 12:36 UTC (permalink / raw)
To: Alice Ryhl
Cc: Christian König, Philipp Stanner, phasta, Danilo Krummrich,
David Airlie, Simona Vetter, Gary Guo, Benno Lossin,
Daniel Almeida, Joel Fernandes, linux-kernel, dri-devel,
rust-for-linux
On Tue, 10 Feb 2026 10:15:04 +0000
Alice Ryhl <aliceryhl@google.com> wrote:
> impl MustBeSignalled<'_> {
> /// Drivers generally should not use this one.
> fn i_promise_it_will_be_signalled(self) -> WillBeSignalled { ... }
>
> /// One way to ensure the fence has been signalled is to signal it.
> fn signal_fence(self) -> WillBeSignalled {
> self.fence.signal();
> self.i_promise_it_will_be_signalled()
> }
>
> /// Another way to ensure the fence will be signalled is to spawn a
> /// workqueue item that promises to signal it.
> fn transfer_to_wq(
> self,
> wq: &Workqueue,
> item: impl DmaFenceWorkItem,
> ) -> WillBeSignalled {
> // briefly obtain the lock class of the wq to indicate to
> // lockdep that the signalling path "blocks" on arbitrary jobs
> // from this wq completing
> bindings::lock_acquire(&wq->key);
> bindings::lock_release(&wq->key);
>
> // enqueue the job
> wq.enqueue(item, wq);
>
> // The signature of DmaFenceWorkItem::run() promises to arrange
> // for it to be signalled.
> self.i_promise_it_will_be_signalled()
> }
I guess what's still missing is some sort of `transfer_to_hw()`
function and way to flag the IRQ handler taking over the fence
signaling token.
> }
^ permalink raw reply [flat|nested] 103+ messages in thread* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-10 12:36 ` Boris Brezillon
@ 2026-02-10 13:15 ` Alice Ryhl
2026-02-10 13:26 ` Boris Brezillon
0 siblings, 1 reply; 103+ messages in thread
From: Alice Ryhl @ 2026-02-10 13:15 UTC (permalink / raw)
To: Boris Brezillon
Cc: Christian König, Philipp Stanner, phasta, Danilo Krummrich,
David Airlie, Simona Vetter, Gary Guo, Benno Lossin,
Daniel Almeida, Joel Fernandes, linux-kernel, dri-devel,
rust-for-linux
On Tue, Feb 10, 2026 at 01:36:17PM +0100, Boris Brezillon wrote:
> On Tue, 10 Feb 2026 10:15:04 +0000
> Alice Ryhl <aliceryhl@google.com> wrote:
>
> > impl MustBeSignalled<'_> {
> > /// Drivers generally should not use this one.
> > fn i_promise_it_will_be_signalled(self) -> WillBeSignalled { ... }
> >
> > /// One way to ensure the fence has been signalled is to signal it.
> > fn signal_fence(self) -> WillBeSignalled {
> > self.fence.signal();
> > self.i_promise_it_will_be_signalled()
> > }
> >
> > /// Another way to ensure the fence will be signalled is to spawn a
> > /// workqueue item that promises to signal it.
> > fn transfer_to_wq(
> > self,
> > wq: &Workqueue,
> > item: impl DmaFenceWorkItem,
> > ) -> WillBeSignalled {
> > // briefly obtain the lock class of the wq to indicate to
> > // lockdep that the signalling path "blocks" on arbitrary jobs
> > // from this wq completing
> > bindings::lock_acquire(&wq->key);
> > bindings::lock_release(&wq->key);
> >
> > // enqueue the job
> > wq.enqueue(item, wq);
> >
> > // The signature of DmaFenceWorkItem::run() promises to arrange
> > // for it to be signalled.
> > self.i_promise_it_will_be_signalled()
> > }
>
> I guess what's still missing is some sort of `transfer_to_hw()`
> function and way to flag the IRQ handler taking over the fence
> signaling token.
Yes, transfer to hardware needs to be another piece of logic similar to
transfer to wq. And I imagine there are many ways such a transfer to
hardware could work.
Unless you have a timeout on it, in which case the WillBeSignalled is
satisfied by the fact you have a timeout alone, and the signalling that
happens from the irq is just an opportunistic signal from outside the
dma fence signalling critical path.
From dma-fence.c:
* * The only exception are fast paths and opportunistic signalling code, which
* calls dma_fence_signal() purely as an optimization, but is not required to
* guarantee completion of a &dma_fence. The usual example is a wait IOCTL
* which calls dma_fence_signal(), while the mandatory completion path goes
* through a hardware interrupt and possible job completion worker.
Well ... unless triggering timeouts can block on GFP_KERNEL
allocations...
Alice
^ permalink raw reply [flat|nested] 103+ messages in thread* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-10 13:15 ` Alice Ryhl
@ 2026-02-10 13:26 ` Boris Brezillon
2026-02-10 13:49 ` Alice Ryhl
0 siblings, 1 reply; 103+ messages in thread
From: Boris Brezillon @ 2026-02-10 13:26 UTC (permalink / raw)
To: Alice Ryhl
Cc: Christian König, Philipp Stanner, phasta, Danilo Krummrich,
David Airlie, Simona Vetter, Gary Guo, Benno Lossin,
Daniel Almeida, Joel Fernandes, linux-kernel, dri-devel,
rust-for-linux
On Tue, 10 Feb 2026 13:15:31 +0000
Alice Ryhl <aliceryhl@google.com> wrote:
> On Tue, Feb 10, 2026 at 01:36:17PM +0100, Boris Brezillon wrote:
> > On Tue, 10 Feb 2026 10:15:04 +0000
> > Alice Ryhl <aliceryhl@google.com> wrote:
> >
> > > impl MustBeSignalled<'_> {
> > > /// Drivers generally should not use this one.
> > > fn i_promise_it_will_be_signalled(self) -> WillBeSignalled { ... }
> > >
> > > /// One way to ensure the fence has been signalled is to signal it.
> > > fn signal_fence(self) -> WillBeSignalled {
> > > self.fence.signal();
> > > self.i_promise_it_will_be_signalled()
> > > }
> > >
> > > /// Another way to ensure the fence will be signalled is to spawn a
> > > /// workqueue item that promises to signal it.
> > > fn transfer_to_wq(
> > > self,
> > > wq: &Workqueue,
> > > item: impl DmaFenceWorkItem,
> > > ) -> WillBeSignalled {
> > > // briefly obtain the lock class of the wq to indicate to
> > > // lockdep that the signalling path "blocks" on arbitrary jobs
> > > // from this wq completing
> > > bindings::lock_acquire(&wq->key);
> > > bindings::lock_release(&wq->key);
> > >
> > > // enqueue the job
> > > wq.enqueue(item, wq);
> > >
> > > // The signature of DmaFenceWorkItem::run() promises to arrange
> > > // for it to be signalled.
> > > self.i_promise_it_will_be_signalled()
> > > }
> >
> > I guess what's still missing is some sort of `transfer_to_hw()`
> > function and way to flag the IRQ handler taking over the fence
> > signaling token.
>
> Yes, transfer to hardware needs to be another piece of logic similar to
> transfer to wq. And I imagine there are many ways such a transfer to
> hardware could work.
>
> Unless you have a timeout on it, in which case the WillBeSignalled is
> satisfied by the fact you have a timeout alone, and the signalling that
> happens from the irq is just an opportunistic signal from outside the
> dma fence signalling critical path.
Yes and no. If it deadlocks in the completion WorkItem because of
allocations (or any of the forbidden use cases), I think we want to
catch that, because that's a sign fences are likely to end up with
timeouts when they should have otherwise been signaled properly.
>
> From dma-fence.c:
>
> * * The only exception are fast paths and opportunistic signalling code, which
> * calls dma_fence_signal() purely as an optimization, but is not required to
> * guarantee completion of a &dma_fence. The usual example is a wait IOCTL
> * which calls dma_fence_signal(), while the mandatory completion path goes
> * through a hardware interrupt and possible job completion worker.
In this example, the fast-signaling path is not in the IRQ handler or
the job completion work item, it's directly in the IOCTL().
Unfortuantely I don't know exactly what would cause dma_fence_signal()
to be called opportunistically in that case, because that's not part of
the description :D. I can tell you there's no such thing in panthor.
>
> Well ... unless triggering timeouts can block on GFP_KERNEL
> allocations...
I mean, the timeout handler should also be considered a DMA-signalling
path, and the same rules should apply to it.
^ permalink raw reply [flat|nested] 103+ messages in thread* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-10 13:26 ` Boris Brezillon
@ 2026-02-10 13:49 ` Alice Ryhl
2026-02-10 13:56 ` Christian König
0 siblings, 1 reply; 103+ messages in thread
From: Alice Ryhl @ 2026-02-10 13:49 UTC (permalink / raw)
To: Boris Brezillon
Cc: Christian König, Philipp Stanner, phasta, Danilo Krummrich,
David Airlie, Simona Vetter, Gary Guo, Benno Lossin,
Daniel Almeida, Joel Fernandes, linux-kernel, dri-devel,
rust-for-linux
On Tue, Feb 10, 2026 at 02:26:31PM +0100, Boris Brezillon wrote:
> On Tue, 10 Feb 2026 13:15:31 +0000
> Alice Ryhl <aliceryhl@google.com> wrote:
>
> > On Tue, Feb 10, 2026 at 01:36:17PM +0100, Boris Brezillon wrote:
> > > On Tue, 10 Feb 2026 10:15:04 +0000
> > > Alice Ryhl <aliceryhl@google.com> wrote:
> > >
> > > > impl MustBeSignalled<'_> {
> > > > /// Drivers generally should not use this one.
> > > > fn i_promise_it_will_be_signalled(self) -> WillBeSignalled { ... }
> > > >
> > > > /// One way to ensure the fence has been signalled is to signal it.
> > > > fn signal_fence(self) -> WillBeSignalled {
> > > > self.fence.signal();
> > > > self.i_promise_it_will_be_signalled()
> > > > }
> > > >
> > > > /// Another way to ensure the fence will be signalled is to spawn a
> > > > /// workqueue item that promises to signal it.
> > > > fn transfer_to_wq(
> > > > self,
> > > > wq: &Workqueue,
> > > > item: impl DmaFenceWorkItem,
> > > > ) -> WillBeSignalled {
> > > > // briefly obtain the lock class of the wq to indicate to
> > > > // lockdep that the signalling path "blocks" on arbitrary jobs
> > > > // from this wq completing
> > > > bindings::lock_acquire(&wq->key);
> > > > bindings::lock_release(&wq->key);
> > > >
> > > > // enqueue the job
> > > > wq.enqueue(item, wq);
> > > >
> > > > // The signature of DmaFenceWorkItem::run() promises to arrange
> > > > // for it to be signalled.
> > > > self.i_promise_it_will_be_signalled()
> > > > }
> > >
> > > I guess what's still missing is some sort of `transfer_to_hw()`
> > > function and way to flag the IRQ handler taking over the fence
> > > signaling token.
> >
> > Yes, transfer to hardware needs to be another piece of logic similar to
> > transfer to wq. And I imagine there are many ways such a transfer to
> > hardware could work.
> >
> > Unless you have a timeout on it, in which case the WillBeSignalled is
> > satisfied by the fact you have a timeout alone, and the signalling that
> > happens from the irq is just an opportunistic signal from outside the
> > dma fence signalling critical path.
>
> Yes and no. If it deadlocks in the completion WorkItem because of
> allocations (or any of the forbidden use cases), I think we want to
> catch that, because that's a sign fences are likely to end up with
> timeouts when they should have otherwise been signaled properly.
>
> > Well ... unless triggering timeouts can block on GFP_KERNEL
> > allocations...
>
> I mean, the timeout handler should also be considered a DMA-signalling
> path, and the same rules should apply to it.
I guess that's fair. Even with a timeout you want both to be signalling
path.
I guess more generally, if a fence is signalled by mechanism A or B,
whichever happens first, you have the choice between:
1. A in signalling path, B is not
2. B in signalling path, A is not
3. A and B both in signalling path
But the downside of choosing (1.) or (2.) is that if you declare that
event B is not in the signalling path, then B can kmalloc(GFP_KERNEL),
which may deadlock on itself until event A happens, and if A is a
timeout that could be a long time, so this scenario is undesirable even
if technically it's not a deadlock because it eventually unblocks
itself.
So we should choose option (3.) and declare that both timeout and hw irq
codepaths are signalling paths.
Alice
^ permalink raw reply [flat|nested] 103+ messages in thread* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-10 13:49 ` Alice Ryhl
@ 2026-02-10 13:56 ` Christian König
2026-02-10 14:00 ` Philipp Stanner
2026-02-10 15:07 ` Alice Ryhl
0 siblings, 2 replies; 103+ messages in thread
From: Christian König @ 2026-02-10 13:56 UTC (permalink / raw)
To: Alice Ryhl, Boris Brezillon
Cc: Philipp Stanner, phasta, Danilo Krummrich, David Airlie,
Simona Vetter, Gary Guo, Benno Lossin, Daniel Almeida,
Joel Fernandes, linux-kernel, dri-devel, rust-for-linux
On 2/10/26 14:49, Alice Ryhl wrote:
> On Tue, Feb 10, 2026 at 02:26:31PM +0100, Boris Brezillon wrote:
>> On Tue, 10 Feb 2026 13:15:31 +0000
>> Alice Ryhl <aliceryhl@google.com> wrote:
>>
>>> On Tue, Feb 10, 2026 at 01:36:17PM +0100, Boris Brezillon wrote:
>>>> On Tue, 10 Feb 2026 10:15:04 +0000
>>>> Alice Ryhl <aliceryhl@google.com> wrote:
>>>>
>>>>> impl MustBeSignalled<'_> {
>>>>> /// Drivers generally should not use this one.
>>>>> fn i_promise_it_will_be_signalled(self) -> WillBeSignalled { ... }
>>>>>
>>>>> /// One way to ensure the fence has been signalled is to signal it.
>>>>> fn signal_fence(self) -> WillBeSignalled {
>>>>> self.fence.signal();
>>>>> self.i_promise_it_will_be_signalled()
>>>>> }
>>>>>
>>>>> /// Another way to ensure the fence will be signalled is to spawn a
>>>>> /// workqueue item that promises to signal it.
>>>>> fn transfer_to_wq(
>>>>> self,
>>>>> wq: &Workqueue,
>>>>> item: impl DmaFenceWorkItem,
>>>>> ) -> WillBeSignalled {
>>>>> // briefly obtain the lock class of the wq to indicate to
>>>>> // lockdep that the signalling path "blocks" on arbitrary jobs
>>>>> // from this wq completing
>>>>> bindings::lock_acquire(&wq->key);
>>>>> bindings::lock_release(&wq->key);
>>>>>
>>>>> // enqueue the job
>>>>> wq.enqueue(item, wq);
>>>>>
>>>>> // The signature of DmaFenceWorkItem::run() promises to arrange
>>>>> // for it to be signalled.
>>>>> self.i_promise_it_will_be_signalled()
>>>>> }
>>>>
>>>> I guess what's still missing is some sort of `transfer_to_hw()`
>>>> function and way to flag the IRQ handler taking over the fence
>>>> signaling token.
>>>
>>> Yes, transfer to hardware needs to be another piece of logic similar to
>>> transfer to wq. And I imagine there are many ways such a transfer to
>>> hardware could work.
>>>
>>> Unless you have a timeout on it, in which case the WillBeSignalled is
>>> satisfied by the fact you have a timeout alone, and the signalling that
>>> happens from the irq is just an opportunistic signal from outside the
>>> dma fence signalling critical path.
>>
>> Yes and no. If it deadlocks in the completion WorkItem because of
>> allocations (or any of the forbidden use cases), I think we want to
>> catch that, because that's a sign fences are likely to end up with
>> timeouts when they should have otherwise been signaled properly.
>>
>>> Well ... unless triggering timeouts can block on GFP_KERNEL
>>> allocations...
>>
>> I mean, the timeout handler should also be considered a DMA-signalling
>> path, and the same rules should apply to it.
>
> I guess that's fair. Even with a timeout you want both to be signalling
> path.
>
> I guess more generally, if a fence is signalled by mechanism A or B,
> whichever happens first, you have the choice between:
That doesn't happen in practice.
For each fence you only have one signaling path you need to guarantee forward progress for.
All other signaling paths are just opportunistically optimizations which *can* signal the fence, but there is no guarantee that they will.
We used to have some exceptions to that, especially around aborting submissions, but those turned out to be a really bad idea as well.
Thinking more about it you should probably enforce that there is only one signaling path for each fence signaling.
Regards,
Christian.
>
> 1. A in signalling path, B is not
> 2. B in signalling path, A is not
> 3. A and B both in signalling path
>
> But the downside of choosing (1.) or (2.) is that if you declare that
> event B is not in the signalling path, then B can kmalloc(GFP_KERNEL),
> which may deadlock on itself until event A happens, and if A is a
> timeout that could be a long time, so this scenario is undesirable even
> if technically it's not a deadlock because it eventually unblocks
> itself.
>
> So we should choose option (3.) and declare that both timeout and hw irq
> codepaths are signalling paths.
>
> Alice
^ permalink raw reply [flat|nested] 103+ messages in thread* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-10 13:56 ` Christian König
@ 2026-02-10 14:00 ` Philipp Stanner
2026-02-10 14:06 ` Christian König
2026-02-10 15:07 ` Alice Ryhl
1 sibling, 1 reply; 103+ messages in thread
From: Philipp Stanner @ 2026-02-10 14:00 UTC (permalink / raw)
To: Christian König, Alice Ryhl, Boris Brezillon
Cc: phasta, Danilo Krummrich, David Airlie, Simona Vetter, Gary Guo,
Benno Lossin, Daniel Almeida, Joel Fernandes, linux-kernel,
dri-devel, rust-for-linux
On Tue, 2026-02-10 at 14:56 +0100, Christian König wrote:
> On 2/10/26 14:49, Alice Ryhl wrote:
> > On Tue, Feb 10, 2026 at 02:26:31PM +0100, Boris Brezillon wrote:
> > > On Tue, 10 Feb 2026 13:15:31 +0000
> > > Alice Ryhl <aliceryhl@google.com> wrote:
> > >
> > >
> > >
[…]
> > > I mean, the timeout handler should also be considered a DMA-signalling
> > > path, and the same rules should apply to it.
> >
> > I guess that's fair. Even with a timeout you want both to be signalling
> > path.
> >
> > I guess more generally, if a fence is signalled by mechanism A or B,
> > whichever happens first, you have the choice between:
>
> That doesn't happen in practice.
>
> For each fence you only have one signaling path you need to guarantee forward progress for.
>
> All other signaling paths are just opportunistically optimizations which *can* signal the fence, but there is no guarantee that they will.
Are you now referring to the fast-path callbacks like fence->ops-
>is_signaled()? Or are you talking about different reference holders
which might want to signal?
>
> We used to have some exceptions to that, especially around aborting submissions, but those turned out to be a really bad idea as well.
>
> Thinking more about it you should probably enforce that there is only one signaling path for each fence signaling.
An idea that is floating around is to move the entire fence signaling
functionality into the dma fence context. That would have exclusive
access, and could also finally guarantee that fences must be signaled
in order.
P.
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-10 14:00 ` Philipp Stanner
@ 2026-02-10 14:06 ` Christian König
2026-02-10 15:32 ` Philipp Stanner
0 siblings, 1 reply; 103+ messages in thread
From: Christian König @ 2026-02-10 14:06 UTC (permalink / raw)
To: phasta, Alice Ryhl, Boris Brezillon
Cc: Danilo Krummrich, David Airlie, Simona Vetter, Gary Guo,
Benno Lossin, Daniel Almeida, Joel Fernandes, linux-kernel,
dri-devel, rust-for-linux
On 2/10/26 15:00, Philipp Stanner wrote:
> On Tue, 2026-02-10 at 14:56 +0100, Christian König wrote:
>> On 2/10/26 14:49, Alice Ryhl wrote:
>>> On Tue, Feb 10, 2026 at 02:26:31PM +0100, Boris Brezillon wrote:
>>>> On Tue, 10 Feb 2026 13:15:31 +0000
>>>> Alice Ryhl <aliceryhl@google.com> wrote:
>>>>
>>>>
>>>>
>
> […]
>
>>>> I mean, the timeout handler should also be considered a DMA-signalling
>>>> path, and the same rules should apply to it.
>>>
>>> I guess that's fair. Even with a timeout you want both to be signalling
>>> path.
>>>
>>> I guess more generally, if a fence is signalled by mechanism A or B,
>>> whichever happens first, you have the choice between:
>>
>> That doesn't happen in practice.
>>
>> For each fence you only have one signaling path you need to guarantee forward progress for.
>>
>> All other signaling paths are just opportunistically optimizations which *can* signal the fence, but there is no guarantee that they will.
>
> Are you now referring to the fast-path callbacks like fence->ops-
>> is_signaled()? Or are you talking about different reference holders
> which might want to signal?
Yes, I'm referring to the is_signaled() callback.
When you have multiple reference holders which can all signal the fence by calling dma_fence_signal then there is clearly something going wrong.
What can be is that you have something like a fallback timer for buddy HW which swallows IRQs (e.g. like radeon or amdgpu have), but then you have something like a high level spinlock which makes sure that your IRQ handler is neither re-entrant nor multiple instances running at the same time on different CPUs.
>>
>> We used to have some exceptions to that, especially around aborting submissions, but those turned out to be a really bad idea as well.
>>
>> Thinking more about it you should probably enforce that there is only one signaling path for each fence signaling.
>
> An idea that is floating around is to move the entire fence signaling
> functionality into the dma fence context. That would have exclusive
> access, and could also finally guarantee that fences must be signaled
> in order.
That sounds like a sane idea to me.
Regards,
Christian.
>
>
> P.
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-10 14:06 ` Christian König
@ 2026-02-10 15:32 ` Philipp Stanner
2026-02-10 15:50 ` Christian König
0 siblings, 1 reply; 103+ messages in thread
From: Philipp Stanner @ 2026-02-10 15:32 UTC (permalink / raw)
To: Christian König, phasta, Alice Ryhl, Boris Brezillon
Cc: Danilo Krummrich, David Airlie, Simona Vetter, Gary Guo,
Benno Lossin, Daniel Almeida, Joel Fernandes, linux-kernel,
dri-devel, rust-for-linux
On Tue, 2026-02-10 at 15:06 +0100, Christian König wrote:
> On 2/10/26 15:00, Philipp Stanner wrote:
> > On Tue, 2026-02-10 at 14:56 +0100, Christian König wrote:
> > > On 2/10/26 14:49, Alice Ryhl wrote:
> > > > On Tue, Feb 10, 2026 at 02:26:31PM +0100, Boris Brezillon wrote:
> > > > > On Tue, 10 Feb 2026 13:15:31 +0000
> > > > > Alice Ryhl <aliceryhl@google.com> wrote:
> > > > >
> > > > >
> > > > >
> >
> > […]
> >
> > > > > I mean, the timeout handler should also be considered a DMA-signalling
> > > > > path, and the same rules should apply to it.
> > > >
> > > > I guess that's fair. Even with a timeout you want both to be signalling
> > > > path.
> > > >
> > > > I guess more generally, if a fence is signalled by mechanism A or B,
> > > > whichever happens first, you have the choice between:
> > >
> > > That doesn't happen in practice.
> > >
> > > For each fence you only have one signaling path you need to guarantee forward progress for.
> > >
> > > All other signaling paths are just opportunistically optimizations which *can* signal the fence, but there is no guarantee that they will.
> >
> > Are you now referring to the fast-path callbacks like fence->ops-
> > > is_signaled()? Or are you talking about different reference holders
> > which might want to signal?
>
> Yes, I'm referring to the is_signaled() callback.
>
> When you have multiple reference holders which can all signal the fence by calling dma_fence_signal then there is clearly something going wrong.
From our previous discussions it always seemed to me that there is
already something wrong when is_signaled() fastpaths are being
utilized. Remember the mess we have in Nouveau because of that?
I agree that it sounds very sane to just have 1 party that can signal a
fence.
However, that also implies that the fastpath is wrong because
a) a consumer of a fence can use that to poll on the fence and
b) because it breaks good naming pattern, where a check whether a fence
is signaled can result in the fence being signaled.
So I would kill that fast path for good.
In the past you mentioned that you had users who were wondering why
there fences are not signalling (like yeah, if you don't call
dma_fence_signal(), then your fence will not signal). What ever the
confusion was, this time we have the chance to set it straight.
P.
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-10 15:32 ` Philipp Stanner
@ 2026-02-10 15:50 ` Christian König
0 siblings, 0 replies; 103+ messages in thread
From: Christian König @ 2026-02-10 15:50 UTC (permalink / raw)
To: phasta, Alice Ryhl, Boris Brezillon
Cc: Danilo Krummrich, David Airlie, Simona Vetter, Gary Guo,
Benno Lossin, Daniel Almeida, Joel Fernandes, linux-kernel,
dri-devel, rust-for-linux
On 2/10/26 16:32, Philipp Stanner wrote:
> On Tue, 2026-02-10 at 15:06 +0100, Christian König wrote:
>> On 2/10/26 15:00, Philipp Stanner wrote:
>>> On Tue, 2026-02-10 at 14:56 +0100, Christian König wrote:
>>>> On 2/10/26 14:49, Alice Ryhl wrote:
>>>>> On Tue, Feb 10, 2026 at 02:26:31PM +0100, Boris Brezillon wrote:
>>>>>> On Tue, 10 Feb 2026 13:15:31 +0000
>>>>>> Alice Ryhl <aliceryhl@google.com> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>
>>> […]
>>>
>>>>>> I mean, the timeout handler should also be considered a DMA-signalling
>>>>>> path, and the same rules should apply to it.
>>>>>
>>>>> I guess that's fair. Even with a timeout you want both to be signalling
>>>>> path.
>>>>>
>>>>> I guess more generally, if a fence is signalled by mechanism A or B,
>>>>> whichever happens first, you have the choice between:
>>>>
>>>> That doesn't happen in practice.
>>>>
>>>> For each fence you only have one signaling path you need to guarantee forward progress for.
>>>>
>>>> All other signaling paths are just opportunistically optimizations which *can* signal the fence, but there is no guarantee that they will.
>>>
>>> Are you now referring to the fast-path callbacks like fence->ops-
>>>> is_signaled()? Or are you talking about different reference holders
>>> which might want to signal?
>>
>> Yes, I'm referring to the is_signaled() callback.
>>
>> When you have multiple reference holders which can all signal the fence by calling dma_fence_signal then there is clearly something going wrong.
>
> From our previous discussions it always seemed to me that there is
> already something wrong when is_signaled() fastpaths are being
> utilized. Remember the mess we have in Nouveau because of that?
Yeah, but that is Nouveau specific. In other words Nouveau has a problem with that because it doesn't like to have already signaled fences on its pending list but still wants to use the fast path.
> I agree that it sounds very sane to just have 1 party that can signal a
> fence.
>
> However, that also implies that the fastpath is wrong because
>
> a) a consumer of a fence can use that to poll on the fence and
> b) because it breaks good naming pattern, where a check whether a fence
> is signaled can result in the fence being signaled.
>
> So I would kill that fast path for good.
Yeah that is not something we can do, some use cases completely rely on that.
E.g. on mobile devices it is normal to not enable fence interrupts and rely on repeating interrupts or timeouts to signal the fences.
> In the past you mentioned that you had users who were wondering why
> there fences are not signalling (like yeah, if you don't call
> dma_fence_signal(), then your fence will not signal). What ever the
> confusion was, this time we have the chance to set it straight.
Not sure what you mean with that.
Christian.
>
> P.
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-10 13:56 ` Christian König
2026-02-10 14:00 ` Philipp Stanner
@ 2026-02-10 15:07 ` Alice Ryhl
2026-02-10 15:45 ` Christian König
1 sibling, 1 reply; 103+ messages in thread
From: Alice Ryhl @ 2026-02-10 15:07 UTC (permalink / raw)
To: Christian König
Cc: Boris Brezillon, Philipp Stanner, phasta, Danilo Krummrich,
David Airlie, Simona Vetter, Gary Guo, Benno Lossin,
Daniel Almeida, Joel Fernandes, linux-kernel, dri-devel,
rust-for-linux
On Tue, Feb 10, 2026 at 02:56:52PM +0100, Christian König wrote:
> On 2/10/26 14:49, Alice Ryhl wrote:
> > On Tue, Feb 10, 2026 at 02:26:31PM +0100, Boris Brezillon wrote:
> >> On Tue, 10 Feb 2026 13:15:31 +0000
> >> Alice Ryhl <aliceryhl@google.com> wrote:
> >>
> >>> On Tue, Feb 10, 2026 at 01:36:17PM +0100, Boris Brezillon wrote:
> >>>> On Tue, 10 Feb 2026 10:15:04 +0000
> >>>> Alice Ryhl <aliceryhl@google.com> wrote:
> >>>>
> >>>>> impl MustBeSignalled<'_> {
> >>>>> /// Drivers generally should not use this one.
> >>>>> fn i_promise_it_will_be_signalled(self) -> WillBeSignalled { ... }
> >>>>>
> >>>>> /// One way to ensure the fence has been signalled is to signal it.
> >>>>> fn signal_fence(self) -> WillBeSignalled {
> >>>>> self.fence.signal();
> >>>>> self.i_promise_it_will_be_signalled()
> >>>>> }
> >>>>>
> >>>>> /// Another way to ensure the fence will be signalled is to spawn a
> >>>>> /// workqueue item that promises to signal it.
> >>>>> fn transfer_to_wq(
> >>>>> self,
> >>>>> wq: &Workqueue,
> >>>>> item: impl DmaFenceWorkItem,
> >>>>> ) -> WillBeSignalled {
> >>>>> // briefly obtain the lock class of the wq to indicate to
> >>>>> // lockdep that the signalling path "blocks" on arbitrary jobs
> >>>>> // from this wq completing
> >>>>> bindings::lock_acquire(&wq->key);
> >>>>> bindings::lock_release(&wq->key);
> >>>>>
> >>>>> // enqueue the job
> >>>>> wq.enqueue(item, wq);
> >>>>>
> >>>>> // The signature of DmaFenceWorkItem::run() promises to arrange
> >>>>> // for it to be signalled.
> >>>>> self.i_promise_it_will_be_signalled()
> >>>>> }
> >>>>
> >>>> I guess what's still missing is some sort of `transfer_to_hw()`
> >>>> function and way to flag the IRQ handler taking over the fence
> >>>> signaling token.
> >>>
> >>> Yes, transfer to hardware needs to be another piece of logic similar to
> >>> transfer to wq. And I imagine there are many ways such a transfer to
> >>> hardware could work.
> >>>
> >>> Unless you have a timeout on it, in which case the WillBeSignalled is
> >>> satisfied by the fact you have a timeout alone, and the signalling that
> >>> happens from the irq is just an opportunistic signal from outside the
> >>> dma fence signalling critical path.
> >>
> >> Yes and no. If it deadlocks in the completion WorkItem because of
> >> allocations (or any of the forbidden use cases), I think we want to
> >> catch that, because that's a sign fences are likely to end up with
> >> timeouts when they should have otherwise been signaled properly.
> >>
> >>> Well ... unless triggering timeouts can block on GFP_KERNEL
> >>> allocations...
> >>
> >> I mean, the timeout handler should also be considered a DMA-signalling
> >> path, and the same rules should apply to it.
> >
> > I guess that's fair. Even with a timeout you want both to be signalling
> > path.
> >
> > I guess more generally, if a fence is signalled by mechanism A or B,
> > whichever happens first, you have the choice between:
>
> That doesn't happen in practice.
>
> For each fence you only have one signaling path you need to guarantee
> forward progress for.
>
> All other signaling paths are just opportunistically optimizations
> which *can* signal the fence, but there is no guarantee that they
> will.
>
> We used to have some exceptions to that, especially around aborting
> submissions, but those turned out to be a really bad idea as well.
>
> Thinking more about it you should probably enforce that there is only
> one signaling path for each fence signaling.
I'm not really convinced by this.
First, the timeout path must be a fence signalling path because the
reason you have a timeout in the first place is because the hw might
never signal the fence. So if the timeout path deadlocks on a
kmalloc(GFP_KERNEL) and the hw never comes around to wake you up, boom.
Second, for the reasons I mentioned you also want the signal-from-irq
path to be a fence signalling critical path, because if we allow you to
kmalloc(GFP_KERNEL) on the path from getting notification from hardware
to signalling the fence, then you may deadlock until the timeout
triggers ... even if the deadlock is only temporary, we should still
avoid such cases IMO. Thus, the hw signal path should also be a fence
signalling critical path.
Alice
^ permalink raw reply [flat|nested] 103+ messages in thread* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-10 15:07 ` Alice Ryhl
@ 2026-02-10 15:45 ` Christian König
2026-02-11 8:16 ` Philipp Stanner
2026-02-17 14:03 ` Philipp Stanner
0 siblings, 2 replies; 103+ messages in thread
From: Christian König @ 2026-02-10 15:45 UTC (permalink / raw)
To: Alice Ryhl
Cc: Boris Brezillon, Philipp Stanner, phasta, Danilo Krummrich,
David Airlie, Simona Vetter, Gary Guo, Benno Lossin,
Daniel Almeida, Joel Fernandes, linux-kernel, dri-devel,
rust-for-linux
On 2/10/26 16:07, Alice Ryhl wrote:
> On Tue, Feb 10, 2026 at 02:56:52PM +0100, Christian König wrote:
>> On 2/10/26 14:49, Alice Ryhl wrote:
>>> On Tue, Feb 10, 2026 at 02:26:31PM +0100, Boris Brezillon wrote:
>>>> On Tue, 10 Feb 2026 13:15:31 +0000
>>>> Alice Ryhl <aliceryhl@google.com> wrote:
>>>>
>>>>> On Tue, Feb 10, 2026 at 01:36:17PM +0100, Boris Brezillon wrote:
>>>>>> On Tue, 10 Feb 2026 10:15:04 +0000
>>>>>> Alice Ryhl <aliceryhl@google.com> wrote:
>>>>>>
>>>>>>> impl MustBeSignalled<'_> {
>>>>>>> /// Drivers generally should not use this one.
>>>>>>> fn i_promise_it_will_be_signalled(self) -> WillBeSignalled { ... }
>>>>>>>
>>>>>>> /// One way to ensure the fence has been signalled is to signal it.
>>>>>>> fn signal_fence(self) -> WillBeSignalled {
>>>>>>> self.fence.signal();
>>>>>>> self.i_promise_it_will_be_signalled()
>>>>>>> }
>>>>>>>
>>>>>>> /// Another way to ensure the fence will be signalled is to spawn a
>>>>>>> /// workqueue item that promises to signal it.
>>>>>>> fn transfer_to_wq(
>>>>>>> self,
>>>>>>> wq: &Workqueue,
>>>>>>> item: impl DmaFenceWorkItem,
>>>>>>> ) -> WillBeSignalled {
>>>>>>> // briefly obtain the lock class of the wq to indicate to
>>>>>>> // lockdep that the signalling path "blocks" on arbitrary jobs
>>>>>>> // from this wq completing
>>>>>>> bindings::lock_acquire(&wq->key);
>>>>>>> bindings::lock_release(&wq->key);
>>>>>>>
>>>>>>> // enqueue the job
>>>>>>> wq.enqueue(item, wq);
>>>>>>>
>>>>>>> // The signature of DmaFenceWorkItem::run() promises to arrange
>>>>>>> // for it to be signalled.
>>>>>>> self.i_promise_it_will_be_signalled()
>>>>>>> }
>>>>>>
>>>>>> I guess what's still missing is some sort of `transfer_to_hw()`
>>>>>> function and way to flag the IRQ handler taking over the fence
>>>>>> signaling token.
>>>>>
>>>>> Yes, transfer to hardware needs to be another piece of logic similar to
>>>>> transfer to wq. And I imagine there are many ways such a transfer to
>>>>> hardware could work.
>>>>>
>>>>> Unless you have a timeout on it, in which case the WillBeSignalled is
>>>>> satisfied by the fact you have a timeout alone, and the signalling that
>>>>> happens from the irq is just an opportunistic signal from outside the
>>>>> dma fence signalling critical path.
>>>>
>>>> Yes and no. If it deadlocks in the completion WorkItem because of
>>>> allocations (or any of the forbidden use cases), I think we want to
>>>> catch that, because that's a sign fences are likely to end up with
>>>> timeouts when they should have otherwise been signaled properly.
>>>>
>>>>> Well ... unless triggering timeouts can block on GFP_KERNEL
>>>>> allocations...
>>>>
>>>> I mean, the timeout handler should also be considered a DMA-signalling
>>>> path, and the same rules should apply to it.
>>>
>>> I guess that's fair. Even with a timeout you want both to be signalling
>>> path.
>>>
>>> I guess more generally, if a fence is signalled by mechanism A or B,
>>> whichever happens first, you have the choice between:
>>
>> That doesn't happen in practice.
>>
>> For each fence you only have one signaling path you need to guarantee
>> forward progress for.
>>
>> All other signaling paths are just opportunistically optimizations
>> which *can* signal the fence, but there is no guarantee that they
>> will.
>>
>> We used to have some exceptions to that, especially around aborting
>> submissions, but those turned out to be a really bad idea as well.
>>
>> Thinking more about it you should probably enforce that there is only
>> one signaling path for each fence signaling.
>
> I'm not really convinced by this.
>
> First, the timeout path must be a fence signalling path because the
> reason you have a timeout in the first place is because the hw might
> never signal the fence. So if the timeout path deadlocks on a
> kmalloc(GFP_KERNEL) and the hw never comes around to wake you up, boom.
Mhm, good point. On the other hand the timeout handling should probably be considered part of the normal signaling path.
In other words the timeout handler either disables the normal signaling path (e.g. by disabling the interrupt) and then reset the HW or it tells the HW to force signal some work and observes the result.
So it can be that the timeout handler finishes only after the fence is signaled from the normal signaling paths.
> Second, for the reasons I mentioned you also want the signal-from-irq
> path to be a fence signalling critical path, because if we allow you to
> kmalloc(GFP_KERNEL) on the path from getting notification from hardware
> to signalling the fence, then you may deadlock until the timeout
> triggers ... even if the deadlock is only temporary, we should still
> avoid such cases IMO. Thus, the hw signal path should also be a fence
> signalling critical path.
As far as I remember we didn't had any of such cases.
You can't call kmalloc(GFP_KERNEL) from an interrupt handler, so you would need something like irq->work item->kmalloc(GFP_KERNEL)->signaling and I think that's unlikely to be implemented this way.
But yeah, it is still something which should be prevented somehow.
Christian.
>
> Alice
^ permalink raw reply [flat|nested] 103+ messages in thread* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-10 15:45 ` Christian König
@ 2026-02-11 8:16 ` Philipp Stanner
2026-02-17 14:03 ` Philipp Stanner
1 sibling, 0 replies; 103+ messages in thread
From: Philipp Stanner @ 2026-02-11 8:16 UTC (permalink / raw)
To: Christian König, Alice Ryhl
Cc: Boris Brezillon, phasta, Danilo Krummrich, David Airlie,
Simona Vetter, Gary Guo, Benno Lossin, Daniel Almeida,
Joel Fernandes, linux-kernel, dri-devel, rust-for-linux
On Tue, 2026-02-10 at 16:45 +0100, Christian König wrote:
> On 2/10/26 16:07, Alice Ryhl wrote:
> > >
[…]
> > > That doesn't happen in practice.
> > >
> > > For each fence you only have one signaling path you need to guarantee
> > > forward progress for.
> > >
> > > All other signaling paths are just opportunistically optimizations
> > > which *can* signal the fence, but there is no guarantee that they
> > > will.
> > >
> > > We used to have some exceptions to that, especially around aborting
> > > submissions, but those turned out to be a really bad idea as well.
> > >
> > > Thinking more about it you should probably enforce that there is only
> > > one signaling path for each fence signaling.
> >
> > I'm not really convinced by this.
> >
> > First, the timeout path must be a fence signalling path because the
> > reason you have a timeout in the first place is because the hw might
> > never signal the fence. So if the timeout path deadlocks on a
> > kmalloc(GFP_KERNEL) and the hw never comes around to wake you up, boom.
>
> Mhm, good point. On the other hand the timeout handling should probably be considered part of the normal signaling path.
>
> In other words the timeout handler either disables the normal signaling path (e.g. by disabling the interrupt) and then reset the HW or it tells the HW to force signal some work and observes the result.
>
> So it can be that the timeout handler finishes only after the fence is signaled from the normal signaling paths.
I would say since we are designing all this (for now) for modern and
future hardware, the timeout handling regarding GPUs can be considered
trivial?
A timeout event as far as JobQueue is concerned is a mere instruction
to drop the entire queue and close the ring. Further signaling should
either not occur at all anymore (because the ring is blocked by a
broken shader) – or if a racy job still finishes while a timeout is
firing, your problem, then the ring shall still be terminated. It would
then result in that last blocking job being completed for userspace,
and the subsequent once being signalled with -ECANCELED.
In a timeout handler, a driver would just drop its jobqueue, resulting
in all access being revoked, and the JQ deregistering its events from
all fences. Deadlock is being accounted for by RCU.
So no problem here, or am I missing something?
>
> > Second, for the reasons I mentioned you also want the signal-from-irq
> > path to be a fence signalling critical path, because if we allow you to
> > kmalloc(GFP_KERNEL) on the path from getting notification from hardware
> > to signalling the fence, then you may deadlock until the timeout
> > triggers ... even if the deadlock is only temporary, we should still
> > avoid such cases IMO. Thus, the hw signal path should also be a fence
> > signalling critical path.
>
> As far as I remember we didn't had any of such cases.
>
> You can't call kmalloc(GFP_KERNEL) from an interrupt handler, so you would need something like irq->work item->kmalloc(GFP_KERNEL)->signaling and I think that's unlikely to be implemented this way.
>
> But yeah, it is still something which should be prevented somehow.
Just as a side note, we want to ask ourselves what kinds of potential
problems we want to make impossible. 100% might get really work
intensive. I'm in general a fan of the 20-80-Rule, so I'd like to know
what the most severe and most common misuses of dma_fences are.
P.
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-10 15:45 ` Christian König
2026-02-11 8:16 ` Philipp Stanner
@ 2026-02-17 14:03 ` Philipp Stanner
2026-02-17 14:09 ` Alice Ryhl
1 sibling, 1 reply; 103+ messages in thread
From: Philipp Stanner @ 2026-02-17 14:03 UTC (permalink / raw)
To: Christian König, Alice Ryhl
Cc: Boris Brezillon, phasta, Danilo Krummrich, David Airlie,
Simona Vetter, Gary Guo, Benno Lossin, Daniel Almeida,
Joel Fernandes, linux-kernel, dri-devel, rust-for-linux
On Tue, 2026-02-10 at 16:45 +0100, Christian König wrote:
> On 2/10/26 16:07, Alice Ryhl wrote:
> > On Tue, Feb 10, 2026 at 02:56:52PM +0100, Christian König wrote:
> > > On 2/10/26 14:49, Alice Ryhl wrote:
> > > > On Tue, Feb 10, 2026 at 02:26:31PM +0100, Boris Brezillon wrote:
> > > > > On Tue, 10 Feb 2026 13:15:31 +0000
> > > > > Alice Ryhl <aliceryhl@google.com> wrote:
> > > > >
> > > > > > On Tue, Feb 10, 2026 at 01:36:17PM +0100, Boris Brezillon wrote:
> > > > > > > On Tue, 10 Feb 2026 10:15:04 +0000
> > > > > > > Alice Ryhl <aliceryhl@google.com> wrote:
> > > > > > >
> > > > > > > > impl MustBeSignalled<'_> {
> > > > > > > > /// Drivers generally should not use this one.
> > > > > > > > fn i_promise_it_will_be_signalled(self) -> WillBeSignalled { ... }
> > > > > > > >
> > > > > > > > /// One way to ensure the fence has been signalled is to signal it.
> > > > > > > > fn signal_fence(self) -> WillBeSignalled {
> > > > > > > > self.fence.signal();
> > > > > > > > self.i_promise_it_will_be_signalled()
> > > > > > > > }
> > > > > > > >
> > > > > > > > /// Another way to ensure the fence will be signalled is to spawn a
> > > > > > > > /// workqueue item that promises to signal it.
> > > > > > > > fn transfer_to_wq(
> > > > > > > > self,
> > > > > > > > wq: &Workqueue,
> > > > > > > > item: impl DmaFenceWorkItem,
> > > > > > > > ) -> WillBeSignalled {
> > > > > > > > // briefly obtain the lock class of the wq to indicate to
> > > > > > > > // lockdep that the signalling path "blocks" on arbitrary jobs
> > > > > > > > // from this wq completing
> > > > > > > > bindings::lock_acquire(&wq->key);
> > > > > > > > bindings::lock_release(&wq->key);
> > > > > > > >
> > > > > > > > // enqueue the job
> > > > > > > > wq.enqueue(item, wq);
> > > > > > > >
> > > > > > > > // The signature of DmaFenceWorkItem::run() promises to arrange
> > > > > > > > // for it to be signalled.
> > > > > > > > self.i_promise_it_will_be_signalled()
> > > > > > > > }
> > > > > > >
> > > > > > > I guess what's still missing is some sort of `transfer_to_hw()`
> > > > > > > function and way to flag the IRQ handler taking over the fence
> > > > > > > signaling token.
> > > > > >
> > > > > > Yes, transfer to hardware needs to be another piece of logic similar to
> > > > > > transfer to wq. And I imagine there are many ways such a transfer to
> > > > > > hardware could work.
> > > > > >
> > > > > > Unless you have a timeout on it, in which case the WillBeSignalled is
> > > > > > satisfied by the fact you have a timeout alone, and the signalling that
> > > > > > happens from the irq is just an opportunistic signal from outside the
> > > > > > dma fence signalling critical path.
> > > > >
> > > > > Yes and no. If it deadlocks in the completion WorkItem because of
> > > > > allocations (or any of the forbidden use cases), I think we want to
> > > > > catch that, because that's a sign fences are likely to end up with
> > > > > timeouts when they should have otherwise been signaled properly.
> > > > >
> > > > > > Well ... unless triggering timeouts can block on GFP_KERNEL
> > > > > > allocations...
> > > > >
> > > > > I mean, the timeout handler should also be considered a DMA-signalling
> > > > > path, and the same rules should apply to it.
> > > >
> > > > I guess that's fair. Even with a timeout you want both to be signalling
> > > > path.
> > > >
> > > > I guess more generally, if a fence is signalled by mechanism A or B,
> > > > whichever happens first, you have the choice between:
> > >
> > > That doesn't happen in practice.
> > >
> > > For each fence you only have one signaling path you need to guarantee
> > > forward progress for.
> > >
> > > All other signaling paths are just opportunistically optimizations
> > > which *can* signal the fence, but there is no guarantee that they
> > > will.
> > >
> > > We used to have some exceptions to that, especially around aborting
> > > submissions, but those turned out to be a really bad idea as well.
> > >
> > > Thinking more about it you should probably enforce that there is only
> > > one signaling path for each fence signaling.
> >
> > I'm not really convinced by this.
> >
> > First, the timeout path must be a fence signalling path because the
> > reason you have a timeout in the first place is because the hw might
> > never signal the fence. So if the timeout path deadlocks on a
> > kmalloc(GFP_KERNEL) and the hw never comes around to wake you up, boom.
>
> Mhm, good point. On the other hand the timeout handling should probably be considered part of the normal signaling path.
Why would anyone want to allocate in a timeout path in the first place – especially for jobqueue?
Timeout -> close the associated ring. Done.
JobQueue will signal the done_fences with -ECANCELED.
What would the driver want to allocate in its timeout path, i.e.: timeout callback.
P.
^ permalink raw reply [flat|nested] 103+ messages in thread* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-17 14:03 ` Philipp Stanner
@ 2026-02-17 14:09 ` Alice Ryhl
2026-02-17 14:22 ` Christian König
0 siblings, 1 reply; 103+ messages in thread
From: Alice Ryhl @ 2026-02-17 14:09 UTC (permalink / raw)
To: phasta
Cc: Christian König, Boris Brezillon, Danilo Krummrich,
David Airlie, Simona Vetter, Gary Guo, Benno Lossin,
Daniel Almeida, Joel Fernandes, linux-kernel, dri-devel,
rust-for-linux
On Tue, Feb 17, 2026 at 3:04 PM Philipp Stanner <phasta@mailbox.org> wrote:
>
> On Tue, 2026-02-10 at 16:45 +0100, Christian König wrote:
> > On 2/10/26 16:07, Alice Ryhl wrote:
> > > On Tue, Feb 10, 2026 at 02:56:52PM +0100, Christian König wrote:
> > > > On 2/10/26 14:49, Alice Ryhl wrote:
> > > > > On Tue, Feb 10, 2026 at 02:26:31PM +0100, Boris Brezillon wrote:
> > > > > > On Tue, 10 Feb 2026 13:15:31 +0000
> > > > > > Alice Ryhl <aliceryhl@google.com> wrote:
> > > > > >
> > > > > > > On Tue, Feb 10, 2026 at 01:36:17PM +0100, Boris Brezillon wrote:
> > > > > > > > On Tue, 10 Feb 2026 10:15:04 +0000
> > > > > > > > Alice Ryhl <aliceryhl@google.com> wrote:
> > > > > > > >
> > > > > > > > > impl MustBeSignalled<'_> {
> > > > > > > > > /// Drivers generally should not use this one.
> > > > > > > > > fn i_promise_it_will_be_signalled(self) -> WillBeSignalled { ... }
> > > > > > > > >
> > > > > > > > > /// One way to ensure the fence has been signalled is to signal it.
> > > > > > > > > fn signal_fence(self) -> WillBeSignalled {
> > > > > > > > > self.fence.signal();
> > > > > > > > > self.i_promise_it_will_be_signalled()
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > /// Another way to ensure the fence will be signalled is to spawn a
> > > > > > > > > /// workqueue item that promises to signal it.
> > > > > > > > > fn transfer_to_wq(
> > > > > > > > > self,
> > > > > > > > > wq: &Workqueue,
> > > > > > > > > item: impl DmaFenceWorkItem,
> > > > > > > > > ) -> WillBeSignalled {
> > > > > > > > > // briefly obtain the lock class of the wq to indicate to
> > > > > > > > > // lockdep that the signalling path "blocks" on arbitrary jobs
> > > > > > > > > // from this wq completing
> > > > > > > > > bindings::lock_acquire(&wq->key);
> > > > > > > > > bindings::lock_release(&wq->key);
> > > > > > > > >
> > > > > > > > > // enqueue the job
> > > > > > > > > wq.enqueue(item, wq);
> > > > > > > > >
> > > > > > > > > // The signature of DmaFenceWorkItem::run() promises to arrange
> > > > > > > > > // for it to be signalled.
> > > > > > > > > self.i_promise_it_will_be_signalled()
> > > > > > > > > }
> > > > > > > >
> > > > > > > > I guess what's still missing is some sort of `transfer_to_hw()`
> > > > > > > > function and way to flag the IRQ handler taking over the fence
> > > > > > > > signaling token.
> > > > > > >
> > > > > > > Yes, transfer to hardware needs to be another piece of logic similar to
> > > > > > > transfer to wq. And I imagine there are many ways such a transfer to
> > > > > > > hardware could work.
> > > > > > >
> > > > > > > Unless you have a timeout on it, in which case the WillBeSignalled is
> > > > > > > satisfied by the fact you have a timeout alone, and the signalling that
> > > > > > > happens from the irq is just an opportunistic signal from outside the
> > > > > > > dma fence signalling critical path.
> > > > > >
> > > > > > Yes and no. If it deadlocks in the completion WorkItem because of
> > > > > > allocations (or any of the forbidden use cases), I think we want to
> > > > > > catch that, because that's a sign fences are likely to end up with
> > > > > > timeouts when they should have otherwise been signaled properly.
> > > > > >
> > > > > > > Well ... unless triggering timeouts can block on GFP_KERNEL
> > > > > > > allocations...
> > > > > >
> > > > > > I mean, the timeout handler should also be considered a DMA-signalling
> > > > > > path, and the same rules should apply to it.
> > > > >
> > > > > I guess that's fair. Even with a timeout you want both to be signalling
> > > > > path.
> > > > >
> > > > > I guess more generally, if a fence is signalled by mechanism A or B,
> > > > > whichever happens first, you have the choice between:
> > > >
> > > > That doesn't happen in practice.
> > > >
> > > > For each fence you only have one signaling path you need to guarantee
> > > > forward progress for.
> > > >
> > > > All other signaling paths are just opportunistically optimizations
> > > > which *can* signal the fence, but there is no guarantee that they
> > > > will.
> > > >
> > > > We used to have some exceptions to that, especially around aborting
> > > > submissions, but those turned out to be a really bad idea as well.
> > > >
> > > > Thinking more about it you should probably enforce that there is only
> > > > one signaling path for each fence signaling.
> > >
> > > I'm not really convinced by this.
> > >
> > > First, the timeout path must be a fence signalling path because the
> > > reason you have a timeout in the first place is because the hw might
> > > never signal the fence. So if the timeout path deadlocks on a
> > > kmalloc(GFP_KERNEL) and the hw never comes around to wake you up, boom.
> >
> > Mhm, good point. On the other hand the timeout handling should probably be considered part of the normal signaling path.
>
>
> Why would anyone want to allocate in a timeout path in the first place – especially for jobqueue?
>
> Timeout -> close the associated ring. Done.
> JobQueue will signal the done_fences with -ECANCELED.
>
> What would the driver want to allocate in its timeout path, i.e.: timeout callback.
Maybe you need an allocation to hold the struct delayed_work_struct
field that you use to enqueue the timeout?
Alice
^ permalink raw reply [flat|nested] 103+ messages in thread* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-17 14:09 ` Alice Ryhl
@ 2026-02-17 14:22 ` Christian König
2026-02-17 14:28 ` Philipp Stanner
0 siblings, 1 reply; 103+ messages in thread
From: Christian König @ 2026-02-17 14:22 UTC (permalink / raw)
To: Alice Ryhl, phasta
Cc: Boris Brezillon, Danilo Krummrich, David Airlie, Simona Vetter,
Gary Guo, Benno Lossin, Daniel Almeida, Joel Fernandes,
linux-kernel, dri-devel, rust-for-linux
On 2/17/26 15:09, Alice Ryhl wrote:
> On Tue, Feb 17, 2026 at 3:04 PM Philipp Stanner <phasta@mailbox.org> wrote:
>>
>> On Tue, 2026-02-10 at 16:45 +0100, Christian König wrote:
>>> On 2/10/26 16:07, Alice Ryhl wrote:
>>>> On Tue, Feb 10, 2026 at 02:56:52PM +0100, Christian König wrote:
>>>>> On 2/10/26 14:49, Alice Ryhl wrote:
>>>>>> On Tue, Feb 10, 2026 at 02:26:31PM +0100, Boris Brezillon wrote:
>>>>>>> On Tue, 10 Feb 2026 13:15:31 +0000
>>>>>>> Alice Ryhl <aliceryhl@google.com> wrote:
>>>>>>>
>>>>>>>> On Tue, Feb 10, 2026 at 01:36:17PM +0100, Boris Brezillon wrote:
>>>>>>>>> On Tue, 10 Feb 2026 10:15:04 +0000
>>>>>>>>> Alice Ryhl <aliceryhl@google.com> wrote:
>>>>>>>>>
>>>>>>>>>> impl MustBeSignalled<'_> {
>>>>>>>>>> /// Drivers generally should not use this one.
>>>>>>>>>> fn i_promise_it_will_be_signalled(self) -> WillBeSignalled { ... }
>>>>>>>>>>
>>>>>>>>>> /// One way to ensure the fence has been signalled is to signal it.
>>>>>>>>>> fn signal_fence(self) -> WillBeSignalled {
>>>>>>>>>> self.fence.signal();
>>>>>>>>>> self.i_promise_it_will_be_signalled()
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> /// Another way to ensure the fence will be signalled is to spawn a
>>>>>>>>>> /// workqueue item that promises to signal it.
>>>>>>>>>> fn transfer_to_wq(
>>>>>>>>>> self,
>>>>>>>>>> wq: &Workqueue,
>>>>>>>>>> item: impl DmaFenceWorkItem,
>>>>>>>>>> ) -> WillBeSignalled {
>>>>>>>>>> // briefly obtain the lock class of the wq to indicate to
>>>>>>>>>> // lockdep that the signalling path "blocks" on arbitrary jobs
>>>>>>>>>> // from this wq completing
>>>>>>>>>> bindings::lock_acquire(&wq->key);
>>>>>>>>>> bindings::lock_release(&wq->key);
>>>>>>>>>>
>>>>>>>>>> // enqueue the job
>>>>>>>>>> wq.enqueue(item, wq);
>>>>>>>>>>
>>>>>>>>>> // The signature of DmaFenceWorkItem::run() promises to arrange
>>>>>>>>>> // for it to be signalled.
>>>>>>>>>> self.i_promise_it_will_be_signalled()
>>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> I guess what's still missing is some sort of `transfer_to_hw()`
>>>>>>>>> function and way to flag the IRQ handler taking over the fence
>>>>>>>>> signaling token.
>>>>>>>>
>>>>>>>> Yes, transfer to hardware needs to be another piece of logic similar to
>>>>>>>> transfer to wq. And I imagine there are many ways such a transfer to
>>>>>>>> hardware could work.
>>>>>>>>
>>>>>>>> Unless you have a timeout on it, in which case the WillBeSignalled is
>>>>>>>> satisfied by the fact you have a timeout alone, and the signalling that
>>>>>>>> happens from the irq is just an opportunistic signal from outside the
>>>>>>>> dma fence signalling critical path.
>>>>>>>
>>>>>>> Yes and no. If it deadlocks in the completion WorkItem because of
>>>>>>> allocations (or any of the forbidden use cases), I think we want to
>>>>>>> catch that, because that's a sign fences are likely to end up with
>>>>>>> timeouts when they should have otherwise been signaled properly.
>>>>>>>
>>>>>>>> Well ... unless triggering timeouts can block on GFP_KERNEL
>>>>>>>> allocations...
>>>>>>>
>>>>>>> I mean, the timeout handler should also be considered a DMA-signalling
>>>>>>> path, and the same rules should apply to it.
>>>>>>
>>>>>> I guess that's fair. Even with a timeout you want both to be signalling
>>>>>> path.
>>>>>>
>>>>>> I guess more generally, if a fence is signalled by mechanism A or B,
>>>>>> whichever happens first, you have the choice between:
>>>>>
>>>>> That doesn't happen in practice.
>>>>>
>>>>> For each fence you only have one signaling path you need to guarantee
>>>>> forward progress for.
>>>>>
>>>>> All other signaling paths are just opportunistically optimizations
>>>>> which *can* signal the fence, but there is no guarantee that they
>>>>> will.
>>>>>
>>>>> We used to have some exceptions to that, especially around aborting
>>>>> submissions, but those turned out to be a really bad idea as well.
>>>>>
>>>>> Thinking more about it you should probably enforce that there is only
>>>>> one signaling path for each fence signaling.
>>>>
>>>> I'm not really convinced by this.
>>>>
>>>> First, the timeout path must be a fence signalling path because the
>>>> reason you have a timeout in the first place is because the hw might
>>>> never signal the fence. So if the timeout path deadlocks on a
>>>> kmalloc(GFP_KERNEL) and the hw never comes around to wake you up, boom.
>>>
>>> Mhm, good point. On the other hand the timeout handling should probably be considered part of the normal signaling path.
>>
>>
>> Why would anyone want to allocate in a timeout path in the first place – especially for jobqueue?
>>
>> Timeout -> close the associated ring. Done.
>> JobQueue will signal the done_fences with -ECANCELED.
>>
>> What would the driver want to allocate in its timeout path, i.e.: timeout callback.
>
> Maybe you need an allocation to hold the struct delayed_work_struct
> field that you use to enqueue the timeout?
And the workqueue were you schedule the delayed_work on must have the reclaim bit set.
Otherwise it can be that the workqueue finds all kthreads busy and tries to start a new one, e.g. allocating task structure......
You also potentially want device core dumps. Those usually use GFP_NOWAIT so that they can't cycle back and wait for some fence. The down side is that they can trivially fail under even light memory pressure.
Regards,
Christian.
>
> Alice
^ permalink raw reply [flat|nested] 103+ messages in thread* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-17 14:22 ` Christian König
@ 2026-02-17 14:28 ` Philipp Stanner
2026-02-17 14:44 ` Danilo Krummrich
` (2 more replies)
0 siblings, 3 replies; 103+ messages in thread
From: Philipp Stanner @ 2026-02-17 14:28 UTC (permalink / raw)
To: Christian König, Alice Ryhl, phasta
Cc: Boris Brezillon, Danilo Krummrich, David Airlie, Simona Vetter,
Gary Guo, Benno Lossin, Daniel Almeida, Joel Fernandes,
linux-kernel, dri-devel, rust-for-linux
On Tue, 2026-02-17 at 15:22 +0100, Christian König wrote:
> On 2/17/26 15:09, Alice Ryhl wrote:
> > On Tue, Feb 17, 2026 at 3:04 PM Philipp Stanner <phasta@mailbox.org> wrote:
> > > > > >
> > > > > >
[…]
> > > > > > Thinking more about it you should probably enforce that there is only
> > > > > > one signaling path for each fence signaling.
> > > > >
> > > > > I'm not really convinced by this.
> > > > >
> > > > > First, the timeout path must be a fence signalling path because the
> > > > > reason you have a timeout in the first place is because the hw might
> > > > > never signal the fence. So if the timeout path deadlocks on a
> > > > > kmalloc(GFP_KERNEL) and the hw never comes around to wake you up, boom.
> > > >
> > > > Mhm, good point. On the other hand the timeout handling should probably be considered part of the normal signaling path.
> > >
> > >
> > > Why would anyone want to allocate in a timeout path in the first place – especially for jobqueue?
> > >
> > > Timeout -> close the associated ring. Done.
> > > JobQueue will signal the done_fences with -ECANCELED.
> > >
> > > What would the driver want to allocate in its timeout path, i.e.: timeout callback.
> >
> > Maybe you need an allocation to hold the struct delayed_work_struct
> > field that you use to enqueue the timeout?
>
> And the workqueue were you schedule the delayed_work on must have the reclaim bit set.
>
> Otherwise it can be that the workqueue finds all kthreads busy and tries to start a new one, e.g. allocating task structure......
OK, maybe I'm lost, but what delayed_work?
The jobqueue's delayed work item gets either created on JQ::new() or in
jq.submit_job(). Why would anyone – that is: any driver – implement a
delayed work in its timeout callback?
That doesn't make sense.
JQ notifies the driver from its delayed_work through
timeout_callback(), and in that callback the driver closes the
associated firmware ring.
And it drops the JQ. So it is gone. A new JQ will get a new timeout
work item.
That's basically all the driver must ever do. Maybe some logging and
stuff.
With firmware scheduling it should really be that simple.
And signalling / notifying userspace gets done by jobqueue.
Right?
>
> You also potentially want device core dumps. Those usually use GFP_NOWAIT so that they can't cycle back and wait for some fence. The down side is that they can trivially fail under even light memory pressure.
Simply logging into dmesg should do the trick, shouldn't it?
P.
^ permalink raw reply [flat|nested] 103+ messages in thread* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-17 14:28 ` Philipp Stanner
@ 2026-02-17 14:44 ` Danilo Krummrich
2026-03-13 23:20 ` Matthew Brost
2026-02-17 15:01 ` Christian König
2026-02-18 9:50 ` Alice Ryhl
2 siblings, 1 reply; 103+ messages in thread
From: Danilo Krummrich @ 2026-02-17 14:44 UTC (permalink / raw)
To: Philipp Stanner
Cc: phasta, Christian König, Alice Ryhl, Boris Brezillon,
David Airlie, Simona Vetter, Gary Guo, Benno Lossin,
Daniel Almeida, Joel Fernandes, linux-kernel, dri-devel,
rust-for-linux
On Tue Feb 17, 2026 at 3:28 PM CET, Philipp Stanner wrote:
> OK, maybe I'm lost, but what delayed_work?
>
> The jobqueue's delayed work item gets either created on JQ::new() or in
> jq.submit_job(). Why would anyone – that is: any driver – implement a
> delayed work in its timeout callback?
>
> That doesn't make sense.
>
> JQ notifies the driver from its delayed_work through
> timeout_callback(), and in that callback the driver closes the
> associated firmware ring.
>
> And it drops the JQ. So it is gone. A new JQ will get a new timeout
> work item.
>
> That's basically all the driver must ever do. Maybe some logging and
> stuff.
>
> With firmware scheduling it should really be that simple.
>
> And signalling / notifying userspace gets done by jobqueue.
>
> Right?
Well, the timeout path is part of the fence signaling critical section until all
fences have been signaled.
But, if I, for instance, just kick off another work from the timeout handler and
subsequently signal all fences by dropping the JQ, this other work must not play
after DMA fence signaling rules anymore and is free to do whatever (maybe even
take a device coredump without needing GFP_NOWAIT).
Xe does this with xe_devcoredump_deferred_snap_work for instance.
>> You also potentially want device core dumps. Those usually use GFP_NOWAIT so
>> that they can't cycle back and wait for some fence. The down side is that
>> they can trivially fail under even light memory pressure.
>
> Simply logging into dmesg should do the trick, shouldn't it?
You can't "log" a device coredump into dmesg. :)
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-17 14:44 ` Danilo Krummrich
@ 2026-03-13 23:20 ` Matthew Brost
0 siblings, 0 replies; 103+ messages in thread
From: Matthew Brost @ 2026-03-13 23:20 UTC (permalink / raw)
To: Danilo Krummrich
Cc: Philipp Stanner, phasta, Christian König, Alice Ryhl,
Boris Brezillon, David Airlie, Simona Vetter, Gary Guo,
Benno Lossin, Daniel Almeida, Joel Fernandes, linux-kernel,
dri-devel, rust-for-linux
On Tue, Feb 17, 2026 at 03:44:06PM +0100, Danilo Krummrich wrote:
> On Tue Feb 17, 2026 at 3:28 PM CET, Philipp Stanner wrote:
> > OK, maybe I'm lost, but what delayed_work?
> >
> > The jobqueue's delayed work item gets either created on JQ::new() or in
> > jq.submit_job(). Why would anyone – that is: any driver – implement a
> > delayed work in its timeout callback?
> >
> > That doesn't make sense.
> >
> > JQ notifies the driver from its delayed_work through
> > timeout_callback(), and in that callback the driver closes the
> > associated firmware ring.
> >
> > And it drops the JQ. So it is gone. A new JQ will get a new timeout
> > work item.
> >
> > That's basically all the driver must ever do. Maybe some logging and
> > stuff.
> >
> > With firmware scheduling it should really be that simple.
> >
> > And signalling / notifying userspace gets done by jobqueue.
> >
> > Right?
>
> Well, the timeout path is part of the fence signaling critical section until all
> fences have been signaled.
>
> But, if I, for instance, just kick off another work from the timeout handler and
> subsequently signal all fences by dropping the JQ, this other work must not play
> after DMA fence signaling rules anymore and is free to do whatever (maybe even
> take a device coredump without needing GFP_NOWAIT).
>
> Xe does this with xe_devcoredump_deferred_snap_work for instance.
>
Yes.
> >> You also potentially want device core dumps. Those usually use GFP_NOWAIT so
> >> that they can't cycle back and wait for some fence. The down side is that
> >> they can trivially fail under even light memory pressure.
> >
> > Simply logging into dmesg should do the trick, shouldn't it?
>
The trick is to make devcoredump a multi-step process. In the TDR,
allocate as little memory as possible using NOWAIT—for example, record
the parts of objects that might disappear, or take references and store
them in the allocated “snap” object so they remain stable. This is the
snap step.
Next, kick a worker that looks at the snap and allocates the memory
needed for the capture step. Here you can safely save off BOs contents,
for example..
After that comes the print step, which converts the captured data into
human-readable output for the devcoredump.
This is actually a simplified view: the capture step can exceed the
kvmalloc size limit (default 2GB), so your print step may need to
trigger additional capture phases. Multiple capture phases also mean you
must hold onto the snap for a longer period.
There is also a time-complexity bug in the capture printer. I added
support for offset-based reads of the print data to prevent it from
becoming insanely expensive.
If you have any questions, feel free to ping me, or refer to
xe_devcoredump.c for the implementation.
Matt
> You can't "log" a device coredump into dmesg. :)
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-17 14:28 ` Philipp Stanner
2026-02-17 14:44 ` Danilo Krummrich
@ 2026-02-17 15:01 ` Christian König
2026-02-18 9:50 ` Alice Ryhl
2 siblings, 0 replies; 103+ messages in thread
From: Christian König @ 2026-02-17 15:01 UTC (permalink / raw)
To: phasta, Alice Ryhl
Cc: Boris Brezillon, Danilo Krummrich, David Airlie, Simona Vetter,
Gary Guo, Benno Lossin, Daniel Almeida, Joel Fernandes,
linux-kernel, dri-devel, rust-for-linux
On 2/17/26 15:28, Philipp Stanner wrote:
> On Tue, 2026-02-17 at 15:22 +0100, Christian König wrote:
>> On 2/17/26 15:09, Alice Ryhl wrote:
>>> On Tue, Feb 17, 2026 at 3:04 PM Philipp Stanner <phasta@mailbox.org> wrote:
>>>>>>>
>>>>>>>
>
> […]
>
>>>>>>> Thinking more about it you should probably enforce that there is only
>>>>>>> one signaling path for each fence signaling.
>>>>>>
>>>>>> I'm not really convinced by this.
>>>>>>
>>>>>> First, the timeout path must be a fence signalling path because the
>>>>>> reason you have a timeout in the first place is because the hw might
>>>>>> never signal the fence. So if the timeout path deadlocks on a
>>>>>> kmalloc(GFP_KERNEL) and the hw never comes around to wake you up, boom.
>>>>>
>>>>> Mhm, good point. On the other hand the timeout handling should probably be considered part of the normal signaling path.
>>>>
>>>>
>>>> Why would anyone want to allocate in a timeout path in the first place – especially for jobqueue?
>>>>
>>>> Timeout -> close the associated ring. Done.
>>>> JobQueue will signal the done_fences with -ECANCELED.
>>>>
>>>> What would the driver want to allocate in its timeout path, i.e.: timeout callback.
>>>
>>> Maybe you need an allocation to hold the struct delayed_work_struct
>>> field that you use to enqueue the timeout?
>>
>> And the workqueue were you schedule the delayed_work on must have the reclaim bit set.
>>
>> Otherwise it can be that the workqueue finds all kthreads busy and tries to start a new one, e.g. allocating task structure......
>
> OK, maybe I'm lost, but what delayed_work?
>
> The jobqueue's delayed work item gets either created on JQ::new() or in
> jq.submit_job(). Why would anyone – that is: any driver – implement a
> delayed work in its timeout callback?
>
> That doesn't make sense.
>
> JQ notifies the driver from its delayed_work through
> timeout_callback(), and in that callback the driver closes the
> associated firmware ring.
>
> And it drops the JQ. So it is gone. A new JQ will get a new timeout
> work item.
>
> That's basically all the driver must ever do. Maybe some logging and
> stuff.
>
> With firmware scheduling it should really be that simple.
>
> And signalling / notifying userspace gets done by jobqueue.
>
> Right?
Correct, I just wanted to point that jobqueue needs to keep the workqueue rules in mind as well.
But you really need to double check what drivers are doing. We had more than one kmalloc() added because we had a warning of to many variables on the stack in DAL/DC...
This turns into a debugging nightmare if you need to re-init the display during timeout handling.
>>
>> You also potentially want device core dumps. Those usually use GFP_NOWAIT so that they can't cycle back and wait for some fence. The down side is that they can trivially fail under even light memory pressure.
>
> Simply logging into dmesg should do the trick, shouldn't it?
Nope, not even remotely. A device core dump can easily be hundreds of megabyte in size.
In other words it's the HW state you usually attach to a crash report to figure out what's going on.
Christian.
>
>
> P.
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-17 14:28 ` Philipp Stanner
2026-02-17 14:44 ` Danilo Krummrich
2026-02-17 15:01 ` Christian König
@ 2026-02-18 9:50 ` Alice Ryhl
2026-02-18 10:48 ` Boris Brezillon
2 siblings, 1 reply; 103+ messages in thread
From: Alice Ryhl @ 2026-02-18 9:50 UTC (permalink / raw)
To: phasta
Cc: Christian König, Boris Brezillon, Danilo Krummrich,
David Airlie, Simona Vetter, Gary Guo, Benno Lossin,
Daniel Almeida, Joel Fernandes, linux-kernel, dri-devel,
rust-for-linux
On Tue, Feb 17, 2026 at 03:28:06PM +0100, Philipp Stanner wrote:
> On Tue, 2026-02-17 at 15:22 +0100, Christian König wrote:
> > On 2/17/26 15:09, Alice Ryhl wrote:
> > > On Tue, Feb 17, 2026 at 3:04 PM Philipp Stanner <phasta@mailbox.org> wrote:
> > > > > > >
> > > > > > >
>
> […]
>
> > > > > > > Thinking more about it you should probably enforce that there is only
> > > > > > > one signaling path for each fence signaling.
> > > > > >
> > > > > > I'm not really convinced by this.
> > > > > >
> > > > > > First, the timeout path must be a fence signalling path because the
> > > > > > reason you have a timeout in the first place is because the hw might
> > > > > > never signal the fence. So if the timeout path deadlocks on a
> > > > > > kmalloc(GFP_KERNEL) and the hw never comes around to wake you up, boom.
> > > > >
> > > > > Mhm, good point. On the other hand the timeout handling should probably be considered part of the normal signaling path.
> > > >
> > > >
> > > > Why would anyone want to allocate in a timeout path in the first place – especially for jobqueue?
> > > >
> > > > Timeout -> close the associated ring. Done.
> > > > JobQueue will signal the done_fences with -ECANCELED.
> > > >
> > > > What would the driver want to allocate in its timeout path, i.e.: timeout callback.
> > >
> > > Maybe you need an allocation to hold the struct delayed_work_struct
> > > field that you use to enqueue the timeout?
> >
> > And the workqueue were you schedule the delayed_work on must have the reclaim bit set.
> >
> > Otherwise it can be that the workqueue finds all kthreads busy and tries to start a new one, e.g. allocating task structure......
>
> OK, maybe I'm lost, but what delayed_work?
>
> The jobqueue's delayed work item gets either created on JQ::new() or in
> jq.submit_job(). Why would anyone – that is: any driver – implement a
> delayed work in its timeout callback?
>
> That doesn't make sense.
>
> JQ notifies the driver from its delayed_work through
> timeout_callback(), and in that callback the driver closes the
> associated firmware ring.
>
> And it drops the JQ. So it is gone. A new JQ will get a new timeout
> work item.
>
> That's basically all the driver must ever do. Maybe some logging and
> stuff.
>
> With firmware scheduling it should really be that simple.
>
> And signalling / notifying userspace gets done by jobqueue.
>
> Right?
What I'm getting at is that a driver author might attempt to implement
their own timeout logic instead of using the job queue, and if they do,
they might get it wrong in the way I described.
You're correct that they shouldn't do this. But you asked how a driver
author might get the timeout wrong, and doing it the wrong way is one
such way they might do it in the wrong way.
Alice
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-18 9:50 ` Alice Ryhl
@ 2026-02-18 10:48 ` Boris Brezillon
0 siblings, 0 replies; 103+ messages in thread
From: Boris Brezillon @ 2026-02-18 10:48 UTC (permalink / raw)
To: Alice Ryhl
Cc: phasta, Christian König, Danilo Krummrich, David Airlie,
Simona Vetter, Gary Guo, Benno Lossin, Daniel Almeida,
Joel Fernandes, linux-kernel, dri-devel, rust-for-linux
On Wed, 18 Feb 2026 09:50:56 +0000
Alice Ryhl <aliceryhl@google.com> wrote:
> On Tue, Feb 17, 2026 at 03:28:06PM +0100, Philipp Stanner wrote:
> > On Tue, 2026-02-17 at 15:22 +0100, Christian König wrote:
> > > On 2/17/26 15:09, Alice Ryhl wrote:
> > > > On Tue, Feb 17, 2026 at 3:04 PM Philipp Stanner <phasta@mailbox.org> wrote:
> > > > > > > >
> > > > > > > >
> >
> > […]
> >
> > > > > > > > Thinking more about it you should probably enforce that there is only
> > > > > > > > one signaling path for each fence signaling.
> > > > > > >
> > > > > > > I'm not really convinced by this.
> > > > > > >
> > > > > > > First, the timeout path must be a fence signalling path because the
> > > > > > > reason you have a timeout in the first place is because the hw might
> > > > > > > never signal the fence. So if the timeout path deadlocks on a
> > > > > > > kmalloc(GFP_KERNEL) and the hw never comes around to wake you up, boom.
> > > > > >
> > > > > > Mhm, good point. On the other hand the timeout handling should probably be considered part of the normal signaling path.
> > > > >
> > > > >
> > > > > Why would anyone want to allocate in a timeout path in the first place – especially for jobqueue?
> > > > >
> > > > > Timeout -> close the associated ring. Done.
> > > > > JobQueue will signal the done_fences with -ECANCELED.
> > > > >
> > > > > What would the driver want to allocate in its timeout path, i.e.: timeout callback.
> > > >
> > > > Maybe you need an allocation to hold the struct delayed_work_struct
> > > > field that you use to enqueue the timeout?
> > >
> > > And the workqueue were you schedule the delayed_work on must have the reclaim bit set.
> > >
> > > Otherwise it can be that the workqueue finds all kthreads busy and tries to start a new one, e.g. allocating task structure......
> >
> > OK, maybe I'm lost, but what delayed_work?
> >
> > The jobqueue's delayed work item gets either created on JQ::new() or in
> > jq.submit_job(). Why would anyone – that is: any driver – implement a
> > delayed work in its timeout callback?
> >
> > That doesn't make sense.
> >
> > JQ notifies the driver from its delayed_work through
> > timeout_callback(), and in that callback the driver closes the
> > associated firmware ring.
> >
> > And it drops the JQ. So it is gone. A new JQ will get a new timeout
> > work item.
> >
> > That's basically all the driver must ever do. Maybe some logging and
> > stuff.
> >
> > With firmware scheduling it should really be that simple.
> >
> > And signalling / notifying userspace gets done by jobqueue.
> >
> > Right?
>
> What I'm getting at is that a driver author might attempt to implement
> their own timeout logic instead of using the job queue, and if they do,
> they might get it wrong in the way I described.
>
> You're correct that they shouldn't do this. But you asked how a driver
> author might get the timeout wrong, and doing it the wrong way is one
> such way they might do it in the wrong way.
Are we back to discussing "how to ensure nothing prohibited happens in
the DMA signalling path?" or is this something else? I mean, I'm
convinced timeout handling should be part of the DMA-signalling path,
no matter if it's in common/well-audited code like JobQueue, or some
custom driver timeout handling (which I'm not advocating for, just to
be clear). As such, I believe we should ensure XxxDmaFence::signal()
(I'm using Xxx because the name of this Signal-able object is still
undecided AFAIK :-)) is called inside a DMA-signalling section. Note
that dma_fence_signal() declares a signalling section before signaling,
so this check would have to be done before calling dma_fence_signal() in
the XxxDmaFence::signal() implementation.
If we go this way, with
- DmaFenceWorkqueue+DmaFenceWork: generic abstractions for DMA-fence
constrained works
- DmaFenceThreadedHandler: generic abstraction for a DMA-fence
constrained threaded IRQ handler (raw IRQ handlers are already more
constrained than the DMA-fence signalling path, so we don't care)
- and potentially other helpers for other kind of deferred signalling
we should be covered.
I believe that covers the case Alice was describing where the driver
allocates a DelayedWork with GFP_KERNEL in a DMA-signalling path, which
is prohibited. So yeah, if a driver decides to go for its own watchdog
implementation signalling all fences manually, it will just be
constrained by the same rules. XxxDmaFence::signal() will yell at you
if you're not in an annotated DMA-fence signalling section, and if you
are and you're doing something prohibited it will also yell at you. The
only way to abuse this is if rust code decides to manually annotate a
section, which we can flag as unsafe to make it clear this is not
something you should play with unless you're well aware of the risks.
Am I missing something?
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-10 10:15 ` Alice Ryhl
` (3 preceding siblings ...)
2026-02-10 12:36 ` Boris Brezillon
@ 2026-02-10 12:49 ` Boris Brezillon
2026-02-10 12:56 ` Boris Brezillon
2026-02-10 13:26 ` Alice Ryhl
4 siblings, 2 replies; 103+ messages in thread
From: Boris Brezillon @ 2026-02-10 12:49 UTC (permalink / raw)
To: Alice Ryhl
Cc: Christian König, Philipp Stanner, phasta, Danilo Krummrich,
David Airlie, Simona Vetter, Gary Guo, Benno Lossin,
Daniel Almeida, Joel Fernandes, linux-kernel, dri-devel,
rust-for-linux
On Tue, 10 Feb 2026 10:15:04 +0000
Alice Ryhl <aliceryhl@google.com> wrote:
> /// The owner of this value must ensure that this fence is signalled.
> struct MustBeSignalled<'fence> { ... }
> /// Proof value indicating that the fence has either already been
> /// signalled, or it will be. The lifetime ensures that you cannot mix
> /// up the proof value.
> struct WillBeSignalled<'fence> { ... }
Sorry, I have more questions, unfortunately. Seems that
{Must,Will}BeSignalled are targeting specific fences (at least that's
what the doc and 'fence lifetime says), but in practice, the WorkItem
backing the scheduler can queue 0-N jobs (0 if no jobs have their deps
met, and N > 1 if more than one job is ready). Similarly, an IRQ
handler can signal 0-N fences (can be that the IRQ has nothing to do we
job completion, or, it can be that multiple jobs have completed). How
is this MustBeSignalled object going to be instantiated in practice if
it's done before the DmaFenceWorkItem::run() function is called?
^ permalink raw reply [flat|nested] 103+ messages in thread* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-10 12:49 ` Boris Brezillon
@ 2026-02-10 12:56 ` Boris Brezillon
2026-02-10 13:26 ` Alice Ryhl
1 sibling, 0 replies; 103+ messages in thread
From: Boris Brezillon @ 2026-02-10 12:56 UTC (permalink / raw)
To: Alice Ryhl
Cc: Christian König, Philipp Stanner, phasta, Danilo Krummrich,
David Airlie, Simona Vetter, Gary Guo, Benno Lossin,
Daniel Almeida, Joel Fernandes, linux-kernel, dri-devel,
rust-for-linux
On Tue, 10 Feb 2026 13:49:13 +0100
Boris Brezillon <boris.brezillon@collabora.com> wrote:
> On Tue, 10 Feb 2026 10:15:04 +0000
> Alice Ryhl <aliceryhl@google.com> wrote:
>
> > /// The owner of this value must ensure that this fence is signalled.
> > struct MustBeSignalled<'fence> { ... }
> > /// Proof value indicating that the fence has either already been
> > /// signalled, or it will be. The lifetime ensures that you cannot mix
> > /// up the proof value.
> > struct WillBeSignalled<'fence> { ... }
>
> Sorry, I have more questions, unfortunately. Seems that
> {Must,Will}BeSignalled are targeting specific fences (at least that's
> what the doc and 'fence lifetime says), but in practice, the WorkItem
> backing the scheduler can queue 0-N jobs (0 if no jobs have their deps
> met, and N > 1 if more than one job is ready). Similarly, an IRQ
> handler can signal 0-N fences (can be that the IRQ has nothing to do we
> job completion, or, it can be that multiple jobs have completed). How
> is this MustBeSignalled object going to be instantiated in practice if
> it's done before the DmaFenceWorkItem::run() function is called?
For the scheduler WorkItem (assuming a JobQueue model), it's kinda
doable, because this is a FIFO, and we can get the first job in the
queue (and thus the fence attached to this job) quite easily, but as
soon as it's a post-execution WorkItem or IRQHandler, we never know
when entering WorkItem::run()/ThreadedHandler::handle_threaded()
which job will be completed (if any).
^ permalink raw reply [flat|nested] 103+ messages in thread* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-10 12:49 ` Boris Brezillon
2026-02-10 12:56 ` Boris Brezillon
@ 2026-02-10 13:26 ` Alice Ryhl
2026-02-10 13:51 ` Boris Brezillon
1 sibling, 1 reply; 103+ messages in thread
From: Alice Ryhl @ 2026-02-10 13:26 UTC (permalink / raw)
To: Boris Brezillon
Cc: Christian König, Philipp Stanner, phasta, Danilo Krummrich,
David Airlie, Simona Vetter, Gary Guo, Benno Lossin,
Daniel Almeida, Joel Fernandes, linux-kernel, dri-devel,
rust-for-linux
On Tue, Feb 10, 2026 at 01:49:13PM +0100, Boris Brezillon wrote:
> On Tue, 10 Feb 2026 10:15:04 +0000
> Alice Ryhl <aliceryhl@google.com> wrote:
>
> > /// The owner of this value must ensure that this fence is signalled.
> > struct MustBeSignalled<'fence> { ... }
> > /// Proof value indicating that the fence has either already been
> > /// signalled, or it will be. The lifetime ensures that you cannot mix
> > /// up the proof value.
> > struct WillBeSignalled<'fence> { ... }
>
> Sorry, I have more questions, unfortunately. Seems that
> {Must,Will}BeSignalled are targeting specific fences (at least that's
> what the doc and 'fence lifetime says), but in practice, the WorkItem
> backing the scheduler can queue 0-N jobs (0 if no jobs have their deps
> met, and N > 1 if more than one job is ready). Similarly, an IRQ
> handler can signal 0-N fences (can be that the IRQ has nothing to do we
> job completion, or, it can be that multiple jobs have completed). How
> is this MustBeSignalled object going to be instantiated in practice if
> it's done before the DmaFenceWorkItem::run() function is called?
The {Must,Will}BeSignalled closure pair needs to wrap the piece of code
that ensures a specific fence is signalled. If you have code that
manages a collection of fences and invokes code for specific fences
depending on outside conditions, then that's a different matter.
After all, transfer_to_wq() has two components:
1. Logic to ensure any spawned workqueue job eventually gets to run.
2. Once the individual job runs, logic specific to the one fence ensures
that this one fence gets signalled.
And {Must,Will}BeSignalled exists to help model part (2.). But what you
described with the IRQ callback falls into (1.) instead, which is
outside the scope of {Must,Will}BeSignalled (or at least requires more
complex APIs).
Alice
^ permalink raw reply [flat|nested] 103+ messages in thread* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-10 13:26 ` Alice Ryhl
@ 2026-02-10 13:51 ` Boris Brezillon
2026-02-10 14:11 ` Alice Ryhl
0 siblings, 1 reply; 103+ messages in thread
From: Boris Brezillon @ 2026-02-10 13:51 UTC (permalink / raw)
To: Alice Ryhl
Cc: Christian König, Philipp Stanner, phasta, Danilo Krummrich,
David Airlie, Simona Vetter, Gary Guo, Benno Lossin,
Daniel Almeida, Joel Fernandes, linux-kernel, dri-devel,
rust-for-linux
On Tue, 10 Feb 2026 13:26:48 +0000
Alice Ryhl <aliceryhl@google.com> wrote:
> On Tue, Feb 10, 2026 at 01:49:13PM +0100, Boris Brezillon wrote:
> > On Tue, 10 Feb 2026 10:15:04 +0000
> > Alice Ryhl <aliceryhl@google.com> wrote:
> >
> > > /// The owner of this value must ensure that this fence is signalled.
> > > struct MustBeSignalled<'fence> { ... }
> > > /// Proof value indicating that the fence has either already been
> > > /// signalled, or it will be. The lifetime ensures that you cannot mix
> > > /// up the proof value.
> > > struct WillBeSignalled<'fence> { ... }
> >
> > Sorry, I have more questions, unfortunately. Seems that
> > {Must,Will}BeSignalled are targeting specific fences (at least that's
> > what the doc and 'fence lifetime says), but in practice, the WorkItem
> > backing the scheduler can queue 0-N jobs (0 if no jobs have their deps
> > met, and N > 1 if more than one job is ready). Similarly, an IRQ
> > handler can signal 0-N fences (can be that the IRQ has nothing to do we
> > job completion, or, it can be that multiple jobs have completed). How
> > is this MustBeSignalled object going to be instantiated in practice if
> > it's done before the DmaFenceWorkItem::run() function is called?
>
> The {Must,Will}BeSignalled closure pair needs to wrap the piece of code
> that ensures a specific fence is signalled. If you have code that
> manages a collection of fences and invokes code for specific fences
> depending on outside conditions, then that's a different matter.
>
> After all, transfer_to_wq() has two components:
> 1. Logic to ensure any spawned workqueue job eventually gets to run.
> 2. Once the individual job runs, logic specific to the one fence ensures
> that this one fence gets signalled.
Okay, that's a change compared to how things are modeled in C (and in
JobQueue) at the moment: the WorkItem is not embedded in a specific
job, it's something that's attached to the JobQueue. The idea being
that the WorkItem represents a task to be done on the queue itself
(check if the first element in the queue is ready for execution), not on
a particular job. Now, we could change that and have a per-job WorkItem,
but ultimately, we'll have to make sure jobs are dequeued in order
(deps on JobN can be met before deps on Job0, but we still want JobN to
be submitted after Job0), and we'd pay the WorkItem overhead once per
Job instead of once per JobQueue. Probably not the end of the world,
but it's worth considering, still.
> And {Must,Will}BeSignalled exists to help model part (2.). But what you
> described with the IRQ callback falls into (1.) instead, which is
> outside the scope of {Must,Will}BeSignalled (or at least requires more
> complex APIs).
For IRQ callbacks, it's not just about making sure they run, but also
making sure nothing in there can lead to deadlocks, which is basically
#2, except it's not scoped to a particular fence. It's just a "fences
can be signaled from there" marker. We could restrict it to "fences of
this particular implementation can be signaled from there" but not
"this particular fence instance will be signaled next, if any", because
that we don't know until we've walked some HW state to figure out which
job is complete and thus which fence we need to signal (the interrupt
we get is most likely multiplexing completion on multiple GPU contexts,
so before we can even get to our per-context in-flight-jobs FIFO, we
need to demux this thing).
^ permalink raw reply [flat|nested] 103+ messages in thread* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-10 13:51 ` Boris Brezillon
@ 2026-02-10 14:11 ` Alice Ryhl
2026-02-10 14:50 ` Boris Brezillon
0 siblings, 1 reply; 103+ messages in thread
From: Alice Ryhl @ 2026-02-10 14:11 UTC (permalink / raw)
To: Boris Brezillon
Cc: Christian König, Philipp Stanner, phasta, Danilo Krummrich,
David Airlie, Simona Vetter, Gary Guo, Benno Lossin,
Daniel Almeida, Joel Fernandes, linux-kernel, dri-devel,
rust-for-linux
On Tue, Feb 10, 2026 at 02:51:56PM +0100, Boris Brezillon wrote:
> On Tue, 10 Feb 2026 13:26:48 +0000
> Alice Ryhl <aliceryhl@google.com> wrote:
>
> > On Tue, Feb 10, 2026 at 01:49:13PM +0100, Boris Brezillon wrote:
> > > On Tue, 10 Feb 2026 10:15:04 +0000
> > > Alice Ryhl <aliceryhl@google.com> wrote:
> > >
> > > > /// The owner of this value must ensure that this fence is signalled.
> > > > struct MustBeSignalled<'fence> { ... }
> > > > /// Proof value indicating that the fence has either already been
> > > > /// signalled, or it will be. The lifetime ensures that you cannot mix
> > > > /// up the proof value.
> > > > struct WillBeSignalled<'fence> { ... }
> > >
> > > Sorry, I have more questions, unfortunately. Seems that
> > > {Must,Will}BeSignalled are targeting specific fences (at least that's
> > > what the doc and 'fence lifetime says), but in practice, the WorkItem
> > > backing the scheduler can queue 0-N jobs (0 if no jobs have their deps
> > > met, and N > 1 if more than one job is ready). Similarly, an IRQ
> > > handler can signal 0-N fences (can be that the IRQ has nothing to do we
> > > job completion, or, it can be that multiple jobs have completed). How
> > > is this MustBeSignalled object going to be instantiated in practice if
> > > it's done before the DmaFenceWorkItem::run() function is called?
> >
> > The {Must,Will}BeSignalled closure pair needs to wrap the piece of code
> > that ensures a specific fence is signalled. If you have code that
> > manages a collection of fences and invokes code for specific fences
> > depending on outside conditions, then that's a different matter.
> >
> > After all, transfer_to_wq() has two components:
> > 1. Logic to ensure any spawned workqueue job eventually gets to run.
> > 2. Once the individual job runs, logic specific to the one fence ensures
> > that this one fence gets signalled.
>
> Okay, that's a change compared to how things are modeled in C (and in
> JobQueue) at the moment: the WorkItem is not embedded in a specific
> job, it's something that's attached to the JobQueue. The idea being
> that the WorkItem represents a task to be done on the queue itself
> (check if the first element in the queue is ready for execution), not on
> a particular job. Now, we could change that and have a per-job WorkItem,
> but ultimately, we'll have to make sure jobs are dequeued in order
> (deps on JobN can be met before deps on Job0, but we still want JobN to
> be submitted after Job0), and we'd pay the WorkItem overhead once per
> Job instead of once per JobQueue. Probably not the end of the world,
> but it's worth considering, still.
It sounds like the fix here is to have transfer_to_job_queue() instead
of trying to do it at the workqueue level.
> > And {Must,Will}BeSignalled exists to help model part (2.). But what you
> > described with the IRQ callback falls into (1.) instead, which is
> > outside the scope of {Must,Will}BeSignalled (or at least requires more
> > complex APIs).
>
> For IRQ callbacks, it's not just about making sure they run, but also
> making sure nothing in there can lead to deadlocks, which is basically
> #2, except it's not scoped to a particular fence. It's just a "fences
> can be signaled from there" marker. We could restrict it to "fences of
> this particular implementation can be signaled from there" but not
> "this particular fence instance will be signaled next, if any", because
> that we don't know until we've walked some HW state to figure out which
> job is complete and thus which fence we need to signal (the interrupt
> we get is most likely multiplexing completion on multiple GPU contexts,
> so before we can even get to our per-context in-flight-jobs FIFO, we
> need to demux this thing).
All I can say is that this is a different use-case for the C api
dma_fence_begin_signalling(). This different usage also seems useful,
but it would be one that does not involve {Must,Will}BeSignalled
arguments at all.
After all, dma_fence_begin_signalling() only requires those arguments if
you want to convert a PrivateFence into a PublishedFence. (I guess a
better name is PublishableFence.) If you're not trying to prove that a
specific fence will be signalled, then you don't need the
{Must,Will}BeSignalled arguments.
Alice
^ permalink raw reply [flat|nested] 103+ messages in thread* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-10 14:11 ` Alice Ryhl
@ 2026-02-10 14:50 ` Boris Brezillon
2026-02-11 8:16 ` Alice Ryhl
0 siblings, 1 reply; 103+ messages in thread
From: Boris Brezillon @ 2026-02-10 14:50 UTC (permalink / raw)
To: Alice Ryhl
Cc: Christian König, Philipp Stanner, phasta, Danilo Krummrich,
David Airlie, Simona Vetter, Gary Guo, Benno Lossin,
Daniel Almeida, Joel Fernandes, linux-kernel, dri-devel,
rust-for-linux
On Tue, 10 Feb 2026 14:11:12 +0000
Alice Ryhl <aliceryhl@google.com> wrote:
> On Tue, Feb 10, 2026 at 02:51:56PM +0100, Boris Brezillon wrote:
> > On Tue, 10 Feb 2026 13:26:48 +0000
> > Alice Ryhl <aliceryhl@google.com> wrote:
> >
> > > On Tue, Feb 10, 2026 at 01:49:13PM +0100, Boris Brezillon wrote:
> > > > On Tue, 10 Feb 2026 10:15:04 +0000
> > > > Alice Ryhl <aliceryhl@google.com> wrote:
> > > >
> > > > > /// The owner of this value must ensure that this fence is signalled.
> > > > > struct MustBeSignalled<'fence> { ... }
> > > > > /// Proof value indicating that the fence has either already been
> > > > > /// signalled, or it will be. The lifetime ensures that you cannot mix
> > > > > /// up the proof value.
> > > > > struct WillBeSignalled<'fence> { ... }
> > > >
> > > > Sorry, I have more questions, unfortunately. Seems that
> > > > {Must,Will}BeSignalled are targeting specific fences (at least that's
> > > > what the doc and 'fence lifetime says), but in practice, the WorkItem
> > > > backing the scheduler can queue 0-N jobs (0 if no jobs have their deps
> > > > met, and N > 1 if more than one job is ready). Similarly, an IRQ
> > > > handler can signal 0-N fences (can be that the IRQ has nothing to do we
> > > > job completion, or, it can be that multiple jobs have completed). How
> > > > is this MustBeSignalled object going to be instantiated in practice if
> > > > it's done before the DmaFenceWorkItem::run() function is called?
> > >
> > > The {Must,Will}BeSignalled closure pair needs to wrap the piece of code
> > > that ensures a specific fence is signalled. If you have code that
> > > manages a collection of fences and invokes code for specific fences
> > > depending on outside conditions, then that's a different matter.
> > >
> > > After all, transfer_to_wq() has two components:
> > > 1. Logic to ensure any spawned workqueue job eventually gets to run.
> > > 2. Once the individual job runs, logic specific to the one fence ensures
> > > that this one fence gets signalled.
> >
> > Okay, that's a change compared to how things are modeled in C (and in
> > JobQueue) at the moment: the WorkItem is not embedded in a specific
> > job, it's something that's attached to the JobQueue. The idea being
> > that the WorkItem represents a task to be done on the queue itself
> > (check if the first element in the queue is ready for execution), not on
> > a particular job. Now, we could change that and have a per-job WorkItem,
> > but ultimately, we'll have to make sure jobs are dequeued in order
> > (deps on JobN can be met before deps on Job0, but we still want JobN to
> > be submitted after Job0), and we'd pay the WorkItem overhead once per
> > Job instead of once per JobQueue. Probably not the end of the world,
> > but it's worth considering, still.
>
> It sounds like the fix here is to have transfer_to_job_queue() instead
> of trying to do it at the workqueue level.
Hm, so Job would be something like that (naming/trait-def are just
suggestions to get the discussion going):
trait JobConsumer {
type FenceType;
type JobData;
fn run(self: MustBeSignalled<T::FenceType>) -> Result<WillBeSignaled<Self::FenceType>>;
}
struct Job<T: JobConsumer> {
fence: MustBeSignalled<T::FenceType>,
data: T::JobData,
}
I guess that would do.
And then we need to flag the WorkItem that's exposed by the
JobQueue as a DmaFenceWorkItem so that
bindings::dma_fence_begin_signalling() is called before entry and
lockdep can do its job and check that nothing forbidden happens in
this WorkItem.
>
> > > And {Must,Will}BeSignalled exists to help model part (2.). But what you
> > > described with the IRQ callback falls into (1.) instead, which is
> > > outside the scope of {Must,Will}BeSignalled (or at least requires more
> > > complex APIs).
> >
> > For IRQ callbacks, it's not just about making sure they run, but also
> > making sure nothing in there can lead to deadlocks, which is basically
> > #2, except it's not scoped to a particular fence. It's just a "fences
> > can be signaled from there" marker. We could restrict it to "fences of
> > this particular implementation can be signaled from there" but not
> > "this particular fence instance will be signaled next, if any", because
> > that we don't know until we've walked some HW state to figure out which
> > job is complete and thus which fence we need to signal (the interrupt
> > we get is most likely multiplexing completion on multiple GPU contexts,
> > so before we can even get to our per-context in-flight-jobs FIFO, we
> > need to demux this thing).
>
> All I can say is that this is a different use-case for the C api
> dma_fence_begin_signalling(). This different usage also seems useful,
> but it would be one that does not involve {Must,Will}BeSignalled
> arguments at all.
>
> After all, dma_fence_begin_signalling() only requires those arguments if
> you want to convert a PrivateFence into a PublishedFence. (I guess a
> better name is PublishableFence.) If you're not trying to prove that a
> specific fence will be signalled, then you don't need the
> {Must,Will}BeSignalled arguments.
Okay, so that would be another function returning some sort of guard
then? What I find confusing is the fact
dma_fence::dma_fence_begin_signalling() matches the C function name
which is not per-fence, but just this lock-guard model flagging a
section from which any fence can be signalled, so maybe we should
name your dma_fence_begin_signalling() proposal differently, dunno.
^ permalink raw reply [flat|nested] 103+ messages in thread* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-10 14:50 ` Boris Brezillon
@ 2026-02-11 8:16 ` Alice Ryhl
2026-02-11 9:20 ` Boris Brezillon
0 siblings, 1 reply; 103+ messages in thread
From: Alice Ryhl @ 2026-02-11 8:16 UTC (permalink / raw)
To: Boris Brezillon
Cc: Christian König, Philipp Stanner, phasta, Danilo Krummrich,
David Airlie, Simona Vetter, Gary Guo, Benno Lossin,
Daniel Almeida, Joel Fernandes, linux-kernel, dri-devel,
rust-for-linux
On Tue, Feb 10, 2026 at 03:50:25PM +0100, Boris Brezillon wrote:
> On Tue, 10 Feb 2026 14:11:12 +0000
> Alice Ryhl <aliceryhl@google.com> wrote:
>
> > On Tue, Feb 10, 2026 at 02:51:56PM +0100, Boris Brezillon wrote:
> > > On Tue, 10 Feb 2026 13:26:48 +0000
> > > Alice Ryhl <aliceryhl@google.com> wrote:
> > >
> > > > On Tue, Feb 10, 2026 at 01:49:13PM +0100, Boris Brezillon wrote:
> > > > > On Tue, 10 Feb 2026 10:15:04 +0000
> > > > > Alice Ryhl <aliceryhl@google.com> wrote:
> > > > >
> > > > > > /// The owner of this value must ensure that this fence is signalled.
> > > > > > struct MustBeSignalled<'fence> { ... }
> > > > > > /// Proof value indicating that the fence has either already been
> > > > > > /// signalled, or it will be. The lifetime ensures that you cannot mix
> > > > > > /// up the proof value.
> > > > > > struct WillBeSignalled<'fence> { ... }
> > > > >
> > > > > Sorry, I have more questions, unfortunately. Seems that
> > > > > {Must,Will}BeSignalled are targeting specific fences (at least that's
> > > > > what the doc and 'fence lifetime says), but in practice, the WorkItem
> > > > > backing the scheduler can queue 0-N jobs (0 if no jobs have their deps
> > > > > met, and N > 1 if more than one job is ready). Similarly, an IRQ
> > > > > handler can signal 0-N fences (can be that the IRQ has nothing to do we
> > > > > job completion, or, it can be that multiple jobs have completed). How
> > > > > is this MustBeSignalled object going to be instantiated in practice if
> > > > > it's done before the DmaFenceWorkItem::run() function is called?
> > > >
> > > > The {Must,Will}BeSignalled closure pair needs to wrap the piece of code
> > > > that ensures a specific fence is signalled. If you have code that
> > > > manages a collection of fences and invokes code for specific fences
> > > > depending on outside conditions, then that's a different matter.
> > > >
> > > > After all, transfer_to_wq() has two components:
> > > > 1. Logic to ensure any spawned workqueue job eventually gets to run.
> > > > 2. Once the individual job runs, logic specific to the one fence ensures
> > > > that this one fence gets signalled.
> > >
> > > Okay, that's a change compared to how things are modeled in C (and in
> > > JobQueue) at the moment: the WorkItem is not embedded in a specific
> > > job, it's something that's attached to the JobQueue. The idea being
> > > that the WorkItem represents a task to be done on the queue itself
> > > (check if the first element in the queue is ready for execution), not on
> > > a particular job. Now, we could change that and have a per-job WorkItem,
> > > but ultimately, we'll have to make sure jobs are dequeued in order
> > > (deps on JobN can be met before deps on Job0, but we still want JobN to
> > > be submitted after Job0), and we'd pay the WorkItem overhead once per
> > > Job instead of once per JobQueue. Probably not the end of the world,
> > > but it's worth considering, still.
> >
> > It sounds like the fix here is to have transfer_to_job_queue() instead
> > of trying to do it at the workqueue level.
>
> Hm, so Job would be something like that (naming/trait-def are just
> suggestions to get the discussion going):
>
> trait JobConsumer {
> type FenceType;
> type JobData;
>
> fn run(self: MustBeSignalled<T::FenceType>) -> Result<WillBeSignaled<Self::FenceType>>;
> }
>
> struct Job<T: JobConsumer> {
> fence: MustBeSignalled<T::FenceType>,
> data: T::JobData,
> }
The fence field of Job would be PublishedFence or PrivateFence (or just
DriverDmaFence). The MustBeSignalled/WillBeSignaled types should only
exist temporarily in a function scope.
Any time you transfer from one function scope to another (like our
transfer_to_job_queue() or transfer_to_wq() examples), that results in
finishing the MustBeSignalled/WillBeSignaled scope on one thread and
creating a new MustBeSignalled/WillBeSignaled scope on another thread.
One could imagine a model where there is no lifetime and you can carry
it around as you wish. That model works okay in most regards, but it
gives up the ability to ensure that dma_fence_lockdep_map is properly
configured to catch mistakes.
The lifetime prohibits you from using the normal ownership semantics to
e.g. transfer the MustBeSignalled into a random workqueue, enforcing
that you can only transfer it into a workqueue by using the provided
methods, which sets up the lockdep dependencies correctly and ensures
that dma_fence_lockdep_map is taken in the workqueue job too.
> I guess that would do.
>
> And then we need to flag the WorkItem that's exposed by the
> JobQueue as a DmaFenceWorkItem so that
> bindings::dma_fence_begin_signalling() is called before entry and
> lockdep can do its job and check that nothing forbidden happens in
> this WorkItem.
In the case of JobQueue, it may make sense to just have the job queue
implementation do that manually. I do not think the workqueue-level API
can fully enforce that the job queue can't make mistakes here.
> > > > And {Must,Will}BeSignalled exists to help model part (2.). But what you
> > > > described with the IRQ callback falls into (1.) instead, which is
> > > > outside the scope of {Must,Will}BeSignalled (or at least requires more
> > > > complex APIs).
> > >
> > > For IRQ callbacks, it's not just about making sure they run, but also
> > > making sure nothing in there can lead to deadlocks, which is basically
> > > #2, except it's not scoped to a particular fence. It's just a "fences
> > > can be signaled from there" marker. We could restrict it to "fences of
> > > this particular implementation can be signaled from there" but not
> > > "this particular fence instance will be signaled next, if any", because
> > > that we don't know until we've walked some HW state to figure out which
> > > job is complete and thus which fence we need to signal (the interrupt
> > > we get is most likely multiplexing completion on multiple GPU contexts,
> > > so before we can even get to our per-context in-flight-jobs FIFO, we
> > > need to demux this thing).
> >
> > All I can say is that this is a different use-case for the C api
> > dma_fence_begin_signalling(). This different usage also seems useful,
> > but it would be one that does not involve {Must,Will}BeSignalled
> > arguments at all.
> >
> > After all, dma_fence_begin_signalling() only requires those arguments if
> > you want to convert a PrivateFence into a PublishedFence. (I guess a
> > better name is PublishableFence.) If you're not trying to prove that a
> > specific fence will be signalled, then you don't need the
> > {Must,Will}BeSignalled arguments.
>
> Okay, so that would be another function returning some sort of guard
> then? What I find confusing is the fact
> dma_fence::dma_fence_begin_signalling() matches the C function name
> which is not per-fence, but just this lock-guard model flagging a
> section from which any fence can be signalled, so maybe we should
> name your dma_fence_begin_signalling() proposal differently, dunno.
Yes we would need multiple methods that call dma_fence_begin_signalling()
depending on why you are calling it.
Alice
^ permalink raw reply [flat|nested] 103+ messages in thread* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-11 8:16 ` Alice Ryhl
@ 2026-02-11 9:20 ` Boris Brezillon
0 siblings, 0 replies; 103+ messages in thread
From: Boris Brezillon @ 2026-02-11 9:20 UTC (permalink / raw)
To: Alice Ryhl
Cc: Christian König, Philipp Stanner, phasta, Danilo Krummrich,
David Airlie, Simona Vetter, Gary Guo, Benno Lossin,
Daniel Almeida, Joel Fernandes, linux-kernel, dri-devel,
rust-for-linux
On Wed, 11 Feb 2026 08:16:38 +0000
Alice Ryhl <aliceryhl@google.com> wrote:
> On Tue, Feb 10, 2026 at 03:50:25PM +0100, Boris Brezillon wrote:
> > On Tue, 10 Feb 2026 14:11:12 +0000
> > Alice Ryhl <aliceryhl@google.com> wrote:
> >
> > > On Tue, Feb 10, 2026 at 02:51:56PM +0100, Boris Brezillon wrote:
> > > > On Tue, 10 Feb 2026 13:26:48 +0000
> > > > Alice Ryhl <aliceryhl@google.com> wrote:
> > > >
> > > > > On Tue, Feb 10, 2026 at 01:49:13PM +0100, Boris Brezillon wrote:
> > > > > > On Tue, 10 Feb 2026 10:15:04 +0000
> > > > > > Alice Ryhl <aliceryhl@google.com> wrote:
> > > > > >
> > > > > > > /// The owner of this value must ensure that this fence is signalled.
> > > > > > > struct MustBeSignalled<'fence> { ... }
> > > > > > > /// Proof value indicating that the fence has either already been
> > > > > > > /// signalled, or it will be. The lifetime ensures that you cannot mix
> > > > > > > /// up the proof value.
> > > > > > > struct WillBeSignalled<'fence> { ... }
> > > > > >
> > > > > > Sorry, I have more questions, unfortunately. Seems that
> > > > > > {Must,Will}BeSignalled are targeting specific fences (at least that's
> > > > > > what the doc and 'fence lifetime says), but in practice, the WorkItem
> > > > > > backing the scheduler can queue 0-N jobs (0 if no jobs have their deps
> > > > > > met, and N > 1 if more than one job is ready). Similarly, an IRQ
> > > > > > handler can signal 0-N fences (can be that the IRQ has nothing to do we
> > > > > > job completion, or, it can be that multiple jobs have completed). How
> > > > > > is this MustBeSignalled object going to be instantiated in practice if
> > > > > > it's done before the DmaFenceWorkItem::run() function is called?
> > > > >
> > > > > The {Must,Will}BeSignalled closure pair needs to wrap the piece of code
> > > > > that ensures a specific fence is signalled. If you have code that
> > > > > manages a collection of fences and invokes code for specific fences
> > > > > depending on outside conditions, then that's a different matter.
> > > > >
> > > > > After all, transfer_to_wq() has two components:
> > > > > 1. Logic to ensure any spawned workqueue job eventually gets to run.
> > > > > 2. Once the individual job runs, logic specific to the one fence ensures
> > > > > that this one fence gets signalled.
> > > >
> > > > Okay, that's a change compared to how things are modeled in C (and in
> > > > JobQueue) at the moment: the WorkItem is not embedded in a specific
> > > > job, it's something that's attached to the JobQueue. The idea being
> > > > that the WorkItem represents a task to be done on the queue itself
> > > > (check if the first element in the queue is ready for execution), not on
> > > > a particular job. Now, we could change that and have a per-job WorkItem,
> > > > but ultimately, we'll have to make sure jobs are dequeued in order
> > > > (deps on JobN can be met before deps on Job0, but we still want JobN to
> > > > be submitted after Job0), and we'd pay the WorkItem overhead once per
> > > > Job instead of once per JobQueue. Probably not the end of the world,
> > > > but it's worth considering, still.
> > >
> > > It sounds like the fix here is to have transfer_to_job_queue() instead
> > > of trying to do it at the workqueue level.
> >
> > Hm, so Job would be something like that (naming/trait-def are just
> > suggestions to get the discussion going):
> >
> > trait JobConsumer {
> > type FenceType;
> > type JobData;
> >
> > fn run(self: MustBeSignalled<T::FenceType>) -> Result<WillBeSignaled<Self::FenceType>>;
> > }
> >
> > struct Job<T: JobConsumer> {
> > fence: MustBeSignalled<T::FenceType>,
> > data: T::JobData,
> > }
>
> The fence field of Job would be PublishedFence or PrivateFence (or just
> DriverDmaFence). The MustBeSignalled/WillBeSignaled types should only
> exist temporarily in a function scope.
>
> Any time you transfer from one function scope to another (like our
> transfer_to_job_queue() or transfer_to_wq() examples), that results in
> finishing the MustBeSignalled/WillBeSignaled scope on one thread and
> creating a new MustBeSignalled/WillBeSignaled scope on another thread.
Makes sense.
>
> One could imagine a model where there is no lifetime and you can carry
> it around as you wish. That model works okay in most regards, but it
> gives up the ability to ensure that dma_fence_lockdep_map is properly
> configured to catch mistakes.
>
> The lifetime prohibits you from using the normal ownership semantics to
> e.g. transfer the MustBeSignalled into a random workqueue, enforcing
> that you can only transfer it into a workqueue by using the provided
> methods, which sets up the lockdep dependencies correctly and ensures
> that dma_fence_lockdep_map is taken in the workqueue job too.
>
> > I guess that would do.
> >
> > And then we need to flag the WorkItem that's exposed by the
> > JobQueue as a DmaFenceWorkItem so that
> > bindings::dma_fence_begin_signalling() is called before entry and
> > lockdep can do its job and check that nothing forbidden happens in
> > this WorkItem.
>
> In the case of JobQueue, it may make sense to just have the job queue
> implementation do that manually. I do not think the workqueue-level API
> can fully enforce that the job queue can't make mistakes here.
Sure, it can't ensure the fence will be signaled in finite time
(like, you can't prevent an infinite loop, or jobs being dropped on
the floor without having their fences signaled), but it could at
least check that no prohibited ops (blocking allocs, prohibited
locks taken, etc) are done in the common JobQueue implementation if we
introduce some sort of MaySignalDmaFencesWorkItem
abstract (call it what you want, I just made the name super explicit
for clarity) that does the annotation around the ::run(), and then have
JobQueue use this instead of a regular WorkItem.
Note that we'll need this MaySignalDmaFencesWorkItem (which is basically
the thing I've been describing in my previous replies) in Tyr if we
want to mimic Panthor, because the IRQHandler doesn't directly signal
the job fences. We have an extra work item that's scheduled when we
receive FW events related to scheduling, and we do the fence
signalling from a workqueue context after having demuxed all the events
and extracted which GPU context made progress.
Of course, devs can very much call some
{Driver,Published}DmaFence::may_signal_fences() (which would return this
Guard we were discussing) manually in their WorkItem::run()
implementation. But then it becomes an explicit operation, with the
risk of forgetting (intentionally or not) to flag those sections as
being part of the signalling path. If we make it an explicit object,
with a dedicated DmaFenceWorkqueue abstract preventing anything but
MaySignalDmaFencesWorkItem from being scheduled on it, all of a sudden,
this explicit model becomes implicit, and it strengthen the
requirements, which is a good thing, I think.
>
> > > > > And {Must,Will}BeSignalled exists to help model part (2.). But what you
> > > > > described with the IRQ callback falls into (1.) instead, which is
> > > > > outside the scope of {Must,Will}BeSignalled (or at least requires more
> > > > > complex APIs).
> > > >
> > > > For IRQ callbacks, it's not just about making sure they run, but also
> > > > making sure nothing in there can lead to deadlocks, which is basically
> > > > #2, except it's not scoped to a particular fence. It's just a "fences
> > > > can be signaled from there" marker. We could restrict it to "fences of
> > > > this particular implementation can be signaled from there" but not
> > > > "this particular fence instance will be signaled next, if any", because
> > > > that we don't know until we've walked some HW state to figure out which
> > > > job is complete and thus which fence we need to signal (the interrupt
> > > > we get is most likely multiplexing completion on multiple GPU contexts,
> > > > so before we can even get to our per-context in-flight-jobs FIFO, we
> > > > need to demux this thing).
> > >
> > > All I can say is that this is a different use-case for the C api
> > > dma_fence_begin_signalling(). This different usage also seems useful,
> > > but it would be one that does not involve {Must,Will}BeSignalled
> > > arguments at all.
> > >
> > > After all, dma_fence_begin_signalling() only requires those arguments if
> > > you want to convert a PrivateFence into a PublishedFence. (I guess a
> > > better name is PublishableFence.) If you're not trying to prove that a
> > > specific fence will be signalled, then you don't need the
> > > {Must,Will}BeSignalled arguments.
> >
> > Okay, so that would be another function returning some sort of guard
> > then? What I find confusing is the fact
> > dma_fence::dma_fence_begin_signalling() matches the C function name
> > which is not per-fence, but just this lock-guard model flagging a
> > section from which any fence can be signalled, so maybe we should
> > name your dma_fence_begin_signalling() proposal differently, dunno.
>
> Yes we would need multiple methods that call dma_fence_begin_signalling()
> depending on why you are calling it.
Ack.
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-10 8:38 ` Alice Ryhl
2026-02-10 9:06 ` Philipp Stanner
2026-02-10 9:15 ` Boris Brezillon
@ 2026-02-10 9:26 ` Christian König
2 siblings, 0 replies; 103+ messages in thread
From: Christian König @ 2026-02-10 9:26 UTC (permalink / raw)
To: Alice Ryhl
Cc: Boris Brezillon, Philipp Stanner, phasta, Danilo Krummrich,
David Airlie, Simona Vetter, Gary Guo, Benno Lossin,
Daniel Almeida, Joel Fernandes, linux-kernel, dri-devel,
rust-for-linux
On 2/10/26 09:38, Alice Ryhl wrote:
> On Tue, Feb 10, 2026 at 09:16:34AM +0100, Christian König wrote:
>> On 2/9/26 15:58, Boris Brezillon wrote:
>>> On Mon, 09 Feb 2026 09:19:46 +0100
>>> Philipp Stanner <phasta@mailbox.org> wrote:
>>>
>>>> On Fri, 2026-02-06 at 11:23 +0100, Danilo Krummrich wrote:
>>>>> On Thu Feb 5, 2026 at 9:57 AM CET, Boris Brezillon wrote:
>>>>>> On Tue, 3 Feb 2026 09:14:01 +0100
>>>>>> Philipp Stanner <phasta@kernel.org> wrote:
>>>>>> Unfortunately, I don't know how to translate that in rust, but we
>>>>>> need a way to check if any path code path does a DmaFence.signal(),
>>>>>> go back to the entry point (for a WorkItem, that would be
>>>>>> WorkItem::run() for instance), and make it a DmaFenceSignallingPath.
>>>>>> Not only that, but we need to know all the deps that make it so
>>>>>> this path can be called (if I take the WorkItem example, that would
>>>>>> be the path that leads to the WorkItem being scheduled).
>>>>>
>>>>> I think we need a guard object for this that is not Send, just like for any
>>>>> other lock.
>>>>>
>>>>> Internally, those markers rely on lockdep, i.e. they just acquire and release a
>>>>> "fake" lock.
>>>>
>>>> The guard object would be created through fence.begin_signalling(), wouldn't it?
>>>
>>> It shouldn't be a (&self)-method, because at the start of a DMA
>>> signaling path, you don't necessarily know which fence you're going to
>>> signal (you might actually signal several of them).
>>>
>>>> And when it drops you call dma_fence_end_signalling()?
>>>
>>> Yep, dma_fence_end_signalling() should be called when the guard is
>>> dropped.
>>>
>>>>
>>>> How would that ensure that the driver actually marks the signalling region correctly?
>>>
>>> Nothing, and that's a problem we have in C: you have no way of telling
>>> which code section is going to be a DMA-signaling path. I can't think
>>> of any way to make that safer in rust, unfortunately. The best I can
>>> think of would be to
>>>
>>> - Have a special DmaFenceSignalWorkItem (wrapper a WorkItem with extra
>>> constraints) that's designed for DMA-fence signaling, and that takes
>>> the DmaSignaling guard around the ::run() call.
>>> - We would then need to ensure that any code path scheduling this work
>>> item is also in a DMA-signaling path by taking a ref to the
>>> DmaSignalingGuard. This of course doesn't guarantee that the section
>>> is wide enough to prevent any non-authorized operations in any path
>>> leading to this WorkItem scheduling, but it would at least force the
>>> caller to consider the problem.
>>
>> On the C side I have a patch set which does something very similar.
>>
>> It's basically a WARN_ON_ONCE() which triggers as soon as you try to
>> signal a DMA fence from an IOCTL, or more specific process context.
>>
>> Signaling a DMA fence from interrupt context, a work item or kernel
>> thread is still allowed, there is just the hole that you can schedule
>> a work item from process context as well.
>>
>> The major problem with that patch set is that we have tons of very
>> hacky signaling paths in drivers already because we initially didn't
>> knew how much trouble getting this wrong causes.
>>
>> I'm strongly in favor of getting this right for the rust side from the
>> beginning and enforcing strict rules for every code trying to
>> implement a DMA fence.
>
> Hmm. Could you say a bit more about what the rules are? I just re-read
> the comments in dma-fence.c, but I have some questions.
Oh that is tricky, we have tried to explain and document that numerous times.
For a good start see that here: https://kernel.org/doc/html/v5.9/driver-api/dma-buf.html#indefinite-dma-fences
There was also a really good talk on LPC 2021 from Faith on that topic.
Unfortunately the understanding usually only comes when things start to fail at the customer and people start to realize that their design approach doesn't work and by then it's already UAPI....
> First, how does the signalling annotation work when the signalling path
> crosses thread boundaries?
It doesn't. The annotation done by Sima is using lockdep in a rather unusual way and we would need to extend lockdep a bit for that.
> For example, let's say I call an ioctl to
> perform an async VM_BIND, then the dma fence signalling critical path
> starts in the ioctl, but then it moves into a workqueue and finishes
> there, right?
Perfectly correct, yes.
And in all those code paths you can't allocate memory, nor take locks under which memory is allocated or more generally waited in any way for the fence to signal.
> Second, it looks like we have the same challenge as with irq locks where
> you must properly nest dma_fence_begin_signalling() regions, and can't
> e.g. do this:
>
> c1 = dma_fence_begin_signalling()
> c2 = dma_fence_begin_signalling()
> dma_fence_end_signalling(c1)
> dma_fence_end_signalling(c2)
Oh, good point as well! Lockdep will indeed start to scream here while that is perfectly valid.
Regards,
Christian.
>
> Alice
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-03 8:14 ` [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions Philipp Stanner
2026-02-05 8:57 ` Boris Brezillon
@ 2026-02-05 10:16 ` Boris Brezillon
2026-02-05 13:16 ` Gary Guo
2026-02-09 11:30 ` Alice Ryhl
2 siblings, 1 reply; 103+ messages in thread
From: Boris Brezillon @ 2026-02-05 10:16 UTC (permalink / raw)
To: Philipp Stanner
Cc: David Airlie, Simona Vetter, Danilo Krummrich, Alice Ryhl,
Gary Guo, Benno Lossin, Christian König, Daniel Almeida,
Joel Fernandes, linux-kernel, dri-devel, rust-for-linux
On Tue, 3 Feb 2026 09:14:01 +0100
Philipp Stanner <phasta@kernel.org> wrote:
> +/// A synchronization primitive mainly for GPU drivers.
> +///
> +/// DmaFences are always reference counted. The typical use case is that one side registers
> +/// callbacks on the fence which will perform a certain action (such as queueing work) once the
> +/// other side signals the fence.
> +///
> +/// # Examples
> +///
> +/// ```
> +/// use kernel::sync::{Arc, ArcBorrow, DmaFence, DmaFenceCtx, DmaFenceCb, DmaFenceCbFunc};
> +/// use core::sync::atomic::{self, AtomicBool};
> +///
> +/// static mut CHECKER: AtomicBool = AtomicBool::new(false);
> +///
> +/// struct CallbackData {
> +/// i: u32,
> +/// }
> +///
> +/// impl CallbackData {
> +/// fn new() -> Self {
> +/// Self { i: 9 }
> +/// }
> +/// }
> +///
> +/// impl DmaFenceCbFunc for CallbackData {
> +/// fn callback(cb: Pin<KBox<DmaFenceCb<Self>>>) where Self: Sized {
> +/// assert_eq!(cb.data.i, 9);
> +/// // SAFETY: Just to have an easy way for testing. This cannot race with the checker
> +/// // because the fence signalling callbacks are executed synchronously.
> +/// unsafe { CHECKER.store(true, atomic::Ordering::Relaxed); }
> +/// }
> +/// }
> +///
> +/// struct DriverData {
> +/// i: u32,
> +/// }
> +///
> +/// impl DriverData {
> +/// fn new() -> Self {
> +/// Self { i: 5 }
> +/// }
> +/// }
> +///
> +/// let data = DriverData::new();
> +/// let fctx = DmaFenceCtx::new()?;
> +///
> +/// let mut fence = fctx.as_arc_borrow().new_fence(data)?;
> +///
> +/// let cb_data = CallbackData::new();
> +/// fence.register_callback(cb_data);
> +/// // fence.begin_signalling();
> +/// fence.signal()?;
> +/// // Now check wehether the callback was actually executed.
> +/// // SAFETY: `fence.signal()` above works sequentially. We just check here whether the signalling
> +/// // actually did set the boolean correctly.
> +/// unsafe { assert_eq!(CHECKER.load(atomic::Ordering::Relaxed), true); }
> +///
> +/// Ok::<(), Error>(())
> +/// ```
> +#[pin_data]
> +pub struct DmaFence<T> {
> + /// The actual dma_fence passed to C.
> + #[pin]
> + inner: Opaque<bindings::dma_fence>,
> + /// User data.
> + #[pin]
> + data: T,
A DmaFence is a cross-device synchronization mechanism that can (and
will) cross the driver boundary (one driver can wait on a fence emitted
by a different driver). As such, I don't think embedding a generic T in
the DmaFence and considering it's the object being passed around is
going to work, because, how can one driver know the T chosen by the
driver that created the fence? If you want to have some fence emitter
data attached to the DmaFence allocation, you'll need two kind of
objects:
- one that's type agnostic and on which you can do the callback
registration/unregistration, signalling checks, and generally all
type-agnostic operations. That's basically just a wrapper around a
bindings::dma_fence implementing AlwaysRefCounted.
- one that has the extra data and fctx, with a way to transmute from a
generic fence to a implementer specific one in case the driver wants
to do something special when waiting on its own fences (check done
with the fence ops in C, I don't know how that translates in rust)
> + /// Marks whether the fence is currently in the signalling critical section.
> + signalling: bool,
> + /// A boolean needed for the C backend's lockdep guard.
> + signalling_cookie: bool,
> + /// A reference to the associated [`DmaFenceCtx`] so that it cannot be dropped while there are
> + /// still fences around.
> + fctx: Arc<DmaFenceCtx>,
> +}
^ permalink raw reply [flat|nested] 103+ messages in thread* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-05 10:16 ` Boris Brezillon
@ 2026-02-05 13:16 ` Gary Guo
2026-02-06 9:32 ` Philipp Stanner
0 siblings, 1 reply; 103+ messages in thread
From: Gary Guo @ 2026-02-05 13:16 UTC (permalink / raw)
To: Boris Brezillon, Philipp Stanner
Cc: David Airlie, Simona Vetter, Danilo Krummrich, Alice Ryhl,
Gary Guo, Benno Lossin, Christian König, Daniel Almeida,
Joel Fernandes, linux-kernel, dri-devel, rust-for-linux
On Thu Feb 5, 2026 at 10:16 AM GMT, Boris Brezillon wrote:
> On Tue, 3 Feb 2026 09:14:01 +0100
> Philipp Stanner <phasta@kernel.org> wrote:
>
>> +/// A synchronization primitive mainly for GPU drivers.
>> +///
>> +/// DmaFences are always reference counted. The typical use case is that one side registers
>> +/// callbacks on the fence which will perform a certain action (such as queueing work) once the
>> +/// other side signals the fence.
>> +///
>> +/// # Examples
>> +///
>> +/// ```
>> +/// use kernel::sync::{Arc, ArcBorrow, DmaFence, DmaFenceCtx, DmaFenceCb, DmaFenceCbFunc};
>> +/// use core::sync::atomic::{self, AtomicBool};
>> +///
>> +/// static mut CHECKER: AtomicBool = AtomicBool::new(false);
>> +///
>> +/// struct CallbackData {
>> +/// i: u32,
>> +/// }
>> +///
>> +/// impl CallbackData {
>> +/// fn new() -> Self {
>> +/// Self { i: 9 }
>> +/// }
>> +/// }
>> +///
>> +/// impl DmaFenceCbFunc for CallbackData {
>> +/// fn callback(cb: Pin<KBox<DmaFenceCb<Self>>>) where Self: Sized {
>> +/// assert_eq!(cb.data.i, 9);
>> +/// // SAFETY: Just to have an easy way for testing. This cannot race with the checker
>> +/// // because the fence signalling callbacks are executed synchronously.
>> +/// unsafe { CHECKER.store(true, atomic::Ordering::Relaxed); }
>> +/// }
>> +/// }
>> +///
>> +/// struct DriverData {
>> +/// i: u32,
>> +/// }
>> +///
>> +/// impl DriverData {
>> +/// fn new() -> Self {
>> +/// Self { i: 5 }
>> +/// }
>> +/// }
>> +///
>> +/// let data = DriverData::new();
>> +/// let fctx = DmaFenceCtx::new()?;
>> +///
>> +/// let mut fence = fctx.as_arc_borrow().new_fence(data)?;
>> +///
>> +/// let cb_data = CallbackData::new();
>> +/// fence.register_callback(cb_data);
>> +/// // fence.begin_signalling();
>> +/// fence.signal()?;
>> +/// // Now check wehether the callback was actually executed.
>> +/// // SAFETY: `fence.signal()` above works sequentially. We just check here whether the signalling
>> +/// // actually did set the boolean correctly.
>> +/// unsafe { assert_eq!(CHECKER.load(atomic::Ordering::Relaxed), true); }
>> +///
>> +/// Ok::<(), Error>(())
>> +/// ```
>> +#[pin_data]
>> +pub struct DmaFence<T> {
>> + /// The actual dma_fence passed to C.
>> + #[pin]
>> + inner: Opaque<bindings::dma_fence>,
>> + /// User data.
>> + #[pin]
>> + data: T,
>
> A DmaFence is a cross-device synchronization mechanism that can (and
> will) cross the driver boundary (one driver can wait on a fence emitted
> by a different driver). As such, I don't think embedding a generic T in
> the DmaFence and considering it's the object being passed around is
> going to work, because, how can one driver know the T chosen by the
> driver that created the fence? If you want to have some fence emitter
> data attached to the DmaFence allocation, you'll need two kind of
> objects:
>
> - one that's type agnostic and on which you can do the callback
> registration/unregistration, signalling checks, and generally all
> type-agnostic operations. That's basically just a wrapper around a
> bindings::dma_fence implementing AlwaysRefCounted.
> - one that has the extra data and fctx, with a way to transmute from a
> generic fence to a implementer specific one in case the driver wants
> to do something special when waiting on its own fences (check done
> with the fence ops in C, I don't know how that translates in rust)
If `data` is moved to the end of struct and `DmaFence<T>` changed to
`DmaFence<T: ?Sized>`, you would also gain the ability to coerce `DmaFence<T>`
to `DmaFence<dyn Trait>`, e.g. `DmaFence<dyn Any>`.
Best,
Gary
>
>> + /// Marks whether the fence is currently in the signalling critical section.
>> + signalling: bool,
>> + /// A boolean needed for the C backend's lockdep guard.
>> + signalling_cookie: bool,
>> + /// A reference to the associated [`DmaFenceCtx`] so that it cannot be dropped while there are
>> + /// still fences around.
>> + fctx: Arc<DmaFenceCtx>,
>> +}
^ permalink raw reply [flat|nested] 103+ messages in thread* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-05 13:16 ` Gary Guo
@ 2026-02-06 9:32 ` Philipp Stanner
2026-02-06 10:16 ` Danilo Krummrich
` (2 more replies)
0 siblings, 3 replies; 103+ messages in thread
From: Philipp Stanner @ 2026-02-06 9:32 UTC (permalink / raw)
To: Gary Guo, Boris Brezillon, Philipp Stanner
Cc: David Airlie, Simona Vetter, Danilo Krummrich, Alice Ryhl,
Benno Lossin, Christian König, Daniel Almeida,
Joel Fernandes, linux-kernel, dri-devel, rust-for-linux
On Thu, 2026-02-05 at 13:16 +0000, Gary Guo wrote:
> On Thu Feb 5, 2026 at 10:16 AM GMT, Boris Brezillon wrote:
> > On Tue, 3 Feb 2026 09:14:01 +0100
> > Philipp Stanner <phasta@kernel.org> wrote:
> >
> > >
[…]
> > > +#[pin_data]
> > > +pub struct DmaFence<T> {
> > > + /// The actual dma_fence passed to C.
> > > + #[pin]
> > > + inner: Opaque<bindings::dma_fence>,
> > > + /// User data.
> > > + #[pin]
> > > + data: T,
> >
> > A DmaFence is a cross-device synchronization mechanism that can (and
> > will)
> >
I'm not questioning the truth behind this statement. They are designed
to do that. But is that actually being done, currently? I recently
found that the get_driver_name() callback intended to inform the
consumer of a fence about who actually issued the fence is only ever
used by i915.
Who actually uses that feature? Who needs fences from another driver?
Just out of curiousity
> > cross the driver boundary (one driver can wait on a fence emitted
> > by a different driver). As such, I don't think embedding a generic T in
> > the DmaFence and considering it's the object being passed around is
> > going to work, because, how can one driver know the T chosen by the
> > driver that created the fence? If you want to have some fence emitter
> > data attached to the DmaFence allocation, you'll need two kind of
> > objects:
> >
> > - one that's type agnostic and on which you can do the callback
> > registration/unregistration, signalling checks, and generally all
> > type-agnostic operations. That's basically just a wrapper around a
> > bindings::dma_fence implementing AlwaysRefCounted.
> > - one that has the extra data and fctx, with a way to transmute from a
> > generic fence to a implementer specific one in case the driver wants
> > to do something special when waiting on its own fences (check done
> > with the fence ops in C, I don't know how that translates in rust)
>
> If `data` is moved to the end of struct and `DmaFence<T>` changed to
> `DmaFence<T: ?Sized>`, you would also gain the ability to coerce `DmaFence<T>`
> to `DmaFence<dyn Trait>`, e.g. `DmaFence<dyn Any>`.
I think we should go one step back here and question the general
design.
I only included data: T because it was among the early feedback that
this is how you do it in Rust.
I was never convinced that it's a good idea. Jobqueue doesn't need the
'data' field. Can anyone think of anyone who would need it?
What kind of data would be in there? It seems a driver would store its
equivalent of C's
struct my_fence {
struct dma_fence f;
/* other driver data */
}
which is then accessed in C with container_of.
But that data is only ever needed by that very driver.
My main point here is:
dma_fence's are a synchronization primitive very similar to
completions: informing about that something is done, executing every
registrants callbacks.
They are *not* a data transfer mechanism. It seems very wrong design-
wise to transfer generic data T from one driver to another. That's not
a fence's purpose. Another primitive should be used for that.
If another driver could touch / consume / see / use the emitter's data:
T, that would grossly decouple us from the original dma_fence design.
It would be akin to doing a container_of to consume foreign driver
data.
Like Xe doing a
struct nouveau_fence *f = container_of(generic_fence, …);
Why would that ever be done? Seems totally broken.
So I strongly think that we'd either want to drop data: T, or we should
think about possibilities to hide it from other drivers.
I've got currently no idea how that could be addressed in Rust, though
:)
:(
P.
>
> Best,
> Gary
>
> >
> > > + /// Marks whether the fence is currently in the signalling critical section.
> > > + signalling: bool,
> > > + /// A boolean needed for the C backend's lockdep guard.
> > > + signalling_cookie: bool,
> > > + /// A reference to the associated [`DmaFenceCtx`] so that it cannot be dropped while there are
> > > + /// still fences around.
> > > + fctx: Arc<DmaFenceCtx>,
> > > +}
>
^ permalink raw reply [flat|nested] 103+ messages in thread* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-06 9:32 ` Philipp Stanner
@ 2026-02-06 10:16 ` Danilo Krummrich
2026-02-06 13:24 ` Philipp Stanner
2026-02-06 11:04 ` Boris Brezillon
2026-02-06 11:23 ` Boris Brezillon
2 siblings, 1 reply; 103+ messages in thread
From: Danilo Krummrich @ 2026-02-06 10:16 UTC (permalink / raw)
To: Philipp Stanner
Cc: phasta, Gary Guo, Boris Brezillon, David Airlie, Simona Vetter,
Alice Ryhl, Benno Lossin, Christian König, Daniel Almeida,
Joel Fernandes, linux-kernel, dri-devel, rust-for-linux
On Fri Feb 6, 2026 at 10:32 AM CET, Philipp Stanner wrote:
> Who needs fences from another driver?
When you get VM_BIND and EXEC IOCTLs a driver takes a list of syncobjs the
submitted job should wait for before execution.
The fences of those syncobjs can be from anywhere, including other DRM drivers.
> I think we should go one step back here and question the general
> design.
>
> I only included data: T because it was among the early feedback that
> this is how you do it in Rust.
>
> I was never convinced that it's a good idea. Jobqueue doesn't need the
> 'data' field. Can anyone think of anyone who would need it?
>
> What kind of data would be in there? It seems a driver would store its
> equivalent of C's
>
> struct my_fence {
> struct dma_fence f;
> /* other driver data */
> }
>
> which is then accessed in C with container_of.
Your current struct is exactly this pattern:
struct DmaFence<T> {
inner: Opaque<bindings::dma_fence>,
...
data: T,
}
So, in Rust you can just write DmaFence<MyData> rather than,
struct my_dma_fence {
struct dma_fence inner;
struct my_data data;
}
> But that data is only ever needed by that very driver.
Exactly, this is the "owned" type that is only ever used by this driver.
> They are *not* a data transfer mechanism. It seems very wrong design-
> wise to transfer generic data T from one driver to another. That's not
> a fence's purpose. Another primitive should be used for that.
>
> If another driver could touch / consume / see / use the emitter's data:
> T, that would grossly decouple us from the original dma_fence design.
> It would be akin to doing a container_of to consume foreign driver
> data.
Correct, that's why the suggestion here was to have a second type that is only
struct ForeignDmaFence {
inner: Opaque<bindings::dma_fence>,
...,
/* No data. */
}
i.e. it does not not provide access to the rest of the allocation, since it is
private to the owning driver.
This type should also not have methods like signal(), since only the owner of
the fence should be allowed to signal the fence.
^ permalink raw reply [flat|nested] 103+ messages in thread* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-06 10:16 ` Danilo Krummrich
@ 2026-02-06 13:24 ` Philipp Stanner
0 siblings, 0 replies; 103+ messages in thread
From: Philipp Stanner @ 2026-02-06 13:24 UTC (permalink / raw)
To: Danilo Krummrich
Cc: phasta, Gary Guo, Boris Brezillon, David Airlie, Simona Vetter,
Alice Ryhl, Benno Lossin, Christian König, Daniel Almeida,
Joel Fernandes, linux-kernel, dri-devel, rust-for-linux
On Fri, 2026-02-06 at 11:16 +0100, Danilo Krummrich wrote:
> On Fri Feb 6, 2026 at 10:32 AM CET, Philipp Stanner wrote:
> > Who needs fences from another driver?
>
> When you get VM_BIND and EXEC IOCTLs a driver takes a list of syncobjs the
> submitted job should wait for before execution.
>
> The fences of those syncobjs can be from anywhere, including other DRM drivers.
>
> > I think we should go one step back here and question the general
> > design.
> >
> > I only included data: T because it was among the early feedback that
> > this is how you do it in Rust.
> >
> > I was never convinced that it's a good idea. Jobqueue doesn't need the
> > 'data' field. Can anyone think of anyone who would need it?
> >
> > What kind of data would be in there? It seems a driver would store its
> > equivalent of C's
> >
> > struct my_fence {
> > struct dma_fence f;
> > /* other driver data */
> > }
> >
> > which is then accessed in C with container_of.
>
> Your current struct is exactly this pattern:
>
> struct DmaFence<T> {
> inner: Opaque<bindings::dma_fence>,
> ...
> data: T,
> }
>
> So, in Rust you can just write DmaFence<MyData> rather than,
>
> struct my_dma_fence {
> struct dma_fence inner;
> struct my_data data;
> }
>
> > But that data is only ever needed by that very driver.
>
> Exactly, this is the "owned" type that is only ever used by this driver.
>
> > They are *not* a data transfer mechanism. It seems very wrong design-
> > wise to transfer generic data T from one driver to another. That's not
> > a fence's purpose. Another primitive should be used for that.
> >
> > If another driver could touch / consume / see / use the emitter's data:
> > T, that would grossly decouple us from the original dma_fence design.
> > It would be akin to doing a container_of to consume foreign driver
> > data.
>
> Correct, that's why the suggestion here was to have a second type that is only
>
> struct ForeignDmaFence {
> inner: Opaque<bindings::dma_fence>,
> ...,
> /* No data. */
> }
>
> i.e. it does not not provide access to the rest of the allocation, since it is
> private to the owning driver.
>
> This type should also not have methods like signal(), since only the owner of
> the fence should be allowed to signal the fence.
So to be sure, you envision it like that:
let foreign_fence = ForeignDmaFence::new(normal_dma_fence)?;
foreign_fence.register_callback(my_consequences)?;
?
With a foreign_fence taking another reference to bindings::dma_fence I
suppose.
Which would mean that we would need to accept those foreign fences for
jobqueue methods, too.
And what kind of fence do we imagine should
let done_fence = jq.submit_job(job)?;
be?
P.
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-06 9:32 ` Philipp Stanner
2026-02-06 10:16 ` Danilo Krummrich
@ 2026-02-06 11:04 ` Boris Brezillon
2026-02-09 8:21 ` Philipp Stanner
2026-02-06 11:23 ` Boris Brezillon
2 siblings, 1 reply; 103+ messages in thread
From: Boris Brezillon @ 2026-02-06 11:04 UTC (permalink / raw)
To: Philipp Stanner
Cc: phasta, Gary Guo, David Airlie, Simona Vetter, Danilo Krummrich,
Alice Ryhl, Benno Lossin, Christian König, Daniel Almeida,
Joel Fernandes, linux-kernel, dri-devel, rust-for-linux
On Fri, 06 Feb 2026 10:32:38 +0100
Philipp Stanner <phasta@mailbox.org> wrote:
> On Thu, 2026-02-05 at 13:16 +0000, Gary Guo wrote:
> > On Thu Feb 5, 2026 at 10:16 AM GMT, Boris Brezillon wrote:
> > > On Tue, 3 Feb 2026 09:14:01 +0100
> > > Philipp Stanner <phasta@kernel.org> wrote:
> > >
> > > >
>
> […]
>
> > > > +#[pin_data]
> > > > +pub struct DmaFence<T> {
> > > > + /// The actual dma_fence passed to C.
> > > > + #[pin]
> > > > + inner: Opaque<bindings::dma_fence>,
> > > > + /// User data.
> > > > + #[pin]
> > > > + data: T,
> > >
> > > A DmaFence is a cross-device synchronization mechanism that can (and
> > > will)
> > >
>
> I'm not questioning the truth behind this statement. They are designed
> to do that. But is that actually being done, currently? I recently
> found that the get_driver_name() callback intended to inform the
> consumer of a fence about who actually issued the fence is only ever
> used by i915.
>
> Who actually uses that feature? Who needs fences from another driver?
Display controller (AKA KMS) drivers waiting on fences emitted by a GPU
driver, for instance.
>
> Just out of curiousity
>
>
> > > cross the driver boundary (one driver can wait on a fence emitted
> > > by a different driver). As such, I don't think embedding a generic T in
> > > the DmaFence and considering it's the object being passed around is
> > > going to work, because, how can one driver know the T chosen by the
> > > driver that created the fence? If you want to have some fence emitter
> > > data attached to the DmaFence allocation, you'll need two kind of
> > > objects:
> > >
> > > - one that's type agnostic and on which you can do the callback
> > > registration/unregistration, signalling checks, and generally all
> > > type-agnostic operations. That's basically just a wrapper around a
> > > bindings::dma_fence implementing AlwaysRefCounted.
> > > - one that has the extra data and fctx, with a way to transmute from a
> > > generic fence to a implementer specific one in case the driver wants
> > > to do something special when waiting on its own fences (check done
> > > with the fence ops in C, I don't know how that translates in rust)
> >
> > If `data` is moved to the end of struct and `DmaFence<T>` changed to
> > `DmaFence<T: ?Sized>`, you would also gain the ability to coerce `DmaFence<T>`
> > to `DmaFence<dyn Trait>`, e.g. `DmaFence<dyn Any>`.
>
>
> I think we should go one step back here and question the general
> design.
>
> I only included data: T because it was among the early feedback that
> this is how you do it in Rust.
>
> I was never convinced that it's a good idea. Jobqueue doesn't need the
> 'data' field. Can anyone think of anyone who would need it?
>
> What kind of data would be in there? It seems a driver would store its
> equivalent of C's
>
> struct my_fence {
> struct dma_fence f;
> /* other driver data */
> }
>
> which is then accessed in C with container_of.
>
> But that data is only ever needed by that very driver.
>
>
> My main point here is:
> dma_fence's are a synchronization primitive very similar to
> completions: informing about that something is done, executing every
> registrants callbacks.
>
> They are *not* a data transfer mechanism. It seems very wrong design-
> wise to transfer generic data T from one driver to another. That's not
> a fence's purpose. Another primitive should be used for that.
>
> If another driver could touch / consume / see / use the emitter's data:
> T, that would grossly decouple us from the original dma_fence design.
> It would be akin to doing a container_of to consume foreign driver
> data.
>
> Like Xe doing a
>
> struct nouveau_fence *f = container_of(generic_fence, …);
>
> Why would that ever be done? Seems totally broken.
>
> So I strongly think that we'd either want to drop data: T, or we should
> think about possibilities to hide it from other drivers.
>
> I've got currently no idea how that could be addressed in Rust, though
So, as Danilo explained in his reply, there's two kind of users:
1. those that want to wait on fences (that'd be the JobQueue, for
instance)
2. those that are emitting fences (AKA those implementing the fence_ops
in C)
And each of them should be given different access to the underlying
dma_fence, hence the proposal to have different objects to back
those concepts.
^ permalink raw reply [flat|nested] 103+ messages in thread* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-06 11:04 ` Boris Brezillon
@ 2026-02-09 8:21 ` Philipp Stanner
0 siblings, 0 replies; 103+ messages in thread
From: Philipp Stanner @ 2026-02-09 8:21 UTC (permalink / raw)
To: Boris Brezillon
Cc: phasta, Gary Guo, David Airlie, Simona Vetter, Danilo Krummrich,
Alice Ryhl, Benno Lossin, Christian König, Daniel Almeida,
Joel Fernandes, linux-kernel, dri-devel, rust-for-linux
On Fri, 2026-02-06 at 12:04 +0100, Boris Brezillon wrote:
> On Fri, 06 Feb 2026 10:32:38 +0100
> Philipp Stanner <phasta@mailbox.org> wrote:
>
[…]
> >
> > So I strongly think that we'd either want to drop data: T, or we should
> > think about possibilities to hide it from other drivers.
> >
> > I've got currently no idea how that could be addressed in Rust, though
>
> So, as Danilo explained in his reply, there's two kind of users:
>
> 1. those that want to wait on fences (that'd be the JobQueue, for
> instance)
> 2. those that are emitting fences (AKA those implementing the fence_ops
> in C)
>
> And each of them should be given different access to the underlying
> dma_fence, hence the proposal to have different objects to back
> those concepts.
That makes sense and can be implemented. I can pick it up.
P.
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-06 9:32 ` Philipp Stanner
2026-02-06 10:16 ` Danilo Krummrich
2026-02-06 11:04 ` Boris Brezillon
@ 2026-02-06 11:23 ` Boris Brezillon
2 siblings, 0 replies; 103+ messages in thread
From: Boris Brezillon @ 2026-02-06 11:23 UTC (permalink / raw)
To: Philipp Stanner
Cc: phasta, Gary Guo, David Airlie, Simona Vetter, Danilo Krummrich,
Alice Ryhl, Benno Lossin, Christian König, Daniel Almeida,
Joel Fernandes, linux-kernel, dri-devel, rust-for-linux
On Fri, 06 Feb 2026 10:32:38 +0100
Philipp Stanner <phasta@mailbox.org> wrote:
> On Thu, 2026-02-05 at 13:16 +0000, Gary Guo wrote:
> > On Thu Feb 5, 2026 at 10:16 AM GMT, Boris Brezillon wrote:
> > > On Tue, 3 Feb 2026 09:14:01 +0100
> > > Philipp Stanner <phasta@kernel.org> wrote:
> > >
> > > >
>
> […]
>
> > > > +#[pin_data]
> > > > +pub struct DmaFence<T> {
> > > > + /// The actual dma_fence passed to C.
> > > > + #[pin]
> > > > + inner: Opaque<bindings::dma_fence>,
> > > > + /// User data.
> > > > + #[pin]
> > > > + data: T,
> > >
> > > A DmaFence is a cross-device synchronization mechanism that can (and
> > > will)
> > >
>
> I'm not questioning the truth behind this statement. They are designed
> to do that. But is that actually being done, currently? I recently
> found that the get_driver_name() callback intended to inform the
> consumer of a fence about who actually issued the fence is only ever
> used by i915.
It's also used by the dma-buf layer to expose info about dma-bufs
through debugfs (see dma_{fence,resv}_describe() and
dma_buf_debug_show()), meaning all GPU drivers adding their fences to
the dma_resv of an imported/exported buffer object should expect to
have their ::get_{driver,timeline}_name() implementation called.
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions
2026-02-03 8:14 ` [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions Philipp Stanner
2026-02-05 8:57 ` Boris Brezillon
2026-02-05 10:16 ` Boris Brezillon
@ 2026-02-09 11:30 ` Alice Ryhl
2 siblings, 0 replies; 103+ messages in thread
From: Alice Ryhl @ 2026-02-09 11:30 UTC (permalink / raw)
To: Philipp Stanner
Cc: David Airlie, Simona Vetter, Danilo Krummrich, Gary Guo,
Benno Lossin, Christian König, Boris Brezillon,
Daniel Almeida, Joel Fernandes, linux-kernel, dri-devel,
rust-for-linux
On Tue, Feb 03, 2026 at 09:14:01AM +0100, Philipp Stanner wrote:
> +void rust_helper_dma_fence_get(struct dma_fence *f)
> +{
> + dma_fence_get(f);
> +}
> +
> +void rust_helper_dma_fence_put(struct dma_fence *f)
> +{
> + dma_fence_put(f);
> +}
> +
> +bool rust_helper_dma_fence_begin_signalling(void)
> +{
> + return dma_fence_begin_signalling();
> +}
> +
> +void rust_helper_dma_fence_end_signalling(bool cookie)
> +{
> + dma_fence_end_signalling(cookie);
> +}
> +
> +bool rust_helper_dma_fence_is_signaled(struct dma_fence *f)
> +{
> + return dma_fence_is_signaled(f);
> +}
These should use the __rust_helper #define. See:
https://lore.kernel.org/r/20260105-define-rust-helper-v2-0-51da5f454a67@google.com
> +void rust_helper_spin_lock_init(spinlock_t *lock)
> +{
> + spin_lock_init(lock);
> +}
> [..]
> +#[pin_data]
> +pub struct DmaFenceCtx {
> + /// An opaque spinlock. Only ever passed to the C backend, never used by Rust.
> + #[pin]
> + lock: Opaque<bindings::spinlock_t>,
> [...]
> +}
> +
> +impl DmaFenceCtx {
> + /// Create a new `DmaFenceCtx`.
> + pub fn new() -> Result<Arc<Self>> {
> + let ctx = pin_init!(Self {
> + // Feed in a non-Rust spinlock for now, since the Rust side never needs the lock.
> + lock <- Opaque::ffi_init(|slot: *mut bindings::spinlock| {
> + // SAFETY: `slot` is a valid pointer to an uninitialized `struct spinlock_t`.
> + unsafe { bindings::spin_lock_init(slot) };
> + }),
We already have a __spin_lock_init() helper used by our SpinLock type. Can we
just use that one instead of adding a new one?
But actually I think it's simpler to just use SpinLock<()> as the type here. We
have (or should add) a method to get the `state` field from a SpinLock<()>,
which gets you a raw spinlock_t you can pass to C code.
> +use core::{
> [...]
> + sync::atomic::{AtomicU64, Ordering},
> +};
This should use kernel::sync::atomic instead.
> +use kernel::c_str;
> [...]
> + extern "C" fn get_driver_name(_ptr: *mut bindings::dma_fence) -> *const c_char {
> + c_str!("DRIVER_NAME_UNUSED").as_char_ptr()
> + }
> +
> + extern "C" fn get_timeline_name(_ptr: *mut bindings::dma_fence) -> *const c_char {
> + c_str!("TIMELINE_NAME_UNUSED").as_char_ptr()
> + }
We have c-strings literals now:
extern "C" fn get_driver_name(_ptr: *mut bindings::dma_fence) -> *const c_char {
c"DRIVER_NAME_UNUSED".as_char_ptr()
}
extern "C" fn get_timeline_name(_ptr: *mut bindings::dma_fence) -> *const c_char {
c"TIMELINE_NAME_UNUSED".as_char_ptr()
}
> +pub trait DmaFenceCbFunc {
> + /// The callback function. `cb` is a container of the data which the driver passed in
> + /// [`DmaFence::register_callback`].
> + fn callback(cb: Pin<KBox<DmaFenceCb<Self>>>)
> + where
> + Self: Sized;
> +}
Just make Sized into a super-trait.
pub trait DmaFenceCbFunc: Sized {
/// The callback function. `cb` is a container of the data which the driver passed in
/// [`DmaFence::register_callback`].
fn callback(cb: Pin<KBox<DmaFenceCb<Self>>>);
}
Probably also include 'static next to Sized instead of specifying it on
register_callback().
> +impl<T: DmaFenceCbFunc + 'static> DmaFenceCb<T> {
> + unsafe extern "C" fn callback(
> + _fence_ptr: *mut bindings::dma_fence,
> + cb_ptr: *mut bindings::dma_fence_cb,
> + ) {
> [...]
> + // SAFETY: `cp_ptr` is the heap memory of a Pin<Kbox<Self>> because it was created by
> + // invoking ForeignOwnable::into_foreign() on such an instance.
> + let cb = unsafe { <Pin<KBox<Self>> as ForeignOwnable>::from_foreign(cb_ptr) };
> + }
> +}
> [...]
> + pub fn register_callback<U: DmaFenceCbFunc + 'static>(&self, data: impl PinInit<U>) -> Result {
> + let cb = DmaFenceCb::new(data)?;
> + let ptr = cb.into_foreign() as *mut DmaFenceCb<U>;
The ForeignOwnable trait provides no guarantees about where the void pointer
points. The only legal usage of such a void pointer is to pass it to
from_foreign() and borrow() and other similar methods defined on the
ForeignOwnable trait. Casting it to DmaFenceCb<U> and dereferencing it is
illegal because it might point elsewhere in the box than at the DmaFenceCb<U>
value. (Yes for Box it happens to point there, but for e.g. Arc it points at the
refcount_t value instead.)
Please replace this usage of ForeignOwnable with Box::into_raw() /
Box::from_raw() calls, or use ForeignOwnable::borrow[_mut]() to access the
value.
> + pub fn register_callback<U: DmaFenceCbFunc + 'static>(&self, data: impl PinInit<U>) -> Result {
> + let cb = DmaFenceCb::new(data)?;
> + let ptr = cb.into_foreign() as *mut DmaFenceCb<U>;
> + // SAFETY: `ptr` was created validly directly above.
> + let inner_cb = unsafe { (*ptr).inner.get() };
> +
> + // SAFETY: `self.as_raw()` is valid because `self` is refcounted, `inner_cb` was created
> + // validly above and was turned into a ForeignOwnable, so it won't be dropped. `callback`
> + // has static life time.
> + let ret = unsafe {
> + bindings::dma_fence_add_callback(
> + self.as_raw(),
> + inner_cb,
> + Some(DmaFenceCb::<U>::callback),
> + )
> + };
> + if ret != 0 {
> + return Err(Error::from_errno(ret));
> + }
> + Ok(())
On error, this function leaks the DmaFenceCb allocation. It should be converted
back to a Box so that the destructor may run.
drop(unsafe { Box::from_raw(ptr) });
// or perhaps:
drop(unsafe { DmaFenceCb::from_raw(...) });
Also this should use to_result().
> +impl<T: DmaFenceCbFunc + 'static> DmaFenceCb<T> {
> + fn new(data: impl PinInit<T>) -> Result<Pin<KBox<Self>>> {
> [...]
> + KBox::pin_init(cb, GFP_KERNEL)
> [...]
> +impl DmaFenceCtx {
> + /// Create a new `DmaFenceCtx`.
> + pub fn new() -> Result<Arc<Self>> {
> [...]
> + Arc::pin_init(ctx, GFP_KERNEL)
Shouldn't the gfp flags be provided by the caller instead of hard-coding
GFP_KERNEL here?
> +unsafe impl<T> AlwaysRefCounted for DmaFence<T> {
> + /// # Safety
> + ///
> + /// `ptr`must be a valid pointer to a [`DmaFence`].
> + unsafe fn dec_ref(ptr: NonNull<Self>) {
> + // SAFETY: `ptr` is never a NULL pointer; and when `dec_ref()` is called
> + // the fence is by definition still valid.
> + let fence = unsafe { (*ptr.as_ptr()).inner.get() };
> +
> + // SAFETY: Valid because `fence` was created validly above.
> + unsafe { bindings::dma_fence_put(fence) }
The safety requirements of `dec_ref()` as described here are incomplete. The
caller must also give up ownership of one refcount to the value before they may
call this method. But you may simply delete that section because this is a trait
implementation, and the safety requirements are inherited from the declaration
of AlwaysRefCounted.
The safety comment on `let fence` is also not quite right. I don't think it's
useful to talk about NULL because you require something stronger than NULL for
this operation - for example `0xDEADBEEF as *mut DmaFence<T>` is not NULL but
would also be illegal here.
A better wording would be to say that by the safety requirements, the caller
passes a valid pointer to a `DmaFence<T>`.
And the safety comment on dma_fence_put() is also incomplete. The caller must
pass ownership of one refcount, so the safety comment should mention why we can
pass ownership of a refcount here (it is because caller must pass ownership of a
refcount to us by the safety requirments).
> + /// Mark the beginning of a DmaFence signalling critical section. Should be called once a fence
> + /// gets published.
> + ///
> + /// The signalling critical section is marked as finished automatically once the fence signals.
> + pub fn begin_signalling(&mut self) {
I doubt it's legal to have a `&mut DmaFence<T>` because I could call
`mem::swap()` with two of them, which likely invalidates stuff inside `inner`.
> + const fn ops_create() -> bindings::dma_fence_ops {
> + // SAFETY: Zeroing out memory on the stack is always safe.
> + let mut ops: bindings::dma_fence_ops = unsafe { core::mem::zeroed() };
No it's not always safe. If I have a local variable of type reference,
then it's not safe to zero that value because NULL is not a legal value
for references. The reason this is ok is because all fields of
dma_fence_ops are values that are nullable.
This should probably just use the safe pin_init::zeroed() function.
> +impl<T> DmaFence<T> {
> + /// Create an initializer for a new [`DmaFence`].
> + fn new(
> + fctx: Arc<DmaFenceCtx>,
> + data: impl PinInit<T>, // TODO: The driver data should implement PinInit<T, Error>
> + lock: &Opaque<bindings::spinlock_t>,
> + context: u64,
> + seqno: u64,
> + ) -> Result<ARef<Self>> {
This function should be unsafe. There are clearly some safety requirements here.
For example, I suspect it's required that `lock` does not get freed before the
returned value?
Making this function unsafe reveals that DmaFenceCtx::new_fence is missing a
safety requirement explaining why self.lock is alive for long enough.
Alice
^ permalink raw reply [flat|nested] 103+ messages in thread
* [RFC PATCH 3/4] rust/drm: Add DRM Jobqueue
2026-02-03 8:13 [RFC PATCH 0/4] Add dma_fence abstractions and DRM Jobqueue Philipp Stanner
2026-02-03 8:14 ` [RFC PATCH 1/4] rust: list: Add unsafe for container_of Philipp Stanner
2026-02-03 8:14 ` [RFC PATCH 2/4] rust: sync: Add dma_fence abstractions Philipp Stanner
@ 2026-02-03 8:14 ` Philipp Stanner
2026-02-10 14:57 ` Boris Brezillon
2026-02-03 8:14 ` [RFC PATCH 4/4] samples: rust: Add jobqueue tester Philipp Stanner
2026-02-03 16:46 ` [RFC PATCH 0/4] Add dma_fence abstractions and DRM Jobqueue Daniel Almeida
4 siblings, 1 reply; 103+ messages in thread
From: Philipp Stanner @ 2026-02-03 8:14 UTC (permalink / raw)
To: David Airlie, Simona Vetter, Danilo Krummrich, Alice Ryhl,
Gary Guo, Benno Lossin, Christian König, Boris Brezillon,
Daniel Almeida, Joel Fernandes
Cc: linux-kernel, dri-devel, rust-for-linux, Philipp Stanner
DRM jobqueue is a load balancer, dependency manager and timeout handler
for GPU drivers with firmware scheduling, i.e. drivers which spawn one
firmware ring for each userspace instance for running jobs on the hardware.
This patch provides:
- Jobs which the user can create and load with custom data.
- Functionality to register dependencies (DmaFence's) on jobs.
- The actual Jobqueue, into which you can push jobs.
Jobqueue submits jobs to your driver through a provided driver callback.
It always submits jobs in order. It only submits jobs whose dependencies
have all been signalled.
Additionally, Jobqueue implements a credit count system so it can take
your hardware's queue depth into account. When creating a Jobqueue, you
provide the number of credits that are available for that queue. Each
job you submit has a specified credit cost which will be subtracted from
the Jobqueue's capacity.
If the Jobqueue runs out of capacity, it will still accept more jobs and
run those once more capacity becomes available through finishing jobs.
This code compiles and was tested and is judget to be ready for beta
testers. However, the code is still plastered with TODOs.
Still missing features are:
- Timeout handling
- Complete decoupling from DmaFences. Jobqueue shall in the future
completely detach itself from all related DmaFence's. This is
currently incomplete. While data-UAF should be impossible, code-UAF
through DmaFence's could occur if the Jobqueue code were unloaded
while unsignaled fences are still alive.
Signed-off-by: Philipp Stanner <phasta@kernel.org>
---
rust/kernel/drm/jq.rs | 680 +++++++++++++++++++++++++++++++++++++++++
rust/kernel/drm/mod.rs | 2 +
2 files changed, 682 insertions(+)
create mode 100644 rust/kernel/drm/jq.rs
diff --git a/rust/kernel/drm/jq.rs b/rust/kernel/drm/jq.rs
new file mode 100644
index 000000000000..fd5641f40a61
--- /dev/null
+++ b/rust/kernel/drm/jq.rs
@@ -0,0 +1,680 @@
+// SPDX-License-Identifier: GPL-2.0
+//
+// Copyright (C) 2025, 2026 Red Hat Inc.:
+// - Philipp Stanner <pstanner@redhat.com>
+
+//! DrmJobqueue. A load balancer, dependency manager and timeout handler for
+//! GPU job submissions.
+
+use crate::{prelude::*, types::ARef};
+use core::sync::atomic::{AtomicU32, Ordering};
+use kernel::list::*;
+use kernel::revocable::Revocable;
+use kernel::sync::{
+ new_spinlock, Arc, DmaFence, DmaFenceCb, DmaFenceCbFunc, DmaFenceCtx, SpinLock,
+};
+use kernel::workqueue::{self, impl_has_work, new_work, Work, WorkItem};
+
+#[pin_data]
+struct Dependency {
+ #[pin]
+ links: ListLinks,
+ fence: ARef<DmaFence<i32>>,
+}
+
+impl Dependency {
+ fn new(fence: ARef<DmaFence<i32>>) -> Result<ListArc<Self>> {
+ ListArc::pin_init(
+ try_pin_init!(Self {
+ links <- ListLinks::new(),
+ fence,
+ }),
+ GFP_KERNEL,
+ )
+ }
+}
+
+impl_list_arc_safe! {
+ impl ListArcSafe<0> for Dependency { untracked; }
+}
+impl_list_item! {
+ impl ListItem<0> for Dependency { using ListLinks { self.links }; }
+}
+// Callback item for the dependency fences to wake / progress the jobqueue.
+struct DependencyWaker<T: 'static + Send> {
+ jobq: Arc<Revocable<SpinLock<InnerJobqueue<T>>>>,
+ // Scary raw pointer! See justification at the unsafe block below.
+ //
+ // What would be the alternatives to the rawpointer? I can see two:
+ // 1. Refcount the jobs and have the dependency callbacks take a reference.
+ // That would require then, however, to guard the jobs with a SpinLock.
+ // That SpinLock would just exist, however, to satisfy the Rust compiler.
+ // From a kernel-engineering perspective, that would be undesirable,
+ // because the only thing within a job that might be accessed by multiple
+ // CPUs in parallel is `Job::nr_of_deps`. It's certainly conceivable
+ // that some userspace applications with a great many dependencies would
+ // then suffer from lock contention, just to modify an integer.
+ // 2. Clever Hackyness just to avoid an unsafe that's provably correct:
+ // We could replace this rawpointer with a Arc<AtomicU32>, the Job<T>
+ // holding another reference. Would work. But is that worth it?
+ // Share your opinion on-list :)
+ job: *const Job<T>,
+}
+
+impl<T: 'static + Send> DependencyWaker<T> {
+ fn new(jobq: Arc<Revocable<SpinLock<InnerJobqueue<T>>>>, job: *const Job<T>) -> Self {
+ Self { jobq, job }
+ }
+}
+
+impl<T: 'static + Send> DmaFenceCbFunc for DependencyWaker<T> {
+ fn callback(cb: Pin<KBox<DmaFenceCb<Self>>>)
+ where
+ Self: Sized,
+ {
+ let jq_guard = cb.data.jobq.try_access();
+ if jq_guard.is_none() {
+ return;
+ }
+ let outer_jq = jq_guard.unwrap();
+
+ // SAFETY:
+ // `job` is only needed to modify the dependency counter within the job.
+ // That counter is atomic, so concurrent modifications are safe.
+ //
+ // As for the life time: Jobs that have pending dependencies are held by
+ // `InnerJobqueue::waiting_jobs`. As long as any of these dependency
+ // callbacks here are active, a job can by definition not move to the
+ // `InnerJobqueue::running_jobs` list and can, thus, not be freed.
+ //
+ // In case `Jobqueue` drops, the revocable-check above will guard against
+ // UAF. Moreover, jobqueue will deregister all of those dma_fence
+ // callbacks and thereby cleanly decouple itself. The dma_fences that
+ // these callbacks are registered on can, after all, outlive the jobqueue.
+ let job: &Job<T> = unsafe { &*cb.data.job };
+
+ let old_nr_of_deps = job.nr_of_deps.fetch_sub(1, Ordering::Relaxed);
+ // If counter == 0, a new job somewhere in the queue just got ready.
+ // Run all ready jobs.
+ if old_nr_of_deps == 1 {
+ let mut jq = outer_jq.lock();
+ jq.check_start_submit_worker(cb.data.jobq.clone());
+ }
+
+ // TODO remove the Dependency from the job's dep list, so that when
+ // `Jobqueue` gets dropped it won't try to deregister callbacks for
+ // already-signalled fences.
+ }
+}
+
+/// A jobqueue Job.
+///
+/// You can stuff your data in it. The job will be borrowed back to your driver
+/// once the time has come to run it.
+///
+/// Jobs are consumed by [`Jobqueue::submit_job`] by value (ownership transfer).
+/// You can set multiple [`DmaFence`] as dependencies for a job. It will only
+/// get run once all dependency fences have been signaled.
+///
+/// Jobs cost credits. Jobs will only be run if there are is enough capacity in
+/// the jobqueue for the job's credits. It is legal to specify jobs costing 0
+/// credits, effectively disabling that mechanism.
+#[pin_data]
+pub struct Job<T: 'static + Send> {
+ cost: u32,
+ #[pin]
+ pub data: T,
+ done_fence: Option<ARef<DmaFence<i32>>>,
+ hardware_fence: Option<ARef<DmaFence<i32>>>,
+ nr_of_deps: AtomicU32,
+ dependencies: List<Dependency>,
+}
+
+impl<T: 'static + Send> Job<T> {
+ /// Create a new job that can be submitted to [`Jobqueue`].
+ ///
+ /// Jobs contain driver data that will later be made available to the driver's
+ /// run_job() callback in which the job gets pushed to the GPU.
+ pub fn new(cost: u32, data: impl PinInit<T>) -> Result<Pin<KBox<Self>>> {
+ let job = pin_init!(Self {
+ cost,
+ data <- data,
+ done_fence: None,
+ hardware_fence: None,
+ nr_of_deps: AtomicU32::new(0),
+ dependencies <- List::<Dependency>::new(),
+ });
+
+ KBox::pin_init(job, GFP_KERNEL)
+ }
+
+ /// Add a callback to the job. When the job gets submitted, all added callbacks will be
+ /// registered on the [`DmaFence`] the jobqueue returns for that job.
+ // TODO is callback a good name? We could call it "consequences" for example.
+ pub fn add_callback() -> Result {
+ Ok(())
+ }
+
+ /// Add a [`DmaFence`] or a [`DoneFence`] as this job's dependency. The job
+ /// will only be executed after that dependency has been finished.
+ pub fn add_dependency(&mut self, fence: ARef<DmaFence<i32>>) -> Result {
+ let dependency = Dependency::new(fence)?;
+
+ self.dependencies.push_back(dependency);
+ self.nr_of_deps.fetch_add(1, Ordering::Relaxed);
+
+ Ok(())
+ }
+
+ /// Check if there are dependencies for this job. Register the jobqueue
+ /// waker if yes.
+ fn arm_deps(&mut self, jobq: Arc<Revocable<SpinLock<InnerJobqueue<T>>>>) {
+ let job_ptr = &raw const *self;
+ let mut cursor = self.dependencies.cursor_front();
+
+ while let Some(dep) = cursor.peek_next() {
+ let waker = DependencyWaker::new(jobq.clone(), job_ptr);
+ if dep.fence.register_callback(waker).is_err() {
+ // TODO precise error check
+ // The fence raced or was already signaled. But the hardware_fence
+ // waker is not yet registered. Thus, it's OK to just decrement
+ // the dependency count.
+ self.nr_of_deps.fetch_sub(1, Ordering::Relaxed);
+ // TODO this dependency must be removed from the list so that
+ // `Jobqueue::drop()` doesn't try to deregister the callback.
+ }
+
+ cursor.move_next();
+ }
+ }
+}
+
+#[pin_data]
+struct JobWrap<T: 'static + Send> {
+ #[pin]
+ links: ListLinks,
+ inner: Pin<KBox<Job<T>>>,
+}
+
+impl<T: 'static + Send> JobWrap<T> {
+ fn new(job: Pin<KBox<Job<T>>>) -> Result<ListArc<Self>> {
+ ListArc::pin_init(
+ try_pin_init!(Self {
+ links <- ListLinks::new(),
+ inner: job,
+ }),
+ GFP_KERNEL,
+ )
+ }
+}
+
+impl_list_arc_safe! {
+ impl{T: Send} ListArcSafe<0> for JobWrap<T> { untracked; }
+}
+impl_list_item! {
+ impl{T: Send} ListItem<0> for JobWrap<T> { using ListLinks { self.links }; }
+}
+
+struct InnerJobqueue<T: 'static + Send> {
+ capacity: u32,
+ waiting_jobs: List<JobWrap<T>>,
+ running_jobs: List<JobWrap<T>>,
+ submit_worker_active: bool,
+ run_job: fn(&Pin<&mut Job<T>>) -> ARef<DmaFence<i32>>,
+}
+
+// SAFETY: We use `List` with effectively a `UniqueArc`, so it can be `Send` when elements are `Send`.
+unsafe impl<T: 'static + Send> Send for InnerJobqueue<T> {}
+
+impl<T: 'static + Send> InnerJobqueue<T> {
+ fn new(capacity: u32, run_job: fn(&Pin<&mut Job<T>>) -> ARef<DmaFence<i32>>) -> Self {
+ let waiting_jobs = List::<JobWrap<T>>::new();
+ let running_jobs = List::<JobWrap<T>>::new();
+
+ Self {
+ capacity,
+ waiting_jobs,
+ running_jobs,
+ submit_worker_active: false,
+ run_job,
+ }
+ }
+
+ fn has_waiting_jobs(&self) -> bool {
+ !self.waiting_jobs.is_empty()
+ }
+
+ fn has_capacity_left(&self, cost: u32) -> bool {
+ let cost = cost as i64;
+ let capacity = self.capacity as i64;
+
+ if capacity - cost >= 0 {
+ return true;
+ }
+
+ false
+ }
+
+ fn check_start_submit_worker(&mut self, outer: Arc<Revocable<SpinLock<Self>>>) {
+ if self.submit_worker_active {
+ return;
+ }
+ self.submit_worker_active = true;
+
+ // TODO the work item should likely be moved into the JQ struct, since
+ // only ever 1 worker needs to run at a time. But if we do it that way,
+ // how can we store a reference to the JQ? We obviously can't store it
+ // in the JQ itself because circular dependency -> memory leak.
+ let submit_work = SubmitWorker::new(outer).unwrap(); // TODO error
+ let _ = workqueue::system().enqueue(submit_work); // TODO error
+ }
+}
+
+// Callback item for the hardware fences to wake / progress the jobqueue.
+struct HwFenceWaker<T: 'static + Send> {
+ jobq: Arc<Revocable<SpinLock<InnerJobqueue<T>>>>,
+ // Another scary raw pointer!
+ // This one is necessary so that a) a job can be removed from `InnerJobqueue::running_jobs`,
+ // and b) its done_fence be accessed and signaled.
+ //
+ // What would be the alternatives to this rawpointer? Two come to mind:
+ // 1. Refcount the job. Then the job would have to be locked to satisfy Rust.
+ // Locking it is not necessary, however. See the below safety comment
+ // for details.
+ // 2. Clever hacky tricks: We could assign a unique ID per job and store it
+ // in this callback. Then, we could find the associated job via iterating
+ // over `jobq.running_jobs`. So to access a job and signal its done_fence,
+ // we'd have to do a list iteration, which is undesirable performance-wise.
+ // Moreover, the unique ID parent would have to be stored in `Jobqueue`,
+ // requiring us to generate jobs on the jobqueue object.
+ job: *const JobWrap<T>,
+}
+
+impl<T: 'static + Send> HwFenceWaker<T> {
+ fn new(jobq: Arc<Revocable<SpinLock<InnerJobqueue<T>>>>, job: *const JobWrap<T>) -> Self {
+ Self { jobq, job }
+ }
+}
+
+impl<T: 'static + Send> DmaFenceCbFunc for HwFenceWaker<T> {
+ fn callback(cb: Pin<KBox<DmaFenceCb<Self>>>)
+ where
+ Self: Sized,
+ {
+ // This prevents against deadlock. See Jobqueue's drop() for details.
+ let jq_guard = cb.data.jobq.try_access();
+ if jq_guard.is_none() {
+ // The JQ itself will signal all done_fences with an error when it drops.
+ return;
+ }
+ let jq_guard = jq_guard.unwrap();
+
+ let mut jobq = jq_guard.lock();
+
+ // SAFETY:
+ // We need the job to remove it from `InnerJobqueue::running_jobs` and to
+ // access its done_fence. There is always only one hardware_fence waker
+ // callback per job. It's the only party which will remove the job from
+ // the running_jobs list. This callback only exists once all Dependency
+ // callbacks have already ran. As for the done_fence, the DmaFence
+ // implementation guarantees synchronization and correctness. Thus,
+ // unlocked access is safe.
+ //
+ // As for the life time: Only when this callback here has ran will a job
+ // be removed from the running_jobs list and, thus, be dropped.
+ // `InnerJobqueue`, which owns running_jobs, can only drop once
+ // `Jobqueue` got dropped. The latter will deregister all hardware fence
+ // callbacks while dropping, thereby preventing UAF through dma_fence
+ // callbacks on jobs.
+ let job: &JobWrap<T> = unsafe { &*cb.data.job };
+
+ jobq.capacity += job.inner.cost;
+ let _ = job.inner.done_fence.as_ref().expect("done_fence not present").signal(); // TODO err
+
+ // SAFETY: This callback function gets registered only once per job,
+ // and the registering party (`run_all_ready_jobs()`) adds the job to
+ // the list.
+ //
+ // This is the only reference (incl. refcount) to this job. Thus, it
+ // may be removed only after all accesses above have been performed.
+ unsafe { jobq.running_jobs.remove(job) };
+
+ // Run more ready jobs if there's capacity.
+ jobq.check_start_submit_worker(cb.data.jobq.clone());
+ }
+}
+
+/// Push a job immediately.
+///
+/// Returns true if the hardware_fence raced, false otherwise.
+fn run_job<T: 'static + Send>(
+ driver_cb: fn(&Pin<&mut Job<T>>) -> ARef<DmaFence<i32>>,
+ waker: HwFenceWaker<T>,
+ job: Pin<&mut Job<T>>,
+) -> bool {
+ let hardware_fence = driver_cb(&job);
+
+ // If a GPU is very fast (or is processing jobs synchronously or sth.) it
+ // could be that the hw_fence is already signaled. In case that happens, we
+ // signal the done_fence for userspace & Co. immediately.
+
+ // TODO catch for exact error (currently backend only ever errors if it raced.
+ // But still, robustness, you know.
+ if hardware_fence.register_callback(waker).is_err() {
+ // TODO: Print into log in case of error.
+ let _ = job.done_fence.as_ref().expect("done_fence not present").signal();
+ return true;
+ }
+
+ *job.project().hardware_fence = Some(hardware_fence);
+
+ false
+}
+
+// Submits all ready jobs as long as there's capacity.
+fn run_all_ready_jobs<T: 'static + Send>(
+ jobq: &mut InnerJobqueue<T>,
+ outer_jobq: Arc<Revocable<SpinLock<InnerJobqueue<T>>>>,
+ driver_cb: fn(&Pin<&mut Job<T>>) -> ARef<DmaFence<i32>>,
+) {
+ let mut cursor = jobq.waiting_jobs.cursor_front();
+
+ while let Some(job) = cursor.peek_next() {
+ if job.inner.nr_of_deps.load(Ordering::Relaxed) > 0 {
+ return;
+ }
+
+ let cost = job.inner.cost as i64;
+ if jobq.capacity as i64 - cost < 0 {
+ return;
+ }
+
+ let runnable_job = job.remove();
+ // To obtain a mutable reference to the list element, we need to cast
+ // into a UniqueArc. unwrap() cannot fire because by the jobqueue design
+ // a job is only ever in the waiting_jobs OR running_jobs list.
+ let mut unique_job = Arc::<JobWrap<T>>::into_unique_or_drop(runnable_job.into_arc()).unwrap();
+ let job_ptr: *const JobWrap<T> = &raw const *unique_job;
+
+ let runnable_inner_job /* &mut Pin<KBox<Job<T>>> */ = unique_job.as_mut().project().inner;
+
+ let hw_fence_waker = HwFenceWaker::new(outer_jobq.clone(), job_ptr);
+ if !run_job(driver_cb, hw_fence_waker, runnable_inner_job.as_mut()) {
+ // run_job() didn't run the job immediately (because the
+ // hw_fence did not race). Subtract the credits.
+ jobq.capacity -= cost as u32;
+ }
+
+ // We gave up our ownership above. And we couldn't clone the Arc, because
+ // we needed a UniqueArc for the mutable access. So turn it back now.
+ let running_job = ListArc::from(unique_job);
+ jobq.running_jobs.push_back(running_job);
+ }
+}
+
+#[pin_data]
+struct SubmitWorker<T: 'static + Send> {
+ jobq: Arc<Revocable<SpinLock<InnerJobqueue<T>>>>,
+ #[pin]
+ work: Work<SubmitWorker<T>>,
+}
+
+impl<T: Send> SubmitWorker<T> {
+ fn new(
+ jobq: Arc<Revocable<SpinLock<InnerJobqueue<T>>>>,
+ ) -> Result<Arc<Self>> {
+ Arc::pin_init(
+ pin_init!(Self {
+ jobq,
+ work <- new_work!("Jobqueue::SubmitWorker")}),
+ GFP_KERNEL,
+ )
+ }
+}
+
+impl_has_work! {
+ impl{T: Send} HasWork<Self> for SubmitWorker<T> { self.work }
+}
+
+impl<T: Send> WorkItem for SubmitWorker<T> {
+ type Pointer = Arc<SubmitWorker<T>>;
+
+ fn run(this: Arc<SubmitWorker<T>>) {
+ let outer_jobq_copy = this.jobq.clone();
+
+ let guard = this.jobq.try_access();
+ if guard.is_none() {
+ // Can never happen. JQ gets only revoked when it drops, and we hold
+ // a reference.
+ return;
+ }
+ let jobq = guard.unwrap();
+
+ let mut jobq = jobq.lock();
+ let run_job = jobq.run_job;
+
+ run_all_ready_jobs(&mut jobq, outer_jobq_copy, run_job);
+ jobq.submit_worker_active = false;
+ }
+}
+
+/// A job load balancer, dependency manager and timeout handler for GPUs.
+///
+/// The JQ allows you to submit [`Job`]s. It will run all jobs whose dependecny
+/// fences have been signalled, as long as there's capacity. Running jobs happens
+/// by borrowing them back to your driver's run_job callback.
+///
+/// # Examples
+///
+/// ```
+/// use kernel::sync::{DmaFenceCtx, DmaFence, Arc};
+/// use kernel::drm::jq::{Job, Jobqueue};
+/// use kernel::types::{ARef};
+/// use kernel::time::{delay::fsleep, Delta};
+///
+/// let fctx = DmaFenceCtx::new()?;
+///
+/// fn run_job(job: &Pin<&mut Job<Arc<DmaFenceCtx>>>) -> ARef<DmaFence<i32>> {
+/// let fence = job.data.as_arc_borrow().new_fence(42 as i32).unwrap();
+///
+/// // Our GPU is so damn fast that it executes each job immediately!
+/// fence.signal();
+/// fence
+/// }
+///
+/// let jq1 = Jobqueue::new(1_000_000, run_job)?;
+/// let jq2 = Jobqueue::new(1_000_000, run_job)?;
+///
+/// let job1 = Job::new(1, fctx.clone())?;
+/// let job2 = Job::new(1, fctx.clone())?;
+///
+///
+/// // Test normal submission of jobs without dependencies.
+/// let fence1 = jq1.submit_job(job1)?;
+/// let fence2 = jq1.submit_job(job2)?;
+///
+/// fsleep(Delta::from_secs(1));
+/// assert_eq!(fence1.is_signaled(), true);
+/// assert_eq!(fence2.is_signaled(), true);
+///
+///
+/// // Test whether a job with a fullfilled dependency gets executed.
+/// let mut job3 = Job::new(1, fctx.clone())?;
+/// job3.add_dependency(fence1)?;
+///
+/// let fence3 = jq2.submit_job(job3)?;
+/// fsleep(Delta::from_secs(1));
+/// assert_eq!(fence3.is_signaled(), true);
+///
+///
+/// // Test whether a job with an unfullfilled dependency does not get executed.
+/// let unsignaled_fence = fctx.as_arc_borrow().new_fence(9001 as i32)?;
+///
+/// let mut job4 = Job::new(1, fctx.clone())?;
+/// job4.add_dependency(unsignaled_fence.clone())?;
+///
+/// let blocked_job_fence = jq2.submit_job(job4)?;
+/// fsleep(Delta::from_secs(1));
+/// assert_eq!(blocked_job_fence.is_signaled(), false);
+///
+///
+/// // Test whether job4 from above actually gets executed once its dep is met.
+/// unsignaled_fence.signal()?;
+/// fsleep(Delta::from_secs(1));
+/// assert_eq!(blocked_job_fence.is_signaled(), true);
+///
+/// Ok::<(), Error>(())
+/// ```
+pub struct Jobqueue<T: 'static + Send> {
+ inner: Arc<Revocable<SpinLock<InnerJobqueue<T>>>>,
+ fctx: Arc<DmaFenceCtx>, // TODO currently has a separate lock shared with fences
+ run_job: fn(&Pin<&mut Job<T>>) -> ARef<DmaFence<i32>>,
+}
+
+impl<T: 'static + Send> Jobqueue<T> {
+ /// Create a new [`Jobqueue`] with `capacity` space for jobs. `run_job` is
+ /// your driver's callback which the jobqueue will call to push a submitted
+ /// job to the hardware.
+ ///
+ /// If you don't want to use the capacity mechanism, set it to a value
+ /// unequal 0 and instead set [`Job`]'s cost to 0.
+ pub fn new(
+ capacity: u32,
+ run_job: fn(&Pin<&mut Job<T>>) -> ARef<DmaFence<i32>>,
+ ) -> Result<Self> {
+ if capacity == 0 {
+ return Err(EINVAL);
+ }
+
+ let inner = Arc::pin_init(
+ Revocable::new(new_spinlock!(InnerJobqueue::<T>::new(capacity, run_job))),
+ GFP_KERNEL,
+ )?;
+ let fctx = DmaFenceCtx::new()?;
+
+ Ok(Self {
+ inner,
+ fctx,
+ run_job,
+ })
+ }
+
+ /// Submit a job to the jobqueue.
+ ///
+ /// The jobqueue takes ownership over the job and later passes it back to the
+ /// driver by reference through the driver's run_job callback. Jobs are
+ /// passed back by reference instead of by value partially to allow for later
+ /// adding a job resubmission mechanism to be added to [`Jobqueue`].
+ ///
+ /// Jobs get run and their done_fences get signalled in submission order.
+ ///
+ /// Returns the "done_fence" on success, which gets signalled once the
+ /// hardware has completed the job and once the jobqueue is done with a job.
+ // TODO: Return a DmaFence-wrapper that users cannot signal.
+ pub fn submit_job(&self, mut job: Pin<KBox<Job<T>>>) -> Result<ARef<DmaFence<i32>>> {
+ let job_cost = job.cost;
+ // TODO: It would be nice if the done_fence's seqno actually matches the
+ // submission order. To do that, however, we'd need to protect job
+ // creation with InnerJobqueue's spinlock. Is that worth it?
+ let done_fence = self.fctx.as_arc_borrow().new_fence(42 as i32)?;
+ *job.as_mut().project().done_fence = Some(done_fence.clone());
+
+ // TODO register job's callbacks on done_fence.
+
+ let guard = self.inner.try_access();
+ if guard.is_none() {
+ // Can never happen. JQ gets only revoked when it drops.
+ return Err(ENODEV);
+ }
+ let jobq = guard.unwrap();
+
+ let mut jobq = jobq.lock();
+
+ let had_waiting_jobs_already = jobq.has_waiting_jobs();
+
+ // Check if there are dependencies and, if yes, register rewake
+ // callbacks on their fences. Must be done under the JQ lock's protection
+ // since the callbacks will access JQ data.
+ // SAFETY: `job` was submitted perfectly validly above. We don't move
+ // the contents; arm_deps() merely iterates over the dependency-list.
+ // TODO: Supposedely this unsafe is unnecessary if you do some magic.
+ let pure_job = unsafe { Pin::into_inner_unchecked(job.as_mut()) };
+ pure_job.arm_deps(self.inner.clone());
+
+ let wrapped_job = JobWrap::new(job)?;
+ jobq.waiting_jobs.push_back(wrapped_job);
+
+ if had_waiting_jobs_already {
+ // Jobs waiting means that there is either currently no capacity
+ // for more jobs, or the jobqueue is blocked by a job with
+ // unfullfilled dependencies. Either the hardware fences' callbacks
+ // or those of the dependency fences will pull in more jobs once
+ // the conditions are met.
+ return Ok(done_fence);
+ } else if jobq.has_capacity_left(job_cost) {
+ // This is the first waiting job. Wake the submit_worker if necessary.
+ jobq.check_start_submit_worker(self.inner.clone());
+ }
+
+ // If the conditions for running now were not met, the callbacks registered
+ // on the already running jobs' hardware fences will check if there's space
+ // for the next job, guaranteeing progress.
+ //
+ // If no jobs were running, there was by definition still space and the
+ // job will get pushed by the worker.
+ //
+ // If a job couldn't be pushed because there were unfinished dependencies,
+ // then the hardware fences' callbacks mentioned above will detect that
+ // and not yet push the job.
+ //
+ // Each dependency's fence has its own callback which checks:
+ // a) whether all other callbacks are fullfilled and if yes:
+ // b) whether there are now enough credits available.
+ //
+ // If a) and b) are fullfilled, the job gets pushed.
+ //
+ // If there are no jobs currently running, credits must be available by
+ // definition.
+
+ Ok(done_fence)
+ }
+}
+
+impl<T: 'static + Send> Drop for Jobqueue<T> {
+ fn drop(&mut self) {
+ // The hardware and dependency fences might outlive the jobqueue.
+ // So fence callbacks could very well still call into job queue code,
+ // resulting in data UAF or, should the jobqueue code be unloaded,
+ // even code UAF.
+ //
+ // Thus, the jobqueue needs to be cleanly decoupled from those fences
+ // when it drops; in other words, it needs to deregister all its
+ // fence callbacks.
+ //
+ // This, however, could easily deadlock when a hw_fence signals:
+ //
+ // Step | Jobqueue step | hw_fence step
+ // ------------------------------------------------------------------
+ // 1 | JQ starts drop | fence signals
+ // 2 | JQ lock taken | fence lock taken
+ // 3 | Tries to take fence lock | Tries to take JQ lock
+ // 4 | ***DEADLOCK*** | ***DEADLOCK***
+ //
+ // In order to prevent deadlock, we first have to revoke access to the
+ // JQ so that all fence callbacks can't try to take the lock anymore,
+ // and then deregister all JQ callbacks on the fences.
+ self.inner.revoke();
+
+ /*
+ let guard = self.inner.lock();
+ for job in self.inner.waiting_jobs {
+ job.deregister_dep_fences();
+ }
+ for job in self.inner.running_jobs {
+ job.deregister_hw_fence();
+ }
+
+ TODO: signall all remaining done_fences with an error.
+ */
+ }
+}
diff --git a/rust/kernel/drm/mod.rs b/rust/kernel/drm/mod.rs
index 1b82b6945edf..803bed36231b 100644
--- a/rust/kernel/drm/mod.rs
+++ b/rust/kernel/drm/mod.rs
@@ -7,12 +7,14 @@
pub mod file;
pub mod gem;
pub mod ioctl;
+pub mod jq;
pub use self::device::Device;
pub use self::driver::Driver;
pub use self::driver::DriverInfo;
pub use self::driver::Registration;
pub use self::file::File;
+pub use self::jq::Jobqueue;
pub(crate) mod private {
pub trait Sealed {}
--
2.49.0
^ permalink raw reply related [flat|nested] 103+ messages in thread* Re: [RFC PATCH 3/4] rust/drm: Add DRM Jobqueue
2026-02-03 8:14 ` [RFC PATCH 3/4] rust/drm: Add DRM Jobqueue Philipp Stanner
@ 2026-02-10 14:57 ` Boris Brezillon
2026-02-11 10:47 ` Philipp Stanner
0 siblings, 1 reply; 103+ messages in thread
From: Boris Brezillon @ 2026-02-10 14:57 UTC (permalink / raw)
To: Philipp Stanner
Cc: David Airlie, Simona Vetter, Danilo Krummrich, Alice Ryhl,
Gary Guo, Benno Lossin, Christian König, Daniel Almeida,
Joel Fernandes, linux-kernel, dri-devel, rust-for-linux
On Tue, 3 Feb 2026 09:14:02 +0100
Philipp Stanner <phasta@kernel.org> wrote:
> +/// A jobqueue Job.
> +///
> +/// You can stuff your data in it. The job will be borrowed back to your driver
> +/// once the time has come to run it.
> +///
> +/// Jobs are consumed by [`Jobqueue::submit_job`] by value (ownership transfer).
> +/// You can set multiple [`DmaFence`] as dependencies for a job. It will only
> +/// get run once all dependency fences have been signaled.
> +///
> +/// Jobs cost credits. Jobs will only be run if there are is enough capacity in
> +/// the jobqueue for the job's credits. It is legal to specify jobs costing 0
> +/// credits, effectively disabling that mechanism.
> +#[pin_data]
> +pub struct Job<T: 'static + Send> {
> + cost: u32,
> + #[pin]
> + pub data: T,
> + done_fence: Option<ARef<DmaFence<i32>>>,
> + hardware_fence: Option<ARef<DmaFence<i32>>>,
> + nr_of_deps: AtomicU32,
> + dependencies: List<Dependency>,
Given how tricky Lists are in rust, I'd recommend going for an XArray,
like we have on the C side. There's a bit of overhead when the job only
has a few deps, but I think simplicity beats memory-usage-optimizations
in that case (especially since the overhead exists and is accepted in
C).
> +}
> +
^ permalink raw reply [flat|nested] 103+ messages in thread* Re: [RFC PATCH 3/4] rust/drm: Add DRM Jobqueue
2026-02-10 14:57 ` Boris Brezillon
@ 2026-02-11 10:47 ` Philipp Stanner
2026-02-11 11:07 ` Boris Brezillon
0 siblings, 1 reply; 103+ messages in thread
From: Philipp Stanner @ 2026-02-11 10:47 UTC (permalink / raw)
To: Boris Brezillon, Philipp Stanner
Cc: David Airlie, Simona Vetter, Danilo Krummrich, Alice Ryhl,
Gary Guo, Benno Lossin, Christian König, Daniel Almeida,
Joel Fernandes, linux-kernel, dri-devel, rust-for-linux
On Tue, 2026-02-10 at 15:57 +0100, Boris Brezillon wrote:
> On Tue, 3 Feb 2026 09:14:02 +0100
> Philipp Stanner <phasta@kernel.org> wrote:
>
> > +/// A jobqueue Job.
> > +///
> > +/// You can stuff your data in it. The job will be borrowed back to your driver
> > +/// once the time has come to run it.
> > +///
> > +/// Jobs are consumed by [`Jobqueue::submit_job`] by value (ownership transfer).
> > +/// You can set multiple [`DmaFence`] as dependencies for a job. It will only
> > +/// get run once all dependency fences have been signaled.
> > +///
> > +/// Jobs cost credits. Jobs will only be run if there are is enough capacity in
> > +/// the jobqueue for the job's credits. It is legal to specify jobs costing 0
> > +/// credits, effectively disabling that mechanism.
> > +#[pin_data]
> > +pub struct Job<T: 'static + Send> {
> > + cost: u32,
> > + #[pin]
> > + pub data: T,
> > + done_fence: Option<ARef<DmaFence<i32>>>,
> > + hardware_fence: Option<ARef<DmaFence<i32>>>,
> > + nr_of_deps: AtomicU32,
> > + dependencies: List<Dependency>,
>
> Given how tricky Lists are in rust, I'd recommend going for an XArray,
> like we have on the C side. There's a bit of overhead when the job only
> has a few deps, but I think simplicity beats memory-usage-optimizations
> in that case (especially since the overhead exists and is accepted in
> C).
I mean, the list is now already implemented and works. Considering the
XArray would have made sense during the development difficulties.
If it were to make sense we could certainly replace the list with an
xarray, but I don't see an advantage. The JQ just needs to iterate over
the dependencies to register its events on them, and on drop to
deregister them perhaps.
We have many jobs, but likely only few dependencies per job, so the
lower memory footprint seems desirable and the XArray's advantages
don't come to play – except maybe if we'd want to consider to avoid the
current unsafe-rawpointer solution to obtain the job, since obtaining a
job from an Xarray is far faster than by list iteration.
P.
^ permalink raw reply [flat|nested] 103+ messages in thread* Re: [RFC PATCH 3/4] rust/drm: Add DRM Jobqueue
2026-02-11 10:47 ` Philipp Stanner
@ 2026-02-11 11:07 ` Boris Brezillon
2026-02-11 11:19 ` Danilo Krummrich
2026-02-11 11:19 ` Philipp Stanner
0 siblings, 2 replies; 103+ messages in thread
From: Boris Brezillon @ 2026-02-11 11:07 UTC (permalink / raw)
To: Philipp Stanner
Cc: phasta, David Airlie, Simona Vetter, Danilo Krummrich, Alice Ryhl,
Gary Guo, Benno Lossin, Christian König, Daniel Almeida,
Joel Fernandes, linux-kernel, dri-devel, rust-for-linux
On Wed, 11 Feb 2026 11:47:27 +0100
Philipp Stanner <phasta@mailbox.org> wrote:
> On Tue, 2026-02-10 at 15:57 +0100, Boris Brezillon wrote:
> > On Tue, 3 Feb 2026 09:14:02 +0100
> > Philipp Stanner <phasta@kernel.org> wrote:
> >
> > > +/// A jobqueue Job.
> > > +///
> > > +/// You can stuff your data in it. The job will be borrowed back to your driver
> > > +/// once the time has come to run it.
> > > +///
> > > +/// Jobs are consumed by [`Jobqueue::submit_job`] by value (ownership transfer).
> > > +/// You can set multiple [`DmaFence`] as dependencies for a job. It will only
> > > +/// get run once all dependency fences have been signaled.
> > > +///
> > > +/// Jobs cost credits. Jobs will only be run if there are is enough capacity in
> > > +/// the jobqueue for the job's credits. It is legal to specify jobs costing 0
> > > +/// credits, effectively disabling that mechanism.
> > > +#[pin_data]
> > > +pub struct Job<T: 'static + Send> {
> > > + cost: u32,
> > > + #[pin]
> > > + pub data: T,
> > > + done_fence: Option<ARef<DmaFence<i32>>>,
> > > + hardware_fence: Option<ARef<DmaFence<i32>>>,
> > > + nr_of_deps: AtomicU32,
> > > + dependencies: List<Dependency>,
> >
> > Given how tricky Lists are in rust, I'd recommend going for an XArray,
> > like we have on the C side. There's a bit of overhead when the job only
> > has a few deps, but I think simplicity beats memory-usage-optimizations
> > in that case (especially since the overhead exists and is accepted in
> > C).
>
> I mean, the list is now already implemented and works. Considering the
> XArray would have made sense during the development difficulties.
I'm sure it does, but that's still more code/tricks to maintain than
what you'd have with the XArray abstraction.
>
> If it were to make sense we could certainly replace the list with an
> xarray, but I don't see an advantage. The JQ just needs to iterate over
> the dependencies to register its events on them, and on drop to
> deregister them perhaps.
>
> We have many jobs, but likely only few dependencies per job, so the
> lower memory footprint seems desirable and the XArray's advantages
> don't come to play – except maybe if we'd want to consider to avoid the
> current unsafe-rawpointer solution to obtain the job, since obtaining a
> job from an Xarray is far faster than by list iteration.
I don't think we need O(1) for picking random deps in a job, because
that's not something we need at all: the dep list here is used as a
FIFO. There's the per-dep overhead of the ListLinks object maybe, but
it's certainly acceptable. And I don't think cache locality matters
either, because the XArray stores pointers too, so we'll still be one
deref away from the DmaFence. No, my main concern was maintainability,
because managing lists in rust is far from trivial, and as a developer,
I try to avoid using concepts the language I rely on is not friendly
with.
^ permalink raw reply [flat|nested] 103+ messages in thread* Re: [RFC PATCH 3/4] rust/drm: Add DRM Jobqueue
2026-02-11 11:07 ` Boris Brezillon
@ 2026-02-11 11:19 ` Danilo Krummrich
2026-02-11 12:10 ` Boris Brezillon
2026-02-11 11:19 ` Philipp Stanner
1 sibling, 1 reply; 103+ messages in thread
From: Danilo Krummrich @ 2026-02-11 11:19 UTC (permalink / raw)
To: Boris Brezillon
Cc: Philipp Stanner, phasta, David Airlie, Simona Vetter, Alice Ryhl,
Gary Guo, Benno Lossin, Christian König, Daniel Almeida,
Joel Fernandes, linux-kernel, dri-devel, rust-for-linux
On Wed Feb 11, 2026 at 12:07 PM CET, Boris Brezillon wrote:
> I try to avoid using concepts the language I rely on is not friendly
> with.
It's not really a language limitation. For instance, you can implement lists the
exact same way as they can be implemented in C. It's more that a memory safe
list implementation is quite tricky in general.
Lists clearly do have their place. In this specific case, it probably doesn't
matter too much, but in general I'd abstain from not using a list (where it is
the best fit) just because they are tricky in getting them implemented in a
memory safe way.
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [RFC PATCH 3/4] rust/drm: Add DRM Jobqueue
2026-02-11 11:19 ` Danilo Krummrich
@ 2026-02-11 12:10 ` Boris Brezillon
2026-02-11 12:32 ` Danilo Krummrich
0 siblings, 1 reply; 103+ messages in thread
From: Boris Brezillon @ 2026-02-11 12:10 UTC (permalink / raw)
To: Danilo Krummrich
Cc: Philipp Stanner, phasta, David Airlie, Simona Vetter, Alice Ryhl,
Gary Guo, Benno Lossin, Christian König, Daniel Almeida,
Joel Fernandes, linux-kernel, dri-devel, rust-for-linux
On Wed, 11 Feb 2026 12:19:04 +0100
"Danilo Krummrich" <dakr@kernel.org> wrote:
> On Wed Feb 11, 2026 at 12:07 PM CET, Boris Brezillon wrote:
> > I try to avoid using concepts the language I rely on is not friendly
> > with.
>
> It's not really a language limitation. For instance, you can implement lists the
> exact same way as they can be implemented in C. It's more that a memory safe
> list implementation is quite tricky in general.
That's what I mean by trickier to use, they are because of rust safety.
And again, that's not a case for saying "nah, rust is not a good fit, it
can't do easy-to-use-lists", but rather a good opportunity to think
twice about the containers we want to use.
>
> Lists clearly do have their place. In this specific case, it probably doesn't
> matter too much, but in general I'd abstain from not using a list (where it is
> the best fit) just because they are tricky in getting them implemented in a
> memory safe way.
Read my other reply. I'm not saying "don't ever use a list in rust!",
I'm saying, "if there are valid alternatives, and they make things
slightly simpler, why not use these alternatives". And that's the case
here I think, especially since the C side does exactly that (using an
xarray over a list). Note how I didn't mention the list under the
JobQueue, because for that one, I'm not too sure XArray is a good fit
(jobs might be moved around between lists depending on their progress at
some point).
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [RFC PATCH 3/4] rust/drm: Add DRM Jobqueue
2026-02-11 12:10 ` Boris Brezillon
@ 2026-02-11 12:32 ` Danilo Krummrich
2026-02-11 12:51 ` Boris Brezillon
0 siblings, 1 reply; 103+ messages in thread
From: Danilo Krummrich @ 2026-02-11 12:32 UTC (permalink / raw)
To: Boris Brezillon
Cc: Philipp Stanner, phasta, David Airlie, Simona Vetter, Alice Ryhl,
Gary Guo, Benno Lossin, Christian König, Daniel Almeida,
Joel Fernandes, linux-kernel, dri-devel, rust-for-linux
On Wed Feb 11, 2026 at 1:10 PM CET, Boris Brezillon wrote:
> On Wed, 11 Feb 2026 12:19:04 +0100
> "Danilo Krummrich" <dakr@kernel.org> wrote:
>
>> On Wed Feb 11, 2026 at 12:07 PM CET, Boris Brezillon wrote:
>> > I try to avoid using concepts the language I rely on is not friendly
>> > with.
>>
>> It's not really a language limitation. For instance, you can implement lists the
>> exact same way as they can be implemented in C. It's more that a memory safe
>> list implementation is quite tricky in general.
>
> That's what I mean by trickier to use, they are because of rust safety.
Yeah, we agree on this. What I don't agree with is the "avoid using concepts"
part, because it came across in an unconditional way.
> And again, that's not a case for saying "nah, rust is not a good fit, it
> can't do easy-to-use-lists", but rather a good opportunity to think
> twice about the containers we want to use.
I think I never implied that you were saying anything along the lines of "rust
is not a good fit" in any way. No idea where this comes from. :)
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [RFC PATCH 3/4] rust/drm: Add DRM Jobqueue
2026-02-11 12:32 ` Danilo Krummrich
@ 2026-02-11 12:51 ` Boris Brezillon
0 siblings, 0 replies; 103+ messages in thread
From: Boris Brezillon @ 2026-02-11 12:51 UTC (permalink / raw)
To: Danilo Krummrich
Cc: Philipp Stanner, phasta, David Airlie, Simona Vetter, Alice Ryhl,
Gary Guo, Benno Lossin, Christian König, Daniel Almeida,
Joel Fernandes, linux-kernel, dri-devel, rust-for-linux
On Wed, 11 Feb 2026 13:32:26 +0100
"Danilo Krummrich" <dakr@kernel.org> wrote:
> On Wed Feb 11, 2026 at 1:10 PM CET, Boris Brezillon wrote:
> > On Wed, 11 Feb 2026 12:19:04 +0100
> > "Danilo Krummrich" <dakr@kernel.org> wrote:
> >
> >> On Wed Feb 11, 2026 at 12:07 PM CET, Boris Brezillon wrote:
> >> > I try to avoid using concepts the language I rely on is not friendly
> >> > with.
> >>
> >> It's not really a language limitation. For instance, you can implement lists the
> >> exact same way as they can be implemented in C. It's more that a memory safe
> >> list implementation is quite tricky in general.
> >
> > That's what I mean by trickier to use, they are because of rust safety.
>
> Yeah, we agree on this. What I don't agree with is the "avoid using concepts"
> part, because it came across in an unconditional way.
Well, I guess that's me approaching problems differently then. I
usually consider that, if a language makes my life harder to do
something, there are good reasons, and there's probably alternatives
(with different paradigms) to do the same thing. At least that's my
first reaction. It might be that after further investigation, that's
just how it is, and I have to live with the extra complexity. But yeah,
I stand to my original statement: if something is complex, I'll always
investigate other options before going for the hard way.
>
> > And again, that's not a case for saying "nah, rust is not a good fit, it
> > can't do easy-to-use-lists", but rather a good opportunity to think
> > twice about the containers we want to use.
>
> I think I never implied that you were saying anything along the lines of "rust
> is not a good fit" in any way. No idea where this comes from. :)
That one was more referring to Philipp's reply, where he was saying
some people dismiss rust because of lists, and I wanted to make it
clear that's not what this about here.
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [RFC PATCH 3/4] rust/drm: Add DRM Jobqueue
2026-02-11 11:07 ` Boris Brezillon
2026-02-11 11:19 ` Danilo Krummrich
@ 2026-02-11 11:19 ` Philipp Stanner
2026-02-11 11:59 ` Boris Brezillon
2026-02-11 12:22 ` Alice Ryhl
1 sibling, 2 replies; 103+ messages in thread
From: Philipp Stanner @ 2026-02-11 11:19 UTC (permalink / raw)
To: Boris Brezillon
Cc: phasta, David Airlie, Simona Vetter, Danilo Krummrich, Alice Ryhl,
Gary Guo, Benno Lossin, Christian König, Daniel Almeida,
Joel Fernandes, linux-kernel, dri-devel, rust-for-linux
On Wed, 2026-02-11 at 12:07 +0100, Boris Brezillon wrote:
> On Wed, 11 Feb 2026 11:47:27 +0100
> Philipp Stanner <phasta@mailbox.org> wrote:
>
> > On Tue, 2026-02-10 at 15:57 +0100, Boris Brezillon wrote:
> > > On Tue, 3 Feb 2026 09:14:02 +0100
> > > Philipp Stanner <phasta@kernel.org> wrote:
> > >
> > > > +/// A jobqueue Job.
> > > > +///
> > > > +/// You can stuff your data in it. The job will be borrowed back to your driver
> > > > +/// once the time has come to run it.
> > > > +///
> > > > +/// Jobs are consumed by [`Jobqueue::submit_job`] by value (ownership transfer).
> > > > +/// You can set multiple [`DmaFence`] as dependencies for a job. It will only
> > > > +/// get run once all dependency fences have been signaled.
> > > > +///
> > > > +/// Jobs cost credits. Jobs will only be run if there are is enough capacity in
> > > > +/// the jobqueue for the job's credits. It is legal to specify jobs costing 0
> > > > +/// credits, effectively disabling that mechanism.
> > > > +#[pin_data]
> > > > +pub struct Job<T: 'static + Send> {
> > > > + cost: u32,
> > > > + #[pin]
> > > > + pub data: T,
> > > > + done_fence: Option<ARef<DmaFence<i32>>>,
> > > > + hardware_fence: Option<ARef<DmaFence<i32>>>,
> > > > + nr_of_deps: AtomicU32,
> > > > + dependencies: List<Dependency>,
> > >
> > > Given how tricky Lists are in rust, I'd recommend going for an XArray,
> > > like we have on the C side. There's a bit of overhead when the job only
> > > has a few deps, but I think simplicity beats memory-usage-optimizations
> > > in that case (especially since the overhead exists and is accepted in
> > > C).
> >
> > I mean, the list is now already implemented and works. Considering the
> > XArray would have made sense during the development difficulties.
>
> I'm sure it does, but that's still more code/tricks to maintain than
> what you'd have with the XArray abstraction.
The solution than will rather be to make the linked list implementation
better.
A list is the correct data structure in a huge number of use cases in
the kernel. We should not begin here to defer to other structures
because of convenience.
Btw. lists in Rust being so horrible has been repeatedly a reason why
some other hackers argued that Rust as a language is not suitable for
kernel development.
So getting that right seems more desirable than capitulating.
>
> >
> > If it were to make sense we could certainly replace the list with an
> > xarray, but I don't see an advantage. The JQ just needs to iterate over
> > the dependencies to register its events on them, and on drop to
> > deregister them perhaps.
> >
> > We have many jobs, but likely only few dependencies per job, so the
> > lower memory footprint seems desirable and the XArray's advantages
> > don't come to play – except maybe if we'd want to consider to avoid the
> > current unsafe-rawpointer solution to obtain the job, since obtaining a
> > job from an Xarray is far faster than by list iteration.
>
> I don't think we need O(1) for picking random deps in a job, because
> that's not something we need at all: the dep list here is used as a
> FIFO.
>
Wrong. The dep list here has no ordering requirements at all. JQ does
not care in which order it registers its events, it just cares about
dealing with dep-fences racing.
You could (de-)register your callbacks in random order, it does not
matter.
List and Xarray might be useful for the unsafe related to the
DependencyWaker. There you could avoid a raw pointer by getting the job
through a list iteration or through the hypothetical XArray.
Please take a look at my detailed code comments for DependencyWaker.
> There's the per-dep overhead of the ListLinks object maybe, but
> it's certainly acceptable. And I don't think cache locality matters
> either, because the XArray stores pointers too, so we'll still be one
> deref away from the DmaFence. No, my main concern was maintainability,
> because managing lists in rust is far from trivial, and as a developer,
> I try to avoid using concepts the language I rely on is not friendly
> with.
This would be a decision with wide implications, as detailed above.
If we were to admit that lists just don't work in Rust, then wouldn't
the consequent decision to remove them all together?
"Lists in kernel-Rust are not supported. Too difficult to maintain.
We're sorry. Use XArray et al. instead :("
^ permalink raw reply [flat|nested] 103+ messages in thread* Re: [RFC PATCH 3/4] rust/drm: Add DRM Jobqueue
2026-02-11 11:19 ` Philipp Stanner
@ 2026-02-11 11:59 ` Boris Brezillon
2026-02-11 12:14 ` Philipp Stanner
2026-02-11 12:22 ` Alice Ryhl
1 sibling, 1 reply; 103+ messages in thread
From: Boris Brezillon @ 2026-02-11 11:59 UTC (permalink / raw)
To: Philipp Stanner
Cc: phasta, David Airlie, Simona Vetter, Danilo Krummrich, Alice Ryhl,
Gary Guo, Benno Lossin, Christian König, Daniel Almeida,
Joel Fernandes, linux-kernel, dri-devel, rust-for-linux
On Wed, 11 Feb 2026 12:19:56 +0100
Philipp Stanner <phasta@mailbox.org> wrote:
> On Wed, 2026-02-11 at 12:07 +0100, Boris Brezillon wrote:
> > On Wed, 11 Feb 2026 11:47:27 +0100
> > Philipp Stanner <phasta@mailbox.org> wrote:
> >
> > > On Tue, 2026-02-10 at 15:57 +0100, Boris Brezillon wrote:
> > > > On Tue, 3 Feb 2026 09:14:02 +0100
> > > > Philipp Stanner <phasta@kernel.org> wrote:
> > > >
> > > > > +/// A jobqueue Job.
> > > > > +///
> > > > > +/// You can stuff your data in it. The job will be borrowed back to your driver
> > > > > +/// once the time has come to run it.
> > > > > +///
> > > > > +/// Jobs are consumed by [`Jobqueue::submit_job`] by value (ownership transfer).
> > > > > +/// You can set multiple [`DmaFence`] as dependencies for a job. It will only
> > > > > +/// get run once all dependency fences have been signaled.
> > > > > +///
> > > > > +/// Jobs cost credits. Jobs will only be run if there are is enough capacity in
> > > > > +/// the jobqueue for the job's credits. It is legal to specify jobs costing 0
> > > > > +/// credits, effectively disabling that mechanism.
> > > > > +#[pin_data]
> > > > > +pub struct Job<T: 'static + Send> {
> > > > > + cost: u32,
> > > > > + #[pin]
> > > > > + pub data: T,
> > > > > + done_fence: Option<ARef<DmaFence<i32>>>,
> > > > > + hardware_fence: Option<ARef<DmaFence<i32>>>,
> > > > > + nr_of_deps: AtomicU32,
> > > > > + dependencies: List<Dependency>,
> > > >
> > > > Given how tricky Lists are in rust, I'd recommend going for an XArray,
> > > > like we have on the C side. There's a bit of overhead when the job only
> > > > has a few deps, but I think simplicity beats memory-usage-optimizations
> > > > in that case (especially since the overhead exists and is accepted in
> > > > C).
> > >
> > > I mean, the list is now already implemented and works. Considering the
> > > XArray would have made sense during the development difficulties.
> >
> > I'm sure it does, but that's still more code/tricks to maintain than
> > what you'd have with the XArray abstraction.
>
> The solution than will rather be to make the linked list implementation
> better.
>
> A list is the correct data structure in a huge number of use cases in
> the kernel. We should not begin here to defer to other structures
> because of convenience.
>
> Btw. lists in Rust being so horrible has been repeatedly a reason why
> some other hackers argued that Rust as a language is not suitable for
> kernel development.
>
> So getting that right seems more desirable than capitulating.
I'm not capitulating, and I'm not saying "No list, never!" either. I'm
saying, if there's something that fits the bill and is easier to use,
maybe we should consider it...
>
> >
> > >
> > > If it were to make sense we could certainly replace the list with an
> > > xarray, but I don't see an advantage. The JQ just needs to iterate over
> > > the dependencies to register its events on them, and on drop to
> > > deregister them perhaps.
> > >
> > > We have many jobs, but likely only few dependencies per job, so the
> > > lower memory footprint seems desirable and the XArray's advantages
> > > don't come to play – except maybe if we'd want to consider to avoid the
> > > current unsafe-rawpointer solution to obtain the job, since obtaining a
> > > job from an Xarray is far faster than by list iteration.
> >
> > I don't think we need O(1) for picking random deps in a job, because
> > that's not something we need at all: the dep list here is used as a
> > FIFO.
> >
>
> Wrong. The dep list here has no ordering requirements at all. JQ does
> not care in which order it registers its events, it just cares about
> dealing with dep-fences racing.
What I mean is that it's used as a FIFO right now, not that deps have to
be processed in order.
>
> You could (de-)register your callbacks in random order, it does not
> matter.
Again, that's not my point, and I think we're just saying the same
thing here: the list seems to be a good match for this dependency
array/list, because right now deps are processed in order. Now, being
the right construct in one language doesn't mean it's the right
construct in another language.
>
> List and Xarray might be useful for the unsafe related to the
> DependencyWaker. There you could avoid a raw pointer by getting the job
> through a list iteration or through the hypothetical XArray.
>
> Please take a look at my detailed code comments for DependencyWaker.
Sure, I'll have a closer look.
>
> > There's the per-dep overhead of the ListLinks object maybe, but
> > it's certainly acceptable. And I don't think cache locality matters
> > either, because the XArray stores pointers too, so we'll still be one
> > deref away from the DmaFence. No, my main concern was maintainability,
> > because managing lists in rust is far from trivial, and as a developer,
> > I try to avoid using concepts the language I rely on is not friendly
> > with.
>
> This would be a decision with wide implications, as detailed above.
>
> If we were to admit that lists just don't work in Rust, then wouldn't
> the consequent decision to remove them all together?
I'm not going as far as saying they don't work, I'm just saying they
are trickier to use, and that's a fact.
>
> "Lists in kernel-Rust are not supported. Too difficult to maintain.
> We're sorry. Use XArray et al. instead :("
No, there are patterns where an XArray wouldn't be a good fit. For
instance, LRU lists where objects get moved between lists depending on
their usage pattern. If we were to use XArrays for that, that would
imply potential allocations in paths where we don't want them. In this
dep array case, the deps are added at submit time, and they get
progressively dropped, so the array can't grow, it can only ever
shrink, and XArray allows it to shrink from both ends (and even have
holes), so it sounds like a good match too. Not saying a perfect match,
not saying better than a list in general, but an XArray is easier to
use than a list **in rust**, and the fact it fits the bill (if it does
in C, it should in rust too) had me thinking that maybe it's what we
should use.
If every one is happy with a list, then lets go for a list.
^ permalink raw reply [flat|nested] 103+ messages in thread* Re: [RFC PATCH 3/4] rust/drm: Add DRM Jobqueue
2026-02-11 11:59 ` Boris Brezillon
@ 2026-02-11 12:14 ` Philipp Stanner
2026-02-11 12:24 ` Boris Brezillon
0 siblings, 1 reply; 103+ messages in thread
From: Philipp Stanner @ 2026-02-11 12:14 UTC (permalink / raw)
To: Boris Brezillon
Cc: phasta, David Airlie, Simona Vetter, Danilo Krummrich, Alice Ryhl,
Gary Guo, Benno Lossin, Christian König, Daniel Almeida,
Joel Fernandes, linux-kernel, dri-devel, rust-for-linux
On Wed, 2026-02-11 at 12:59 +0100, Boris Brezillon wrote:
> On Wed, 11 Feb 2026 12:19:56 +0100
> Philipp Stanner <phasta@mailbox.org> wrote:
>
> > On Wed, 2026-02-11 at 12:07 +0100, Boris Brezillon wrote:
> > > On Wed, 11 Feb 2026 11:47:27 +0100
> > > Philipp Stanner <phasta@mailbox.org> wrote:
> > >
> > > > On Tue, 2026-02-10 at 15:57 +0100, Boris Brezillon wrote:
> > > > > On Tue, 3 Feb 2026 09:14:02 +0100
> > > > > Philipp Stanner <phasta@kernel.org> wrote:
> > > > >
> > > > > > +/// A jobqueue Job.
> > > > > > +///
> > > > > > +/// You can stuff your data in it. The job will be borrowed back to your driver
> > > > > > +/// once the time has come to run it.
> > > > > > +///
> > > > > > +/// Jobs are consumed by [`Jobqueue::submit_job`] by value (ownership transfer).
> > > > > > +/// You can set multiple [`DmaFence`] as dependencies for a job. It will only
> > > > > > +/// get run once all dependency fences have been signaled.
> > > > > > +///
> > > > > > +/// Jobs cost credits. Jobs will only be run if there are is enough capacity in
> > > > > > +/// the jobqueue for the job's credits. It is legal to specify jobs costing 0
> > > > > > +/// credits, effectively disabling that mechanism.
> > > > > > +#[pin_data]
> > > > > > +pub struct Job<T: 'static + Send> {
> > > > > > + cost: u32,
> > > > > > + #[pin]
> > > > > > + pub data: T,
> > > > > > + done_fence: Option<ARef<DmaFence<i32>>>,
> > > > > > + hardware_fence: Option<ARef<DmaFence<i32>>>,
> > > > > > + nr_of_deps: AtomicU32,
> > > > > > + dependencies: List<Dependency>,
> > > > >
> > > > > Given how tricky Lists are in rust, I'd recommend going for an XArray,
> > > > > like we have on the C side. There's a bit of overhead when the job only
> > > > > has a few deps, but I think simplicity beats memory-usage-optimizations
> > > > > in that case (especially since the overhead exists and is accepted in
> > > > > C).
> > > >
> > > > I mean, the list is now already implemented and works. Considering the
> > > > XArray would have made sense during the development difficulties.
> > >
> > > I'm sure it does, but that's still more code/tricks to maintain than
> > > what you'd have with the XArray abstraction.
> >
> > The solution than will rather be to make the linked list implementation
> > better.
> >
> > A list is the correct data structure in a huge number of use cases in
> > the kernel. We should not begin here to defer to other structures
> > because of convenience.
> >
> > Btw. lists in Rust being so horrible has been repeatedly a reason why
> > some other hackers argued that Rust as a language is not suitable for
> > kernel development.
> >
> > So getting that right seems more desirable than capitulating.
>
> I'm not capitulating, and I'm not saying "No list, never!" either. I'm
> saying, if there's something that fits the bill and is easier to use,
> maybe we should consider it...
>
> >
> > >
> > > >
> > > > If it were to make sense we could certainly replace the list with an
> > > > xarray, but I don't see an advantage. The JQ just needs to iterate over
> > > > the dependencies to register its events on them, and on drop to
> > > > deregister them perhaps.
> > > >
> > > > We have many jobs, but likely only few dependencies per job, so the
> > > > lower memory footprint seems desirable and the XArray's advantages
> > > > don't come to play – except maybe if we'd want to consider to avoid the
> > > > current unsafe-rawpointer solution to obtain the job, since obtaining a
> > > > job from an Xarray is far faster than by list iteration.
> > >
> > > I don't think we need O(1) for picking random deps in a job, because
> > > that's not something we need at all: the dep list here is used as a
> > > FIFO.
> > >
> >
> > Wrong. The dep list here has no ordering requirements at all. JQ does
> > not care in which order it registers its events, it just cares about
> > dealing with dep-fences racing.
>
> What I mean is that it's used as a FIFO right now, not that deps have to
> be processed in order.
Yeah, but it being a FIFO is irrelevant :)
>
> >
> > You could (de-)register your callbacks in random order, it does not
> > matter.
>
> Again, that's not my point, and I think we're just saying the same
> thing here: the list seems to be a good match for this dependency
> array/list, because right now deps are processed in order. Now, being
> the right construct in one language doesn't mean it's the right
> construct in another language.
>
> >
> > List and Xarray might be useful for the unsafe related to the
> > DependencyWaker. There you could avoid a raw pointer by getting the job
> > through a list iteration or through the hypothetical XArray.
> >
> > Please take a look at my detailed code comments for DependencyWaker.
>
> Sure, I'll have a closer look.
>
> >
> > > There's the per-dep overhead of the ListLinks object maybe, but
> > > it's certainly acceptable. And I don't think cache locality matters
> > > either, because the XArray stores pointers too, so we'll still be one
> > > deref away from the DmaFence. No, my main concern was maintainability,
> > > because managing lists in rust is far from trivial, and as a developer,
> > > I try to avoid using concepts the language I rely on is not friendly
> > > with.
> >
> > This would be a decision with wide implications, as detailed above.
> >
> > If we were to admit that lists just don't work in Rust, then wouldn't
> > the consequent decision to remove them all together?
>
> I'm not going as far as saying they don't work, I'm just saying they
> are trickier to use, and that's a fact.
>
> >
> > "Lists in kernel-Rust are not supported. Too difficult to maintain.
> > We're sorry. Use XArray et al. instead :("
>
> No, there are patterns where an XArray wouldn't be a good fit. For
> instance, LRU lists where objects get moved between lists depending on
> their usage pattern. If we were to use XArrays for that, that would
> imply potential allocations in paths where we don't want them. In this
> dep array case, the deps are added at submit time, and they get
> progressively dropped, so the array can't grow, it can only ever
> shrink, and XArray allows it to shrink from both ends (and even have
> holes), so it sounds like a good match too. Not saying a perfect match,
> not saying better than a list in general, but an XArray is easier to
> use than a list **in rust**, and the fact it fits the bill (if it does
> in C, it should in rust too) had me thinking that maybe it's what we
> should use.
Yoah, you have valid points.
Since XArray allows for dropping the unsafe {} without the performance
penalty of a list-iteration, I think your idea for this particular case
is good after all and can be put on the TODO list.
I'm not sure how soon I have the cycles for implementing that, though.
Looks as if a ton of work is coming at us for dma_fence.
P.
^ permalink raw reply [flat|nested] 103+ messages in thread* Re: [RFC PATCH 3/4] rust/drm: Add DRM Jobqueue
2026-02-11 12:14 ` Philipp Stanner
@ 2026-02-11 12:24 ` Boris Brezillon
0 siblings, 0 replies; 103+ messages in thread
From: Boris Brezillon @ 2026-02-11 12:24 UTC (permalink / raw)
To: Philipp Stanner
Cc: phasta, David Airlie, Simona Vetter, Danilo Krummrich, Alice Ryhl,
Gary Guo, Benno Lossin, Christian König, Daniel Almeida,
Joel Fernandes, linux-kernel, dri-devel, rust-for-linux
On Wed, 11 Feb 2026 13:14:11 +0100
Philipp Stanner <phasta@mailbox.org> wrote:
> On Wed, 2026-02-11 at 12:59 +0100, Boris Brezillon wrote:
> > On Wed, 11 Feb 2026 12:19:56 +0100
> > Philipp Stanner <phasta@mailbox.org> wrote:
> >
> > > On Wed, 2026-02-11 at 12:07 +0100, Boris Brezillon wrote:
> > > > On Wed, 11 Feb 2026 11:47:27 +0100
> > > > Philipp Stanner <phasta@mailbox.org> wrote:
> > > >
> > > > > On Tue, 2026-02-10 at 15:57 +0100, Boris Brezillon wrote:
> > > > > > On Tue, 3 Feb 2026 09:14:02 +0100
> > > > > > Philipp Stanner <phasta@kernel.org> wrote:
> > > > > >
> > > > > > > +/// A jobqueue Job.
> > > > > > > +///
> > > > > > > +/// You can stuff your data in it. The job will be borrowed back to your driver
> > > > > > > +/// once the time has come to run it.
> > > > > > > +///
> > > > > > > +/// Jobs are consumed by [`Jobqueue::submit_job`] by value (ownership transfer).
> > > > > > > +/// You can set multiple [`DmaFence`] as dependencies for a job. It will only
> > > > > > > +/// get run once all dependency fences have been signaled.
> > > > > > > +///
> > > > > > > +/// Jobs cost credits. Jobs will only be run if there are is enough capacity in
> > > > > > > +/// the jobqueue for the job's credits. It is legal to specify jobs costing 0
> > > > > > > +/// credits, effectively disabling that mechanism.
> > > > > > > +#[pin_data]
> > > > > > > +pub struct Job<T: 'static + Send> {
> > > > > > > + cost: u32,
> > > > > > > + #[pin]
> > > > > > > + pub data: T,
> > > > > > > + done_fence: Option<ARef<DmaFence<i32>>>,
> > > > > > > + hardware_fence: Option<ARef<DmaFence<i32>>>,
> > > > > > > + nr_of_deps: AtomicU32,
> > > > > > > + dependencies: List<Dependency>,
> > > > > >
> > > > > > Given how tricky Lists are in rust, I'd recommend going for an XArray,
> > > > > > like we have on the C side. There's a bit of overhead when the job only
> > > > > > has a few deps, but I think simplicity beats memory-usage-optimizations
> > > > > > in that case (especially since the overhead exists and is accepted in
> > > > > > C).
> > > > >
> > > > > I mean, the list is now already implemented and works. Considering the
> > > > > XArray would have made sense during the development difficulties.
> > > >
> > > > I'm sure it does, but that's still more code/tricks to maintain than
> > > > what you'd have with the XArray abstraction.
> > >
> > > The solution than will rather be to make the linked list implementation
> > > better.
> > >
> > > A list is the correct data structure in a huge number of use cases in
> > > the kernel. We should not begin here to defer to other structures
> > > because of convenience.
> > >
> > > Btw. lists in Rust being so horrible has been repeatedly a reason why
> > > some other hackers argued that Rust as a language is not suitable for
> > > kernel development.
> > >
> > > So getting that right seems more desirable than capitulating.
> >
> > I'm not capitulating, and I'm not saying "No list, never!" either. I'm
> > saying, if there's something that fits the bill and is easier to use,
> > maybe we should consider it...
> >
> > >
> > > >
> > > > >
> > > > > If it were to make sense we could certainly replace the list with an
> > > > > xarray, but I don't see an advantage. The JQ just needs to iterate over
> > > > > the dependencies to register its events on them, and on drop to
> > > > > deregister them perhaps.
> > > > >
> > > > > We have many jobs, but likely only few dependencies per job, so the
> > > > > lower memory footprint seems desirable and the XArray's advantages
> > > > > don't come to play – except maybe if we'd want to consider to avoid the
> > > > > current unsafe-rawpointer solution to obtain the job, since obtaining a
> > > > > job from an Xarray is far faster than by list iteration.
> > > >
> > > > I don't think we need O(1) for picking random deps in a job, because
> > > > that's not something we need at all: the dep list here is used as a
> > > > FIFO.
> > > >
> > >
> > > Wrong. The dep list here has no ordering requirements at all. JQ does
> > > not care in which order it registers its events, it just cares about
> > > dealing with dep-fences racing.
> >
> > What I mean is that it's used as a FIFO right now, not that deps have to
> > be processed in order.
>
> Yeah, but it being a FIFO is irrelevant :)
I do think it's relevant actually. If the implementation does it as a
FIFO, then that means a container that's capable of providing a FIFO
abstraction is good enough :P. The fact that in theory it can be random
order dep checking is not important, because the implementation does
the check in dependency addition order, and it rightfully does so to
keep things simple => if we have to wait for all the deps anyway,
what's the point of trying to give them a fancy
will-likely-be-signaled-first-order.
>
> >
> > >
> > > You could (de-)register your callbacks in random order, it does not
> > > matter.
> >
> > Again, that's not my point, and I think we're just saying the same
> > thing here: the list seems to be a good match for this dependency
> > array/list, because right now deps are processed in order. Now, being
> > the right construct in one language doesn't mean it's the right
> > construct in another language.
> >
> > >
> > > List and Xarray might be useful for the unsafe related to the
> > > DependencyWaker. There you could avoid a raw pointer by getting the job
> > > through a list iteration or through the hypothetical XArray.
> > >
> > > Please take a look at my detailed code comments for DependencyWaker.
> >
> > Sure, I'll have a closer look.
> >
> > >
> > > > There's the per-dep overhead of the ListLinks object maybe, but
> > > > it's certainly acceptable. And I don't think cache locality matters
> > > > either, because the XArray stores pointers too, so we'll still be one
> > > > deref away from the DmaFence. No, my main concern was maintainability,
> > > > because managing lists in rust is far from trivial, and as a developer,
> > > > I try to avoid using concepts the language I rely on is not friendly
> > > > with.
> > >
> > > This would be a decision with wide implications, as detailed above.
> > >
> > > If we were to admit that lists just don't work in Rust, then wouldn't
> > > the consequent decision to remove them all together?
> >
> > I'm not going as far as saying they don't work, I'm just saying they
> > are trickier to use, and that's a fact.
> >
> > >
> > > "Lists in kernel-Rust are not supported. Too difficult to maintain.
> > > We're sorry. Use XArray et al. instead :("
> >
> > No, there are patterns where an XArray wouldn't be a good fit. For
> > instance, LRU lists where objects get moved between lists depending on
> > their usage pattern. If we were to use XArrays for that, that would
> > imply potential allocations in paths where we don't want them. In this
> > dep array case, the deps are added at submit time, and they get
> > progressively dropped, so the array can't grow, it can only ever
> > shrink, and XArray allows it to shrink from both ends (and even have
> > holes), so it sounds like a good match too. Not saying a perfect match,
> > not saying better than a list in general, but an XArray is easier to
> > use than a list **in rust**, and the fact it fits the bill (if it does
> > in C, it should in rust too) had me thinking that maybe it's what we
> > should use.
>
> Yoah, you have valid points.
>
> Since XArray allows for dropping the unsafe {} without the performance
> penalty of a list-iteration, I think your idea for this particular case
> is good after all and can be put on the TODO list.
>
> I'm not sure how soon I have the cycles for implementing that, though.
> Looks as if a ton of work is coming at us for dma_fence.
Let me know how we (the Tyr devs) can help with that.
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [RFC PATCH 3/4] rust/drm: Add DRM Jobqueue
2026-02-11 11:19 ` Philipp Stanner
2026-02-11 11:59 ` Boris Brezillon
@ 2026-02-11 12:22 ` Alice Ryhl
2026-02-11 12:44 ` Philipp Stanner
` (2 more replies)
1 sibling, 3 replies; 103+ messages in thread
From: Alice Ryhl @ 2026-02-11 12:22 UTC (permalink / raw)
To: phasta
Cc: Boris Brezillon, David Airlie, Simona Vetter, Danilo Krummrich,
Gary Guo, Benno Lossin, Christian König, Daniel Almeida,
Joel Fernandes, linux-kernel, dri-devel, rust-for-linux
On Wed, Feb 11, 2026 at 12:19:56PM +0100, Philipp Stanner wrote:
> On Wed, 2026-02-11 at 12:07 +0100, Boris Brezillon wrote:
> > On Wed, 11 Feb 2026 11:47:27 +0100
> > Philipp Stanner <phasta@mailbox.org> wrote:
> >
> > > On Tue, 2026-02-10 at 15:57 +0100, Boris Brezillon wrote:
> > > > On Tue, 3 Feb 2026 09:14:02 +0100
> > > > Philipp Stanner <phasta@kernel.org> wrote:
> > > >
> > > > > +/// A jobqueue Job.
> > > > > +///
> > > > > +/// You can stuff your data in it. The job will be borrowed back to your driver
> > > > > +/// once the time has come to run it.
> > > > > +///
> > > > > +/// Jobs are consumed by [`Jobqueue::submit_job`] by value (ownership transfer).
> > > > > +/// You can set multiple [`DmaFence`] as dependencies for a job. It will only
> > > > > +/// get run once all dependency fences have been signaled.
> > > > > +///
> > > > > +/// Jobs cost credits. Jobs will only be run if there are is enough capacity in
> > > > > +/// the jobqueue for the job's credits. It is legal to specify jobs costing 0
> > > > > +/// credits, effectively disabling that mechanism.
> > > > > +#[pin_data]
> > > > > +pub struct Job<T: 'static + Send> {
> > > > > + cost: u32,
> > > > > + #[pin]
> > > > > + pub data: T,
> > > > > + done_fence: Option<ARef<DmaFence<i32>>>,
> > > > > + hardware_fence: Option<ARef<DmaFence<i32>>>,
> > > > > + nr_of_deps: AtomicU32,
> > > > > + dependencies: List<Dependency>,
> > > >
> > > > Given how tricky Lists are in rust, I'd recommend going for an XArray,
> > > > like we have on the C side. There's a bit of overhead when the job only
> > > > has a few deps, but I think simplicity beats memory-usage-optimizations
> > > > in that case (especially since the overhead exists and is accepted in
> > > > C).
> > >
> > > I mean, the list is now already implemented and works. Considering the
> > > XArray would have made sense during the development difficulties.
> >
> > I'm sure it does, but that's still more code/tricks to maintain than
> > what you'd have with the XArray abstraction.
>
> The solution than will rather be to make the linked list implementation
> better.
>
> A list is the correct data structure in a huge number of use cases in
> the kernel. We should not begin here to defer to other structures
> because of convenience.
Rust vs C aside, linked lists are often used in the kernel despite not
being the best choice. They are extremely cache unfriendly and
inefficient; most of the time a vector or xarray is far faster if you
can accept an ENOMEM failure path when adding elements. I have heard
several times from C maintainers that overuse of list is making the
kernel slow in a death from a thousand cuts situation.
This applies to the red/black tree too, by the way.
Alice
^ permalink raw reply [flat|nested] 103+ messages in thread* Re: [RFC PATCH 3/4] rust/drm: Add DRM Jobqueue
2026-02-11 12:22 ` Alice Ryhl
@ 2026-02-11 12:44 ` Philipp Stanner
2026-02-11 12:52 ` Alice Ryhl
2026-02-11 12:45 ` Danilo Krummrich
2026-02-11 13:45 ` Gary Guo
2 siblings, 1 reply; 103+ messages in thread
From: Philipp Stanner @ 2026-02-11 12:44 UTC (permalink / raw)
To: Alice Ryhl, phasta
Cc: Boris Brezillon, David Airlie, Simona Vetter, Danilo Krummrich,
Gary Guo, Benno Lossin, Christian König, Daniel Almeida,
Joel Fernandes, linux-kernel, dri-devel, rust-for-linux
On Wed, 2026-02-11 at 12:22 +0000, Alice Ryhl wrote:
> On Wed, Feb 11, 2026 at 12:19:56PM +0100, Philipp Stanner wrote:
> > On Wed, 2026-02-11 at 12:07 +0100, Boris Brezillon wrote:
> > > On Wed, 11 Feb 2026 11:47:27 +0100
> > > Philipp Stanner <phasta@mailbox.org> wrote:
> > >
> > > > On Tue, 2026-02-10 at 15:57 +0100, Boris Brezillon wrote:
> > > > > On Tue, 3 Feb 2026 09:14:02 +0100
> > > > > Philipp Stanner <phasta@kernel.org> wrote:
> > > > >
> > > > > > +/// A jobqueue Job.
> > > > > > +///
> > > > > > +/// You can stuff your data in it. The job will be borrowed back to your driver
> > > > > > +/// once the time has come to run it.
> > > > > > +///
> > > > > > +/// Jobs are consumed by [`Jobqueue::submit_job`] by value (ownership transfer).
> > > > > > +/// You can set multiple [`DmaFence`] as dependencies for a job. It will only
> > > > > > +/// get run once all dependency fences have been signaled.
> > > > > > +///
> > > > > > +/// Jobs cost credits. Jobs will only be run if there are is enough capacity in
> > > > > > +/// the jobqueue for the job's credits. It is legal to specify jobs costing 0
> > > > > > +/// credits, effectively disabling that mechanism.
> > > > > > +#[pin_data]
> > > > > > +pub struct Job<T: 'static + Send> {
> > > > > > + cost: u32,
> > > > > > + #[pin]
> > > > > > + pub data: T,
> > > > > > + done_fence: Option<ARef<DmaFence<i32>>>,
> > > > > > + hardware_fence: Option<ARef<DmaFence<i32>>>,
> > > > > > + nr_of_deps: AtomicU32,
> > > > > > + dependencies: List<Dependency>,
> > > > >
> > > > > Given how tricky Lists are in rust, I'd recommend going for an XArray,
> > > > > like we have on the C side. There's a bit of overhead when the job only
> > > > > has a few deps, but I think simplicity beats memory-usage-optimizations
> > > > > in that case (especially since the overhead exists and is accepted in
> > > > > C).
> > > >
> > > > I mean, the list is now already implemented and works. Considering the
> > > > XArray would have made sense during the development difficulties.
> > >
> > > I'm sure it does, but that's still more code/tricks to maintain than
> > > what you'd have with the XArray abstraction.
> >
> > The solution than will rather be to make the linked list implementation
> > better.
> >
> > A list is the correct data structure in a huge number of use cases in
> > the kernel. We should not begin here to defer to other structures
> > because of convenience.
>
> Rust vs C aside, linked lists are often used in the kernel despite not
> being the best choice. They are extremely cache unfriendly and
> inefficient; most of the time a vector or xarray is far faster if you
> can accept an ENOMEM failure path when adding elements. I have heard
> several times from C maintainers that overuse of list is making the
> kernel slow in a death from a thousand cuts situation.
Interesting. Valid points.
It might be a self-accelerating thing. More people have lists on their
mind because they are so common, with RB trees et al. being relatively
rare, so they instinctively use them, making them more common…
>
> This applies to the red/black tree too, by the way.
Can't fully follow, you mean that RB trees are supposedly overused,
too?
P.
^ permalink raw reply [flat|nested] 103+ messages in thread* Re: [RFC PATCH 3/4] rust/drm: Add DRM Jobqueue
2026-02-11 12:44 ` Philipp Stanner
@ 2026-02-11 12:52 ` Alice Ryhl
2026-02-11 13:53 ` Philipp Stanner
0 siblings, 1 reply; 103+ messages in thread
From: Alice Ryhl @ 2026-02-11 12:52 UTC (permalink / raw)
To: phasta
Cc: Boris Brezillon, David Airlie, Simona Vetter, Danilo Krummrich,
Gary Guo, Benno Lossin, Christian König, Daniel Almeida,
Joel Fernandes, linux-kernel, dri-devel, rust-for-linux
On Wed, Feb 11, 2026 at 01:44:56PM +0100, Philipp Stanner wrote:
> On Wed, 2026-02-11 at 12:22 +0000, Alice Ryhl wrote:
> > On Wed, Feb 11, 2026 at 12:19:56PM +0100, Philipp Stanner wrote:
> > > On Wed, 2026-02-11 at 12:07 +0100, Boris Brezillon wrote:
> > > > On Wed, 11 Feb 2026 11:47:27 +0100
> > > > Philipp Stanner <phasta@mailbox.org> wrote:
> > > >
> > > > > On Tue, 2026-02-10 at 15:57 +0100, Boris Brezillon wrote:
> > > > > > On Tue, 3 Feb 2026 09:14:02 +0100
> > > > > > Philipp Stanner <phasta@kernel.org> wrote:
> > > > > >
> > > > > > > +/// A jobqueue Job.
> > > > > > > +///
> > > > > > > +/// You can stuff your data in it. The job will be borrowed back to your driver
> > > > > > > +/// once the time has come to run it.
> > > > > > > +///
> > > > > > > +/// Jobs are consumed by [`Jobqueue::submit_job`] by value (ownership transfer).
> > > > > > > +/// You can set multiple [`DmaFence`] as dependencies for a job. It will only
> > > > > > > +/// get run once all dependency fences have been signaled.
> > > > > > > +///
> > > > > > > +/// Jobs cost credits. Jobs will only be run if there are is enough capacity in
> > > > > > > +/// the jobqueue for the job's credits. It is legal to specify jobs costing 0
> > > > > > > +/// credits, effectively disabling that mechanism.
> > > > > > > +#[pin_data]
> > > > > > > +pub struct Job<T: 'static + Send> {
> > > > > > > + cost: u32,
> > > > > > > + #[pin]
> > > > > > > + pub data: T,
> > > > > > > + done_fence: Option<ARef<DmaFence<i32>>>,
> > > > > > > + hardware_fence: Option<ARef<DmaFence<i32>>>,
> > > > > > > + nr_of_deps: AtomicU32,
> > > > > > > + dependencies: List<Dependency>,
> > > > > >
> > > > > > Given how tricky Lists are in rust, I'd recommend going for an XArray,
> > > > > > like we have on the C side. There's a bit of overhead when the job only
> > > > > > has a few deps, but I think simplicity beats memory-usage-optimizations
> > > > > > in that case (especially since the overhead exists and is accepted in
> > > > > > C).
> > > > >
> > > > > I mean, the list is now already implemented and works. Considering the
> > > > > XArray would have made sense during the development difficulties.
> > > >
> > > > I'm sure it does, but that's still more code/tricks to maintain than
> > > > what you'd have with the XArray abstraction.
> > >
> > > The solution than will rather be to make the linked list implementation
> > > better.
> > >
> > > A list is the correct data structure in a huge number of use cases in
> > > the kernel. We should not begin here to defer to other structures
> > > because of convenience.
> >
> > Rust vs C aside, linked lists are often used in the kernel despite not
> > being the best choice. They are extremely cache unfriendly and
> > inefficient; most of the time a vector or xarray is far faster if you
> > can accept an ENOMEM failure path when adding elements. I have heard
> > several times from C maintainers that overuse of list is making the
> > kernel slow in a death from a thousand cuts situation.
>
> Interesting. Valid points.
>
> It might be a self-accelerating thing. More people have lists on their
> mind because they are so common, with RB trees et al. being relatively
> rare, so they instinctively use them, making them more common…
Yes, many people assume "list widely used in kernel" implies "list is a
good idea". Unfortunately it is not the case.
> > This applies to the red/black tree too, by the way.
>
> Can't fully follow, you mean that RB trees are supposedly overused,
> too?
When I first suggested adding red/black tree abstractions in Rust
several years ago I was told by Greg that I couldn't do it because the
red/black tree was deprecated and no new users should be added.
Later I found that this was more of a not-written-down recommendation
than a full deprecation, and since Rust Binder has codepaths where an
ENOMEM failure path is unacceptable for the map, we did end up adding a
Rust rb tree abstraction after all. But this is where I first heard of
this issue with lists and rb trees.
Alice
^ permalink raw reply [flat|nested] 103+ messages in thread* Re: [RFC PATCH 3/4] rust/drm: Add DRM Jobqueue
2026-02-11 12:52 ` Alice Ryhl
@ 2026-02-11 13:53 ` Philipp Stanner
2026-02-11 15:28 ` Alice Ryhl
0 siblings, 1 reply; 103+ messages in thread
From: Philipp Stanner @ 2026-02-11 13:53 UTC (permalink / raw)
To: Alice Ryhl, phasta
Cc: Boris Brezillon, David Airlie, Simona Vetter, Danilo Krummrich,
Gary Guo, Benno Lossin, Christian König, Daniel Almeida,
Joel Fernandes, linux-kernel, dri-devel, rust-for-linux
On Wed, 2026-02-11 at 12:52 +0000, Alice Ryhl wrote:
>
> Yes, many people assume "list widely used in kernel" implies "list is a
> good idea". Unfortunately it is not the case.
>
> > > This applies to the red/black tree too, by the way.
> >
> > Can't fully follow, you mean that RB trees are supposedly overused,
> > too?
>
> When I first suggested adding red/black tree abstractions in Rust
> several years ago I was told by Greg that I couldn't do it because the
> red/black tree was deprecated and no new users should be added.
Do you have a link or sth?
First time in my life that I hear that RB trees shouldn't be used. If
something is deprecated for good one would hope that's obvious.
What's the justification? Should everyone use the B-Tree?
RB trees are super widely used in CS.
P.
>
> Later I found that this was more of a not-written-down recommendation
> than a full deprecation, and since Rust Binder has codepaths where an
> ENOMEM failure path is unacceptable for the map, we did end up adding a
> Rust rb tree abstraction after all. But this is where I first heard of
> this issue with lists and rb trees.
>
> Alice
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [RFC PATCH 3/4] rust/drm: Add DRM Jobqueue
2026-02-11 13:53 ` Philipp Stanner
@ 2026-02-11 15:28 ` Alice Ryhl
0 siblings, 0 replies; 103+ messages in thread
From: Alice Ryhl @ 2026-02-11 15:28 UTC (permalink / raw)
To: phasta
Cc: Boris Brezillon, David Airlie, Simona Vetter, Danilo Krummrich,
Gary Guo, Benno Lossin, Christian König, Daniel Almeida,
Joel Fernandes, linux-kernel, dri-devel, rust-for-linux
On Wed, Feb 11, 2026 at 02:53:49PM +0100, Philipp Stanner wrote:
> On Wed, 2026-02-11 at 12:52 +0000, Alice Ryhl wrote:
> >
> > Yes, many people assume "list widely used in kernel" implies "list is a
> > good idea". Unfortunately it is not the case.
> >
> > > > This applies to the red/black tree too, by the way.
> > >
> > > Can't fully follow, you mean that RB trees are supposedly overused,
> > > too?
> >
> > When I first suggested adding red/black tree abstractions in Rust
> > several years ago I was told by Greg that I couldn't do it because the
> > red/black tree was deprecated and no new users should be added.
>
> Do you have a link or sth?
I could not easily find it again, sorry. It's been several years now
since that discussion.
> First time in my life that I hear that RB trees shouldn't be used. If
> something is deprecated for good one would hope that's obvious.
I'm not sure what the current status is ... it may have been somewhat
walked back and is now a "some people do not like rb trees" rather than
a deprecation.
> What's the justification? Should everyone use the B-Tree?
> RB trees are super widely used in CS.
The justification is that every time you follow a pointer, it costs a
cache miss which is really expensive. Using an xarray or hashtable or
vector is much cheaper under most circumstances.
RB trees are used widely in CS because they are one of the simplest data
structures that provide O(log n) lookup. Such analysis usually does not
care about constant factors.
Alice
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [RFC PATCH 3/4] rust/drm: Add DRM Jobqueue
2026-02-11 12:22 ` Alice Ryhl
2026-02-11 12:44 ` Philipp Stanner
@ 2026-02-11 12:45 ` Danilo Krummrich
2026-02-11 13:45 ` Gary Guo
2 siblings, 0 replies; 103+ messages in thread
From: Danilo Krummrich @ 2026-02-11 12:45 UTC (permalink / raw)
To: Alice Ryhl
Cc: phasta, Boris Brezillon, David Airlie, Simona Vetter, Gary Guo,
Benno Lossin, Christian König, Daniel Almeida,
Joel Fernandes, linux-kernel, dri-devel, rust-for-linux
On Wed Feb 11, 2026 at 1:22 PM CET, Alice Ryhl wrote:
> Rust vs C aside, linked lists are often used in the kernel despite not being
> the best choice.
Absolutely agree with this.
> They are extremely cache unfriendly and inefficient; most of the time a vector
> or xarray is far faster if you can accept an ENOMEM failure path when adding
> elements.
Not sure if it's really most of the time, but I'd agree with "in a lot of
cases".
> I have heard several times from C maintainers that overuse of list is making
> the kernel slow in a death from a thousand cuts situation.
>
> This applies to the red/black tree too, by the way.
Yeah, that's why I would have preferred maple tree for GPUVM.
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [RFC PATCH 3/4] rust/drm: Add DRM Jobqueue
2026-02-11 12:22 ` Alice Ryhl
2026-02-11 12:44 ` Philipp Stanner
2026-02-11 12:45 ` Danilo Krummrich
@ 2026-02-11 13:45 ` Gary Guo
2026-02-11 14:07 ` Boris Brezillon
2026-02-11 15:33 ` Alice Ryhl
2 siblings, 2 replies; 103+ messages in thread
From: Gary Guo @ 2026-02-11 13:45 UTC (permalink / raw)
To: Alice Ryhl, phasta
Cc: Boris Brezillon, David Airlie, Simona Vetter, Danilo Krummrich,
Gary Guo, Benno Lossin, Christian König, Daniel Almeida,
Joel Fernandes, linux-kernel, dri-devel, rust-for-linux
On Wed Feb 11, 2026 at 8:22 PM CST, Alice Ryhl wrote:
> On Wed, Feb 11, 2026 at 12:19:56PM +0100, Philipp Stanner wrote:
>> On Wed, 2026-02-11 at 12:07 +0100, Boris Brezillon wrote:
>> > On Wed, 11 Feb 2026 11:47:27 +0100
>> > Philipp Stanner <phasta@mailbox.org> wrote:
>> >
>> > > On Tue, 2026-02-10 at 15:57 +0100, Boris Brezillon wrote:
>> > > > On Tue, 3 Feb 2026 09:14:02 +0100
>> > > > Philipp Stanner <phasta@kernel.org> wrote:
>> > > >
>> > > > > +/// A jobqueue Job.
>> > > > > +///
>> > > > > +/// You can stuff your data in it. The job will be borrowed back to your driver
>> > > > > +/// once the time has come to run it.
>> > > > > +///
>> > > > > +/// Jobs are consumed by [`Jobqueue::submit_job`] by value (ownership transfer).
>> > > > > +/// You can set multiple [`DmaFence`] as dependencies for a job. It will only
>> > > > > +/// get run once all dependency fences have been signaled.
>> > > > > +///
>> > > > > +/// Jobs cost credits. Jobs will only be run if there are is enough capacity in
>> > > > > +/// the jobqueue for the job's credits. It is legal to specify jobs costing 0
>> > > > > +/// credits, effectively disabling that mechanism.
>> > > > > +#[pin_data]
>> > > > > +pub struct Job<T: 'static + Send> {
>> > > > > + cost: u32,
>> > > > > + #[pin]
>> > > > > + pub data: T,
>> > > > > + done_fence: Option<ARef<DmaFence<i32>>>,
>> > > > > + hardware_fence: Option<ARef<DmaFence<i32>>>,
>> > > > > + nr_of_deps: AtomicU32,
>> > > > > + dependencies: List<Dependency>,
>> > > >
>> > > > Given how tricky Lists are in rust, I'd recommend going for an XArray,
>> > > > like we have on the C side. There's a bit of overhead when the job only
>> > > > has a few deps, but I think simplicity beats memory-usage-optimizations
>> > > > in that case (especially since the overhead exists and is accepted in
>> > > > C).
>> > >
>> > > I mean, the list is now already implemented and works. Considering the
>> > > XArray would have made sense during the development difficulties.
>> >
>> > I'm sure it does, but that's still more code/tricks to maintain than
>> > what you'd have with the XArray abstraction.
>>
>> The solution than will rather be to make the linked list implementation
>> better.
>>
>> A list is the correct data structure in a huge number of use cases in
>> the kernel. We should not begin here to defer to other structures
>> because of convenience.
>
> Rust vs C aside, linked lists are often used in the kernel despite not
> being the best choice. They are extremely cache unfriendly and
> inefficient; most of the time a vector or xarray is far faster if you
> can accept an ENOMEM failure path when adding elements. I have heard
> several times from C maintainers that overuse of list is making the
> kernel slow in a death from a thousand cuts situation.
I would rather argue the other way, other than very hot paths where cache
friendliness absolutely matters, if you do not require indexing access then the
list is the correct data strucutre more often than not.
Vector have the issue where resizing requires moving, so it cannot be used with
pinned types. XArray doesn't require moving because it requires an indirection
and thus an extra allocation, but this means that if you're just iterating over
all elements it also does not benefit from cache locality. Using vectors also
require careful management of capacity, which is a very common source of memory
leak for long running programs in user space Rust.
Re: the ENOMEM failure path, I'd argue that even if you *can* accept a ENOMEM
failure path, it is better to not have a failing path that is unnecessary.
Best,
Gary
>
> This applies to the red/black tree too, by the way.
>
> Alice
^ permalink raw reply [flat|nested] 103+ messages in thread* Re: [RFC PATCH 3/4] rust/drm: Add DRM Jobqueue
2026-02-11 13:45 ` Gary Guo
@ 2026-02-11 14:07 ` Boris Brezillon
2026-02-11 15:17 ` Alice Ryhl
2026-02-11 15:33 ` Alice Ryhl
1 sibling, 1 reply; 103+ messages in thread
From: Boris Brezillon @ 2026-02-11 14:07 UTC (permalink / raw)
To: Gary Guo
Cc: Alice Ryhl, phasta, David Airlie, Simona Vetter, Danilo Krummrich,
Benno Lossin, Christian König, Daniel Almeida,
Joel Fernandes, linux-kernel, dri-devel, rust-for-linux
On Wed, 11 Feb 2026 21:45:37 +0800
"Gary Guo" <gary@garyguo.net> wrote:
> On Wed Feb 11, 2026 at 8:22 PM CST, Alice Ryhl wrote:
> > On Wed, Feb 11, 2026 at 12:19:56PM +0100, Philipp Stanner wrote:
> >> On Wed, 2026-02-11 at 12:07 +0100, Boris Brezillon wrote:
> >> > On Wed, 11 Feb 2026 11:47:27 +0100
> >> > Philipp Stanner <phasta@mailbox.org> wrote:
> >> >
> >> > > On Tue, 2026-02-10 at 15:57 +0100, Boris Brezillon wrote:
> >> > > > On Tue, 3 Feb 2026 09:14:02 +0100
> >> > > > Philipp Stanner <phasta@kernel.org> wrote:
> >> > > >
> >> > > > > +/// A jobqueue Job.
> >> > > > > +///
> >> > > > > +/// You can stuff your data in it. The job will be borrowed back to your driver
> >> > > > > +/// once the time has come to run it.
> >> > > > > +///
> >> > > > > +/// Jobs are consumed by [`Jobqueue::submit_job`] by value (ownership transfer).
> >> > > > > +/// You can set multiple [`DmaFence`] as dependencies for a job. It will only
> >> > > > > +/// get run once all dependency fences have been signaled.
> >> > > > > +///
> >> > > > > +/// Jobs cost credits. Jobs will only be run if there are is enough capacity in
> >> > > > > +/// the jobqueue for the job's credits. It is legal to specify jobs costing 0
> >> > > > > +/// credits, effectively disabling that mechanism.
> >> > > > > +#[pin_data]
> >> > > > > +pub struct Job<T: 'static + Send> {
> >> > > > > + cost: u32,
> >> > > > > + #[pin]
> >> > > > > + pub data: T,
> >> > > > > + done_fence: Option<ARef<DmaFence<i32>>>,
> >> > > > > + hardware_fence: Option<ARef<DmaFence<i32>>>,
> >> > > > > + nr_of_deps: AtomicU32,
> >> > > > > + dependencies: List<Dependency>,
> >> > > >
> >> > > > Given how tricky Lists are in rust, I'd recommend going for an XArray,
> >> > > > like we have on the C side. There's a bit of overhead when the job only
> >> > > > has a few deps, but I think simplicity beats memory-usage-optimizations
> >> > > > in that case (especially since the overhead exists and is accepted in
> >> > > > C).
> >> > >
> >> > > I mean, the list is now already implemented and works. Considering the
> >> > > XArray would have made sense during the development difficulties.
> >> >
> >> > I'm sure it does, but that's still more code/tricks to maintain than
> >> > what you'd have with the XArray abstraction.
> >>
> >> The solution than will rather be to make the linked list implementation
> >> better.
> >>
> >> A list is the correct data structure in a huge number of use cases in
> >> the kernel. We should not begin here to defer to other structures
> >> because of convenience.
> >
> > Rust vs C aside, linked lists are often used in the kernel despite not
> > being the best choice. They are extremely cache unfriendly and
> > inefficient; most of the time a vector or xarray is far faster if you
> > can accept an ENOMEM failure path when adding elements. I have heard
> > several times from C maintainers that overuse of list is making the
> > kernel slow in a death from a thousand cuts situation.
>
> I would rather argue the other way, other than very hot paths where cache
> friendliness absolutely matters, if you do not require indexing access then the
> list is the correct data strucutre more often than not.
>
> Vector have the issue where resizing requires moving, so it cannot be used with
> pinned types. XArray doesn't require moving because it requires an indirection
> and thus an extra allocation, but this means that if you're just iterating over
> all elements it also does not benefit from cache locality.
Back to this particular job dependencies use case: we have to embed the
DmaFence pointer in some wrapper with the ListLinks element anyway,
because DmaFences can be inserted in multiple of those lists in
parallel. This means that now the overhead is two-pointers per DmaFence
pointer. Of course, it's not a big issue in practice, because those
elements are short-lived, it's only 16 bytes, and if we're ending up
having too many of those deps, we're gonna have other challenging
scaling issues anyway. But it also means we have the extra-indirection
that you'd have with an array of pointers or an xarray, with more
per-item overhead, and none of the advantages a list could provide (O(1)
removal if you have the list item, O(1) front insertion, ...) would
really be used in this case (because we use the list as a FIFO, really).
So overall, I'd still lean towards an XArray here, unless there are
strong objections. Just to make it super clear, I'm not making a case
against all List usage, just this particular one :-).
^ permalink raw reply [flat|nested] 103+ messages in thread* Re: [RFC PATCH 3/4] rust/drm: Add DRM Jobqueue
2026-02-11 14:07 ` Boris Brezillon
@ 2026-02-11 15:17 ` Alice Ryhl
2026-02-11 15:20 ` Philipp Stanner
0 siblings, 1 reply; 103+ messages in thread
From: Alice Ryhl @ 2026-02-11 15:17 UTC (permalink / raw)
To: Boris Brezillon
Cc: Gary Guo, phasta, David Airlie, Simona Vetter, Danilo Krummrich,
Benno Lossin, Christian König, Daniel Almeida,
Joel Fernandes, linux-kernel, dri-devel, rust-for-linux
On Wed, Feb 11, 2026 at 03:07:38PM +0100, Boris Brezillon wrote:
>
> Back to this particular job dependencies use case: we have to embed the
> DmaFence pointer in some wrapper with the ListLinks element anyway,
> because DmaFences can be inserted in multiple of those lists in
> parallel.
Okay, if that's the case, then the linked list is *really* not the right
tool for the job.
Alice
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [RFC PATCH 3/4] rust/drm: Add DRM Jobqueue
2026-02-11 15:17 ` Alice Ryhl
@ 2026-02-11 15:20 ` Philipp Stanner
2026-02-11 15:51 ` Boris Brezillon
` (2 more replies)
0 siblings, 3 replies; 103+ messages in thread
From: Philipp Stanner @ 2026-02-11 15:20 UTC (permalink / raw)
To: Alice Ryhl, Boris Brezillon
Cc: Gary Guo, phasta, David Airlie, Simona Vetter, Danilo Krummrich,
Benno Lossin, Christian König, Daniel Almeida,
Joel Fernandes, linux-kernel, dri-devel, rust-for-linux
On Wed, 2026-02-11 at 15:17 +0000, Alice Ryhl wrote:
> On Wed, Feb 11, 2026 at 03:07:38PM +0100, Boris Brezillon wrote:
> >
> > Back to this particular job dependencies use case: we have to embed the
> > DmaFence pointer in some wrapper with the ListLinks element anyway,
> > because DmaFences can be inserted in multiple of those lists in
> > parallel.
>
> Okay, if that's the case, then the linked list is *really* not the right
> tool for the job.
We have to distinguish what we are talking about here.
For the JobQueue, it takes over a cloned DmaFence and stuffs that into
its own list. Problem solved.
Whether the driver has other clones of that fence in other list is not
relevant because it's not the same list head.
JQ's lists and list heads are internal.
I don't see a problem.
P.
^ permalink raw reply [flat|nested] 103+ messages in thread* Re: [RFC PATCH 3/4] rust/drm: Add DRM Jobqueue
2026-02-11 15:20 ` Philipp Stanner
@ 2026-02-11 15:51 ` Boris Brezillon
2026-02-11 15:53 ` Alice Ryhl
2026-02-11 15:54 ` Danilo Krummrich
2 siblings, 0 replies; 103+ messages in thread
From: Boris Brezillon @ 2026-02-11 15:51 UTC (permalink / raw)
To: Philipp Stanner
Cc: phasta, Alice Ryhl, Gary Guo, David Airlie, Simona Vetter,
Danilo Krummrich, Benno Lossin, Christian König,
Daniel Almeida, Joel Fernandes, linux-kernel, dri-devel,
rust-for-linux
On Wed, 11 Feb 2026 16:20:15 +0100
Philipp Stanner <phasta@mailbox.org> wrote:
> On Wed, 2026-02-11 at 15:17 +0000, Alice Ryhl wrote:
> > On Wed, Feb 11, 2026 at 03:07:38PM +0100, Boris Brezillon wrote:
> > >
> > > Back to this particular job dependencies use case: we have to embed the
> > > DmaFence pointer in some wrapper with the ListLinks element anyway,
> > > because DmaFences can be inserted in multiple of those lists in
> > > parallel.
> >
> > Okay, if that's the case, then the linked list is *really* not the right
> > tool for the job.
>
> We have to distinguish what we are talking about here.
>
> For the JobQueue, it takes over a cloned DmaFence and stuffs that into
> its own list. Problem solved.
>
> Whether the driver has other clones of that fence in other list is not
> relevant because it's not the same list head.
>
> JQ's lists and list heads are internal.
>
> I don't see a problem.
Both the list and xarray-based implementions will work, but what you end
up with when you use a list is items that look like:
struct Dependency {
// Two pointers to insert the element in the list
link: ListLinks,
// The pointer to your fence
fence: ARef<DmaFence>,
}
vs just the ARef<DmaFence> that's stored as a ForeignOwnable pointer in
some xarray entry. So the list overhead is still very much present, with
none of the benefits of the direct access you'd get if you were having
something like:
struct DmaFence {
// Two pointers to insert the element in the list dependency
list link: ListLinks,
// Put the rest of the DmaFence stuff there
...
}
which you can't have because a DmaFence can be in multiple dependency
lists at the same time.
^ permalink raw reply [flat|nested] 103+ messages in thread* Re: [RFC PATCH 3/4] rust/drm: Add DRM Jobqueue
2026-02-11 15:20 ` Philipp Stanner
2026-02-11 15:51 ` Boris Brezillon
@ 2026-02-11 15:53 ` Alice Ryhl
2026-02-11 15:54 ` Danilo Krummrich
2 siblings, 0 replies; 103+ messages in thread
From: Alice Ryhl @ 2026-02-11 15:53 UTC (permalink / raw)
To: phasta
Cc: Boris Brezillon, Gary Guo, David Airlie, Simona Vetter,
Danilo Krummrich, Benno Lossin, Christian König,
Daniel Almeida, Joel Fernandes, linux-kernel, dri-devel,
rust-for-linux
On Wed, Feb 11, 2026 at 04:20:15PM +0100, Philipp Stanner wrote:
> On Wed, 2026-02-11 at 15:17 +0000, Alice Ryhl wrote:
> > On Wed, Feb 11, 2026 at 03:07:38PM +0100, Boris Brezillon wrote:
> > >
> > > Back to this particular job dependencies use case: we have to embed the
> > > DmaFence pointer in some wrapper with the ListLinks element anyway,
> > > because DmaFences can be inserted in multiple of those lists in
> > > parallel.
> >
> > Okay, if that's the case, then the linked list is *really* not the right
> > tool for the job.
>
> We have to distinguish what we are talking about here.
>
> For the JobQueue, it takes over a cloned DmaFence and stuffs that into
> its own list. Problem solved.
>
> Whether the driver has other clones of that fence in other list is not
> relevant because it's not the same list head.
>
> JQ's lists and list heads are internal.
>
> I don't see a problem.
I'm talking about this allocation:
pub fn add_dependency(&mut self, fence: ARef<DmaFence<i32>>) -> Result {
let dependency = Dependency::new(fence)?;
and this one:
pub fn submit_job(&self, mut job: Pin<KBox<Job<T>>>) -> Result<ARef<DmaFence<i32>>> {
[...]
let wrapped_job = JobWrap::new(job)?;
Replacing `dependencies` with a KVec and `waiting_jobs` with xarray
would cause these to use less memory, fewer allocations, and be faster.
And it would solve the raw pointer in HwFenceWaker, as you can use the
address (or other ID) of the Job<T> as key to look up in the xarray.
For `dependencies` there's no need to worry about the vector growing too
large, because the vectors are not long-lived: when the fence signalled
the entire vector can be freed. As for waiting_jobs, the xarray
auto-shrinks its memory usage, so no problem there.
Alice
^ permalink raw reply [flat|nested] 103+ messages in thread* Re: [RFC PATCH 3/4] rust/drm: Add DRM Jobqueue
2026-02-11 15:20 ` Philipp Stanner
2026-02-11 15:51 ` Boris Brezillon
2026-02-11 15:53 ` Alice Ryhl
@ 2026-02-11 15:54 ` Danilo Krummrich
2 siblings, 0 replies; 103+ messages in thread
From: Danilo Krummrich @ 2026-02-11 15:54 UTC (permalink / raw)
To: Philipp Stanner
Cc: phasta, Alice Ryhl, Boris Brezillon, Gary Guo, David Airlie,
Simona Vetter, Benno Lossin, Christian König, Daniel Almeida,
Joel Fernandes, linux-kernel, dri-devel, rust-for-linux
On Wed Feb 11, 2026 at 4:20 PM CET, Philipp Stanner wrote:
> On Wed, 2026-02-11 at 15:17 +0000, Alice Ryhl wrote:
>> On Wed, Feb 11, 2026 at 03:07:38PM +0100, Boris Brezillon wrote:
>> >
>> > Back to this particular job dependencies use case: we have to embed the
>> > DmaFence pointer in some wrapper with the ListLinks element anyway,
>> > because DmaFences can be inserted in multiple of those lists in
>> > parallel.
>>
>> Okay, if that's the case, then the linked list is *really* not the right
>> tool for the job.
>
> We have to distinguish what we are talking about here.
>
> For the JobQueue, it takes over a cloned DmaFence and stuffs that into
> its own list. Problem solved.
You mean it wraps the ARef<DmaFence> into a struct Dependency which has its own
ListLinks.
But this requires an additional allocation for every dependency, whereas with
xarray you can just store the pointer directly in the xarray with
ARef::into_raw().
I.e. it scales much better than lists for this use-case.
^ permalink raw reply [flat|nested] 103+ messages in thread
* Re: [RFC PATCH 3/4] rust/drm: Add DRM Jobqueue
2026-02-11 13:45 ` Gary Guo
2026-02-11 14:07 ` Boris Brezillon
@ 2026-02-11 15:33 ` Alice Ryhl
1 sibling, 0 replies; 103+ messages in thread
From: Alice Ryhl @ 2026-02-11 15:33 UTC (permalink / raw)
To: Gary Guo
Cc: phasta, Boris Brezillon, David Airlie, Simona Vetter,
Danilo Krummrich, Benno Lossin, Christian König,
Daniel Almeida, Joel Fernandes, linux-kernel, dri-devel,
rust-for-linux
On Wed, Feb 11, 2026 at 09:45:37PM +0800, Gary Guo wrote:
> On Wed Feb 11, 2026 at 8:22 PM CST, Alice Ryhl wrote:
> > On Wed, Feb 11, 2026 at 12:19:56PM +0100, Philipp Stanner wrote:
> >> On Wed, 2026-02-11 at 12:07 +0100, Boris Brezillon wrote:
> >> > On Wed, 11 Feb 2026 11:47:27 +0100
> >> > Philipp Stanner <phasta@mailbox.org> wrote:
> >> >
> >> > > On Tue, 2026-02-10 at 15:57 +0100, Boris Brezillon wrote:
> >> > > > On Tue, 3 Feb 2026 09:14:02 +0100
> >> > > > Philipp Stanner <phasta@kernel.org> wrote:
> >> > > >
> >> > > > > +/// A jobqueue Job.
> >> > > > > +///
> >> > > > > +/// You can stuff your data in it. The job will be borrowed back to your driver
> >> > > > > +/// once the time has come to run it.
> >> > > > > +///
> >> > > > > +/// Jobs are consumed by [`Jobqueue::submit_job`] by value (ownership transfer).
> >> > > > > +/// You can set multiple [`DmaFence`] as dependencies for a job. It will only
> >> > > > > +/// get run once all dependency fences have been signaled.
> >> > > > > +///
> >> > > > > +/// Jobs cost credits. Jobs will only be run if there are is enough capacity in
> >> > > > > +/// the jobqueue for the job's credits. It is legal to specify jobs costing 0
> >> > > > > +/// credits, effectively disabling that mechanism.
> >> > > > > +#[pin_data]
> >> > > > > +pub struct Job<T: 'static + Send> {
> >> > > > > + cost: u32,
> >> > > > > + #[pin]
> >> > > > > + pub data: T,
> >> > > > > + done_fence: Option<ARef<DmaFence<i32>>>,
> >> > > > > + hardware_fence: Option<ARef<DmaFence<i32>>>,
> >> > > > > + nr_of_deps: AtomicU32,
> >> > > > > + dependencies: List<Dependency>,
> >> > > >
> >> > > > Given how tricky Lists are in rust, I'd recommend going for an XArray,
> >> > > > like we have on the C side. There's a bit of overhead when the job only
> >> > > > has a few deps, but I think simplicity beats memory-usage-optimizations
> >> > > > in that case (especially since the overhead exists and is accepted in
> >> > > > C).
> >> > >
> >> > > I mean, the list is now already implemented and works. Considering the
> >> > > XArray would have made sense during the development difficulties.
> >> >
> >> > I'm sure it does, but that's still more code/tricks to maintain than
> >> > what you'd have with the XArray abstraction.
> >>
> >> The solution than will rather be to make the linked list implementation
> >> better.
> >>
> >> A list is the correct data structure in a huge number of use cases in
> >> the kernel. We should not begin here to defer to other structures
> >> because of convenience.
> >
> > Rust vs C aside, linked lists are often used in the kernel despite not
> > being the best choice. They are extremely cache unfriendly and
> > inefficient; most of the time a vector or xarray is far faster if you
> > can accept an ENOMEM failure path when adding elements. I have heard
> > several times from C maintainers that overuse of list is making the
> > kernel slow in a death from a thousand cuts situation.
>
> I would rather argue the other way, other than very hot paths where cache
> friendliness absolutely matters, if you do not require indexing access then the
> list is the correct data strucutre more often than not.
>
> Vector have the issue where resizing requires moving, so it cannot be used with
> pinned types. XArray doesn't require moving because it requires an indirection
> and thus an extra allocation, but this means that if you're just iterating over
> all elements it also does not benefit from cache locality. Using vectors also
> require careful management of capacity, which is a very common source of memory
> leak for long running programs in user space Rust.
XArray does benefit somewhat from cache locality compared to a linked
list because you know the address of element i+1 even if you have not yet
retrieved element i, which may enable prefetching to happen.
Alice
> Re: the ENOMEM failure path, I'd argue that even if you *can* accept a ENOMEM
> failure path, it is better to not have a failing path that is unnecessary.
>
> Best,
> Gary
>
> >
> > This applies to the red/black tree too, by the way.
> >
> > Alice
>
^ permalink raw reply [flat|nested] 103+ messages in thread
* [RFC PATCH 4/4] samples: rust: Add jobqueue tester
2026-02-03 8:13 [RFC PATCH 0/4] Add dma_fence abstractions and DRM Jobqueue Philipp Stanner
` (2 preceding siblings ...)
2026-02-03 8:14 ` [RFC PATCH 3/4] rust/drm: Add DRM Jobqueue Philipp Stanner
@ 2026-02-03 8:14 ` Philipp Stanner
2026-02-03 16:46 ` [RFC PATCH 0/4] Add dma_fence abstractions and DRM Jobqueue Daniel Almeida
4 siblings, 0 replies; 103+ messages in thread
From: Philipp Stanner @ 2026-02-03 8:14 UTC (permalink / raw)
To: David Airlie, Simona Vetter, Danilo Krummrich, Alice Ryhl,
Gary Guo, Benno Lossin, Christian König, Boris Brezillon,
Daniel Almeida, Joel Fernandes
Cc: linux-kernel, dri-devel, rust-for-linux, Philipp Stanner
The DRM Jobqueue is a new piece of (highly asynchronous) infrastructure
for submitting jobs to graphics processing units (GPUs).
It is difficult to test such a mechanism purely with unit tests. Thus,
provide this driver solely for testing drm::Jobqueue.
Signed-off-by: Philipp Stanner <phasta@kernel.org>
---
samples/rust/Kconfig | 11 ++
samples/rust/Makefile | 1 +
samples/rust/rust_jobqueue_tester.rs | 180 +++++++++++++++++++++++++++
3 files changed, 192 insertions(+)
create mode 100644 samples/rust/rust_jobqueue_tester.rs
diff --git a/samples/rust/Kconfig b/samples/rust/Kconfig
index c376eb899b7a..a9a3a671bb0b 100644
--- a/samples/rust/Kconfig
+++ b/samples/rust/Kconfig
@@ -145,4 +145,15 @@ config SAMPLE_RUST_HOSTPROGS
If unsure, say N.
+config SAMPLE_RUST_JOBQUEUE_TESTER
+ tristate "Jobqueue Tester"
+ select JOBQUEUE_TESTER
+ help
+ This option builds the Rust Jobqueue Tester.
+
+ To compile this as a module, choose M here:
+ the module will be called rust_jobqueue_tester.
+
+ If unsure, say N.
+
endif # SAMPLES_RUST
diff --git a/samples/rust/Makefile b/samples/rust/Makefile
index cf8422f8f219..9cc1f021dc39 100644
--- a/samples/rust/Makefile
+++ b/samples/rust/Makefile
@@ -13,6 +13,7 @@ obj-$(CONFIG_SAMPLE_RUST_DRIVER_USB) += rust_driver_usb.o
obj-$(CONFIG_SAMPLE_RUST_DRIVER_FAUX) += rust_driver_faux.o
obj-$(CONFIG_SAMPLE_RUST_DRIVER_AUXILIARY) += rust_driver_auxiliary.o
obj-$(CONFIG_SAMPLE_RUST_CONFIGFS) += rust_configfs.o
+obj-$(CONFIG_SAMPLE_RUST_JOBQUEUE_TESTER) += rust_jobqueue_tester.o
rust_print-y := rust_print_main.o rust_print_events.o
diff --git a/samples/rust/rust_jobqueue_tester.rs b/samples/rust/rust_jobqueue_tester.rs
new file mode 100644
index 000000000000..c2590a1b4f8a
--- /dev/null
+++ b/samples/rust/rust_jobqueue_tester.rs
@@ -0,0 +1,180 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Small example demonstrating how to use [`drm::Jobqueue`].
+
+use kernel::prelude::*;
+use kernel::sync::{DmaFenceCtx, DmaFence, Arc};
+use kernel::drm::jq::{Job, Jobqueue};
+use kernel::types::{ARef};
+use kernel::time::{delay::fsleep, Delta};
+use kernel::workqueue::{self, impl_has_work, new_work, Work, WorkItem};
+
+module! {
+ type: RustJobqueueTester,
+ name: "rust_jobqueue_tester",
+ authors: ["Philipp Stanner"],
+ description: "Rust minimal sample",
+ license: "GPL",
+}
+
+#[pin_data]
+struct GPUWorker {
+ hw_fence: ARef<DmaFence<i32>>,
+ #[pin]
+ work: Work<GPUWorker>,
+}
+
+impl GPUWorker {
+ fn new(
+ hw_fence: ARef<DmaFence<i32>>,
+ ) -> Result<Arc<Self>> {
+ Arc::pin_init(
+ pin_init!(Self {hw_fence, work <- new_work!("Jobqueue::GPUWorker")}),
+ GFP_KERNEL,
+ )
+ }
+}
+
+impl_has_work! {
+ impl HasWork<Self> for GPUWorker { self.work }
+}
+
+impl WorkItem for GPUWorker {
+ type Pointer = Arc<GPUWorker>;
+
+ fn run(this: Arc<GPUWorker>) {
+ fsleep(Delta::from_secs(1));
+ this.hw_fence.signal().unwrap();
+ }
+}
+
+fn run_job(job: &Pin<&mut Job<Arc<DmaFenceCtx>>>) -> ARef<DmaFence<i32>> {
+ let fence = job.data.as_arc_borrow().new_fence(42 as i32).unwrap();
+
+ let gpu_worker = GPUWorker::new(fence.clone()).unwrap();
+ let _ = workqueue::system().enqueue(gpu_worker);
+
+ fence
+}
+
+struct RustJobqueueTester { }
+
+impl kernel::Module for RustJobqueueTester {
+ fn init(_module: &'static ThisModule) -> Result<Self> {
+ pr_info!("Rust Jobqueue Tester (init)\n");
+ pr_info!("Am I built-in? {}\n", !cfg!(MODULE));
+
+ let dep_fctx = DmaFenceCtx::new()?;
+ let hw_fctx = DmaFenceCtx::new()?;
+ let jq = Jobqueue::new(1_000_000, run_job)?;
+
+
+ pr_info!("Test 1: Test submitting two jobs without dependencies.\n");
+ let job1 = Job::new(1, hw_fctx.clone())?;
+ let job2 = Job::new(1, hw_fctx.clone())?;
+
+ let fence1 = jq.submit_job(job1)?;
+ let fence2 = jq.submit_job(job2)?;
+
+ while !fence1.is_signaled() || !fence2.is_signaled() {
+ fsleep(Delta::from_secs(2));
+ }
+ pr_info!("Test 1 succeeded.\n");
+
+
+ pr_info!("Test 2: Test submitting a job with already-fullfilled dependency.\n");
+ let mut job3 = Job::new(1, hw_fctx.clone())?;
+ job3.add_dependency(fence1)?;
+
+ let fence3 = jq.submit_job(job3)?;
+ fsleep(Delta::from_secs(2));
+ if !fence3.is_signaled() {
+ pr_info!("Test 2 failed.\n");
+ return Err(EAGAIN);
+ }
+ pr_info!("Test 2 succeeded.\n");
+
+
+ pr_info!("Test 3: Test that a job with unfullfilled dependency gets never run.\n");
+ let unsignaled_fence = dep_fctx.as_arc_borrow().new_fence(9001 as i32)?;
+
+ let mut job4 = Job::new(1, hw_fctx.clone())?;
+ job4.add_dependency(unsignaled_fence.clone())?;
+
+ let blocked_job_fence = jq.submit_job(job4)?;
+ fsleep(Delta::from_secs(2));
+ if blocked_job_fence.is_signaled() {
+ pr_info!("Test 3 failed.\n");
+ return Err(EAGAIN);
+ }
+ pr_info!("Test 3 succeeded.\n");
+
+
+ pr_info!("Test 4: Test whether Test 3's blocked job can be unblocked.\n");
+ unsignaled_fence.signal()?;
+ while !blocked_job_fence.is_signaled() {
+ fsleep(Delta::from_secs(2));
+ }
+ pr_info!("Test 4 succeeded.\n");
+
+
+ pr_info!("Test 5: Submit a bunch of unblocked jobs, then a blocked one, then an unblocked one.\n");
+ let job1 = Job::new(1, hw_fctx.clone())?;
+ let job2 = Job::new(1, hw_fctx.clone())?;
+ let mut job3 = Job::new(1, hw_fctx.clone())?;
+ let job4 = Job::new(1, hw_fctx.clone())?;
+ let job5 = Job::new(1, hw_fctx.clone())?;
+
+ let unsignaled_fence1 = dep_fctx.as_arc_borrow().new_fence(9001 as i32)?;
+ let unsignaled_fence2 = dep_fctx.as_arc_borrow().new_fence(9001 as i32)?;
+ let unsignaled_fence3 = dep_fctx.as_arc_borrow().new_fence(9001 as i32)?;
+ job3.add_dependency(unsignaled_fence1.clone())?;
+ job3.add_dependency(unsignaled_fence2.clone())?;
+ job3.add_dependency(unsignaled_fence3.clone())?;
+
+ let fence1 = jq.submit_job(job1)?;
+ let fence2 = jq.submit_job(job2)?;
+ let fence3 = jq.submit_job(job3)?;
+
+ fsleep(Delta::from_secs(2));
+ if fence3.is_signaled() || !fence1.is_signaled() || !fence2.is_signaled() {
+ pr_info!("Test 5 failed.\n");
+ return Err(EAGAIN);
+ }
+
+ unsignaled_fence1.signal()?;
+ unsignaled_fence3.signal()?;
+ fsleep(Delta::from_secs(2));
+ if fence3.is_signaled() {
+ pr_info!("Test 5 failed.\n");
+ return Err(EAGAIN);
+ }
+
+ unsignaled_fence2.signal()?;
+ fsleep(Delta::from_secs(2));
+ if !fence3.is_signaled() {
+ pr_info!("Test 5 failed.\n");
+ return Err(EAGAIN);
+ }
+
+ let fence4 = jq.submit_job(job4)?;
+ let fence5 = jq.submit_job(job5)?;
+
+ fsleep(Delta::from_secs(2));
+
+ if !fence4.is_signaled() || !fence5.is_signaled() {
+ pr_info!("Test 5 failed.\n");
+ return Err(EAGAIN);
+ }
+ pr_info!("Test 5 succeeded.\n");
+
+
+ Ok(RustJobqueueTester { })
+ }
+}
+
+impl Drop for RustJobqueueTester {
+ fn drop(&mut self) {
+ pr_info!("Rust Jobqueue Tester (exit)\n");
+ }
+}
--
2.49.0
^ permalink raw reply related [flat|nested] 103+ messages in thread* Re: [RFC PATCH 0/4] Add dma_fence abstractions and DRM Jobqueue
2026-02-03 8:13 [RFC PATCH 0/4] Add dma_fence abstractions and DRM Jobqueue Philipp Stanner
` (3 preceding siblings ...)
2026-02-03 8:14 ` [RFC PATCH 4/4] samples: rust: Add jobqueue tester Philipp Stanner
@ 2026-02-03 16:46 ` Daniel Almeida
4 siblings, 0 replies; 103+ messages in thread
From: Daniel Almeida @ 2026-02-03 16:46 UTC (permalink / raw)
To: Philipp Stanner
Cc: David Airlie, Simona Vetter, Danilo Krummrich, Alice Ryhl,
Gary Guo, Benno Lossin, Christian König, Boris Brezillon,
Joel Fernandes, linux-kernel, dri-devel, rust-for-linux
Hi Phillipp,
Fantastic, thanks a lot for this work! I will take some couple of weeks to test
that on the Tyr prototype and report back.
— Daniel
^ permalink raw reply [flat|nested] 103+ messages in thread