linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] Fix race condition in Devres
@ 2025-06-03 20:48 Danilo Krummrich
  2025-06-03 20:48 ` [PATCH 1/3] rust: completion: implement initial abstraction Danilo Krummrich
                   ` (3 more replies)
  0 siblings, 4 replies; 28+ messages in thread
From: Danilo Krummrich @ 2025-06-03 20:48 UTC (permalink / raw)
  To: gregkh, rafael, ojeda, alex.gaynor, boqun.feng, gary, bjorn3_gh,
	benno.lossin, a.hindborg, aliceryhl, tmgross, chrisi.schrefl
  Cc: rust-for-linux, linux-kernel, Danilo Krummrich

This patch series fixes a race condition in Devres.

Please see patch 3 "rust: devres: fix race in Devres::drop()" for a detailed
description of the race and how it is fixed.

None of the upstream users of Devres is prone to this race, hence we do not
necessarily have to backport this -- yet, it must be fixed.

Thanks to Alice for catching and reporting this!

A branch containing those patches can be found in [1].

[1] https://git.kernel.org/pub/scm/linux/kernel/git/dakr/linux.git/log/?h=rust/devres-race

Danilo Krummrich (3):
  rust: completion: implement initial abstraction
  rust: revocable: indicate whether `data` has been revoked already
  rust: devres: fix race in Devres::drop()

 rust/bindings/bindings_helper.h |   1 +
 rust/helpers/completion.c       |   8 +++
 rust/helpers/helpers.c          |   1 +
 rust/kernel/devres.rs           |  33 ++++++++--
 rust/kernel/revocable.rs        |  18 ++++--
 rust/kernel/sync.rs             |   2 +
 rust/kernel/sync/completion.rs  | 111 ++++++++++++++++++++++++++++++++
 7 files changed, 163 insertions(+), 11 deletions(-)
 create mode 100644 rust/helpers/completion.c
 create mode 100644 rust/kernel/sync/completion.rs


base-commit: cd2e103d57e5615f9bb027d772f93b9efd567224
-- 
2.49.0


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/3] rust: completion: implement initial abstraction
  2025-06-03 20:48 [PATCH 0/3] Fix race condition in Devres Danilo Krummrich
@ 2025-06-03 20:48 ` Danilo Krummrich
  2025-06-06  9:00   ` Alice Ryhl
                     ` (3 more replies)
  2025-06-03 20:48 ` [PATCH 2/3] rust: revocable: indicate whether `data` has been revoked already Danilo Krummrich
                   ` (2 subsequent siblings)
  3 siblings, 4 replies; 28+ messages in thread
From: Danilo Krummrich @ 2025-06-03 20:48 UTC (permalink / raw)
  To: gregkh, rafael, ojeda, alex.gaynor, boqun.feng, gary, bjorn3_gh,
	benno.lossin, a.hindborg, aliceryhl, tmgross, chrisi.schrefl
  Cc: rust-for-linux, linux-kernel, Danilo Krummrich, Ingo Molnar,
	Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider

Implement a minimal abstraction for the completion synchronization
primitive.

This initial abstraction only adds complete_all() and
wait_for_completion(), since that is what is required for the subsequent
Devres patch.

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Valentin Schneider <vschneid@redhat.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
---
 rust/bindings/bindings_helper.h |   1 +
 rust/helpers/completion.c       |   8 +++
 rust/helpers/helpers.c          |   1 +
 rust/kernel/sync.rs             |   2 +
 rust/kernel/sync/completion.rs  | 111 ++++++++++++++++++++++++++++++++
 5 files changed, 123 insertions(+)
 create mode 100644 rust/helpers/completion.c
 create mode 100644 rust/kernel/sync/completion.rs

diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
index a5a6fb45d405..9da3fe89295c 100644
--- a/rust/bindings/bindings_helper.h
+++ b/rust/bindings/bindings_helper.h
@@ -17,6 +17,7 @@
 #include <linux/blk_types.h>
 #include <linux/blkdev.h>
 #include <linux/clk.h>
+#include <linux/completion.h>
 #include <linux/configfs.h>
 #include <linux/cpu.h>
 #include <linux/cpufreq.h>
diff --git a/rust/helpers/completion.c b/rust/helpers/completion.c
new file mode 100644
index 000000000000..b2443262a2ae
--- /dev/null
+++ b/rust/helpers/completion.c
@@ -0,0 +1,8 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/completion.h>
+
+void rust_helper_init_completion(struct completion *x)
+{
+	init_completion(x);
+}
diff --git a/rust/helpers/helpers.c b/rust/helpers/helpers.c
index 805307018f0e..7a5c520be8cb 100644
--- a/rust/helpers/helpers.c
+++ b/rust/helpers/helpers.c
@@ -13,6 +13,7 @@
 #include "build_assert.c"
 #include "build_bug.c"
 #include "clk.c"
+#include "completion.c"
 #include "cpufreq.c"
 #include "cpumask.c"
 #include "cred.c"
diff --git a/rust/kernel/sync.rs b/rust/kernel/sync.rs
index 36a719015583..c23a12639924 100644
--- a/rust/kernel/sync.rs
+++ b/rust/kernel/sync.rs
@@ -10,6 +10,7 @@
 use pin_init;
 
 mod arc;
+pub mod completion;
 mod condvar;
 pub mod lock;
 mod locked_by;
@@ -17,6 +18,7 @@
 pub mod rcu;
 
 pub use arc::{Arc, ArcBorrow, UniqueArc};
+pub use completion::Completion;
 pub use condvar::{new_condvar, CondVar, CondVarTimeoutResult};
 pub use lock::global::{global_lock, GlobalGuard, GlobalLock, GlobalLockBackend, GlobalLockedBy};
 pub use lock::mutex::{new_mutex, Mutex, MutexGuard};
diff --git a/rust/kernel/sync/completion.rs b/rust/kernel/sync/completion.rs
new file mode 100644
index 000000000000..4ec4c2aa73a5
--- /dev/null
+++ b/rust/kernel/sync/completion.rs
@@ -0,0 +1,111 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Completion support.
+//!
+//! Reference: <https://docs.kernel.org/scheduler/completion.html>
+//!
+//! C header: [`include/linux/completion.h`](srctree/include/linux/completion.h)
+
+use crate::{bindings, prelude::*, types::Opaque};
+
+/// Synchronization primitive to signal when a certain task has been completed.
+///
+/// The [`Completion`] synchronization primitive signales when a certain task has been completed by
+/// waking up other tasks that can queue themselves up to wait for the [`Completion`] to be
+/// completed.
+///
+/// # Examples
+///
+/// ```
+/// use kernel::sync::{Arc, Completion};
+/// use kernel::workqueue::{self, impl_has_work, new_work, Work, WorkItem};
+///
+/// #[pin_data]
+/// struct MyTask {
+///     #[pin]
+///     work: Work<MyTask>,
+///     #[pin]
+///     done: Completion,
+/// }
+///
+/// impl_has_work! {
+///     impl HasWork<Self> for MyTask { self.work }
+/// }
+///
+/// impl MyTask {
+///     fn new() -> Result<Arc<Self>> {
+///         let this = Arc::pin_init(pin_init!(MyTask {
+///             work <- new_work!("MyTask::work"),
+///             done <- Completion::new(),
+///         }), GFP_KERNEL)?;
+///
+///         let _ = workqueue::system().enqueue(this.clone());
+///
+///         Ok(this)
+///     }
+///
+///     fn wait_for_completion(&self) {
+///         self.done.wait_for_completion();
+///
+///         pr_info!("Completion: task complete\n");
+///     }
+/// }
+///
+/// impl WorkItem for MyTask {
+///     type Pointer = Arc<MyTask>;
+///
+///     fn run(this: Arc<MyTask>) {
+///         // process this task
+///         this.done.complete_all();
+///     }
+/// }
+///
+/// let task = MyTask::new()?;
+/// task.wait_for_completion();
+/// # Ok::<(), Error>(())
+/// ```
+#[pin_data]
+pub struct Completion {
+    #[pin]
+    inner: Opaque<bindings::completion>,
+}
+
+impl Completion {
+    /// Create an initializer for a new [`Completion`].
+    pub fn new() -> impl PinInit<Self> {
+        pin_init!(Self {
+            inner <- Opaque::ffi_init(|slot: *mut bindings::completion| {
+                // SAFETY: `slot` is a valid pointer to an uninitialized `struct completion`.
+                unsafe { bindings::init_completion(slot) };
+            }),
+        })
+    }
+
+    fn as_raw(&self) -> *mut bindings::completion {
+        self.inner.get()
+    }
+
+    /// Signal all tasks waiting on this completion.
+    ///
+    /// This method wakes up all tasks waiting on this completion; after this operation the
+    /// completion is permanently done.
+    pub fn complete_all(&self) {
+        // SAFETY: `self.as_raw()` is a pointer to a valid `struct completion`.
+        unsafe { bindings::complete_all(self.as_raw()) };
+    }
+
+    /// Wait for completion of a task.
+    ///
+    /// This method waits for the completion of a task; it is not interruptible and there is no
+    /// timeout.
+    pub fn wait_for_completion(&self) {
+        // SAFETY: `self.as_raw()` is a pointer to a valid `struct completion`.
+        unsafe { bindings::wait_for_completion(self.as_raw()) };
+    }
+}
+
+// SAFETY: `Completion` is safe to be send to any task.
+unsafe impl Send for Completion {}
+
+// SAFETY: `Completion` is safe to be accessed concurrently.
+unsafe impl Sync for Completion {}
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 2/3] rust: revocable: indicate whether `data` has been revoked already
  2025-06-03 20:48 [PATCH 0/3] Fix race condition in Devres Danilo Krummrich
  2025-06-03 20:48 ` [PATCH 1/3] rust: completion: implement initial abstraction Danilo Krummrich
@ 2025-06-03 20:48 ` Danilo Krummrich
  2025-06-12  7:59   ` Benno Lossin
  2025-06-03 20:48 ` [PATCH 3/3] rust: devres: fix race in Devres::drop() Danilo Krummrich
  2025-06-04 12:36 ` [PATCH 0/3] Fix race condition in Devres Miguel Ojeda
  3 siblings, 1 reply; 28+ messages in thread
From: Danilo Krummrich @ 2025-06-03 20:48 UTC (permalink / raw)
  To: gregkh, rafael, ojeda, alex.gaynor, boqun.feng, gary, bjorn3_gh,
	benno.lossin, a.hindborg, aliceryhl, tmgross, chrisi.schrefl
  Cc: rust-for-linux, linux-kernel, Danilo Krummrich

Return a boolean from Revocable::revoke() and Revocable::revoke_nosync()
to indicate whether the data has been revoked already.

Return true if the data hasn't been revoked yet (i.e. this call revoked
the data), false otherwise.

Signed-off-by: Danilo Krummrich <dakr@kernel.org>
---
 rust/kernel/revocable.rs | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/rust/kernel/revocable.rs b/rust/kernel/revocable.rs
index db4aa46bb121..06a3cdfce344 100644
--- a/rust/kernel/revocable.rs
+++ b/rust/kernel/revocable.rs
@@ -154,8 +154,10 @@ pub unsafe fn access(&self) -> &T {
     /// # Safety
     ///
     /// Callers must ensure that there are no more concurrent users of the revocable object.
-    unsafe fn revoke_internal<const SYNC: bool>(&self) {
-        if self.is_available.swap(false, Ordering::Relaxed) {
+    unsafe fn revoke_internal<const SYNC: bool>(&self) -> bool {
+        let revoke = self.is_available.swap(false, Ordering::Relaxed);
+
+        if revoke {
             if SYNC {
                 // SAFETY: Just an FFI call, there are no further requirements.
                 unsafe { bindings::synchronize_rcu() };
@@ -165,6 +167,8 @@ unsafe fn revoke_internal<const SYNC: bool>(&self) {
             // `compare_exchange` above that takes `is_available` from `true` to `false`.
             unsafe { drop_in_place(self.data.get()) };
         }
+
+        revoke
     }
 
     /// Revokes access to and drops the wrapped object.
@@ -172,10 +176,13 @@ unsafe fn revoke_internal<const SYNC: bool>(&self) {
     /// Access to the object is revoked immediately to new callers of [`Revocable::try_access`],
     /// expecting that there are no concurrent users of the object.
     ///
+    /// Returns `true` if `&self` has been revoked with this call, `false` if it was revoked
+    /// already.
+    ///
     /// # Safety
     ///
     /// Callers must ensure that there are no more concurrent users of the revocable object.
-    pub unsafe fn revoke_nosync(&self) {
+    pub unsafe fn revoke_nosync(&self) -> bool {
         // SAFETY: By the safety requirement of this function, the caller ensures that nobody is
         // accessing the data anymore and hence we don't have to wait for the grace period to
         // finish.
@@ -189,7 +196,10 @@ pub unsafe fn revoke_nosync(&self) {
     /// If there are concurrent users of the object (i.e., ones that called
     /// [`Revocable::try_access`] beforehand and still haven't dropped the returned guard), this
     /// function waits for the concurrent access to complete before dropping the wrapped object.
-    pub fn revoke(&self) {
+    ///
+    /// Returns `true` if `&self` has been revoked with this call, `false` if it was revoked
+    /// already.
+    pub fn revoke(&self) -> bool {
         // SAFETY: By passing `true` we ask `revoke_internal` to wait for the grace period to
         // finish.
         unsafe { self.revoke_internal::<true>() }
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 3/3] rust: devres: fix race in Devres::drop()
  2025-06-03 20:48 [PATCH 0/3] Fix race condition in Devres Danilo Krummrich
  2025-06-03 20:48 ` [PATCH 1/3] rust: completion: implement initial abstraction Danilo Krummrich
  2025-06-03 20:48 ` [PATCH 2/3] rust: revocable: indicate whether `data` has been revoked already Danilo Krummrich
@ 2025-06-03 20:48 ` Danilo Krummrich
  2025-06-03 23:26   ` Boqun Feng
  2025-06-12  8:13   ` Benno Lossin
  2025-06-04 12:36 ` [PATCH 0/3] Fix race condition in Devres Miguel Ojeda
  3 siblings, 2 replies; 28+ messages in thread
From: Danilo Krummrich @ 2025-06-03 20:48 UTC (permalink / raw)
  To: gregkh, rafael, ojeda, alex.gaynor, boqun.feng, gary, bjorn3_gh,
	benno.lossin, a.hindborg, aliceryhl, tmgross, chrisi.schrefl
  Cc: rust-for-linux, linux-kernel, Danilo Krummrich

In Devres::drop() we first remove the devres action and then drop the
wrapped device resource.

The design goal is to give the owner of a Devres object control over when
the device resource is dropped, but limit the overall scope to the
corresponding device being bound to a driver.

However, there's a race that was introduced with commit 8ff656643d30
("rust: devres: remove action in `Devres::drop`"), but also has been
(partially) present from the initial version on.

In Devres::drop(), the devres action is removed successfully and
subsequently the destructor of the wrapped device resource runs.
However, there is no guarantee that the destructor of the wrapped device
resource completes before the driver core is done unbinding the
corresponding device.

If in Devres::drop(), the devres action can't be removed, it means that
the devres callback has been executed already, or is still running
concurrently. In case of the latter, either Devres::drop() wins revoking
the Revocable or the devres callback wins revoking the Revocable. If
Devres::drop() wins, we (again) have no guarantee that the destructor of
the wrapped device resource completes before the driver core is done
unbinding the corresponding device.

Depending on the specific device resource, this can potentially lead to
user-after-free bugs.

In order to fix this, implement the following logic.

In the devres callback, we're always good when we get to revoke the
device resource ourselves, i.e. Revocable::revoke() returns true.

If Revocable::revoke() returns false, it means that Devres::drop(),
concurrently, already drops the device resource and we have to wait for
Devres::drop() to signal that it finished dropping the device resource.

Note that if we hit the case where we need to wait for the completion of
Devres::drop() in the devres callback, it means that we're actually
racing with a concurrent Devres::drop() call, which already started
revoking the device resource for us. This is rather unlikely and means
that the concurrent Devres::drop() already started doing our work and we
just need to wait for it to complete it for us. Hence, there should not
be any additional overhead from that.

(Actually, for now it's even better if Devres::drop() does the work for
us, since it can bypass the synchronize_rcu() call implied by
Revocable::revoke(), but this goes away anyways once I get to implement
the split devres callback approach, which allows us to first flip the
atomics of all registered Devres objects of a certain device, execute a
single synchronize_rcu() and then drop all revocable objects.)

In Devres::drop() we try to revoke the device resource. If that is *not*
successful, it means that the devres callback already did and we're good.

Otherwise, we try to remove the devres action, which, if successful,
means that we're good, since the device resource has just been revoked
by us *before* we removed the devres action successfully.

If the devres action could not be removed, it means that the devres
callback must be running concurrently, hence we signal that the device
resource has been revoked by us, using the completion.

This makes it safe to drop a Devres object from any task and at any point
of time, which is one of the design goals.

Fixes: 8ff656643d30 ("rust: devres: remove action in `Devres::drop`") [1]
Reported-by: Alice Ryhl <aliceryhl@google.com>
Closes: https://lore.kernel.org/lkml/aD64YNuqbPPZHAa5@google.com/
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
---
 rust/kernel/devres.rs | 33 ++++++++++++++++++++++++++-------
 1 file changed, 26 insertions(+), 7 deletions(-)

diff --git a/rust/kernel/devres.rs b/rust/kernel/devres.rs
index 0f79a2ec9474..dedb39d83cbe 100644
--- a/rust/kernel/devres.rs
+++ b/rust/kernel/devres.rs
@@ -13,7 +13,7 @@
     ffi::c_void,
     prelude::*,
     revocable::Revocable,
-    sync::Arc,
+    sync::{Arc, Completion},
     types::ARef,
 };
 
@@ -25,6 +25,8 @@ struct DevresInner<T> {
     callback: unsafe extern "C" fn(*mut c_void),
     #[pin]
     data: Revocable<T>,
+    #[pin]
+    revoke: Completion,
 }
 
 /// This abstraction is meant to be used by subsystems to containerize [`Device`] bound resources to
@@ -102,6 +104,7 @@ fn new(dev: &Device<Bound>, data: T, flags: Flags) -> Result<Arc<DevresInner<T>>
                 dev: dev.into(),
                 callback: Self::devres_callback,
                 data <- Revocable::new(data),
+                revoke <- Completion::new(),
             }),
             flags,
         )?;
@@ -130,26 +133,28 @@ fn as_ptr(&self) -> *const Self {
         self as _
     }
 
-    fn remove_action(this: &Arc<Self>) {
+    fn remove_action(this: &Arc<Self>) -> bool {
         // SAFETY:
         // - `self.inner.dev` is a valid `Device`,
         // - the `action` and `data` pointers are the exact same ones as given to devm_add_action()
         //   previously,
         // - `self` is always valid, even if the action has been released already.
-        let ret = unsafe {
+        let success = unsafe {
             bindings::devm_remove_action_nowarn(
                 this.dev.as_raw(),
                 Some(this.callback),
                 this.as_ptr() as _,
             )
-        };
+        } == 0;
 
-        if ret == 0 {
+        if success {
             // SAFETY: We leaked an `Arc` reference to devm_add_action() in `DevresInner::new`; if
             // devm_remove_action_nowarn() was successful we can (and have to) claim back ownership
             // of this reference.
             let _ = unsafe { Arc::from_raw(this.as_ptr()) };
         }
+
+        success
     }
 
     #[allow(clippy::missing_safety_doc)]
@@ -161,7 +166,12 @@ fn remove_action(this: &Arc<Self>) {
         //         `DevresInner::new`.
         let inner = unsafe { Arc::from_raw(ptr) };
 
-        inner.data.revoke();
+        if !inner.data.revoke() {
+            // If `revoke()` returns false, it means that `Devres::drop` already started revoking
+            // `inner.data` for us. Hence we have to wait until `Devres::drop()` signals that it
+            // completed revoking `inner.data`.
+            inner.revoke.wait_for_completion();
+        }
     }
 }
 
@@ -232,6 +242,15 @@ fn deref(&self) -> &Self::Target {
 
 impl<T> Drop for Devres<T> {
     fn drop(&mut self) {
-        DevresInner::remove_action(&self.0);
+        // SAFETY: When `drop` runs, it is guaranteed that nobody is accessing the revocable data
+        // anymore, hence it is safe not to wait for the grace period to finish.
+        if unsafe { self.revoke_nosync() } {
+            // We revoked `self.0.data` before the devres action did, hence try to remove it.
+            if !DevresInner::remove_action(&self.0) {
+                // We could not remove the devres action, which means that it now runs concurrently,
+                // hence signal that `self.0.data` has been revoked successfully.
+                self.0.revoke.complete_all();
+            }
+        }
     }
 }
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/3] rust: devres: fix race in Devres::drop()
  2025-06-03 20:48 ` [PATCH 3/3] rust: devres: fix race in Devres::drop() Danilo Krummrich
@ 2025-06-03 23:26   ` Boqun Feng
  2025-06-04  9:49     ` Danilo Krummrich
  2025-06-12  8:13   ` Benno Lossin
  1 sibling, 1 reply; 28+ messages in thread
From: Boqun Feng @ 2025-06-03 23:26 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: gregkh, rafael, ojeda, alex.gaynor, gary, bjorn3_gh, benno.lossin,
	a.hindborg, aliceryhl, tmgross, chrisi.schrefl, rust-for-linux,
	linux-kernel

On Tue, Jun 03, 2025 at 10:48:52PM +0200, Danilo Krummrich wrote:
> In Devres::drop() we first remove the devres action and then drop the
> wrapped device resource.
> 
> The design goal is to give the owner of a Devres object control over when
> the device resource is dropped, but limit the overall scope to the
> corresponding device being bound to a driver.
> 
> However, there's a race that was introduced with commit 8ff656643d30
> ("rust: devres: remove action in `Devres::drop`"), but also has been
> (partially) present from the initial version on.
> 
> In Devres::drop(), the devres action is removed successfully and
> subsequently the destructor of the wrapped device resource runs.
> However, there is no guarantee that the destructor of the wrapped device
> resource completes before the driver core is done unbinding the
> corresponding device.
> 
> If in Devres::drop(), the devres action can't be removed, it means that
> the devres callback has been executed already, or is still running
> concurrently. In case of the latter, either Devres::drop() wins revoking
> the Revocable or the devres callback wins revoking the Revocable. If
> Devres::drop() wins, we (again) have no guarantee that the destructor of
> the wrapped device resource completes before the driver core is done
> unbinding the corresponding device.
> 
> Depending on the specific device resource, this can potentially lead to
> user-after-free bugs.
> 

This all sounds reasonable, one question though: it seems to me the
problem exists only for the device resources that expect the device
being bounded, so hypothetically if the device resources can be
programmed against unbound devices, then the current behavior should be
fine? For example, in your case, you want free_irq() to happen before
the device becomes unbound, which is of course reasonable, but it sounds
more like a design choice (or what device model we want to use), because
hypothetically you can program an irq that still works even if the
device is unbound, no?

Again this sounds reasonable to me, just want to check my understanding
here.

Regards,
Boqun

> In order to fix this, implement the following logic.
> 
> In the devres callback, we're always good when we get to revoke the
> device resource ourselves, i.e. Revocable::revoke() returns true.
> 
> If Revocable::revoke() returns false, it means that Devres::drop(),
> concurrently, already drops the device resource and we have to wait for
> Devres::drop() to signal that it finished dropping the device resource.
> 
> Note that if we hit the case where we need to wait for the completion of
> Devres::drop() in the devres callback, it means that we're actually
> racing with a concurrent Devres::drop() call, which already started
> revoking the device resource for us. This is rather unlikely and means
> that the concurrent Devres::drop() already started doing our work and we
> just need to wait for it to complete it for us. Hence, there should not
> be any additional overhead from that.
> 
> (Actually, for now it's even better if Devres::drop() does the work for
> us, since it can bypass the synchronize_rcu() call implied by
> Revocable::revoke(), but this goes away anyways once I get to implement
> the split devres callback approach, which allows us to first flip the
> atomics of all registered Devres objects of a certain device, execute a
> single synchronize_rcu() and then drop all revocable objects.)
> 
> In Devres::drop() we try to revoke the device resource. If that is *not*
> successful, it means that the devres callback already did and we're good.
> 
> Otherwise, we try to remove the devres action, which, if successful,
> means that we're good, since the device resource has just been revoked
> by us *before* we removed the devres action successfully.
> 
> If the devres action could not be removed, it means that the devres
> callback must be running concurrently, hence we signal that the device
> resource has been revoked by us, using the completion.
> 
> This makes it safe to drop a Devres object from any task and at any point
> of time, which is one of the design goals.
> 
[...]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/3] rust: devres: fix race in Devres::drop()
  2025-06-03 23:26   ` Boqun Feng
@ 2025-06-04  9:49     ` Danilo Krummrich
  2025-06-12 15:24       ` Boqun Feng
  0 siblings, 1 reply; 28+ messages in thread
From: Danilo Krummrich @ 2025-06-04  9:49 UTC (permalink / raw)
  To: Boqun Feng
  Cc: gregkh, rafael, ojeda, alex.gaynor, gary, bjorn3_gh, benno.lossin,
	a.hindborg, aliceryhl, tmgross, chrisi.schrefl, rust-for-linux,
	linux-kernel

On Tue, Jun 03, 2025 at 04:26:01PM -0700, Boqun Feng wrote:
> On Tue, Jun 03, 2025 at 10:48:52PM +0200, Danilo Krummrich wrote:
> > In Devres::drop() we first remove the devres action and then drop the
> > wrapped device resource.
> > 
> > The design goal is to give the owner of a Devres object control over when
> > the device resource is dropped, but limit the overall scope to the
> > corresponding device being bound to a driver.
> > 
> > However, there's a race that was introduced with commit 8ff656643d30
> > ("rust: devres: remove action in `Devres::drop`"), but also has been
> > (partially) present from the initial version on.
> > 
> > In Devres::drop(), the devres action is removed successfully and
> > subsequently the destructor of the wrapped device resource runs.
> > However, there is no guarantee that the destructor of the wrapped device
> > resource completes before the driver core is done unbinding the
> > corresponding device.
> > 
> > If in Devres::drop(), the devres action can't be removed, it means that
> > the devres callback has been executed already, or is still running
> > concurrently. In case of the latter, either Devres::drop() wins revoking
> > the Revocable or the devres callback wins revoking the Revocable. If
> > Devres::drop() wins, we (again) have no guarantee that the destructor of
> > the wrapped device resource completes before the driver core is done
> > unbinding the corresponding device.
> > 
> > Depending on the specific device resource, this can potentially lead to
> > user-after-free bugs.
> > 
> 
> This all sounds reasonable, one question though: it seems to me the
> problem exists only for the device resources that expect the device
> being bounded, so hypothetically if the device resources can be
> programmed against unbound devices, then the current behavior should be
> fine?

I don't think that such device resources exist from a semantical point of view.

We always have to guarantee that a driver released the device resources once the
corresponding device is unbound from the driver.

However, there certainly are differences between how fatal it is if we don't do
so.

Complementing your example below, if we for instance fail to release a memory
region in time, a subsequent driver probing the device may fail requesting the
corresponding region.

> For example, in your case, you want free_irq() to happen before
> the device becomes unbound, which is of course reasonable, but it sounds
> more like a design choice (or what device model we want to use), because
> hypothetically you can program an irq that still works even if the
> device is unbound, no?

You can, just like for every other registration (e.g. class devices, such as
misc device), but it's sub-optimal, since then we could not treat the
registering device of the registration as &Device<Bound>, which allows direct
access to device resources with Devres::access(). Please see also [1] and [2].

We have two (safe and correct) ways to access device resources, one is the
runtime checked access through Revocable::try_access() (which implies the RCU
read-side critical section and atomic check); the other one is the compile-time
checked access through providing a &Device<Bound> as cookie for directy access
without runtime overhead.

Wherever possible, we want to enable the latter, which means that registrations
need to be properly guarded.

[1] https://lore.kernel.org/lkml/20250530142447.166524-6-dakr@kernel.org/
[2] https://lore.kernel.org/lkml/20250530142447.166524-7-dakr@kernel.org/

> Again this sounds reasonable to me, just want to check my understanding
> here.
> 
> Regards,
> Boqun
> 
> > In order to fix this, implement the following logic.
> > 
> > In the devres callback, we're always good when we get to revoke the
> > device resource ourselves, i.e. Revocable::revoke() returns true.
> > 
> > If Revocable::revoke() returns false, it means that Devres::drop(),
> > concurrently, already drops the device resource and we have to wait for
> > Devres::drop() to signal that it finished dropping the device resource.
> > 
> > Note that if we hit the case where we need to wait for the completion of
> > Devres::drop() in the devres callback, it means that we're actually
> > racing with a concurrent Devres::drop() call, which already started
> > revoking the device resource for us. This is rather unlikely and means
> > that the concurrent Devres::drop() already started doing our work and we
> > just need to wait for it to complete it for us. Hence, there should not
> > be any additional overhead from that.
> > 
> > (Actually, for now it's even better if Devres::drop() does the work for
> > us, since it can bypass the synchronize_rcu() call implied by
> > Revocable::revoke(), but this goes away anyways once I get to implement
> > the split devres callback approach, which allows us to first flip the
> > atomics of all registered Devres objects of a certain device, execute a
> > single synchronize_rcu() and then drop all revocable objects.)
> > 
> > In Devres::drop() we try to revoke the device resource. If that is *not*
> > successful, it means that the devres callback already did and we're good.
> > 
> > Otherwise, we try to remove the devres action, which, if successful,
> > means that we're good, since the device resource has just been revoked
> > by us *before* we removed the devres action successfully.
> > 
> > If the devres action could not be removed, it means that the devres
> > callback must be running concurrently, hence we signal that the device
> > resource has been revoked by us, using the completion.
> > 
> > This makes it safe to drop a Devres object from any task and at any point
> > of time, which is one of the design goals.
> > 
> [...]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/3] Fix race condition in Devres
  2025-06-03 20:48 [PATCH 0/3] Fix race condition in Devres Danilo Krummrich
                   ` (2 preceding siblings ...)
  2025-06-03 20:48 ` [PATCH 3/3] rust: devres: fix race in Devres::drop() Danilo Krummrich
@ 2025-06-04 12:36 ` Miguel Ojeda
  3 siblings, 0 replies; 28+ messages in thread
From: Miguel Ojeda @ 2025-06-04 12:36 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: gregkh, rafael, ojeda, alex.gaynor, boqun.feng, gary, bjorn3_gh,
	benno.lossin, a.hindborg, aliceryhl, tmgross, chrisi.schrefl,
	rust-for-linux, linux-kernel

On Tue, Jun 3, 2025 at 10:54 PM Danilo Krummrich <dakr@kernel.org> wrote:
>
> None of the upstream users of Devres is prone to this race, hence we do not
> necessarily have to backport this -- yet, it must be fixed.

We generally backport even soundness fixes when they do not affect
actual upstream code, so we should probably try.

On the other hand, v6.14.y may go out of support soon (a few weeks I
assume) and this does not affect LTSs.

Cheers,
Miguel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/3] rust: completion: implement initial abstraction
  2025-06-03 20:48 ` [PATCH 1/3] rust: completion: implement initial abstraction Danilo Krummrich
@ 2025-06-06  9:00   ` Alice Ryhl
  2025-06-11 20:01   ` Boqun Feng
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 28+ messages in thread
From: Alice Ryhl @ 2025-06-06  9:00 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: gregkh, rafael, ojeda, alex.gaynor, boqun.feng, gary, bjorn3_gh,
	benno.lossin, a.hindborg, tmgross, chrisi.schrefl, rust-for-linux,
	linux-kernel, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Valentin Schneider

On Tue, Jun 03, 2025 at 10:48:50PM +0200, Danilo Krummrich wrote:
> Implement a minimal abstraction for the completion synchronization
> primitive.
> 
> This initial abstraction only adds complete_all() and
> wait_for_completion(), since that is what is required for the subsequent
> Devres patch.
> 
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Juri Lelli <juri.lelli@redhat.com>
> Cc: Vincent Guittot <vincent.guittot@linaro.org>
> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Ben Segall <bsegall@google.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Valentin Schneider <vschneid@redhat.com>
> Signed-off-by: Danilo Krummrich <dakr@kernel.org>

Reviewed-by: Alice Ryhl <aliceryhl@google.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/3] rust: completion: implement initial abstraction
  2025-06-03 20:48 ` [PATCH 1/3] rust: completion: implement initial abstraction Danilo Krummrich
  2025-06-06  9:00   ` Alice Ryhl
@ 2025-06-11 20:01   ` Boqun Feng
  2025-06-12  7:58   ` Benno Lossin
  2025-06-12  8:15   ` Benno Lossin
  3 siblings, 0 replies; 28+ messages in thread
From: Boqun Feng @ 2025-06-11 20:01 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: gregkh, rafael, ojeda, alex.gaynor, gary, bjorn3_gh, benno.lossin,
	a.hindborg, aliceryhl, tmgross, chrisi.schrefl, rust-for-linux,
	linux-kernel, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Valentin Schneider

On Tue, Jun 03, 2025 at 10:48:50PM +0200, Danilo Krummrich wrote:
> Implement a minimal abstraction for the completion synchronization
> primitive.
> 
> This initial abstraction only adds complete_all() and
> wait_for_completion(), since that is what is required for the subsequent
> Devres patch.
> 
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Juri Lelli <juri.lelli@redhat.com>
> Cc: Vincent Guittot <vincent.guittot@linaro.org>
> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Ben Segall <bsegall@google.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Valentin Schneider <vschneid@redhat.com>
> Signed-off-by: Danilo Krummrich <dakr@kernel.org>

There is a DECLARE_COMPLETION() which allows to customiz the lock class
for a particular completion allocated on the stack, but we can add that
feature in the future, the current abstraction looks good to me. And I
think this should go into the driver-core with the rest of the fixes.

Reviewed-by: Boqun Feng <boqun.feng@gmail.com>

Regards,
Boqun

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/3] rust: completion: implement initial abstraction
  2025-06-03 20:48 ` [PATCH 1/3] rust: completion: implement initial abstraction Danilo Krummrich
  2025-06-06  9:00   ` Alice Ryhl
  2025-06-11 20:01   ` Boqun Feng
@ 2025-06-12  7:58   ` Benno Lossin
  2025-06-12 10:47     ` Danilo Krummrich
  2025-06-12 11:23     ` Alice Ryhl
  2025-06-12  8:15   ` Benno Lossin
  3 siblings, 2 replies; 28+ messages in thread
From: Benno Lossin @ 2025-06-12  7:58 UTC (permalink / raw)
  To: Danilo Krummrich, gregkh, rafael, ojeda, alex.gaynor, boqun.feng,
	gary, bjorn3_gh, benno.lossin, a.hindborg, aliceryhl, tmgross,
	chrisi.schrefl
  Cc: rust-for-linux, linux-kernel, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Valentin Schneider

On Tue Jun 3, 2025 at 10:48 PM CEST, Danilo Krummrich wrote:
> Implement a minimal abstraction for the completion synchronization
> primitive.
>
> This initial abstraction only adds complete_all() and
> wait_for_completion(), since that is what is required for the subsequent
> Devres patch.
>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Juri Lelli <juri.lelli@redhat.com>
> Cc: Vincent Guittot <vincent.guittot@linaro.org>
> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Ben Segall <bsegall@google.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Valentin Schneider <vschneid@redhat.com>
> Signed-off-by: Danilo Krummrich <dakr@kernel.org>

I have a couple comments on the documentation, but the rest seems good.
So with those fixed:

Reviewed-by: Benno Lossin <lossin@kernel.org>

> ---
>  rust/bindings/bindings_helper.h |   1 +
>  rust/helpers/completion.c       |   8 +++
>  rust/helpers/helpers.c          |   1 +
>  rust/kernel/sync.rs             |   2 +
>  rust/kernel/sync/completion.rs  | 111 ++++++++++++++++++++++++++++++++
>  5 files changed, 123 insertions(+)
>  create mode 100644 rust/helpers/completion.c
>  create mode 100644 rust/kernel/sync/completion.rs
>
> diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
> index a5a6fb45d405..9da3fe89295c 100644
> --- a/rust/bindings/bindings_helper.h
> +++ b/rust/bindings/bindings_helper.h
> @@ -17,6 +17,7 @@
>  #include <linux/blk_types.h>
>  #include <linux/blkdev.h>
>  #include <linux/clk.h>
> +#include <linux/completion.h>
>  #include <linux/configfs.h>
>  #include <linux/cpu.h>
>  #include <linux/cpufreq.h>
> diff --git a/rust/helpers/completion.c b/rust/helpers/completion.c
> new file mode 100644
> index 000000000000..b2443262a2ae
> --- /dev/null
> +++ b/rust/helpers/completion.c
> @@ -0,0 +1,8 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/completion.h>
> +
> +void rust_helper_init_completion(struct completion *x)
> +{
> +	init_completion(x);
> +}
> diff --git a/rust/helpers/helpers.c b/rust/helpers/helpers.c
> index 805307018f0e..7a5c520be8cb 100644
> --- a/rust/helpers/helpers.c
> +++ b/rust/helpers/helpers.c
> @@ -13,6 +13,7 @@
>  #include "build_assert.c"
>  #include "build_bug.c"
>  #include "clk.c"
> +#include "completion.c"
>  #include "cpufreq.c"
>  #include "cpumask.c"
>  #include "cred.c"
> diff --git a/rust/kernel/sync.rs b/rust/kernel/sync.rs
> index 36a719015583..c23a12639924 100644
> --- a/rust/kernel/sync.rs
> +++ b/rust/kernel/sync.rs
> @@ -10,6 +10,7 @@
>  use pin_init;
>  
>  mod arc;
> +pub mod completion;
>  mod condvar;
>  pub mod lock;
>  mod locked_by;
> @@ -17,6 +18,7 @@
>  pub mod rcu;
>  
>  pub use arc::{Arc, ArcBorrow, UniqueArc};
> +pub use completion::Completion;
>  pub use condvar::{new_condvar, CondVar, CondVarTimeoutResult};
>  pub use lock::global::{global_lock, GlobalGuard, GlobalLock, GlobalLockBackend, GlobalLockedBy};
>  pub use lock::mutex::{new_mutex, Mutex, MutexGuard};
> diff --git a/rust/kernel/sync/completion.rs b/rust/kernel/sync/completion.rs
> new file mode 100644
> index 000000000000..4ec4c2aa73a5
> --- /dev/null
> +++ b/rust/kernel/sync/completion.rs
> @@ -0,0 +1,111 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +//! Completion support.
> +//!
> +//! Reference: <https://docs.kernel.org/scheduler/completion.html>
> +//!
> +//! C header: [`include/linux/completion.h`](srctree/include/linux/completion.h)
> +
> +use crate::{bindings, prelude::*, types::Opaque};
> +
> +/// Synchronization primitive to signal when a certain task has been completed.
> +///
> +/// The [`Completion`] synchronization primitive signales when a certain task has been completed by
> +/// waking up other tasks that can queue themselves up to wait for the [`Completion`] to be

s/can queue themselves/have been queued/

> +/// completed.
> +///
> +/// # Examples
> +///
> +/// ```
> +/// use kernel::sync::{Arc, Completion};
> +/// use kernel::workqueue::{self, impl_has_work, new_work, Work, WorkItem};
> +///
> +/// #[pin_data]
> +/// struct MyTask {
> +///     #[pin]
> +///     work: Work<MyTask>,

Can we maybe add a dummy value like `Mutex<i32>` here that the task
changes, so we can print the value of it below (after waiting for the
task)?

> +///     #[pin]
> +///     done: Completion,
> +/// }
> +///
> +/// impl_has_work! {
> +///     impl HasWork<Self> for MyTask { self.work }
> +/// }
> +///
> +/// impl MyTask {
> +///     fn new() -> Result<Arc<Self>> {
> +///         let this = Arc::pin_init(pin_init!(MyTask {
> +///             work <- new_work!("MyTask::work"),
> +///             done <- Completion::new(),
> +///         }), GFP_KERNEL)?;
> +///
> +///         let _ = workqueue::system().enqueue(this.clone());
> +///
> +///         Ok(this)
> +///     }
> +///
> +///     fn wait_for_completion(&self) {
> +///         self.done.wait_for_completion();
> +///
> +///         pr_info!("Completion: task complete\n");
> +///     }
> +/// }
> +///
> +/// impl WorkItem for MyTask {
> +///     type Pointer = Arc<MyTask>;
> +///
> +///     fn run(this: Arc<MyTask>) {
> +///         // process this task
> +///         this.done.complete_all();
> +///     }
> +/// }
> +///
> +/// let task = MyTask::new()?;
> +/// task.wait_for_completion();
> +/// # Ok::<(), Error>(())
> +/// ```
> +#[pin_data]
> +pub struct Completion {
> +    #[pin]
> +    inner: Opaque<bindings::completion>,
> +}
> +
> +impl Completion {
> +    /// Create an initializer for a new [`Completion`].
> +    pub fn new() -> impl PinInit<Self> {
> +        pin_init!(Self {
> +            inner <- Opaque::ffi_init(|slot: *mut bindings::completion| {
> +                // SAFETY: `slot` is a valid pointer to an uninitialized `struct completion`.
> +                unsafe { bindings::init_completion(slot) };
> +            }),
> +        })
> +    }
> +
> +    fn as_raw(&self) -> *mut bindings::completion {
> +        self.inner.get()
> +    }
> +
> +    /// Signal all tasks waiting on this completion.
> +    ///
> +    /// This method wakes up all tasks waiting on this completion; after this operation the
> +    /// completion is permanently done.
> +    pub fn complete_all(&self) {
> +        // SAFETY: `self.as_raw()` is a pointer to a valid `struct completion`.
> +        unsafe { bindings::complete_all(self.as_raw()) };
> +    }
> +
> +    /// Wait for completion of a task.
> +    ///
> +    /// This method waits for the completion of a task; it is not interruptible and there is no

I personally would write:

s/waits for/blocks on/

But if `wait` is the more common kernel term then let's go with your
version instead.

> +    /// timeout.

I would mention the `complete_all` method above.

> +    pub fn wait_for_completion(&self) {
> +        // SAFETY: `self.as_raw()` is a pointer to a valid `struct completion`.
> +        unsafe { bindings::wait_for_completion(self.as_raw()) };
> +    }
> +}
> +
> +// SAFETY: `Completion` is safe to be send to any task.
> +unsafe impl Send for Completion {}
> +
> +// SAFETY: `Completion` is safe to be accessed concurrently.
> +unsafe impl Sync for Completion {}

Please move these to the struct definition, that way one can more easily
see them when looking at the code.

---
Cheers,
Benno

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/3] rust: revocable: indicate whether `data` has been revoked already
  2025-06-03 20:48 ` [PATCH 2/3] rust: revocable: indicate whether `data` has been revoked already Danilo Krummrich
@ 2025-06-12  7:59   ` Benno Lossin
  0 siblings, 0 replies; 28+ messages in thread
From: Benno Lossin @ 2025-06-12  7:59 UTC (permalink / raw)
  To: Danilo Krummrich, gregkh, rafael, ojeda, alex.gaynor, boqun.feng,
	gary, bjorn3_gh, benno.lossin, a.hindborg, aliceryhl, tmgross,
	chrisi.schrefl
  Cc: rust-for-linux, linux-kernel

On Tue Jun 3, 2025 at 10:48 PM CEST, Danilo Krummrich wrote:
> Return a boolean from Revocable::revoke() and Revocable::revoke_nosync()
> to indicate whether the data has been revoked already.
>
> Return true if the data hasn't been revoked yet (i.e. this call revoked
> the data), false otherwise.
>
> Signed-off-by: Danilo Krummrich <dakr@kernel.org>

Reviewed-by: Benno Lossin <lossin@kernel.org>

---
Cheers,
Benno

> ---
>  rust/kernel/revocable.rs | 18 ++++++++++++++----
>  1 file changed, 14 insertions(+), 4 deletions(-)

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/3] rust: devres: fix race in Devres::drop()
  2025-06-03 20:48 ` [PATCH 3/3] rust: devres: fix race in Devres::drop() Danilo Krummrich
  2025-06-03 23:26   ` Boqun Feng
@ 2025-06-12  8:13   ` Benno Lossin
  2025-06-12  8:15     ` Alice Ryhl
                       ` (2 more replies)
  1 sibling, 3 replies; 28+ messages in thread
From: Benno Lossin @ 2025-06-12  8:13 UTC (permalink / raw)
  To: Danilo Krummrich, gregkh, rafael, ojeda, alex.gaynor, boqun.feng,
	gary, bjorn3_gh, benno.lossin, a.hindborg, aliceryhl, tmgross,
	chrisi.schrefl
  Cc: rust-for-linux, linux-kernel

On Tue Jun 3, 2025 at 10:48 PM CEST, Danilo Krummrich wrote:
> In Devres::drop() we first remove the devres action and then drop the
> wrapped device resource.
>
> The design goal is to give the owner of a Devres object control over when
> the device resource is dropped, but limit the overall scope to the
> corresponding device being bound to a driver.
>
> However, there's a race that was introduced with commit 8ff656643d30
> ("rust: devres: remove action in `Devres::drop`"), but also has been
> (partially) present from the initial version on.
>
> In Devres::drop(), the devres action is removed successfully and
> subsequently the destructor of the wrapped device resource runs.
> However, there is no guarantee that the destructor of the wrapped device
> resource completes before the driver core is done unbinding the
> corresponding device.
>
> If in Devres::drop(), the devres action can't be removed, it means that
> the devres callback has been executed already, or is still running
> concurrently. In case of the latter, either Devres::drop() wins revoking
> the Revocable or the devres callback wins revoking the Revocable. If
> Devres::drop() wins, we (again) have no guarantee that the destructor of
> the wrapped device resource completes before the driver core is done
> unbinding the corresponding device.

I don't understand the exact sequence of events here. Here is what I got
from your explanation:

* the driver created a `Devres<T>` associated to their device.
* their physical device gets disconnected and thus the driver core
  starts unbinding the device.
* simultaneously, the driver drops the `Devres<T>` (eg because the
  driver initiated the physical removal)
* now `devres_callback` is being called from both `Devres::Drop` (which
  calls `Devres::remove_action`) and from the driver core.
* they both call `inner.data.revoke()`, but only one wins, in our
  example `Devres::drop`.
* but now the driver core has finished running `devres_callback` and
  finalizes unbinding the device, even though the `Devres` still exists
  though is almost done being dropped.

I don't see a race here. Also the `dev: ARef<Device>` should keep the
device alive until the `Devres` is dropped, no?

> Depending on the specific device resource, this can potentially lead to
> user-after-free bugs.
>
> In order to fix this, implement the following logic.
>
> In the devres callback, we're always good when we get to revoke the
> device resource ourselves, i.e. Revocable::revoke() returns true.
>
> If Revocable::revoke() returns false, it means that Devres::drop(),
> concurrently, already drops the device resource and we have to wait for
> Devres::drop() to signal that it finished dropping the device resource.
>
> Note that if we hit the case where we need to wait for the completion of
> Devres::drop() in the devres callback, it means that we're actually
> racing with a concurrent Devres::drop() call, which already started
> revoking the device resource for us. This is rather unlikely and means
> that the concurrent Devres::drop() already started doing our work and we
> just need to wait for it to complete it for us. Hence, there should not
> be any additional overhead from that.
>
> (Actually, for now it's even better if Devres::drop() does the work for
> us, since it can bypass the synchronize_rcu() call implied by
> Revocable::revoke(), but this goes away anyways once I get to implement
> the split devres callback approach, which allows us to first flip the
> atomics of all registered Devres objects of a certain device, execute a
> single synchronize_rcu() and then drop all revocable objects.)
>
> In Devres::drop() we try to revoke the device resource. If that is *not*
> successful, it means that the devres callback already did and we're good.
>
> Otherwise, we try to remove the devres action, which, if successful,
> means that we're good, since the device resource has just been revoked
> by us *before* we removed the devres action successfully.
>
> If the devres action could not be removed, it means that the devres
> callback must be running concurrently, hence we signal that the device
> resource has been revoked by us, using the completion.
>
> This makes it safe to drop a Devres object from any task and at any point
> of time, which is one of the design goals.
>
> Fixes: 8ff656643d30 ("rust: devres: remove action in `Devres::drop`") [1]
> Reported-by: Alice Ryhl <aliceryhl@google.com>
> Closes: https://lore.kernel.org/lkml/aD64YNuqbPPZHAa5@google.com/
> Signed-off-by: Danilo Krummrich <dakr@kernel.org>
> ---
>  rust/kernel/devres.rs | 33 ++++++++++++++++++++++++++-------
>  1 file changed, 26 insertions(+), 7 deletions(-)

> @@ -161,7 +166,12 @@ fn remove_action(this: &Arc<Self>) {
>          //         `DevresInner::new`.
>          let inner = unsafe { Arc::from_raw(ptr) };
>  
> -        inner.data.revoke();
> +        if !inner.data.revoke() {
> +            // If `revoke()` returns false, it means that `Devres::drop` already started revoking
> +            // `inner.data` for us. Hence we have to wait until `Devres::drop()` signals that it
> +            // completed revoking `inner.data`.
> +            inner.revoke.wait_for_completion();
> +        }
>      }
>  }
>  
> @@ -232,6 +242,15 @@ fn deref(&self) -> &Self::Target {
>  
>  impl<T> Drop for Devres<T> {
>      fn drop(&mut self) {
> -        DevresInner::remove_action(&self.0);
> +        // SAFETY: When `drop` runs, it is guaranteed that nobody is accessing the revocable data
> +        // anymore, hence it is safe not to wait for the grace period to finish.
> +        if unsafe { self.revoke_nosync() } {
> +            // We revoked `self.0.data` before the devres action did, hence try to remove it.
> +            if !DevresInner::remove_action(&self.0) {

Shouldn't this not be inverted? (ie 's/!//')

Otherwise this will return `true`, get negated and we don't run the code
below and the `inner.data.revoke()` in `devres_callback` will return
`false` which will get negated and thus it will never return.

---
Cheers,
Benno

> +                // We could not remove the devres action, which means that it now runs concurrently,
> +                // hence signal that `self.0.data` has been revoked successfully.
> +                self.0.revoke.complete_all();
> +            }
> +        }
>      }
>  }


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/3] rust: devres: fix race in Devres::drop()
  2025-06-12  8:13   ` Benno Lossin
@ 2025-06-12  8:15     ` Alice Ryhl
  2025-06-12  8:47       ` Benno Lossin
  2025-06-12 10:26     ` Danilo Krummrich
  2025-06-12 10:31     ` Danilo Krummrich
  2 siblings, 1 reply; 28+ messages in thread
From: Alice Ryhl @ 2025-06-12  8:15 UTC (permalink / raw)
  To: Benno Lossin
  Cc: Danilo Krummrich, gregkh, rafael, ojeda, alex.gaynor, boqun.feng,
	gary, bjorn3_gh, benno.lossin, a.hindborg, tmgross,
	chrisi.schrefl, rust-for-linux, linux-kernel

On Thu, Jun 12, 2025 at 10:13 AM Benno Lossin <lossin@kernel.org> wrote:
>
> On Tue Jun 3, 2025 at 10:48 PM CEST, Danilo Krummrich wrote:
> > In Devres::drop() we first remove the devres action and then drop the
> > wrapped device resource.
> >
> > The design goal is to give the owner of a Devres object control over when
> > the device resource is dropped, but limit the overall scope to the
> > corresponding device being bound to a driver.
> >
> > However, there's a race that was introduced with commit 8ff656643d30
> > ("rust: devres: remove action in `Devres::drop`"), but also has been
> > (partially) present from the initial version on.
> >
> > In Devres::drop(), the devres action is removed successfully and
> > subsequently the destructor of the wrapped device resource runs.
> > However, there is no guarantee that the destructor of the wrapped device
> > resource completes before the driver core is done unbinding the
> > corresponding device.
> >
> > If in Devres::drop(), the devres action can't be removed, it means that
> > the devres callback has been executed already, or is still running
> > concurrently. In case of the latter, either Devres::drop() wins revoking
> > the Revocable or the devres callback wins revoking the Revocable. If
> > Devres::drop() wins, we (again) have no guarantee that the destructor of
> > the wrapped device resource completes before the driver core is done
> > unbinding the corresponding device.
>
> I don't understand the exact sequence of events here. Here is what I got
> from your explanation:
>
> * the driver created a `Devres<T>` associated to their device.
> * their physical device gets disconnected and thus the driver core
>   starts unbinding the device.
> * simultaneously, the driver drops the `Devres<T>` (eg because the
>   driver initiated the physical removal)
> * now `devres_callback` is being called from both `Devres::Drop` (which
>   calls `Devres::remove_action`) and from the driver core.
> * they both call `inner.data.revoke()`, but only one wins, in our
>   example `Devres::drop`.
> * but now the driver core has finished running `devres_callback` and
>   finalizes unbinding the device, even though the `Devres` still exists
>   though is almost done being dropped.
>
> I don't see a race here. Also the `dev: ARef<Device>` should keep the
> device alive until the `Devres` is dropped, no?

The race is that Devres is used when the contents *must* be dropped
before the device is unbound. This example violates that by having
device unbind finish before the contents are dropped.

> > Depending on the specific device resource, this can potentially lead to
> > user-after-free bugs.
> >
> > In order to fix this, implement the following logic.
> >
> > In the devres callback, we're always good when we get to revoke the
> > device resource ourselves, i.e. Revocable::revoke() returns true.
> >
> > If Revocable::revoke() returns false, it means that Devres::drop(),
> > concurrently, already drops the device resource and we have to wait for
> > Devres::drop() to signal that it finished dropping the device resource.
> >
> > Note that if we hit the case where we need to wait for the completion of
> > Devres::drop() in the devres callback, it means that we're actually
> > racing with a concurrent Devres::drop() call, which already started
> > revoking the device resource for us. This is rather unlikely and means
> > that the concurrent Devres::drop() already started doing our work and we
> > just need to wait for it to complete it for us. Hence, there should not
> > be any additional overhead from that.
> >
> > (Actually, for now it's even better if Devres::drop() does the work for
> > us, since it can bypass the synchronize_rcu() call implied by
> > Revocable::revoke(), but this goes away anyways once I get to implement
> > the split devres callback approach, which allows us to first flip the
> > atomics of all registered Devres objects of a certain device, execute a
> > single synchronize_rcu() and then drop all revocable objects.)
> >
> > In Devres::drop() we try to revoke the device resource. If that is *not*
> > successful, it means that the devres callback already did and we're good.
> >
> > Otherwise, we try to remove the devres action, which, if successful,
> > means that we're good, since the device resource has just been revoked
> > by us *before* we removed the devres action successfully.
> >
> > If the devres action could not be removed, it means that the devres
> > callback must be running concurrently, hence we signal that the device
> > resource has been revoked by us, using the completion.
> >
> > This makes it safe to drop a Devres object from any task and at any point
> > of time, which is one of the design goals.
> >
> > Fixes: 8ff656643d30 ("rust: devres: remove action in `Devres::drop`") [1]
> > Reported-by: Alice Ryhl <aliceryhl@google.com>
> > Closes: https://lore.kernel.org/lkml/aD64YNuqbPPZHAa5@google.com/
> > Signed-off-by: Danilo Krummrich <dakr@kernel.org>
> > ---
> >  rust/kernel/devres.rs | 33 ++++++++++++++++++++++++++-------
> >  1 file changed, 26 insertions(+), 7 deletions(-)
>
> > @@ -161,7 +166,12 @@ fn remove_action(this: &Arc<Self>) {
> >          //         `DevresInner::new`.
> >          let inner = unsafe { Arc::from_raw(ptr) };
> >
> > -        inner.data.revoke();
> > +        if !inner.data.revoke() {
> > +            // If `revoke()` returns false, it means that `Devres::drop` already started revoking
> > +            // `inner.data` for us. Hence we have to wait until `Devres::drop()` signals that it
> > +            // completed revoking `inner.data`.
> > +            inner.revoke.wait_for_completion();
> > +        }
> >      }
> >  }
> >
> > @@ -232,6 +242,15 @@ fn deref(&self) -> &Self::Target {
> >
> >  impl<T> Drop for Devres<T> {
> >      fn drop(&mut self) {
> > -        DevresInner::remove_action(&self.0);
> > +        // SAFETY: When `drop` runs, it is guaranteed that nobody is accessing the revocable data
> > +        // anymore, hence it is safe not to wait for the grace period to finish.
> > +        if unsafe { self.revoke_nosync() } {
> > +            // We revoked `self.0.data` before the devres action did, hence try to remove it.
> > +            if !DevresInner::remove_action(&self.0) {
>
> Shouldn't this not be inverted? (ie 's/!//')
>
> Otherwise this will return `true`, get negated and we don't run the code
> below and the `inner.data.revoke()` in `devres_callback` will return
> `false` which will get negated and thus it will never return.
>
> ---
> Cheers,
> Benno
>
> > +                // We could not remove the devres action, which means that it now runs concurrently,
> > +                // hence signal that `self.0.data` has been revoked successfully.
> > +                self.0.revoke.complete_all();
> > +            }
> > +        }
> >      }
> >  }
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/3] rust: completion: implement initial abstraction
  2025-06-03 20:48 ` [PATCH 1/3] rust: completion: implement initial abstraction Danilo Krummrich
                     ` (2 preceding siblings ...)
  2025-06-12  7:58   ` Benno Lossin
@ 2025-06-12  8:15   ` Benno Lossin
  2025-06-12 10:35     ` Danilo Krummrich
  3 siblings, 1 reply; 28+ messages in thread
From: Benno Lossin @ 2025-06-12  8:15 UTC (permalink / raw)
  To: Danilo Krummrich, gregkh, rafael, ojeda, alex.gaynor, boqun.feng,
	gary, bjorn3_gh, benno.lossin, a.hindborg, aliceryhl, tmgross,
	chrisi.schrefl
  Cc: rust-for-linux, linux-kernel, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Valentin Schneider

On Tue Jun 3, 2025 at 10:48 PM CEST, Danilo Krummrich wrote:
> +    /// Signal all tasks waiting on this completion.
> +    ///
> +    /// This method wakes up all tasks waiting on this completion; after this operation the
> +    /// completion is permanently done.
> +    pub fn complete_all(&self) {
> +        // SAFETY: `self.as_raw()` is a pointer to a valid `struct completion`.
> +        unsafe { bindings::complete_all(self.as_raw()) };
> +    }
> +
> +    /// Wait for completion of a task.
> +    ///
> +    /// This method waits for the completion of a task; it is not interruptible and there is no
> +    /// timeout.

Another thing that we should document is weather this function returns
immediately when `complete_all` was already called in the past.

---
Cheers,
Benno

> +    pub fn wait_for_completion(&self) {
> +        // SAFETY: `self.as_raw()` is a pointer to a valid `struct completion`.
> +        unsafe { bindings::wait_for_completion(self.as_raw()) };
> +    }
> +}
> +
> +// SAFETY: `Completion` is safe to be send to any task.
> +unsafe impl Send for Completion {}
> +
> +// SAFETY: `Completion` is safe to be accessed concurrently.
> +unsafe impl Sync for Completion {}


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/3] rust: devres: fix race in Devres::drop()
  2025-06-12  8:15     ` Alice Ryhl
@ 2025-06-12  8:47       ` Benno Lossin
  0 siblings, 0 replies; 28+ messages in thread
From: Benno Lossin @ 2025-06-12  8:47 UTC (permalink / raw)
  To: Alice Ryhl
  Cc: Danilo Krummrich, gregkh, rafael, ojeda, alex.gaynor, boqun.feng,
	gary, bjorn3_gh, benno.lossin, a.hindborg, tmgross,
	chrisi.schrefl, rust-for-linux, linux-kernel

On Thu Jun 12, 2025 at 10:15 AM CEST, Alice Ryhl wrote:
> On Thu, Jun 12, 2025 at 10:13 AM Benno Lossin <lossin@kernel.org> wrote:
>> On Tue Jun 3, 2025 at 10:48 PM CEST, Danilo Krummrich wrote:
>> > In Devres::drop() we first remove the devres action and then drop the
>> > wrapped device resource.
>> >
>> > The design goal is to give the owner of a Devres object control over when
>> > the device resource is dropped, but limit the overall scope to the
>> > corresponding device being bound to a driver.
>> >
>> > However, there's a race that was introduced with commit 8ff656643d30
>> > ("rust: devres: remove action in `Devres::drop`"), but also has been
>> > (partially) present from the initial version on.
>> >
>> > In Devres::drop(), the devres action is removed successfully and
>> > subsequently the destructor of the wrapped device resource runs.
>> > However, there is no guarantee that the destructor of the wrapped device
>> > resource completes before the driver core is done unbinding the
>> > corresponding device.
>> >
>> > If in Devres::drop(), the devres action can't be removed, it means that
>> > the devres callback has been executed already, or is still running
>> > concurrently. In case of the latter, either Devres::drop() wins revoking
>> > the Revocable or the devres callback wins revoking the Revocable. If
>> > Devres::drop() wins, we (again) have no guarantee that the destructor of
>> > the wrapped device resource completes before the driver core is done
>> > unbinding the corresponding device.
>>
>> I don't understand the exact sequence of events here. Here is what I got
>> from your explanation:
>>
>> * the driver created a `Devres<T>` associated to their device.
>> * their physical device gets disconnected and thus the driver core
>>   starts unbinding the device.
>> * simultaneously, the driver drops the `Devres<T>` (eg because the
>>   driver initiated the physical removal)
>> * now `devres_callback` is being called from both `Devres::Drop` (which
>>   calls `Devres::remove_action`) and from the driver core.
>> * they both call `inner.data.revoke()`, but only one wins, in our
>>   example `Devres::drop`.
>> * but now the driver core has finished running `devres_callback` and
>>   finalizes unbinding the device, even though the `Devres` still exists
>>   though is almost done being dropped.
>>
>> I don't see a race here. Also the `dev: ARef<Device>` should keep the
>> device alive until the `Devres` is dropped, no?
>
> The race is that Devres is used when the contents *must* be dropped
> before the device is unbound. This example violates that by having
> device unbind finish before the contents are dropped.

If `Devres::drop` is being run, nobody has access to it any longer.
Additionally, the data in the revocable has already been dropped if
`revoke()` has been run, so it's fine?

---
Cheers,
Benno

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/3] rust: devres: fix race in Devres::drop()
  2025-06-12  8:13   ` Benno Lossin
  2025-06-12  8:15     ` Alice Ryhl
@ 2025-06-12 10:26     ` Danilo Krummrich
  2025-06-12 10:59       ` Benno Lossin
  2025-06-12 10:31     ` Danilo Krummrich
  2 siblings, 1 reply; 28+ messages in thread
From: Danilo Krummrich @ 2025-06-12 10:26 UTC (permalink / raw)
  To: Benno Lossin
  Cc: gregkh, rafael, ojeda, alex.gaynor, boqun.feng, gary, bjorn3_gh,
	benno.lossin, a.hindborg, aliceryhl, tmgross, chrisi.schrefl,
	rust-for-linux, linux-kernel

On Thu, Jun 12, 2025 at 10:13:29AM +0200, Benno Lossin wrote:
> On Tue Jun 3, 2025 at 10:48 PM CEST, Danilo Krummrich wrote:
> > In Devres::drop() we first remove the devres action and then drop the
> > wrapped device resource.
> >
> > The design goal is to give the owner of a Devres object control over when
> > the device resource is dropped, but limit the overall scope to the
> > corresponding device being bound to a driver.
> >
> > However, there's a race that was introduced with commit 8ff656643d30
> > ("rust: devres: remove action in `Devres::drop`"), but also has been
> > (partially) present from the initial version on.
> >
> > In Devres::drop(), the devres action is removed successfully and
> > subsequently the destructor of the wrapped device resource runs.
> > However, there is no guarantee that the destructor of the wrapped device
> > resource completes before the driver core is done unbinding the
> > corresponding device.
> >
> > If in Devres::drop(), the devres action can't be removed, it means that
> > the devres callback has been executed already, or is still running
> > concurrently. In case of the latter, either Devres::drop() wins revoking
> > the Revocable or the devres callback wins revoking the Revocable. If
> > Devres::drop() wins, we (again) have no guarantee that the destructor of
> > the wrapped device resource completes before the driver core is done
> > unbinding the corresponding device.
> 
> I don't understand the exact sequence of events here. Here is what I got
> from your explanation:
> 
> * the driver created a `Devres<T>` associated to their device.
> * their physical device gets disconnected and thus the driver core
>   starts unbinding the device.
> * simultaneously, the driver drops the `Devres<T>` (eg because the
>   driver initiated the physical removal)
> * now `devres_callback` is being called from both `Devres::Drop` (which
>   calls `Devres::remove_action`) and from the driver core.
> * they both call `inner.data.revoke()`, but only one wins, in our
>   example `Devres::drop`.
> * but now the driver core has finished running `devres_callback` and
>   finalizes unbinding the device, even though the `Devres` still exists
>   though is almost done being dropped.

Your "almost done being dropped" is close, actually Devres::drop() may or may
not be done calling Revocable::revoke(), i.e. drop_in_place() of the data.

CPU0						CPU1

Devres::drop() {				devres_callback() {
	self.data.revoke() {				this.data.revoke() {
		is_available.swap() == true
								is_available.swap == false
							}
						}

						// [...]
						// driver fully unbound
		drop_in_place() {
			pci_iounmap()
			pci_release_region()
		}
	}
}

This means that we have to ensure that the revoke() in Devres::drop() is
completed before devres_callback() completes, in case they race.

> I don't see a race here. Also the `dev: ARef<Device>` should keep the
> device alive until the `Devres` is dropped, no?

Yes, the device reference is fine.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/3] rust: devres: fix race in Devres::drop()
  2025-06-12  8:13   ` Benno Lossin
  2025-06-12  8:15     ` Alice Ryhl
  2025-06-12 10:26     ` Danilo Krummrich
@ 2025-06-12 10:31     ` Danilo Krummrich
  2025-06-12 11:04       ` Benno Lossin
  2 siblings, 1 reply; 28+ messages in thread
From: Danilo Krummrich @ 2025-06-12 10:31 UTC (permalink / raw)
  To: Benno Lossin
  Cc: gregkh, rafael, ojeda, alex.gaynor, boqun.feng, gary, bjorn3_gh,
	benno.lossin, a.hindborg, aliceryhl, tmgross, chrisi.schrefl,
	rust-for-linux, linux-kernel

On Thu, Jun 12, 2025 at 10:13:29AM +0200, Benno Lossin wrote:
> On Tue Jun 3, 2025 at 10:48 PM CEST, Danilo Krummrich wrote:
> >  impl<T> Drop for Devres<T> {
> >      fn drop(&mut self) {
> > -        DevresInner::remove_action(&self.0);
> > +        // SAFETY: When `drop` runs, it is guaranteed that nobody is accessing the revocable data
> > +        // anymore, hence it is safe not to wait for the grace period to finish.
> > +        if unsafe { self.revoke_nosync() } {
> > +            // We revoked `self.0.data` before the devres action did, hence try to remove it.
> > +            if !DevresInner::remove_action(&self.0) {
> 
> Shouldn't this not be inverted? (ie 's/!//')
> 
> Otherwise this will return `true`, get negated and we don't run the code
> below and the `inner.data.revoke()` in `devres_callback` will return
> `false` which will get negated and thus it will never return.

DevresInner::remove_action() returns false it means that the devres action has
already been taken from the list and will be run eventually, hence we have to
complete the completion.

If DevresInner::remove_action() returns true, it means that we removed the
action from the list and the callback will never be exectuted, hence nothing to
do.

> > +                // We could not remove the devres action, which means that it now runs concurrently,
> > +                // hence signal that `self.0.data` has been revoked successfully.
> > +                self.0.revoke.complete_all();
> > +            }
> > +        }
> >      }
> >  }
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/3] rust: completion: implement initial abstraction
  2025-06-12  8:15   ` Benno Lossin
@ 2025-06-12 10:35     ` Danilo Krummrich
  2025-06-12 10:53       ` Benno Lossin
  0 siblings, 1 reply; 28+ messages in thread
From: Danilo Krummrich @ 2025-06-12 10:35 UTC (permalink / raw)
  To: Benno Lossin
  Cc: gregkh, rafael, ojeda, alex.gaynor, boqun.feng, gary, bjorn3_gh,
	benno.lossin, a.hindborg, aliceryhl, tmgross, chrisi.schrefl,
	rust-for-linux, linux-kernel, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Valentin Schneider

On Thu, Jun 12, 2025 at 10:15:55AM +0200, Benno Lossin wrote:
> On Tue Jun 3, 2025 at 10:48 PM CEST, Danilo Krummrich wrote:
> > +    /// Signal all tasks waiting on this completion.
> > +    ///
> > +    /// This method wakes up all tasks waiting on this completion; after this operation the
> > +    /// completion is permanently done.
> > +    pub fn complete_all(&self) {
> > +        // SAFETY: `self.as_raw()` is a pointer to a valid `struct completion`.
> > +        unsafe { bindings::complete_all(self.as_raw()) };
> > +    }
> > +
> > +    /// Wait for completion of a task.
> > +    ///
> > +    /// This method waits for the completion of a task; it is not interruptible and there is no
> > +    /// timeout.
> 
> Another thing that we should document is weather this function returns
> immediately when `complete_all` was already called in the past.

The details are all documented in [1], which is also linked in the module
documentation of this file.

[1] https://docs.kernel.org/scheduler/completion.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/3] rust: completion: implement initial abstraction
  2025-06-12  7:58   ` Benno Lossin
@ 2025-06-12 10:47     ` Danilo Krummrich
  2025-06-12 11:23     ` Alice Ryhl
  1 sibling, 0 replies; 28+ messages in thread
From: Danilo Krummrich @ 2025-06-12 10:47 UTC (permalink / raw)
  To: Benno Lossin
  Cc: gregkh, rafael, ojeda, alex.gaynor, boqun.feng, gary, bjorn3_gh,
	benno.lossin, a.hindborg, aliceryhl, tmgross, chrisi.schrefl,
	rust-for-linux, linux-kernel, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Valentin Schneider

On Thu, Jun 12, 2025 at 09:58:30AM +0200, Benno Lossin wrote:
> On Tue Jun 3, 2025 at 10:48 PM CEST, Danilo Krummrich wrote:
> > +/// # Examples
> > +///
> > +/// ```
> > +/// use kernel::sync::{Arc, Completion};
> > +/// use kernel::workqueue::{self, impl_has_work, new_work, Work, WorkItem};
> > +///
> > +/// #[pin_data]
> > +/// struct MyTask {
> > +///     #[pin]
> > +///     work: Work<MyTask>,
> 
> Can we maybe add a dummy value like `Mutex<i32>` here that the task
> changes, so we can print the value of it below (after waiting for the
> task)?

Sure, but I don't think it improves the example a lot. It adds more code that
may be more distracting than helpful.

> > +///     #[pin]
> > +///     done: Completion,
> > +/// }
> > +///
> > +/// impl_has_work! {
> > +///     impl HasWork<Self> for MyTask { self.work }
> > +/// }
> > +///
> > +/// impl MyTask {
> > +///     fn new() -> Result<Arc<Self>> {
> > +///         let this = Arc::pin_init(pin_init!(MyTask {
> > +///             work <- new_work!("MyTask::work"),
> > +///             done <- Completion::new(),
> > +///         }), GFP_KERNEL)?;
> > +///
> > +///         let _ = workqueue::system().enqueue(this.clone());
> > +///
> > +///         Ok(this)
> > +///     }
> > +///
> > +///     fn wait_for_completion(&self) {
> > +///         self.done.wait_for_completion();
> > +///
> > +///         pr_info!("Completion: task complete\n");
> > +///     }
> > +/// }
> > +///
> > +/// impl WorkItem for MyTask {
> > +///     type Pointer = Arc<MyTask>;
> > +///
> > +///     fn run(this: Arc<MyTask>) {
> > +///         // process this task
> > +///         this.done.complete_all();
> > +///     }
> > +/// }
> > +///
> > +/// let task = MyTask::new()?;
> > +/// task.wait_for_completion();
> > +/// # Ok::<(), Error>(())
> > +/// ```
> > +#[pin_data]
> > +pub struct Completion {
> > +    #[pin]
> > +    inner: Opaque<bindings::completion>,
> > +}
> > +
> > +impl Completion {
> > +    /// Create an initializer for a new [`Completion`].
> > +    pub fn new() -> impl PinInit<Self> {
> > +        pin_init!(Self {
> > +            inner <- Opaque::ffi_init(|slot: *mut bindings::completion| {
> > +                // SAFETY: `slot` is a valid pointer to an uninitialized `struct completion`.
> > +                unsafe { bindings::init_completion(slot) };
> > +            }),
> > +        })
> > +    }
> > +
> > +    fn as_raw(&self) -> *mut bindings::completion {
> > +        self.inner.get()
> > +    }
> > +
> > +    /// Signal all tasks waiting on this completion.
> > +    ///
> > +    /// This method wakes up all tasks waiting on this completion; after this operation the
> > +    /// completion is permanently done.
> > +    pub fn complete_all(&self) {
> > +        // SAFETY: `self.as_raw()` is a pointer to a valid `struct completion`.
> > +        unsafe { bindings::complete_all(self.as_raw()) };
> > +    }
> > +
> > +    /// Wait for completion of a task.
> > +    ///
> > +    /// This method waits for the completion of a task; it is not interruptible and there is no
> 
> I personally would write:
> 
> s/waits for/blocks on/
> 
> But if `wait` is the more common kernel term then let's go with your
> version instead.

I don't think either is more common in general, but the C code and the existing
documentation all use "wait for".

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/3] rust: completion: implement initial abstraction
  2025-06-12 10:35     ` Danilo Krummrich
@ 2025-06-12 10:53       ` Benno Lossin
  2025-06-12 11:06         ` Danilo Krummrich
  0 siblings, 1 reply; 28+ messages in thread
From: Benno Lossin @ 2025-06-12 10:53 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: gregkh, rafael, ojeda, alex.gaynor, boqun.feng, gary, bjorn3_gh,
	benno.lossin, a.hindborg, aliceryhl, tmgross, chrisi.schrefl,
	rust-for-linux, linux-kernel, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Valentin Schneider

On Thu Jun 12, 2025 at 12:35 PM CEST, Danilo Krummrich wrote:
> On Thu, Jun 12, 2025 at 10:15:55AM +0200, Benno Lossin wrote:
>> On Tue Jun 3, 2025 at 10:48 PM CEST, Danilo Krummrich wrote:
>> > +    /// Signal all tasks waiting on this completion.
>> > +    ///
>> > +    /// This method wakes up all tasks waiting on this completion; after this operation the
>> > +    /// completion is permanently done.
>> > +    pub fn complete_all(&self) {
>> > +        // SAFETY: `self.as_raw()` is a pointer to a valid `struct completion`.
>> > +        unsafe { bindings::complete_all(self.as_raw()) };
>> > +    }
>> > +
>> > +    /// Wait for completion of a task.
>> > +    ///
>> > +    /// This method waits for the completion of a task; it is not interruptible and there is no
>> > +    /// timeout.
>> 
>> Another thing that we should document is weather this function returns
>> immediately when `complete_all` was already called in the past.
>
> The details are all documented in [1], which is also linked in the module
> documentation of this file.
>
> [1] https://docs.kernel.org/scheduler/completion.html

I dislike that we don't have the docs right there on the function.
Following that link, there is also a lot of other stuff there that don't
apply to Rust (eg initializing completions, and the
wait_for_completion*() variants).

After a bit of reading, I found the part that I was looking for (by
searching for `complete_all`...):

    A thread that wants to signal that the conditions for continuation have
    been achieved calls `complete()` to signal exactly one of the waiters
    that it can continue:
    
    ```c
    void complete(struct completion *done)
    ```
    
    ... or calls `complete_all()` to signal all current and future waiters:
    
    ```c
    void complete_all(struct completion *done)
    ```

Let's just put this information on the `complete_all` function.

---
Cheers,
Benno

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/3] rust: devres: fix race in Devres::drop()
  2025-06-12 10:26     ` Danilo Krummrich
@ 2025-06-12 10:59       ` Benno Lossin
  0 siblings, 0 replies; 28+ messages in thread
From: Benno Lossin @ 2025-06-12 10:59 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: gregkh, rafael, ojeda, alex.gaynor, boqun.feng, gary, bjorn3_gh,
	benno.lossin, a.hindborg, aliceryhl, tmgross, chrisi.schrefl,
	rust-for-linux, linux-kernel

On Thu Jun 12, 2025 at 12:26 PM CEST, Danilo Krummrich wrote:
> On Thu, Jun 12, 2025 at 10:13:29AM +0200, Benno Lossin wrote:
>> On Tue Jun 3, 2025 at 10:48 PM CEST, Danilo Krummrich wrote:
>> > In Devres::drop() we first remove the devres action and then drop the
>> > wrapped device resource.
>> >
>> > The design goal is to give the owner of a Devres object control over when
>> > the device resource is dropped, but limit the overall scope to the
>> > corresponding device being bound to a driver.
>> >
>> > However, there's a race that was introduced with commit 8ff656643d30
>> > ("rust: devres: remove action in `Devres::drop`"), but also has been
>> > (partially) present from the initial version on.
>> >
>> > In Devres::drop(), the devres action is removed successfully and
>> > subsequently the destructor of the wrapped device resource runs.
>> > However, there is no guarantee that the destructor of the wrapped device
>> > resource completes before the driver core is done unbinding the
>> > corresponding device.
>> >
>> > If in Devres::drop(), the devres action can't be removed, it means that
>> > the devres callback has been executed already, or is still running
>> > concurrently. In case of the latter, either Devres::drop() wins revoking
>> > the Revocable or the devres callback wins revoking the Revocable. If
>> > Devres::drop() wins, we (again) have no guarantee that the destructor of
>> > the wrapped device resource completes before the driver core is done
>> > unbinding the corresponding device.
>> 
>> I don't understand the exact sequence of events here. Here is what I got
>> from your explanation:
>> 
>> * the driver created a `Devres<T>` associated to their device.
>> * their physical device gets disconnected and thus the driver core
>>   starts unbinding the device.
>> * simultaneously, the driver drops the `Devres<T>` (eg because the
>>   driver initiated the physical removal)
>> * now `devres_callback` is being called from both `Devres::Drop` (which
>>   calls `Devres::remove_action`) and from the driver core.
>> * they both call `inner.data.revoke()`, but only one wins, in our
>>   example `Devres::drop`.
>> * but now the driver core has finished running `devres_callback` and
>>   finalizes unbinding the device, even though the `Devres` still exists
>>   though is almost done being dropped.
>
> Your "almost done being dropped" is close, actually Devres::drop() may or may
> not be done calling Revocable::revoke(), i.e. drop_in_place() of the data.
>
> CPU0						CPU1
>
> Devres::drop() {				devres_callback() {
> 	self.data.revoke() {				this.data.revoke() {
> 		is_available.swap() == true
> 								is_available.swap == false
> 							}
> 						}
>
> 						// [...]
> 						// driver fully unbound
> 		drop_in_place() {
> 			pci_iounmap()
> 			pci_release_region()
> 		}
> 	}
> }

Ahh I missed that the splitting point could be inside of `revoke`.

What are those `pci_iounmap` and `pci_release_region` calls? Ah maybe
they are from a `T` that expects the driver to still be bound when it is
being dropped? That would make a lot of sense.

If I'm correct, do you mind adding the above example execution to the
commit message? Including the information that the type `T` is allowed
to expect that the driver is still bound while being dropped. That would
make it obvious to me what the issue is.

> This means that we have to ensure that the revoke() in Devres::drop() is
> completed before devres_callback() completes, in case they race.

Agreed :)

---
Cheers,
Benno

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/3] rust: devres: fix race in Devres::drop()
  2025-06-12 10:31     ` Danilo Krummrich
@ 2025-06-12 11:04       ` Benno Lossin
  0 siblings, 0 replies; 28+ messages in thread
From: Benno Lossin @ 2025-06-12 11:04 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: gregkh, rafael, ojeda, alex.gaynor, boqun.feng, gary, bjorn3_gh,
	benno.lossin, a.hindborg, aliceryhl, tmgross, chrisi.schrefl,
	rust-for-linux, linux-kernel

On Thu Jun 12, 2025 at 12:31 PM CEST, Danilo Krummrich wrote:
> On Thu, Jun 12, 2025 at 10:13:29AM +0200, Benno Lossin wrote:
>> On Tue Jun 3, 2025 at 10:48 PM CEST, Danilo Krummrich wrote:
>> >  impl<T> Drop for Devres<T> {
>> >      fn drop(&mut self) {
>> > -        DevresInner::remove_action(&self.0);
>> > +        // SAFETY: When `drop` runs, it is guaranteed that nobody is accessing the revocable data
>> > +        // anymore, hence it is safe not to wait for the grace period to finish.
>> > +        if unsafe { self.revoke_nosync() } {
>> > +            // We revoked `self.0.data` before the devres action did, hence try to remove it.
>> > +            if !DevresInner::remove_action(&self.0) {
>> 
>> Shouldn't this not be inverted? (ie 's/!//')
>> 
>> Otherwise this will return `true`, get negated and we don't run the code
>> below and the `inner.data.revoke()` in `devres_callback` will return
>> `false` which will get negated and thus it will never return.
>
> DevresInner::remove_action() returns false it means that the devres action has
> already been taken from the list and will be run eventually, hence we have to
> complete the completion.
>
> If DevresInner::remove_action() returns true, it means that we removed the
> action from the list and the callback will never be exectuted, hence nothing to
> do.

Ahh I mixed up which call was which, I thought this was
`self.revoke_nosync()`...

---
Cheers,
Benno

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/3] rust: completion: implement initial abstraction
  2025-06-12 10:53       ` Benno Lossin
@ 2025-06-12 11:06         ` Danilo Krummrich
  2025-06-12 11:15           ` Benno Lossin
  0 siblings, 1 reply; 28+ messages in thread
From: Danilo Krummrich @ 2025-06-12 11:06 UTC (permalink / raw)
  To: Benno Lossin
  Cc: gregkh, rafael, ojeda, alex.gaynor, boqun.feng, gary, bjorn3_gh,
	benno.lossin, a.hindborg, aliceryhl, tmgross, chrisi.schrefl,
	rust-for-linux, linux-kernel, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Valentin Schneider

On Thu, Jun 12, 2025 at 12:53:45PM +0200, Benno Lossin wrote:
> On Thu Jun 12, 2025 at 12:35 PM CEST, Danilo Krummrich wrote:
> > On Thu, Jun 12, 2025 at 10:15:55AM +0200, Benno Lossin wrote:
> >> On Tue Jun 3, 2025 at 10:48 PM CEST, Danilo Krummrich wrote:
> >> > +    /// Signal all tasks waiting on this completion.
> >> > +    ///
> >> > +    /// This method wakes up all tasks waiting on this completion; after this operation the
> >> > +    /// completion is permanently done.
> >> > +    pub fn complete_all(&self) {
> >> > +        // SAFETY: `self.as_raw()` is a pointer to a valid `struct completion`.
> >> > +        unsafe { bindings::complete_all(self.as_raw()) };
> >> > +    }
> >> > +
> >> > +    /// Wait for completion of a task.
> >> > +    ///
> >> > +    /// This method waits for the completion of a task; it is not interruptible and there is no
> >> > +    /// timeout.
> >> 
> >> Another thing that we should document is weather this function returns
> >> immediately when `complete_all` was already called in the past.
> >
> > The details are all documented in [1], which is also linked in the module
> > documentation of this file.
> >
> > [1] https://docs.kernel.org/scheduler/completion.html
> 
> I dislike that we don't have the docs right there on the function.
> Following that link, there is also a lot of other stuff there that don't
> apply to Rust (eg initializing completions, and the
> wait_for_completion*() variants).
> 
> After a bit of reading, I found the part that I was looking for (by
> searching for `complete_all`...):
> 
>     A thread that wants to signal that the conditions for continuation have
>     been achieved calls `complete()` to signal exactly one of the waiters
>     that it can continue:
>     
>     ```c
>     void complete(struct completion *done)
>     ```
>     
>     ... or calls `complete_all()` to signal all current and future waiters:
>     
>     ```c
>     void complete_all(struct completion *done)
>     ```
> 
> Let's just put this information on the `complete_all` function.

It's already there, no?

"after this operation the completion is permanently done"

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/3] rust: completion: implement initial abstraction
  2025-06-12 11:06         ` Danilo Krummrich
@ 2025-06-12 11:15           ` Benno Lossin
  0 siblings, 0 replies; 28+ messages in thread
From: Benno Lossin @ 2025-06-12 11:15 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: gregkh, rafael, ojeda, alex.gaynor, boqun.feng, gary, bjorn3_gh,
	benno.lossin, a.hindborg, aliceryhl, tmgross, chrisi.schrefl,
	rust-for-linux, linux-kernel, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Valentin Schneider

On Thu Jun 12, 2025 at 1:06 PM CEST, Danilo Krummrich wrote:
> On Thu, Jun 12, 2025 at 12:53:45PM +0200, Benno Lossin wrote:
>> On Thu Jun 12, 2025 at 12:35 PM CEST, Danilo Krummrich wrote:
>> > On Thu, Jun 12, 2025 at 10:15:55AM +0200, Benno Lossin wrote:
>> >> On Tue Jun 3, 2025 at 10:48 PM CEST, Danilo Krummrich wrote:
>> >> > +    /// Signal all tasks waiting on this completion.
>> >> > +    ///
>> >> > +    /// This method wakes up all tasks waiting on this completion; after this operation the
>> >> > +    /// completion is permanently done.
>> >> > +    pub fn complete_all(&self) {
>> >> > +        // SAFETY: `self.as_raw()` is a pointer to a valid `struct completion`.
>> >> > +        unsafe { bindings::complete_all(self.as_raw()) };
>> >> > +    }
>> >> > +
>> >> > +    /// Wait for completion of a task.
>> >> > +    ///
>> >> > +    /// This method waits for the completion of a task; it is not interruptible and there is no
>> >> > +    /// timeout.
>> >> 
>> >> Another thing that we should document is weather this function returns
>> >> immediately when `complete_all` was already called in the past.
>> >
>> > The details are all documented in [1], which is also linked in the module
>> > documentation of this file.
>> >
>> > [1] https://docs.kernel.org/scheduler/completion.html
>> 
>> I dislike that we don't have the docs right there on the function.
>> Following that link, there is also a lot of other stuff there that don't
>> apply to Rust (eg initializing completions, and the
>> wait_for_completion*() variants).
>> 
>> After a bit of reading, I found the part that I was looking for (by
>> searching for `complete_all`...):
>> 
>>     A thread that wants to signal that the conditions for continuation have
>>     been achieved calls `complete()` to signal exactly one of the waiters
>>     that it can continue:
>>     
>>     ```c
>>     void complete(struct completion *done)
>>     ```
>>     
>>     ... or calls `complete_all()` to signal all current and future waiters:
>>     
>>     ```c
>>     void complete_all(struct completion *done)
>>     ```
>> 
>> Let's just put this information on the `complete_all` function.
>
> It's already there, no?
>
> "after this operation the completion is permanently done"

The phrasing in the C docs seems more obvious to me "signal to all
current and future waiters". The "permanently done" part is a bit
ambiguous to me.

---
Cheers,
Benno

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/3] rust: completion: implement initial abstraction
  2025-06-12  7:58   ` Benno Lossin
  2025-06-12 10:47     ` Danilo Krummrich
@ 2025-06-12 11:23     ` Alice Ryhl
  1 sibling, 0 replies; 28+ messages in thread
From: Alice Ryhl @ 2025-06-12 11:23 UTC (permalink / raw)
  To: Benno Lossin
  Cc: Danilo Krummrich, gregkh, rafael, ojeda, alex.gaynor, boqun.feng,
	gary, bjorn3_gh, benno.lossin, a.hindborg, tmgross,
	chrisi.schrefl, rust-for-linux, linux-kernel, Ingo Molnar,
	Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider

On Thu, Jun 12, 2025 at 9:58 AM Benno Lossin <lossin@kernel.org> wrote:
>
> On Tue Jun 3, 2025 at 10:48 PM CEST, Danilo Krummrich wrote:
> > Implement a minimal abstraction for the completion synchronization
> > primitive.
> >
> > This initial abstraction only adds complete_all() and
> > wait_for_completion(), since that is what is required for the subsequent
> > Devres patch.
> >
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Juri Lelli <juri.lelli@redhat.com>
> > Cc: Vincent Guittot <vincent.guittot@linaro.org>
> > Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > Cc: Ben Segall <bsegall@google.com>
> > Cc: Mel Gorman <mgorman@suse.de>
> > Cc: Valentin Schneider <vschneid@redhat.com>
> > Signed-off-by: Danilo Krummrich <dakr@kernel.org>
>
> I have a couple comments on the documentation, but the rest seems good.
> So with those fixed:
>
> Reviewed-by: Benno Lossin <lossin@kernel.org>
>
> > ---
> >  rust/bindings/bindings_helper.h |   1 +
> >  rust/helpers/completion.c       |   8 +++
> >  rust/helpers/helpers.c          |   1 +
> >  rust/kernel/sync.rs             |   2 +
> >  rust/kernel/sync/completion.rs  | 111 ++++++++++++++++++++++++++++++++
> >  5 files changed, 123 insertions(+)
> >  create mode 100644 rust/helpers/completion.c
> >  create mode 100644 rust/kernel/sync/completion.rs
> >
> > diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
> > index a5a6fb45d405..9da3fe89295c 100644
> > --- a/rust/bindings/bindings_helper.h
> > +++ b/rust/bindings/bindings_helper.h
> > @@ -17,6 +17,7 @@
> >  #include <linux/blk_types.h>
> >  #include <linux/blkdev.h>
> >  #include <linux/clk.h>
> > +#include <linux/completion.h>
> >  #include <linux/configfs.h>
> >  #include <linux/cpu.h>
> >  #include <linux/cpufreq.h>
> > diff --git a/rust/helpers/completion.c b/rust/helpers/completion.c
> > new file mode 100644
> > index 000000000000..b2443262a2ae
> > --- /dev/null
> > +++ b/rust/helpers/completion.c
> > @@ -0,0 +1,8 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +#include <linux/completion.h>
> > +
> > +void rust_helper_init_completion(struct completion *x)
> > +{
> > +     init_completion(x);
> > +}
> > diff --git a/rust/helpers/helpers.c b/rust/helpers/helpers.c
> > index 805307018f0e..7a5c520be8cb 100644
> > --- a/rust/helpers/helpers.c
> > +++ b/rust/helpers/helpers.c
> > @@ -13,6 +13,7 @@
> >  #include "build_assert.c"
> >  #include "build_bug.c"
> >  #include "clk.c"
> > +#include "completion.c"
> >  #include "cpufreq.c"
> >  #include "cpumask.c"
> >  #include "cred.c"
> > diff --git a/rust/kernel/sync.rs b/rust/kernel/sync.rs
> > index 36a719015583..c23a12639924 100644
> > --- a/rust/kernel/sync.rs
> > +++ b/rust/kernel/sync.rs
> > @@ -10,6 +10,7 @@
> >  use pin_init;
> >
> >  mod arc;
> > +pub mod completion;
> >  mod condvar;
> >  pub mod lock;
> >  mod locked_by;
> > @@ -17,6 +18,7 @@
> >  pub mod rcu;
> >
> >  pub use arc::{Arc, ArcBorrow, UniqueArc};
> > +pub use completion::Completion;
> >  pub use condvar::{new_condvar, CondVar, CondVarTimeoutResult};
> >  pub use lock::global::{global_lock, GlobalGuard, GlobalLock, GlobalLockBackend, GlobalLockedBy};
> >  pub use lock::mutex::{new_mutex, Mutex, MutexGuard};
> > diff --git a/rust/kernel/sync/completion.rs b/rust/kernel/sync/completion.rs
> > new file mode 100644
> > index 000000000000..4ec4c2aa73a5
> > --- /dev/null
> > +++ b/rust/kernel/sync/completion.rs
> > @@ -0,0 +1,111 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +//! Completion support.
> > +//!
> > +//! Reference: <https://docs.kernel.org/scheduler/completion.html>
> > +//!
> > +//! C header: [`include/linux/completion.h`](srctree/include/linux/completion.h)
> > +
> > +use crate::{bindings, prelude::*, types::Opaque};
> > +
> > +/// Synchronization primitive to signal when a certain task has been completed.
> > +///
> > +/// The [`Completion`] synchronization primitive signales when a certain task has been completed by
> > +/// waking up other tasks that can queue themselves up to wait for the [`Completion`] to be
>
> s/can queue themselves/have been queued/
>
> > +/// completed.
> > +///
> > +/// # Examples
> > +///
> > +/// ```
> > +/// use kernel::sync::{Arc, Completion};
> > +/// use kernel::workqueue::{self, impl_has_work, new_work, Work, WorkItem};
> > +///
> > +/// #[pin_data]
> > +/// struct MyTask {
> > +///     #[pin]
> > +///     work: Work<MyTask>,
>
> Can we maybe add a dummy value like `Mutex<i32>` here that the task
> changes, so we can print the value of it below (after waiting for the
> task)?

If there's something incorrect in the docs, sure, let's fix it. But
since this is supposed to land as a part of a fix, perhaps it would be
best to have the work of making these docs perfect be a follow-up that
can land in the usual merge window?

Alice

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/3] rust: devres: fix race in Devres::drop()
  2025-06-04  9:49     ` Danilo Krummrich
@ 2025-06-12 15:24       ` Boqun Feng
  2025-06-12 15:44         ` Danilo Krummrich
  0 siblings, 1 reply; 28+ messages in thread
From: Boqun Feng @ 2025-06-12 15:24 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: gregkh, rafael, ojeda, alex.gaynor, gary, bjorn3_gh, benno.lossin,
	a.hindborg, aliceryhl, tmgross, chrisi.schrefl, rust-for-linux,
	linux-kernel

On Wed, Jun 04, 2025 at 11:49:54AM +0200, Danilo Krummrich wrote:
> On Tue, Jun 03, 2025 at 04:26:01PM -0700, Boqun Feng wrote:
> > On Tue, Jun 03, 2025 at 10:48:52PM +0200, Danilo Krummrich wrote:
> > > In Devres::drop() we first remove the devres action and then drop the
> > > wrapped device resource.
> > > 
> > > The design goal is to give the owner of a Devres object control over when
> > > the device resource is dropped, but limit the overall scope to the
> > > corresponding device being bound to a driver.
> > > 
> > > However, there's a race that was introduced with commit 8ff656643d30
> > > ("rust: devres: remove action in `Devres::drop`"), but also has been
> > > (partially) present from the initial version on.
> > > 
> > > In Devres::drop(), the devres action is removed successfully and
> > > subsequently the destructor of the wrapped device resource runs.
> > > However, there is no guarantee that the destructor of the wrapped device
> > > resource completes before the driver core is done unbinding the
> > > corresponding device.
> > > 
> > > If in Devres::drop(), the devres action can't be removed, it means that
> > > the devres callback has been executed already, or is still running
> > > concurrently. In case of the latter, either Devres::drop() wins revoking
> > > the Revocable or the devres callback wins revoking the Revocable. If
> > > Devres::drop() wins, we (again) have no guarantee that the destructor of
> > > the wrapped device resource completes before the driver core is done
> > > unbinding the corresponding device.
> > > 
> > > Depending on the specific device resource, this can potentially lead to
> > > user-after-free bugs.
> > > 
> > 
> > This all sounds reasonable, one question though: it seems to me the
> > problem exists only for the device resources that expect the device
> > being bounded, so hypothetically if the device resources can be
> > programmed against unbound devices, then the current behavior should be
> > fine?
> 
> I don't think that such device resources exist from a semantical point of view.
> 
> We always have to guarantee that a driver released the device resources once the
> corresponding device is unbound from the driver.
> 
> However, there certainly are differences between how fatal it is if we don't do
> so.
> 
> Complementing your example below, if we for instance fail to release a memory
> region in time, a subsequent driver probing the device may fail requesting the
> corresponding region.
> 
> > For example, in your case, you want free_irq() to happen before
> > the device becomes unbound, which is of course reasonable, but it sounds
> > more like a design choice (or what device model we want to use), because
> > hypothetically you can program an irq that still works even if the
> > device is unbound, no?
> 
> You can, just like for every other registration (e.g. class devices, such as
> misc device), but it's sub-optimal, since then we could not treat the
> registering device of the registration as &Device<Bound>, which allows direct
> access to device resources with Devres::access(). Please see also [1] and [2].
> 
> We have two (safe and correct) ways to access device resources, one is the
> runtime checked access through Revocable::try_access() (which implies the RCU
> read-side critical section and atomic check); the other one is the compile-time
> checked access through providing a &Device<Bound> as cookie for directy access
> without runtime overhead.
> 
> Wherever possible, we want to enable the latter, which means that registrations
> need to be properly guarded.
> 
> [1] https://lore.kernel.org/lkml/20250530142447.166524-6-dakr@kernel.org/
> [2] https://lore.kernel.org/lkml/20250530142447.166524-7-dakr@kernel.org/
> 

Thanks for the explanation, and sorry I'm a bit late for the response. I
was trying to find a place that we should document this, how about the
diff below:

------------
diff --git a/rust/kernel/devres.rs b/rust/kernel/devres.rs
index 0f79a2ec9474..c8b9754e411b 100644
--- a/rust/kernel/devres.rs
+++ b/rust/kernel/devres.rs
@@ -31,7 +31,8 @@ struct DevresInner<T> {
 /// manage their lifetime.
 ///
 /// [`Device`] bound resources should be freed when either the resource goes out of scope or the
-/// [`Device`] is unbound respectively, depending on what happens first.
+/// [`Device`] is unbound respectively, depending on what happens first. And if the resource goes
+/// out of scope first, [`Device`] unbinding will wait until the resource being freed.
 ///
 /// To achieve that [`Devres`] registers a devres callback on creation, which is called once the
 /// [`Device`] is unbound, revoking access to the encapsulated resource (see also [`Revocable`]).


Regards,
Boqun

> > Again this sounds reasonable to me, just want to check my understanding
> > here.
> > 
> > Regards,
> > Boqun
> > 
> > > In order to fix this, implement the following logic.
> > > 
> > > In the devres callback, we're always good when we get to revoke the
> > > device resource ourselves, i.e. Revocable::revoke() returns true.
> > > 
> > > If Revocable::revoke() returns false, it means that Devres::drop(),
> > > concurrently, already drops the device resource and we have to wait for
> > > Devres::drop() to signal that it finished dropping the device resource.
> > > 
> > > Note that if we hit the case where we need to wait for the completion of
> > > Devres::drop() in the devres callback, it means that we're actually
> > > racing with a concurrent Devres::drop() call, which already started
> > > revoking the device resource for us. This is rather unlikely and means
> > > that the concurrent Devres::drop() already started doing our work and we
> > > just need to wait for it to complete it for us. Hence, there should not
> > > be any additional overhead from that.
> > > 
> > > (Actually, for now it's even better if Devres::drop() does the work for
> > > us, since it can bypass the synchronize_rcu() call implied by
> > > Revocable::revoke(), but this goes away anyways once I get to implement
> > > the split devres callback approach, which allows us to first flip the
> > > atomics of all registered Devres objects of a certain device, execute a
> > > single synchronize_rcu() and then drop all revocable objects.)
> > > 
> > > In Devres::drop() we try to revoke the device resource. If that is *not*
> > > successful, it means that the devres callback already did and we're good.
> > > 
> > > Otherwise, we try to remove the devres action, which, if successful,
> > > means that we're good, since the device resource has just been revoked
> > > by us *before* we removed the devres action successfully.
> > > 
> > > If the devres action could not be removed, it means that the devres
> > > callback must be running concurrently, hence we signal that the device
> > > resource has been revoked by us, using the completion.
> > > 
> > > This makes it safe to drop a Devres object from any task and at any point
> > > of time, which is one of the design goals.
> > > 
> > [...]

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/3] rust: devres: fix race in Devres::drop()
  2025-06-12 15:24       ` Boqun Feng
@ 2025-06-12 15:44         ` Danilo Krummrich
  2025-06-12 15:48           ` Boqun Feng
  0 siblings, 1 reply; 28+ messages in thread
From: Danilo Krummrich @ 2025-06-12 15:44 UTC (permalink / raw)
  To: Boqun Feng
  Cc: gregkh, rafael, ojeda, alex.gaynor, gary, bjorn3_gh, benno.lossin,
	a.hindborg, aliceryhl, tmgross, chrisi.schrefl, rust-for-linux,
	linux-kernel

On Thu, Jun 12, 2025 at 08:24:57AM -0700, Boqun Feng wrote:
> Thanks for the explanation, and sorry I'm a bit late for the response. I
> was trying to find a place that we should document this, how about the
> diff below:
> 
> ------------
> diff --git a/rust/kernel/devres.rs b/rust/kernel/devres.rs
> index 0f79a2ec9474..c8b9754e411b 100644
> --- a/rust/kernel/devres.rs
> +++ b/rust/kernel/devres.rs
> @@ -31,7 +31,8 @@ struct DevresInner<T> {
>  /// manage their lifetime.
>  ///
>  /// [`Device`] bound resources should be freed when either the resource goes out of scope or the
> -/// [`Device`] is unbound respectively, depending on what happens first.
> +/// [`Device`] is unbound respectively, depending on what happens first. And if the resource goes
> +/// out of scope first, [`Device`] unbinding will wait until the resource being freed.

I will add

	In any case, it is always guaranteed that revoking the device resource
	is completed before the corresponding [`Device`] is unbound.

when applying the patch, if that's fine with you.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/3] rust: devres: fix race in Devres::drop()
  2025-06-12 15:44         ` Danilo Krummrich
@ 2025-06-12 15:48           ` Boqun Feng
  0 siblings, 0 replies; 28+ messages in thread
From: Boqun Feng @ 2025-06-12 15:48 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: gregkh, rafael, ojeda, alex.gaynor, gary, bjorn3_gh, benno.lossin,
	a.hindborg, aliceryhl, tmgross, chrisi.schrefl, rust-for-linux,
	linux-kernel

On Thu, Jun 12, 2025 at 05:44:27PM +0200, Danilo Krummrich wrote:
> On Thu, Jun 12, 2025 at 08:24:57AM -0700, Boqun Feng wrote:
> > Thanks for the explanation, and sorry I'm a bit late for the response. I
> > was trying to find a place that we should document this, how about the
> > diff below:
> > 
> > ------------
> > diff --git a/rust/kernel/devres.rs b/rust/kernel/devres.rs
> > index 0f79a2ec9474..c8b9754e411b 100644
> > --- a/rust/kernel/devres.rs
> > +++ b/rust/kernel/devres.rs
> > @@ -31,7 +31,8 @@ struct DevresInner<T> {
> >  /// manage their lifetime.
> >  ///
> >  /// [`Device`] bound resources should be freed when either the resource goes out of scope or the
> > -/// [`Device`] is unbound respectively, depending on what happens first.
> > +/// [`Device`] is unbound respectively, depending on what happens first. And if the resource goes
> > +/// out of scope first, [`Device`] unbinding will wait until the resource being freed.
> 
> I will add
> 
> 	In any case, it is always guaranteed that revoking the device resource
> 	is completed before the corresponding [`Device`] is unbound.
> 
> when applying the patch, if that's fine with you.

Looks good to me, thanks!

Regards,
Boqun

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2025-06-12 15:48 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-03 20:48 [PATCH 0/3] Fix race condition in Devres Danilo Krummrich
2025-06-03 20:48 ` [PATCH 1/3] rust: completion: implement initial abstraction Danilo Krummrich
2025-06-06  9:00   ` Alice Ryhl
2025-06-11 20:01   ` Boqun Feng
2025-06-12  7:58   ` Benno Lossin
2025-06-12 10:47     ` Danilo Krummrich
2025-06-12 11:23     ` Alice Ryhl
2025-06-12  8:15   ` Benno Lossin
2025-06-12 10:35     ` Danilo Krummrich
2025-06-12 10:53       ` Benno Lossin
2025-06-12 11:06         ` Danilo Krummrich
2025-06-12 11:15           ` Benno Lossin
2025-06-03 20:48 ` [PATCH 2/3] rust: revocable: indicate whether `data` has been revoked already Danilo Krummrich
2025-06-12  7:59   ` Benno Lossin
2025-06-03 20:48 ` [PATCH 3/3] rust: devres: fix race in Devres::drop() Danilo Krummrich
2025-06-03 23:26   ` Boqun Feng
2025-06-04  9:49     ` Danilo Krummrich
2025-06-12 15:24       ` Boqun Feng
2025-06-12 15:44         ` Danilo Krummrich
2025-06-12 15:48           ` Boqun Feng
2025-06-12  8:13   ` Benno Lossin
2025-06-12  8:15     ` Alice Ryhl
2025-06-12  8:47       ` Benno Lossin
2025-06-12 10:26     ` Danilo Krummrich
2025-06-12 10:59       ` Benno Lossin
2025-06-12 10:31     ` Danilo Krummrich
2025-06-12 11:04       ` Benno Lossin
2025-06-04 12:36 ` [PATCH 0/3] Fix race condition in Devres Miguel Ojeda

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).