All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Benno Lossin" <lossin@kernel.org>
To: "Danilo Krummrich" <dakr@kernel.org>,
	<gregkh@linuxfoundation.org>, <rafael@kernel.org>,
	<ojeda@kernel.org>, <alex.gaynor@gmail.com>,
	<boqun.feng@gmail.com>, <gary@garyguo.net>,
	<bjorn3_gh@protonmail.com>, <benno.lossin@proton.me>,
	<a.hindborg@kernel.org>, <aliceryhl@google.com>,
	<tmgross@umich.edu>, <chrisi.schrefl@gmail.com>
Cc: <rust-for-linux@vger.kernel.org>, <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 3/3] rust: devres: fix race in Devres::drop()
Date: Thu, 12 Jun 2025 10:13:29 +0200	[thread overview]
Message-ID: <DAKEK5YPNCAU.3LQGI98GGG4KF@kernel.org> (raw)
In-Reply-To: <20250603205416.49281-4-dakr@kernel.org>

On Tue Jun 3, 2025 at 10:48 PM CEST, Danilo Krummrich wrote:
> In Devres::drop() we first remove the devres action and then drop the
> wrapped device resource.
>
> The design goal is to give the owner of a Devres object control over when
> the device resource is dropped, but limit the overall scope to the
> corresponding device being bound to a driver.
>
> However, there's a race that was introduced with commit 8ff656643d30
> ("rust: devres: remove action in `Devres::drop`"), but also has been
> (partially) present from the initial version on.
>
> In Devres::drop(), the devres action is removed successfully and
> subsequently the destructor of the wrapped device resource runs.
> However, there is no guarantee that the destructor of the wrapped device
> resource completes before the driver core is done unbinding the
> corresponding device.
>
> If in Devres::drop(), the devres action can't be removed, it means that
> the devres callback has been executed already, or is still running
> concurrently. In case of the latter, either Devres::drop() wins revoking
> the Revocable or the devres callback wins revoking the Revocable. If
> Devres::drop() wins, we (again) have no guarantee that the destructor of
> the wrapped device resource completes before the driver core is done
> unbinding the corresponding device.

I don't understand the exact sequence of events here. Here is what I got
from your explanation:

* the driver created a `Devres<T>` associated to their device.
* their physical device gets disconnected and thus the driver core
  starts unbinding the device.
* simultaneously, the driver drops the `Devres<T>` (eg because the
  driver initiated the physical removal)
* now `devres_callback` is being called from both `Devres::Drop` (which
  calls `Devres::remove_action`) and from the driver core.
* they both call `inner.data.revoke()`, but only one wins, in our
  example `Devres::drop`.
* but now the driver core has finished running `devres_callback` and
  finalizes unbinding the device, even though the `Devres` still exists
  though is almost done being dropped.

I don't see a race here. Also the `dev: ARef<Device>` should keep the
device alive until the `Devres` is dropped, no?

> Depending on the specific device resource, this can potentially lead to
> user-after-free bugs.
>
> In order to fix this, implement the following logic.
>
> In the devres callback, we're always good when we get to revoke the
> device resource ourselves, i.e. Revocable::revoke() returns true.
>
> If Revocable::revoke() returns false, it means that Devres::drop(),
> concurrently, already drops the device resource and we have to wait for
> Devres::drop() to signal that it finished dropping the device resource.
>
> Note that if we hit the case where we need to wait for the completion of
> Devres::drop() in the devres callback, it means that we're actually
> racing with a concurrent Devres::drop() call, which already started
> revoking the device resource for us. This is rather unlikely and means
> that the concurrent Devres::drop() already started doing our work and we
> just need to wait for it to complete it for us. Hence, there should not
> be any additional overhead from that.
>
> (Actually, for now it's even better if Devres::drop() does the work for
> us, since it can bypass the synchronize_rcu() call implied by
> Revocable::revoke(), but this goes away anyways once I get to implement
> the split devres callback approach, which allows us to first flip the
> atomics of all registered Devres objects of a certain device, execute a
> single synchronize_rcu() and then drop all revocable objects.)
>
> In Devres::drop() we try to revoke the device resource. If that is *not*
> successful, it means that the devres callback already did and we're good.
>
> Otherwise, we try to remove the devres action, which, if successful,
> means that we're good, since the device resource has just been revoked
> by us *before* we removed the devres action successfully.
>
> If the devres action could not be removed, it means that the devres
> callback must be running concurrently, hence we signal that the device
> resource has been revoked by us, using the completion.
>
> This makes it safe to drop a Devres object from any task and at any point
> of time, which is one of the design goals.
>
> Fixes: 8ff656643d30 ("rust: devres: remove action in `Devres::drop`") [1]
> Reported-by: Alice Ryhl <aliceryhl@google.com>
> Closes: https://lore.kernel.org/lkml/aD64YNuqbPPZHAa5@google.com/
> Signed-off-by: Danilo Krummrich <dakr@kernel.org>
> ---
>  rust/kernel/devres.rs | 33 ++++++++++++++++++++++++++-------
>  1 file changed, 26 insertions(+), 7 deletions(-)

> @@ -161,7 +166,12 @@ fn remove_action(this: &Arc<Self>) {
>          //         `DevresInner::new`.
>          let inner = unsafe { Arc::from_raw(ptr) };
>  
> -        inner.data.revoke();
> +        if !inner.data.revoke() {
> +            // If `revoke()` returns false, it means that `Devres::drop` already started revoking
> +            // `inner.data` for us. Hence we have to wait until `Devres::drop()` signals that it
> +            // completed revoking `inner.data`.
> +            inner.revoke.wait_for_completion();
> +        }
>      }
>  }
>  
> @@ -232,6 +242,15 @@ fn deref(&self) -> &Self::Target {
>  
>  impl<T> Drop for Devres<T> {
>      fn drop(&mut self) {
> -        DevresInner::remove_action(&self.0);
> +        // SAFETY: When `drop` runs, it is guaranteed that nobody is accessing the revocable data
> +        // anymore, hence it is safe not to wait for the grace period to finish.
> +        if unsafe { self.revoke_nosync() } {
> +            // We revoked `self.0.data` before the devres action did, hence try to remove it.
> +            if !DevresInner::remove_action(&self.0) {

Shouldn't this not be inverted? (ie 's/!//')

Otherwise this will return `true`, get negated and we don't run the code
below and the `inner.data.revoke()` in `devres_callback` will return
`false` which will get negated and thus it will never return.

---
Cheers,
Benno

> +                // We could not remove the devres action, which means that it now runs concurrently,
> +                // hence signal that `self.0.data` has been revoked successfully.
> +                self.0.revoke.complete_all();
> +            }
> +        }
>      }
>  }


  parent reply	other threads:[~2025-06-12  8:13 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-03 20:48 [PATCH 0/3] Fix race condition in Devres Danilo Krummrich
2025-06-03 20:48 ` [PATCH 1/3] rust: completion: implement initial abstraction Danilo Krummrich
2025-06-06  9:00   ` Alice Ryhl
2025-06-11 20:01   ` Boqun Feng
2025-06-12  7:58   ` Benno Lossin
2025-06-12 10:47     ` Danilo Krummrich
2025-06-12 11:23     ` Alice Ryhl
2025-06-12  8:15   ` Benno Lossin
2025-06-12 10:35     ` Danilo Krummrich
2025-06-12 10:53       ` Benno Lossin
2025-06-12 11:06         ` Danilo Krummrich
2025-06-12 11:15           ` Benno Lossin
2025-06-03 20:48 ` [PATCH 2/3] rust: revocable: indicate whether `data` has been revoked already Danilo Krummrich
2025-06-12  7:59   ` Benno Lossin
2025-06-03 20:48 ` [PATCH 3/3] rust: devres: fix race in Devres::drop() Danilo Krummrich
2025-06-03 23:26   ` Boqun Feng
2025-06-04  9:49     ` Danilo Krummrich
2025-06-12 15:24       ` Boqun Feng
2025-06-12 15:44         ` Danilo Krummrich
2025-06-12 15:48           ` Boqun Feng
2025-06-12  8:13   ` Benno Lossin [this message]
2025-06-12  8:15     ` Alice Ryhl
2025-06-12  8:47       ` Benno Lossin
2025-06-12 10:26     ` Danilo Krummrich
2025-06-12 10:59       ` Benno Lossin
2025-06-12 10:31     ` Danilo Krummrich
2025-06-12 11:04       ` Benno Lossin
2025-06-04 12:36 ` [PATCH 0/3] Fix race condition in Devres Miguel Ojeda

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DAKEK5YPNCAU.3LQGI98GGG4KF@kernel.org \
    --to=lossin@kernel.org \
    --cc=a.hindborg@kernel.org \
    --cc=alex.gaynor@gmail.com \
    --cc=aliceryhl@google.com \
    --cc=benno.lossin@proton.me \
    --cc=bjorn3_gh@protonmail.com \
    --cc=boqun.feng@gmail.com \
    --cc=chrisi.schrefl@gmail.com \
    --cc=dakr@kernel.org \
    --cc=gary@garyguo.net \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ojeda@kernel.org \
    --cc=rafael@kernel.org \
    --cc=rust-for-linux@vger.kernel.org \
    --cc=tmgross@umich.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.