rust-for-linux.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Benno Lossin" <lossin@kernel.org>
To: "Danilo Krummrich" <dakr@kernel.org>,
	<gregkh@linuxfoundation.org>, <rafael@kernel.org>,
	<ojeda@kernel.org>, <alex.gaynor@gmail.com>,
	<boqun.feng@gmail.com>, <gary@garyguo.net>,
	<bjorn3_gh@protonmail.com>, <benno.lossin@proton.me>,
	<a.hindborg@kernel.org>, <aliceryhl@google.com>,
	<tmgross@umich.edu>
Cc: <rust-for-linux@vger.kernel.org>, <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v2 3/3] rust: devres: fix race in Devres::drop()
Date: Thu, 12 Jun 2025 15:07:11 +0200	[thread overview]
Message-ID: <DAKKT1ML27VO.35Q9I6SQHTYTX@kernel.org> (raw)
In-Reply-To: <20250612121817.1621-4-dakr@kernel.org>

On Thu Jun 12, 2025 at 2:17 PM CEST, Danilo Krummrich wrote:
> In Devres::drop() we first remove the devres action and then drop the
> wrapped device resource.
>
> The design goal is to give the owner of a Devres object control over when
> the device resource is dropped, but limit the overall scope to the
> corresponding device being bound to a driver.
>
> However, there's a race that was introduced with commit 8ff656643d30
> ("rust: devres: remove action in `Devres::drop`"), but also has been
> (partially) present from the initial version on.
>
> In Devres::drop(), the devres action is removed successfully and
> subsequently the destructor of the wrapped device resource runs.
> However, there is no guarantee that the destructor of the wrapped device
> resource completes before the driver core is done unbinding the
> corresponding device.
>
> If in Devres::drop(), the devres action can't be removed, it means that
> the devres callback has been executed already, or is still running
> concurrently. In case of the latter, either Devres::drop() wins revoking
> the Revocable or the devres callback wins revoking the Revocable. If
> Devres::drop() wins, we (again) have no guarantee that the destructor of
> the wrapped device resource completes before the driver core is done
> unbinding the corresponding device.
>
> CPU0					CPU1
> ------------------------------------------------------------------------
> Devres::drop() {			Devres::devres_callback() {
>    self.data.revoke() {			   this.data.revoke() {
>       is_available.swap() == true
> 					      is_available.swap == false
> 					   }
> 					}
>
> 					// [...]
> 					// device fully unbound
>       drop_in_place() {
>          // release device resource
>       }
>    }
> }

I forgot to mention: you used tabs, which breaks when tabstop is not set
to 8 (such as in my editor. This is how it looks for me ):

    ------------------------------------------------------------------------
    Devres::drop() {      Devres::devres_callback() {
       self.data.revoke() {         this.data.revoke() {
          is_available.swap() == true
                    is_available.swap == false
                 }
              }
   
              // [...]
              // device fully unbound
          drop_in_place() {
             // release device resource
          }
       }
    }

I personally would have used spaces for this, but it looks fine in my
gitlog, so feel free to keep it this way.

> Depending on the specific device resource, this can potentially lead to
> user-after-free bugs.
>
> In order to fix this, implement the following logic.
>
> In the devres callback, we're always good when we get to revoke the
> device resource ourselves, i.e. Revocable::revoke() returns true.
>
> If Revocable::revoke() returns false, it means that Devres::drop(),
> concurrently, already drops the device resource and we have to wait for
> Devres::drop() to signal that it finished dropping the device resource.
>
> Note that if we hit the case where we need to wait for the completion of
> Devres::drop() in the devres callback, it means that we're actually
> racing with a concurrent Devres::drop() call, which already started
> revoking the device resource for us. This is rather unlikely and means
> that the concurrent Devres::drop() already started doing our work and we
> just need to wait for it to complete it for us. Hence, there should not
> be any additional overhead from that.
>
> (Actually, for now it's even better if Devres::drop() does the work for
> us, since it can bypass the synchronize_rcu() call implied by
> Revocable::revoke(), but this goes away anyways once I get to implement
> the split devres callback approach, which allows us to first flip the
> atomics of all registered Devres objects of a certain device, execute a
> single synchronize_rcu() and then drop all revocable objects.)
>
> In Devres::drop() we try to revoke the device resource. If that is *not*
> successful, it means that the devres callback already did and we're good.
>
> Otherwise, we try to remove the devres action, which, if successful,
> means that we're good, since the device resource has just been revoked
> by us *before* we removed the devres action successfully.
>
> If the devres action could not be removed, it means that the devres
> callback must be running concurrently, hence we signal that the device
> resource has been revoked by us, using the completion.
>
> This makes it safe to drop a Devres object from any task and at any point
> of time, which is one of the design goals.
>
> Fixes: 76c01ded724b ("rust: add devres abstraction")
> Reported-by: Alice Ryhl <aliceryhl@google.com>
> Closes: https://lore.kernel.org/lkml/aD64YNuqbPPZHAa5@google.com/
> Signed-off-by: Danilo Krummrich <dakr@kernel.org>

Reviewed-by: Benno Lossin <lossin@kernel.org>

---
Cheers,
Benno

> ---
>  rust/kernel/devres.rs | 33 ++++++++++++++++++++++++++-------
>  1 file changed, 26 insertions(+), 7 deletions(-)

  reply	other threads:[~2025-06-12 13:07 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-12 12:17 [PATCH v2 0/3] Fix race condition in Devres Danilo Krummrich
2025-06-12 12:17 ` [PATCH v2 1/3] rust: completion: implement initial abstraction Danilo Krummrich
2025-06-13 15:51   ` Miguel Ojeda
2025-06-12 12:17 ` [PATCH v2 2/3] rust: revocable: indicate whether `data` has been revoked already Danilo Krummrich
2025-06-13 15:53   ` Miguel Ojeda
2025-06-12 12:17 ` [PATCH v2 3/3] rust: devres: fix race in Devres::drop() Danilo Krummrich
2025-06-12 13:07   ` Benno Lossin [this message]
2025-06-13 22:05 ` [PATCH v2 0/3] Fix race condition in Devres Danilo Krummrich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DAKKT1ML27VO.35Q9I6SQHTYTX@kernel.org \
    --to=lossin@kernel.org \
    --cc=a.hindborg@kernel.org \
    --cc=alex.gaynor@gmail.com \
    --cc=aliceryhl@google.com \
    --cc=benno.lossin@proton.me \
    --cc=bjorn3_gh@protonmail.com \
    --cc=boqun.feng@gmail.com \
    --cc=dakr@kernel.org \
    --cc=gary@garyguo.net \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ojeda@kernel.org \
    --cc=rafael@kernel.org \
    --cc=rust-for-linux@vger.kernel.org \
    --cc=tmgross@umich.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).