public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Alice Ryhl <aliceryhl@google.com>
To: Boqun Feng <boqun@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>,
	Boqun Feng <boqun.feng@gmail.com>,
	 "Paul E. McKenney" <paulmck@kernel.org>,
	Gary Guo <gary@garyguo.net>,
	 Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Carlos Llamas <cmllamas@google.com>,
	 linux-fsdevel@vger.kernel.org, rust-for-linux@vger.kernel.org,
	 linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 1/2] rust: poll: make PollCondVar upgradable
Date: Wed, 4 Mar 2026 07:59:59 +0000	[thread overview]
Message-ID: <aafmf5icyPIFcwf_@google.com> (raw)
In-Reply-To: <aadbyBmaV8zCYiog@tardis.local>

On Tue, Mar 03, 2026 at 02:08:08PM -0800, Boqun Feng wrote:
> On Fri, Feb 13, 2026 at 11:29:41AM +0000, Alice Ryhl wrote:
> > Rust Binder currently uses PollCondVar, but it calls synchronize_rcu()
> > in the destructor, which we would like to avoid. Add a variation of
> > PollCondVar, which uses kfree_rcu() instead.
> > 
> > Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> > ---
> >  rust/kernel/sync/poll.rs | 160 ++++++++++++++++++++++++++++++++++++++++++++++-
> >  1 file changed, 159 insertions(+), 1 deletion(-)
> > 
> > diff --git a/rust/kernel/sync/poll.rs b/rust/kernel/sync/poll.rs
> > index 0ec985d560c8d3405c08dbd86e48b14c7c34484d..9555f818a24d777dd908fca849015c3490ce38d3 100644
> > --- a/rust/kernel/sync/poll.rs
> > +++ b/rust/kernel/sync/poll.rs
> > @@ -5,12 +5,21 @@
> >  //! Utilities for working with `struct poll_table`.
> >  
> >  use crate::{
> > +    alloc::AllocError,
> >      bindings,
> > +    container_of,
> >      fs::File,
> >      prelude::*,
> > +    sync::atomic::{Acquire, Atomic, Relaxed, Release},
> > +    sync::lock::{Backend, Lock},
> >      sync::{CondVar, LockClassKey},
> > +    types::Opaque, //
> > +};
> > +use core::{
> > +    marker::{PhantomData, PhantomPinned},
> > +    ops::Deref,
> > +    ptr,
> >  };
> > -use core::{marker::PhantomData, ops::Deref};
> >  
> >  /// Creates a [`PollCondVar`] initialiser with the given name and a newly-created lock class.
> >  #[macro_export]
> > @@ -66,6 +75,7 @@ pub fn register_wait(&self, file: &File, cv: &PollCondVar) {
> >  ///
> >  /// [`CondVar`]: crate::sync::CondVar
> >  #[pin_data(PinnedDrop)]
> > +#[repr(transparent)]
> >  pub struct PollCondVar {
> >      #[pin]
> >      inner: CondVar,
> > @@ -78,6 +88,17 @@ pub fn new(name: &'static CStr, key: Pin<&'static LockClassKey>) -> impl PinInit
> >              inner <- CondVar::new(name, key),
> >          })
> >      }
> > +
> > +    /// Use this `CondVar` as a `PollCondVar`.
> > +    ///
> > +    /// # Safety
> > +    ///
> > +    /// After the last use of the returned `&PollCondVar`, `__wake_up_pollfree` must be called on
> > +    /// the `wait_queue_head` at least one grace period before the `CondVar` is destroyed.
> > +    unsafe fn from_non_poll(c: &CondVar) -> &PollCondVar {
> > +        // SAFETY: Layout is the same. Caller ensures that PollTables are cleared in time.
> > +        unsafe { &*ptr::from_ref(c).cast() }
> > +    }
> >  }
> >  
> >  // Make the `CondVar` methods callable on `PollCondVar`.
> > @@ -104,3 +125,140 @@ fn drop(self: Pin<&mut Self>) {
> >          unsafe { bindings::synchronize_rcu() };
> >      }
> >  }
> > +
> > +/// Wrapper around [`CondVar`] that can be upgraded to [`PollCondVar`].
> > +///
> > +/// By using this wrapper, you can avoid rcu for cases that don't use [`PollTable`], and in all
> > +/// cases you can avoid `synchronize_rcu()`.
> > +///
> > +/// # Invariants
> > +///
> > +/// `active` either references `simple`, or a `kmalloc` allocation holding an
> > +/// `UpgradePollCondVarInner`. In the latter case, the allocation remains valid until
> > +/// `Self::drop()` plus one grace period.
> > +#[pin_data(PinnedDrop)]
> > +pub struct UpgradePollCondVar {
> > +    #[pin]
> > +    simple: CondVar,
> > +    active: Atomic<*const CondVar>,
> > +    #[pin]
> > +    _pin: PhantomPinned,
> > +}
> > +
> > +#[pin_data]
> > +#[repr(C)]
> > +struct UpgradePollCondVarInner {
> > +    #[pin]
> > +    upgraded: CondVar,
> > +    #[pin]
> > +    rcu: Opaque<bindings::callback_head>,
> > +}
> > +
> > +impl UpgradePollCondVar {
> > +    /// Constructs a new upgradable condvar initialiser.
> > +    pub fn new(name: &'static CStr, key: Pin<&'static LockClassKey>) -> impl PinInit<Self> {
> > +        pin_init!(&this in Self {
> > +            simple <- CondVar::new(name, key),
> > +            // SAFETY: `this->simple` is in-bounds. Pointer remains valid since this type is
> > +            // pinned.
> > +            active: Atomic::new(unsafe { &raw const (*this.as_ptr()).simple }),
> > +            _pin: PhantomPinned,
> > +        })
> > +    }
> > +
> > +    /// Obtain a [`PollCondVar`], upgrading if necessary.
> > +    ///
> > +    /// You should use the same lock as what is passed to the `wait_*` methods. Otherwise wakeups
> > +    /// may be missed.
> > +    pub fn poll<T: ?Sized, B: Backend>(
> > +        &self,
> > +        lock: &Lock<T, B>,
> > +        name: &'static CStr,
> > +        key: Pin<&'static LockClassKey>,
> > +    ) -> Result<&PollCondVar, AllocError> {
> > +        let mut ptr = self.active.load(Acquire);
> > +        if ptr::eq(ptr, &self.simple) {
> > +            self.upgrade(lock, name, key)?;
> > +            ptr = self.active.load(Acquire);
> > +            debug_assert_ne!(ptr, ptr::from_ref(&self.simple));
> > +        }
> > +        // SAFETY: Signature ensures that last use of returned `&PollCondVar` is before drop(), and
> > +        // drop() calls `__wake_up_pollfree` followed by waiting a grace period before the
> > +        // `CondVar` is destroyed.
> > +        Ok(unsafe { PollCondVar::from_non_poll(&*ptr) })
> > +    }
> > +
> > +    fn upgrade<T: ?Sized, B: Backend>(
> > +        &self,
> > +        lock: &Lock<T, B>,
> > +        name: &'static CStr,
> > +        key: Pin<&'static LockClassKey>,
> > +    ) -> Result<(), AllocError> {
> > +        let upgraded = KBox::pin_init(
> > +            pin_init!(UpgradePollCondVarInner {
> > +                upgraded <- CondVar::new(name, key),
> > +                rcu: Opaque::uninit(),
> > +            }),
> > +            GFP_KERNEL,
> > +        )
> > +        .map_err(|_| AllocError)?;
> > +
> > +        // SAFETY: The value is treated as pinned.
> > +        let upgraded = KBox::into_raw(unsafe { Pin::into_inner_unchecked(upgraded) });
> > +
> > +        let res = self.active.cmpxchg(
> > +            ptr::from_ref(&self.simple),
> > +            // SAFETY: This operation stays in-bounds of the above allocation.
> > +            unsafe { &raw mut (*upgraded).upgraded },
> > +            Release,
> > +        );
> > +
> > +        if res.is_err() {
> > +            // Already upgraded, so still succeess.
> > +            // SAFETY: The cmpxchg failed, so take back ownership of the box.
> > +            drop(unsafe { KBox::from_raw(upgraded) });
> > +            return Ok(());
> > +        }
> > +
> > +        // If a normal waiter registers in parallel with us, then either:
> > +        // * We took the lock first. In that case, the waiter sees the above cmpxchg.
> > +        // * They took the lock first. In that case, we wake them up below.
> > +        drop(lock.lock());
> > +        self.simple.notify_all();
> 
> Hmm.. what if the waiter gets its `&CondVar` before `upgrade()` and use
> that directly?
> 
> 	<waiter>				<in upgrade()>
> 	let poll_cv: &UpgradePollCondVar = ...;
> 	let cv = poll_cv.deref();
> 						cmpxchg();
> 						drop(lock.lock());
> 						self.simple.notify_all();
> 	let mut guard = lock.lock();
> 	cv.wait(&mut guard);
> 
> we still miss the wake-up, right?
> 
> It's creative, but I particularly hate we use an empty lock critical
> section to synchronize ;-)

I guess instead of exposing Deref, I can just implement `wait` directly
on `UpgradePollCondVar`. Then this API misuse is not possible.

> Do you think the complexity of a dynamic upgrading is worthwhile, or we
> should just use the box-allocated PollCondVar unconditionally?
> 
> I think if the current users won't benefit from the dynamic upgrading
> then we can avoid the complexity. We can always add it back later.
> Thoughts?

I do actually think it's worthwhile to consider:

I started an Android device running this. It created 3961 instances of
`UpgradePollCondVar` during the hour it ran, but only 5 were upgraded.

Alice

  reply	other threads:[~2026-03-04  8:00 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-13 11:29 [PATCH v2 0/2] Avoid synchronize_rcu() for every thread drop in Rust Binder Alice Ryhl
2026-02-13 11:29 ` [PATCH v2 1/2] rust: poll: make PollCondVar upgradable Alice Ryhl
2026-03-03 22:08   ` Boqun Feng
2026-03-04  7:59     ` Alice Ryhl [this message]
2026-03-04 16:29       ` Boqun Feng
2026-03-04 21:37         ` Alice Ryhl
2026-03-04 23:36           ` Boqun Feng
2026-02-13 11:29 ` [PATCH v2 2/2] rust_binder: use UpgradePollCondVar Alice Ryhl

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aafmf5icyPIFcwf_@google.com \
    --to=aliceryhl@google.com \
    --cc=boqun.feng@gmail.com \
    --cc=boqun@kernel.org \
    --cc=brauner@kernel.org \
    --cc=cmllamas@google.com \
    --cc=gary@garyguo.net \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@kernel.org \
    --cc=rust-for-linux@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox