* [PATCH v2 0/2] Avoid synchronize_rcu() for every thread drop in Rust Binder
@ 2026-02-13 11:29 Alice Ryhl
2026-02-13 11:29 ` [PATCH v2 1/2] rust: poll: make PollCondVar upgradable Alice Ryhl
2026-02-13 11:29 ` [PATCH v2 2/2] rust_binder: use UpgradePollCondVar Alice Ryhl
0 siblings, 2 replies; 8+ messages in thread
From: Alice Ryhl @ 2026-02-13 11:29 UTC (permalink / raw)
To: Christian Brauner, Boqun Feng, Paul E. McKenney
Cc: Gary Guo, Greg Kroah-Hartman, Carlos Llamas, linux-fsdevel,
rust-for-linux, linux-kernel, Alice Ryhl
Right now Rust Binder calls synchronize_rcu() more often than is
necessary. Most processes do not use epoll at all, so they don't require
rcu here. Back in Kangrejos I came up with a way to avoid this. Idea is
to move the value that needs rcu to a separate allocation that's easy to
kfree_rcu(). We pay the allocation only when the proc uses epoll using
an "upgrade" strategy - most processes don't.
One solution that may be better is to just kfree_rcu() the Binder Thread
struct directly when it's refcount drops to zero (Drop of everything
else can still run without a grace period). But I'm not sure how I would
achieve that - after all Thread is also used with kernel::list.
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
---
Changes in v2:
- Change how Rust Binder handles the lock class key.
- Rebase.
- Link to v1: https://lore.kernel.org/r/20260117-upgrade-poll-v1-0-179437b7bd49@google.com
---
Alice Ryhl (2):
rust: poll: make PollCondVar upgradable
rust_binder: use UpgradePollCondVar
drivers/android/binder/process.rs | 2 +-
drivers/android/binder/thread.rs | 25 +++---
rust/kernel/sync/poll.rs | 160 +++++++++++++++++++++++++++++++++++++-
3 files changed, 176 insertions(+), 11 deletions(-)
---
base-commit: de718b2ca866e10e2a26c259ab0493a5af411879
change-id: 20260117-upgrade-poll-37ee2a7a79dd
Best regards,
--
Alice Ryhl <aliceryhl@google.com>
^ permalink raw reply [flat|nested] 8+ messages in thread* [PATCH v2 1/2] rust: poll: make PollCondVar upgradable 2026-02-13 11:29 [PATCH v2 0/2] Avoid synchronize_rcu() for every thread drop in Rust Binder Alice Ryhl @ 2026-02-13 11:29 ` Alice Ryhl 2026-03-03 22:08 ` Boqun Feng 2026-02-13 11:29 ` [PATCH v2 2/2] rust_binder: use UpgradePollCondVar Alice Ryhl 1 sibling, 1 reply; 8+ messages in thread From: Alice Ryhl @ 2026-02-13 11:29 UTC (permalink / raw) To: Christian Brauner, Boqun Feng, Paul E. McKenney Cc: Gary Guo, Greg Kroah-Hartman, Carlos Llamas, linux-fsdevel, rust-for-linux, linux-kernel, Alice Ryhl Rust Binder currently uses PollCondVar, but it calls synchronize_rcu() in the destructor, which we would like to avoid. Add a variation of PollCondVar, which uses kfree_rcu() instead. Signed-off-by: Alice Ryhl <aliceryhl@google.com> --- rust/kernel/sync/poll.rs | 160 ++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 159 insertions(+), 1 deletion(-) diff --git a/rust/kernel/sync/poll.rs b/rust/kernel/sync/poll.rs index 0ec985d560c8d3405c08dbd86e48b14c7c34484d..9555f818a24d777dd908fca849015c3490ce38d3 100644 --- a/rust/kernel/sync/poll.rs +++ b/rust/kernel/sync/poll.rs @@ -5,12 +5,21 @@ //! Utilities for working with `struct poll_table`. use crate::{ + alloc::AllocError, bindings, + container_of, fs::File, prelude::*, + sync::atomic::{Acquire, Atomic, Relaxed, Release}, + sync::lock::{Backend, Lock}, sync::{CondVar, LockClassKey}, + types::Opaque, // +}; +use core::{ + marker::{PhantomData, PhantomPinned}, + ops::Deref, + ptr, }; -use core::{marker::PhantomData, ops::Deref}; /// Creates a [`PollCondVar`] initialiser with the given name and a newly-created lock class. #[macro_export] @@ -66,6 +75,7 @@ pub fn register_wait(&self, file: &File, cv: &PollCondVar) { /// /// [`CondVar`]: crate::sync::CondVar #[pin_data(PinnedDrop)] +#[repr(transparent)] pub struct PollCondVar { #[pin] inner: CondVar, @@ -78,6 +88,17 @@ pub fn new(name: &'static CStr, key: Pin<&'static LockClassKey>) -> impl PinInit inner <- CondVar::new(name, key), }) } + + /// Use this `CondVar` as a `PollCondVar`. + /// + /// # Safety + /// + /// After the last use of the returned `&PollCondVar`, `__wake_up_pollfree` must be called on + /// the `wait_queue_head` at least one grace period before the `CondVar` is destroyed. + unsafe fn from_non_poll(c: &CondVar) -> &PollCondVar { + // SAFETY: Layout is the same. Caller ensures that PollTables are cleared in time. + unsafe { &*ptr::from_ref(c).cast() } + } } // Make the `CondVar` methods callable on `PollCondVar`. @@ -104,3 +125,140 @@ fn drop(self: Pin<&mut Self>) { unsafe { bindings::synchronize_rcu() }; } } + +/// Wrapper around [`CondVar`] that can be upgraded to [`PollCondVar`]. +/// +/// By using this wrapper, you can avoid rcu for cases that don't use [`PollTable`], and in all +/// cases you can avoid `synchronize_rcu()`. +/// +/// # Invariants +/// +/// `active` either references `simple`, or a `kmalloc` allocation holding an +/// `UpgradePollCondVarInner`. In the latter case, the allocation remains valid until +/// `Self::drop()` plus one grace period. +#[pin_data(PinnedDrop)] +pub struct UpgradePollCondVar { + #[pin] + simple: CondVar, + active: Atomic<*const CondVar>, + #[pin] + _pin: PhantomPinned, +} + +#[pin_data] +#[repr(C)] +struct UpgradePollCondVarInner { + #[pin] + upgraded: CondVar, + #[pin] + rcu: Opaque<bindings::callback_head>, +} + +impl UpgradePollCondVar { + /// Constructs a new upgradable condvar initialiser. + pub fn new(name: &'static CStr, key: Pin<&'static LockClassKey>) -> impl PinInit<Self> { + pin_init!(&this in Self { + simple <- CondVar::new(name, key), + // SAFETY: `this->simple` is in-bounds. Pointer remains valid since this type is + // pinned. + active: Atomic::new(unsafe { &raw const (*this.as_ptr()).simple }), + _pin: PhantomPinned, + }) + } + + /// Obtain a [`PollCondVar`], upgrading if necessary. + /// + /// You should use the same lock as what is passed to the `wait_*` methods. Otherwise wakeups + /// may be missed. + pub fn poll<T: ?Sized, B: Backend>( + &self, + lock: &Lock<T, B>, + name: &'static CStr, + key: Pin<&'static LockClassKey>, + ) -> Result<&PollCondVar, AllocError> { + let mut ptr = self.active.load(Acquire); + if ptr::eq(ptr, &self.simple) { + self.upgrade(lock, name, key)?; + ptr = self.active.load(Acquire); + debug_assert_ne!(ptr, ptr::from_ref(&self.simple)); + } + // SAFETY: Signature ensures that last use of returned `&PollCondVar` is before drop(), and + // drop() calls `__wake_up_pollfree` followed by waiting a grace period before the + // `CondVar` is destroyed. + Ok(unsafe { PollCondVar::from_non_poll(&*ptr) }) + } + + fn upgrade<T: ?Sized, B: Backend>( + &self, + lock: &Lock<T, B>, + name: &'static CStr, + key: Pin<&'static LockClassKey>, + ) -> Result<(), AllocError> { + let upgraded = KBox::pin_init( + pin_init!(UpgradePollCondVarInner { + upgraded <- CondVar::new(name, key), + rcu: Opaque::uninit(), + }), + GFP_KERNEL, + ) + .map_err(|_| AllocError)?; + + // SAFETY: The value is treated as pinned. + let upgraded = KBox::into_raw(unsafe { Pin::into_inner_unchecked(upgraded) }); + + let res = self.active.cmpxchg( + ptr::from_ref(&self.simple), + // SAFETY: This operation stays in-bounds of the above allocation. + unsafe { &raw mut (*upgraded).upgraded }, + Release, + ); + + if res.is_err() { + // Already upgraded, so still succeess. + // SAFETY: The cmpxchg failed, so take back ownership of the box. + drop(unsafe { KBox::from_raw(upgraded) }); + return Ok(()); + } + + // If a normal waiter registers in parallel with us, then either: + // * We took the lock first. In that case, the waiter sees the above cmpxchg. + // * They took the lock first. In that case, we wake them up below. + drop(lock.lock()); + self.simple.notify_all(); + + Ok(()) + } +} + +// Make the `CondVar` methods callable on `UpgradePollCondVar`. +impl Deref for UpgradePollCondVar { + type Target = CondVar; + + fn deref(&self) -> &CondVar { + // SAFETY: By the type invariants, this is either `&self.simple` or references an + // allocation that lives until `UpgradePollCondVar::drop`. + unsafe { &*self.active.load(Acquire) } + } +} + +#[pinned_drop] +impl PinnedDrop for UpgradePollCondVar { + #[inline] + fn drop(self: Pin<&mut Self>) { + // ORDERING: All calls to upgrade happens-before Drop, so no synchronization is required. + let ptr = self.active.load(Relaxed); + if ptr::eq(ptr, &self.simple) { + return; + } + // SAFETY: When the pointer is not &self.active, it is an `UpgradePollCondVarInner`. + let ptr = unsafe { container_of!(ptr.cast_mut(), UpgradePollCondVarInner, upgraded) }; + // SAFETY: The pointer points at a valid `wait_queue_head`. + unsafe { bindings::__wake_up_pollfree((*ptr).upgraded.wait_queue_head.get()) }; + // This skips drop of `CondVar`, but that's ok because we reimplemented its drop here. + // + // SAFETY: `__wake_up_pollfree` ensures that all registered PollTable instances are gone in + // one grace period, and this is the destructor so no new PollTable instances can be + // registered. Thus, it's safety to rcu free the `UpgradePollCondVarInner`. + unsafe { bindings::kvfree_call_rcu((*ptr).rcu.get(), ptr.cast::<c_void>()) }; + } +} -- 2.53.0.273.g2a3d683680-goog ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/2] rust: poll: make PollCondVar upgradable 2026-02-13 11:29 ` [PATCH v2 1/2] rust: poll: make PollCondVar upgradable Alice Ryhl @ 2026-03-03 22:08 ` Boqun Feng 2026-03-04 7:59 ` Alice Ryhl 0 siblings, 1 reply; 8+ messages in thread From: Boqun Feng @ 2026-03-03 22:08 UTC (permalink / raw) To: Alice Ryhl Cc: Christian Brauner, Boqun Feng, Paul E. McKenney, Gary Guo, Greg Kroah-Hartman, Carlos Llamas, linux-fsdevel, rust-for-linux, linux-kernel On Fri, Feb 13, 2026 at 11:29:41AM +0000, Alice Ryhl wrote: > Rust Binder currently uses PollCondVar, but it calls synchronize_rcu() > in the destructor, which we would like to avoid. Add a variation of > PollCondVar, which uses kfree_rcu() instead. > > Signed-off-by: Alice Ryhl <aliceryhl@google.com> > --- > rust/kernel/sync/poll.rs | 160 ++++++++++++++++++++++++++++++++++++++++++++++- > 1 file changed, 159 insertions(+), 1 deletion(-) > > diff --git a/rust/kernel/sync/poll.rs b/rust/kernel/sync/poll.rs > index 0ec985d560c8d3405c08dbd86e48b14c7c34484d..9555f818a24d777dd908fca849015c3490ce38d3 100644 > --- a/rust/kernel/sync/poll.rs > +++ b/rust/kernel/sync/poll.rs > @@ -5,12 +5,21 @@ > //! Utilities for working with `struct poll_table`. > > use crate::{ > + alloc::AllocError, > bindings, > + container_of, > fs::File, > prelude::*, > + sync::atomic::{Acquire, Atomic, Relaxed, Release}, > + sync::lock::{Backend, Lock}, > sync::{CondVar, LockClassKey}, > + types::Opaque, // > +}; > +use core::{ > + marker::{PhantomData, PhantomPinned}, > + ops::Deref, > + ptr, > }; > -use core::{marker::PhantomData, ops::Deref}; > > /// Creates a [`PollCondVar`] initialiser with the given name and a newly-created lock class. > #[macro_export] > @@ -66,6 +75,7 @@ pub fn register_wait(&self, file: &File, cv: &PollCondVar) { > /// > /// [`CondVar`]: crate::sync::CondVar > #[pin_data(PinnedDrop)] > +#[repr(transparent)] > pub struct PollCondVar { > #[pin] > inner: CondVar, > @@ -78,6 +88,17 @@ pub fn new(name: &'static CStr, key: Pin<&'static LockClassKey>) -> impl PinInit > inner <- CondVar::new(name, key), > }) > } > + > + /// Use this `CondVar` as a `PollCondVar`. > + /// > + /// # Safety > + /// > + /// After the last use of the returned `&PollCondVar`, `__wake_up_pollfree` must be called on > + /// the `wait_queue_head` at least one grace period before the `CondVar` is destroyed. > + unsafe fn from_non_poll(c: &CondVar) -> &PollCondVar { > + // SAFETY: Layout is the same. Caller ensures that PollTables are cleared in time. > + unsafe { &*ptr::from_ref(c).cast() } > + } > } > > // Make the `CondVar` methods callable on `PollCondVar`. > @@ -104,3 +125,140 @@ fn drop(self: Pin<&mut Self>) { > unsafe { bindings::synchronize_rcu() }; > } > } > + > +/// Wrapper around [`CondVar`] that can be upgraded to [`PollCondVar`]. > +/// > +/// By using this wrapper, you can avoid rcu for cases that don't use [`PollTable`], and in all > +/// cases you can avoid `synchronize_rcu()`. > +/// > +/// # Invariants > +/// > +/// `active` either references `simple`, or a `kmalloc` allocation holding an > +/// `UpgradePollCondVarInner`. In the latter case, the allocation remains valid until > +/// `Self::drop()` plus one grace period. > +#[pin_data(PinnedDrop)] > +pub struct UpgradePollCondVar { > + #[pin] > + simple: CondVar, > + active: Atomic<*const CondVar>, > + #[pin] > + _pin: PhantomPinned, > +} > + > +#[pin_data] > +#[repr(C)] > +struct UpgradePollCondVarInner { > + #[pin] > + upgraded: CondVar, > + #[pin] > + rcu: Opaque<bindings::callback_head>, > +} > + > +impl UpgradePollCondVar { > + /// Constructs a new upgradable condvar initialiser. > + pub fn new(name: &'static CStr, key: Pin<&'static LockClassKey>) -> impl PinInit<Self> { > + pin_init!(&this in Self { > + simple <- CondVar::new(name, key), > + // SAFETY: `this->simple` is in-bounds. Pointer remains valid since this type is > + // pinned. > + active: Atomic::new(unsafe { &raw const (*this.as_ptr()).simple }), > + _pin: PhantomPinned, > + }) > + } > + > + /// Obtain a [`PollCondVar`], upgrading if necessary. > + /// > + /// You should use the same lock as what is passed to the `wait_*` methods. Otherwise wakeups > + /// may be missed. > + pub fn poll<T: ?Sized, B: Backend>( > + &self, > + lock: &Lock<T, B>, > + name: &'static CStr, > + key: Pin<&'static LockClassKey>, > + ) -> Result<&PollCondVar, AllocError> { > + let mut ptr = self.active.load(Acquire); > + if ptr::eq(ptr, &self.simple) { > + self.upgrade(lock, name, key)?; > + ptr = self.active.load(Acquire); > + debug_assert_ne!(ptr, ptr::from_ref(&self.simple)); > + } > + // SAFETY: Signature ensures that last use of returned `&PollCondVar` is before drop(), and > + // drop() calls `__wake_up_pollfree` followed by waiting a grace period before the > + // `CondVar` is destroyed. > + Ok(unsafe { PollCondVar::from_non_poll(&*ptr) }) > + } > + > + fn upgrade<T: ?Sized, B: Backend>( > + &self, > + lock: &Lock<T, B>, > + name: &'static CStr, > + key: Pin<&'static LockClassKey>, > + ) -> Result<(), AllocError> { > + let upgraded = KBox::pin_init( > + pin_init!(UpgradePollCondVarInner { > + upgraded <- CondVar::new(name, key), > + rcu: Opaque::uninit(), > + }), > + GFP_KERNEL, > + ) > + .map_err(|_| AllocError)?; > + > + // SAFETY: The value is treated as pinned. > + let upgraded = KBox::into_raw(unsafe { Pin::into_inner_unchecked(upgraded) }); > + > + let res = self.active.cmpxchg( > + ptr::from_ref(&self.simple), > + // SAFETY: This operation stays in-bounds of the above allocation. > + unsafe { &raw mut (*upgraded).upgraded }, > + Release, > + ); > + > + if res.is_err() { > + // Already upgraded, so still succeess. > + // SAFETY: The cmpxchg failed, so take back ownership of the box. > + drop(unsafe { KBox::from_raw(upgraded) }); > + return Ok(()); > + } > + > + // If a normal waiter registers in parallel with us, then either: > + // * We took the lock first. In that case, the waiter sees the above cmpxchg. > + // * They took the lock first. In that case, we wake them up below. > + drop(lock.lock()); > + self.simple.notify_all(); Hmm.. what if the waiter gets its `&CondVar` before `upgrade()` and use that directly? <waiter> <in upgrade()> let poll_cv: &UpgradePollCondVar = ...; let cv = poll_cv.deref(); cmpxchg(); drop(lock.lock()); self.simple.notify_all(); let mut guard = lock.lock(); cv.wait(&mut guard); we still miss the wake-up, right? It's creative, but I particularly hate we use an empty lock critical section to synchronize ;-) Do you think the complexity of a dynamic upgrading is worthwhile, or we should just use the box-allocated PollCondVar unconditionally? I think if the current users won't benefit from the dynamic upgrading then we can avoid the complexity. We can always add it back later. Thoughts? Regards, Boqun > + > + Ok(()) > + } > +} > + > +// Make the `CondVar` methods callable on `UpgradePollCondVar`. > +impl Deref for UpgradePollCondVar { > + type Target = CondVar; > + > + fn deref(&self) -> &CondVar { > + // SAFETY: By the type invariants, this is either `&self.simple` or references an > + // allocation that lives until `UpgradePollCondVar::drop`. > + unsafe { &*self.active.load(Acquire) } > + } > +} > + > +#[pinned_drop] > +impl PinnedDrop for UpgradePollCondVar { > + #[inline] > + fn drop(self: Pin<&mut Self>) { > + // ORDERING: All calls to upgrade happens-before Drop, so no synchronization is required. > + let ptr = self.active.load(Relaxed); > + if ptr::eq(ptr, &self.simple) { > + return; > + } > + // SAFETY: When the pointer is not &self.active, it is an `UpgradePollCondVarInner`. > + let ptr = unsafe { container_of!(ptr.cast_mut(), UpgradePollCondVarInner, upgraded) }; > + // SAFETY: The pointer points at a valid `wait_queue_head`. > + unsafe { bindings::__wake_up_pollfree((*ptr).upgraded.wait_queue_head.get()) }; > + // This skips drop of `CondVar`, but that's ok because we reimplemented its drop here. > + // > + // SAFETY: `__wake_up_pollfree` ensures that all registered PollTable instances are gone in > + // one grace period, and this is the destructor so no new PollTable instances can be > + // registered. Thus, it's safety to rcu free the `UpgradePollCondVarInner`. > + unsafe { bindings::kvfree_call_rcu((*ptr).rcu.get(), ptr.cast::<c_void>()) }; > + } > +} > > -- > 2.53.0.273.g2a3d683680-goog > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/2] rust: poll: make PollCondVar upgradable 2026-03-03 22:08 ` Boqun Feng @ 2026-03-04 7:59 ` Alice Ryhl 2026-03-04 16:29 ` Boqun Feng 0 siblings, 1 reply; 8+ messages in thread From: Alice Ryhl @ 2026-03-04 7:59 UTC (permalink / raw) To: Boqun Feng Cc: Christian Brauner, Boqun Feng, Paul E. McKenney, Gary Guo, Greg Kroah-Hartman, Carlos Llamas, linux-fsdevel, rust-for-linux, linux-kernel On Tue, Mar 03, 2026 at 02:08:08PM -0800, Boqun Feng wrote: > On Fri, Feb 13, 2026 at 11:29:41AM +0000, Alice Ryhl wrote: > > Rust Binder currently uses PollCondVar, but it calls synchronize_rcu() > > in the destructor, which we would like to avoid. Add a variation of > > PollCondVar, which uses kfree_rcu() instead. > > > > Signed-off-by: Alice Ryhl <aliceryhl@google.com> > > --- > > rust/kernel/sync/poll.rs | 160 ++++++++++++++++++++++++++++++++++++++++++++++- > > 1 file changed, 159 insertions(+), 1 deletion(-) > > > > diff --git a/rust/kernel/sync/poll.rs b/rust/kernel/sync/poll.rs > > index 0ec985d560c8d3405c08dbd86e48b14c7c34484d..9555f818a24d777dd908fca849015c3490ce38d3 100644 > > --- a/rust/kernel/sync/poll.rs > > +++ b/rust/kernel/sync/poll.rs > > @@ -5,12 +5,21 @@ > > //! Utilities for working with `struct poll_table`. > > > > use crate::{ > > + alloc::AllocError, > > bindings, > > + container_of, > > fs::File, > > prelude::*, > > + sync::atomic::{Acquire, Atomic, Relaxed, Release}, > > + sync::lock::{Backend, Lock}, > > sync::{CondVar, LockClassKey}, > > + types::Opaque, // > > +}; > > +use core::{ > > + marker::{PhantomData, PhantomPinned}, > > + ops::Deref, > > + ptr, > > }; > > -use core::{marker::PhantomData, ops::Deref}; > > > > /// Creates a [`PollCondVar`] initialiser with the given name and a newly-created lock class. > > #[macro_export] > > @@ -66,6 +75,7 @@ pub fn register_wait(&self, file: &File, cv: &PollCondVar) { > > /// > > /// [`CondVar`]: crate::sync::CondVar > > #[pin_data(PinnedDrop)] > > +#[repr(transparent)] > > pub struct PollCondVar { > > #[pin] > > inner: CondVar, > > @@ -78,6 +88,17 @@ pub fn new(name: &'static CStr, key: Pin<&'static LockClassKey>) -> impl PinInit > > inner <- CondVar::new(name, key), > > }) > > } > > + > > + /// Use this `CondVar` as a `PollCondVar`. > > + /// > > + /// # Safety > > + /// > > + /// After the last use of the returned `&PollCondVar`, `__wake_up_pollfree` must be called on > > + /// the `wait_queue_head` at least one grace period before the `CondVar` is destroyed. > > + unsafe fn from_non_poll(c: &CondVar) -> &PollCondVar { > > + // SAFETY: Layout is the same. Caller ensures that PollTables are cleared in time. > > + unsafe { &*ptr::from_ref(c).cast() } > > + } > > } > > > > // Make the `CondVar` methods callable on `PollCondVar`. > > @@ -104,3 +125,140 @@ fn drop(self: Pin<&mut Self>) { > > unsafe { bindings::synchronize_rcu() }; > > } > > } > > + > > +/// Wrapper around [`CondVar`] that can be upgraded to [`PollCondVar`]. > > +/// > > +/// By using this wrapper, you can avoid rcu for cases that don't use [`PollTable`], and in all > > +/// cases you can avoid `synchronize_rcu()`. > > +/// > > +/// # Invariants > > +/// > > +/// `active` either references `simple`, or a `kmalloc` allocation holding an > > +/// `UpgradePollCondVarInner`. In the latter case, the allocation remains valid until > > +/// `Self::drop()` plus one grace period. > > +#[pin_data(PinnedDrop)] > > +pub struct UpgradePollCondVar { > > + #[pin] > > + simple: CondVar, > > + active: Atomic<*const CondVar>, > > + #[pin] > > + _pin: PhantomPinned, > > +} > > + > > +#[pin_data] > > +#[repr(C)] > > +struct UpgradePollCondVarInner { > > + #[pin] > > + upgraded: CondVar, > > + #[pin] > > + rcu: Opaque<bindings::callback_head>, > > +} > > + > > +impl UpgradePollCondVar { > > + /// Constructs a new upgradable condvar initialiser. > > + pub fn new(name: &'static CStr, key: Pin<&'static LockClassKey>) -> impl PinInit<Self> { > > + pin_init!(&this in Self { > > + simple <- CondVar::new(name, key), > > + // SAFETY: `this->simple` is in-bounds. Pointer remains valid since this type is > > + // pinned. > > + active: Atomic::new(unsafe { &raw const (*this.as_ptr()).simple }), > > + _pin: PhantomPinned, > > + }) > > + } > > + > > + /// Obtain a [`PollCondVar`], upgrading if necessary. > > + /// > > + /// You should use the same lock as what is passed to the `wait_*` methods. Otherwise wakeups > > + /// may be missed. > > + pub fn poll<T: ?Sized, B: Backend>( > > + &self, > > + lock: &Lock<T, B>, > > + name: &'static CStr, > > + key: Pin<&'static LockClassKey>, > > + ) -> Result<&PollCondVar, AllocError> { > > + let mut ptr = self.active.load(Acquire); > > + if ptr::eq(ptr, &self.simple) { > > + self.upgrade(lock, name, key)?; > > + ptr = self.active.load(Acquire); > > + debug_assert_ne!(ptr, ptr::from_ref(&self.simple)); > > + } > > + // SAFETY: Signature ensures that last use of returned `&PollCondVar` is before drop(), and > > + // drop() calls `__wake_up_pollfree` followed by waiting a grace period before the > > + // `CondVar` is destroyed. > > + Ok(unsafe { PollCondVar::from_non_poll(&*ptr) }) > > + } > > + > > + fn upgrade<T: ?Sized, B: Backend>( > > + &self, > > + lock: &Lock<T, B>, > > + name: &'static CStr, > > + key: Pin<&'static LockClassKey>, > > + ) -> Result<(), AllocError> { > > + let upgraded = KBox::pin_init( > > + pin_init!(UpgradePollCondVarInner { > > + upgraded <- CondVar::new(name, key), > > + rcu: Opaque::uninit(), > > + }), > > + GFP_KERNEL, > > + ) > > + .map_err(|_| AllocError)?; > > + > > + // SAFETY: The value is treated as pinned. > > + let upgraded = KBox::into_raw(unsafe { Pin::into_inner_unchecked(upgraded) }); > > + > > + let res = self.active.cmpxchg( > > + ptr::from_ref(&self.simple), > > + // SAFETY: This operation stays in-bounds of the above allocation. > > + unsafe { &raw mut (*upgraded).upgraded }, > > + Release, > > + ); > > + > > + if res.is_err() { > > + // Already upgraded, so still succeess. > > + // SAFETY: The cmpxchg failed, so take back ownership of the box. > > + drop(unsafe { KBox::from_raw(upgraded) }); > > + return Ok(()); > > + } > > + > > + // If a normal waiter registers in parallel with us, then either: > > + // * We took the lock first. In that case, the waiter sees the above cmpxchg. > > + // * They took the lock first. In that case, we wake them up below. > > + drop(lock.lock()); > > + self.simple.notify_all(); > > Hmm.. what if the waiter gets its `&CondVar` before `upgrade()` and use > that directly? > > <waiter> <in upgrade()> > let poll_cv: &UpgradePollCondVar = ...; > let cv = poll_cv.deref(); > cmpxchg(); > drop(lock.lock()); > self.simple.notify_all(); > let mut guard = lock.lock(); > cv.wait(&mut guard); > > we still miss the wake-up, right? > > It's creative, but I particularly hate we use an empty lock critical > section to synchronize ;-) I guess instead of exposing Deref, I can just implement `wait` directly on `UpgradePollCondVar`. Then this API misuse is not possible. > Do you think the complexity of a dynamic upgrading is worthwhile, or we > should just use the box-allocated PollCondVar unconditionally? > > I think if the current users won't benefit from the dynamic upgrading > then we can avoid the complexity. We can always add it back later. > Thoughts? I do actually think it's worthwhile to consider: I started an Android device running this. It created 3961 instances of `UpgradePollCondVar` during the hour it ran, but only 5 were upgraded. Alice ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/2] rust: poll: make PollCondVar upgradable 2026-03-04 7:59 ` Alice Ryhl @ 2026-03-04 16:29 ` Boqun Feng 2026-03-04 21:37 ` Alice Ryhl 0 siblings, 1 reply; 8+ messages in thread From: Boqun Feng @ 2026-03-04 16:29 UTC (permalink / raw) To: Alice Ryhl Cc: Christian Brauner, Boqun Feng, Paul E. McKenney, Gary Guo, Greg Kroah-Hartman, Carlos Llamas, linux-fsdevel, rust-for-linux, linux-kernel On Wed, Mar 04, 2026 at 07:59:59AM +0000, Alice Ryhl wrote: [...] > > > + // If a normal waiter registers in parallel with us, then either: > > > + // * We took the lock first. In that case, the waiter sees the above cmpxchg. > > > + // * They took the lock first. In that case, we wake them up below. > > > + drop(lock.lock()); > > > + self.simple.notify_all(); > > > > Hmm.. what if the waiter gets its `&CondVar` before `upgrade()` and use > > that directly? > > > > <waiter> <in upgrade()> > > let poll_cv: &UpgradePollCondVar = ...; > > let cv = poll_cv.deref(); > > cmpxchg(); > > drop(lock.lock()); > > self.simple.notify_all(); > > let mut guard = lock.lock(); > > cv.wait(&mut guard); > > > > we still miss the wake-up, right? > > > > It's creative, but I particularly hate we use an empty lock critical > > section to synchronize ;-) > > I guess instead of exposing Deref, I can just implement `wait` directly > on `UpgradePollCondVar`. Then this API misuse is not possible. > If we do that,then we can avoid the `drop(lock.lock())` as well, if we do: impl UpgradePollCondVar { pub fn wait(...) { prepare_to_wait_exclusive(); // <- this will take lock in // simple.wait_queue_head. So // either upgrade() comes // first, or they observe the // wait being queued. let cv_ptr = self.active.load(Relaxed); if !ptr_eq(cv_ptr, &self.simple) { // We have moved from // simple, so need to // need to wake up and // redo the wait. finish_wait(); } else { guard.do_unlock(|| { schedule_timeout(); }); finish_wait(); } } } (CondVar::notify*() will take the wait_queue_head lock as well) > > Do you think the complexity of a dynamic upgrading is worthwhile, or we > > should just use the box-allocated PollCondVar unconditionally? > > > > I think if the current users won't benefit from the dynamic upgrading > > then we can avoid the complexity. We can always add it back later. > > Thoughts? > > I do actually think it's worthwhile to consider: > > I started an Android device running this. It created 3961 instances of > `UpgradePollCondVar` during the hour it ran, but only 5 were upgraded. > That makes sense, thank you for providing the data! But still I think we should be more informative about the performance difference between dynamic upgrading vs. unconditionally box-allocated PollCondVar, because I would assume when a `UpgradePollCondVar` is created, other allocations also happen as well (e.g. when creating a Arc<binder::Thread>), so the extra cost of the allocation may be unnoticeable. Regards, Boqun > Alice ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/2] rust: poll: make PollCondVar upgradable 2026-03-04 16:29 ` Boqun Feng @ 2026-03-04 21:37 ` Alice Ryhl 2026-03-04 23:36 ` Boqun Feng 0 siblings, 1 reply; 8+ messages in thread From: Alice Ryhl @ 2026-03-04 21:37 UTC (permalink / raw) To: Boqun Feng Cc: Christian Brauner, Boqun Feng, Paul E. McKenney, Gary Guo, Greg Kroah-Hartman, Carlos Llamas, linux-fsdevel, rust-for-linux, linux-kernel On Wed, Mar 04, 2026 at 08:29:12AM -0800, Boqun Feng wrote: > On Wed, Mar 04, 2026 at 07:59:59AM +0000, Alice Ryhl wrote: > [...] > > > > + // If a normal waiter registers in parallel with us, then either: > > > > + // * We took the lock first. In that case, the waiter sees the above cmpxchg. > > > > + // * They took the lock first. In that case, we wake them up below. > > > > + drop(lock.lock()); > > > > + self.simple.notify_all(); > > > > > > Hmm.. what if the waiter gets its `&CondVar` before `upgrade()` and use > > > that directly? > > > > > > <waiter> <in upgrade()> > > > let poll_cv: &UpgradePollCondVar = ...; > > > let cv = poll_cv.deref(); > > > cmpxchg(); > > > drop(lock.lock()); > > > self.simple.notify_all(); > > > let mut guard = lock.lock(); > > > cv.wait(&mut guard); > > > > > > we still miss the wake-up, right? > > > > > > It's creative, but I particularly hate we use an empty lock critical > > > section to synchronize ;-) > > > > I guess instead of exposing Deref, I can just implement `wait` directly > > on `UpgradePollCondVar`. Then this API misuse is not possible. > > > > If we do that,then we can avoid the `drop(lock.lock())` as well, if we > do: > > impl UpgradePollCondVar { > pub fn wait(...) { > prepare_to_wait_exclusive(); // <- this will take lock in > // simple.wait_queue_head. So > // either upgrade() comes > // first, or they observe the > // wait being queued. > let cv_ptr = self.active.load(Relaxed); > if !ptr_eq(cv_ptr, &self.simple) { // We have moved from > // simple, so need to > // need to wake up and > // redo the wait. > finish_wait(); > } else { > guard.do_unlock(|| { schedule_timeout(); }); > finish_wait(); > } > } > } > > (CondVar::notify*() will take the wait_queue_head lock as well) Yeah but then I have to actually re-implement those methods and not just call them. Seems not worth it. > > > Do you think the complexity of a dynamic upgrading is worthwhile, or we > > > should just use the box-allocated PollCondVar unconditionally? > > > > > > I think if the current users won't benefit from the dynamic upgrading > > > then we can avoid the complexity. We can always add it back later. > > > Thoughts? > > > > I do actually think it's worthwhile to consider: > > > > I started an Android device running this. It created 3961 instances of > > `UpgradePollCondVar` during the hour it ran, but only 5 were upgraded. > > > > That makes sense, thank you for providing the data! But still I think we > should be more informative about the performance difference between > dynamic upgrading vs. unconditionally box-allocated PollCondVar, because > I would assume when a `UpgradePollCondVar` is created, other allocations > also happen as well (e.g. when creating a Arc<binder::Thread>), so the > extra cost of the allocation may be unnoticeable. Perf-wise it doesn't matter, but I'm concerned about memory usage. Alice ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/2] rust: poll: make PollCondVar upgradable 2026-03-04 21:37 ` Alice Ryhl @ 2026-03-04 23:36 ` Boqun Feng 0 siblings, 0 replies; 8+ messages in thread From: Boqun Feng @ 2026-03-04 23:36 UTC (permalink / raw) To: Alice Ryhl Cc: Christian Brauner, Boqun Feng, Paul E. McKenney, Gary Guo, Greg Kroah-Hartman, Carlos Llamas, linux-fsdevel, rust-for-linux, linux-kernel On Wed, Mar 04, 2026 at 09:37:45PM +0000, Alice Ryhl wrote: > On Wed, Mar 04, 2026 at 08:29:12AM -0800, Boqun Feng wrote: > > On Wed, Mar 04, 2026 at 07:59:59AM +0000, Alice Ryhl wrote: > > [...] > > > > > + // If a normal waiter registers in parallel with us, then either: > > > > > + // * We took the lock first. In that case, the waiter sees the above cmpxchg. > > > > > + // * They took the lock first. In that case, we wake them up below. > > > > > + drop(lock.lock()); > > > > > + self.simple.notify_all(); > > > > > > > > Hmm.. what if the waiter gets its `&CondVar` before `upgrade()` and use > > > > that directly? > > > > > > > > <waiter> <in upgrade()> > > > > let poll_cv: &UpgradePollCondVar = ...; > > > > let cv = poll_cv.deref(); > > > > cmpxchg(); > > > > drop(lock.lock()); > > > > self.simple.notify_all(); > > > > let mut guard = lock.lock(); > > > > cv.wait(&mut guard); > > > > > > > > we still miss the wake-up, right? > > > > > > > > It's creative, but I particularly hate we use an empty lock critical > > > > section to synchronize ;-) > > > > > > I guess instead of exposing Deref, I can just implement `wait` directly > > > on `UpgradePollCondVar`. Then this API misuse is not possible. > > > > > > > If we do that,then we can avoid the `drop(lock.lock())` as well, if we > > do: > > > > impl UpgradePollCondVar { > > pub fn wait(...) { > > prepare_to_wait_exclusive(); // <- this will take lock in > > // simple.wait_queue_head. So > > // either upgrade() comes > > // first, or they observe the > > // wait being queued. > > let cv_ptr = self.active.load(Relaxed); > > if !ptr_eq(cv_ptr, &self.simple) { // We have moved from > > // simple, so need to > > // need to wake up and > > // redo the wait. > > finish_wait(); > > } else { > > guard.do_unlock(|| { schedule_timeout(); }); > > finish_wait(); > > } > > } > > } > > > > (CondVar::notify*() will take the wait_queue_head lock as well) > > Yeah but then I have to actually re-implement those methods and not just > call them. Seems not worth it. > We can pass a closure to wait_*() as condition: fn wait_internal<T: ?Sized, B: Backend>( &self, wait_state: c_int, guard: &mut Guard<'_, T, B>, cond: Some(FnOnce() -> bool), timeout_in_jiffies: c_long, ) -> c_long { I'm not just suggesting this because it helps in this case. In a more general pattern (if you see ___wait_event() macro in include/linux/wait.h), the condition checking after prepare_to_wait*() is needed to prevent wake-up misses. So maybe in long-term, we will have the case that we need to check the condition for `CondVar` as well. Plus, you don't need to pass a &Lock to poll() if you do this ;-) > > > > Do you think the complexity of a dynamic upgrading is worthwhile, or we > > > > should just use the box-allocated PollCondVar unconditionally? > > > > > > > > I think if the current users won't benefit from the dynamic upgrading > > > > then we can avoid the complexity. We can always add it back later. > > > > Thoughts? > > > > > > I do actually think it's worthwhile to consider: > > > > > > I started an Android device running this. It created 3961 instances of > > > `UpgradePollCondVar` during the hour it ran, but only 5 were upgraded. > > > > > > > That makes sense, thank you for providing the data! But still I think we > > should be more informative about the performance difference between > > dynamic upgrading vs. unconditionally box-allocated PollCondVar, because > > I would assume when a `UpgradePollCondVar` is created, other allocations > > also happen as well (e.g. when creating a Arc<binder::Thread>), so the > > extra cost of the allocation may be unnoticeable. > > Perf-wise it doesn't matter, but I'm concerned about memory usage. > Let's see, we are comparing the memory cost between: (assuming on a 64-bit system, and LOCKDEP=n) struct UpgradePollCondVar { simple: CondVar, // <- 24 bytes (1 spinlock + 2 pointers) active: Atomic<*const UpgradePollCondVarInner>, // <- 8 bytes. // but +40 extra // bytes on the // heap in the // worst case. } vs struct BoxedPollCondVar { active: Box<UpgradePollCondVarInner>, // <- 8 bytes, but +40 // extra bytes on the heap } that's extra 16 bytes per binder::Thread, but binder::Thread itself is more than 100 bytes. Of course it's up to binder whether 16 bytes per thread is a lot or not, but to me, I would choose the simplicity ;-) Regards, Boqun > Alice ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v2 2/2] rust_binder: use UpgradePollCondVar 2026-02-13 11:29 [PATCH v2 0/2] Avoid synchronize_rcu() for every thread drop in Rust Binder Alice Ryhl 2026-02-13 11:29 ` [PATCH v2 1/2] rust: poll: make PollCondVar upgradable Alice Ryhl @ 2026-02-13 11:29 ` Alice Ryhl 1 sibling, 0 replies; 8+ messages in thread From: Alice Ryhl @ 2026-02-13 11:29 UTC (permalink / raw) To: Christian Brauner, Boqun Feng, Paul E. McKenney Cc: Gary Guo, Greg Kroah-Hartman, Carlos Llamas, linux-fsdevel, rust-for-linux, linux-kernel, Alice Ryhl Most processes do not use Rust Binder with epoll, so avoid paying the synchronize_rcu() cost in drop for those that don't need it. For those that do, we also manage to replace synchronize_rcu() with kfree_rcu(), though we introduce an extra allocation. In case the last ref to an Arc<Thread> is dropped outside of deferred_release(), this also ensures that synchronize_rcu() is not called in destructor of Arc<Thread> in other places. Theoretically that could lead to jank by making other syscalls slow, though I have not seen it happen in practice. Signed-off-by: Alice Ryhl <aliceryhl@google.com> --- drivers/android/binder/process.rs | 2 +- drivers/android/binder/thread.rs | 25 ++++++++++++++++--------- 2 files changed, 17 insertions(+), 10 deletions(-) diff --git a/drivers/android/binder/process.rs b/drivers/android/binder/process.rs index 132055b4790f0ec69a87635b498909df2bf475e2..9374f1a86766c09321b57e565b6317cc290ea32b 100644 --- a/drivers/android/binder/process.rs +++ b/drivers/android/binder/process.rs @@ -1684,7 +1684,7 @@ pub(crate) fn poll( table: PollTable<'_>, ) -> Result<u32> { let thread = this.get_current_thread()?; - let (from_proc, mut mask) = thread.poll(file, table); + let (from_proc, mut mask) = thread.poll(file, table)?; if mask == 0 && from_proc && !this.inner.lock().work.is_empty() { mask |= bindings::POLLIN; } diff --git a/drivers/android/binder/thread.rs b/drivers/android/binder/thread.rs index 82264db06507d4641b60cbed96af482a9d36e7b2..a07210405c64e19984f49777a9d2c7b218944755 100644 --- a/drivers/android/binder/thread.rs +++ b/drivers/android/binder/thread.rs @@ -16,8 +16,8 @@ seq_file::SeqFile, seq_print, sync::atomic::{ordering::Relaxed, Atomic}, - sync::poll::{PollCondVar, PollTable}, - sync::{Arc, SpinLock}, + sync::poll::{PollTable, UpgradePollCondVar}, + sync::{Arc, LockClassKey, SpinLock}, task::Task, types::ARef, uaccess::UserSlice, @@ -35,7 +35,7 @@ BinderReturnWriter, DArc, DLArc, DTRWrap, DeliverCode, DeliverToRead, }; -use core::mem::size_of; +use core::{mem::size_of, pin::Pin}; /// Stores the layout of the scatter-gather entries. This is used during the `translate_objects` /// call and is discarded when it returns. @@ -412,7 +412,7 @@ pub(crate) struct Thread { #[pin] inner: SpinLock<InnerThread>, #[pin] - work_condvar: PollCondVar, + work_condvar: UpgradePollCondVar, /// Used to insert this thread into the process' `ready_threads` list. /// /// INVARIANT: May never be used for any other list than the `self.process.ready_threads`. @@ -433,6 +433,11 @@ impl ListItem<0> for Thread { } } +const THREAD_CONDVAR_NAME: &CStr = c"Thread::work_condvar"; +fn thread_condvar_class() -> Pin<&'static LockClassKey> { + kernel::static_lock_class!() +} + impl Thread { pub(crate) fn new(id: i32, process: Arc<Process>) -> Result<Arc<Self>> { let inner = InnerThread::new()?; @@ -443,7 +448,7 @@ pub(crate) fn new(id: i32, process: Arc<Process>) -> Result<Arc<Self>> { process, task: ARef::from(&**kernel::current!()), inner <- kernel::new_spinlock!(inner, "Thread::inner"), - work_condvar <- kernel::new_poll_condvar!("Thread::work_condvar"), + work_condvar <- UpgradePollCondVar::new(THREAD_CONDVAR_NAME, thread_condvar_class()), links <- ListLinks::new(), links_track <- AtomicTracker::new(), }), @@ -1484,10 +1489,13 @@ pub(crate) fn write_read(self: &Arc<Self>, data: UserSlice, wait: bool) -> Resul ret } - pub(crate) fn poll(&self, file: &File, table: PollTable<'_>) -> (bool, u32) { - table.register_wait(file, &self.work_condvar); + pub(crate) fn poll(&self, file: &File, table: PollTable<'_>) -> Result<(bool, u32)> { + let condvar = + self.work_condvar + .poll(&self.inner, THREAD_CONDVAR_NAME, thread_condvar_class())?; + table.register_wait(file, condvar); let mut inner = self.inner.lock(); - (inner.should_use_process_work_queue(), inner.poll()) + Ok((inner.should_use_process_work_queue(), inner.poll())) } /// Make the call to `get_work` or `get_work_local` return immediately, if any. @@ -1523,7 +1531,6 @@ pub(crate) fn notify_if_poll_ready(&self, sync: bool) { pub(crate) fn release(self: &Arc<Self>) { self.inner.lock().is_dead = true; - //self.work_condvar.clear(); self.unwind_transaction_stack(); // Cancel all pending work items. -- 2.53.0.273.g2a3d683680-goog ^ permalink raw reply related [flat|nested] 8+ messages in thread
end of thread, other threads:[~2026-03-04 23:36 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-02-13 11:29 [PATCH v2 0/2] Avoid synchronize_rcu() for every thread drop in Rust Binder Alice Ryhl 2026-02-13 11:29 ` [PATCH v2 1/2] rust: poll: make PollCondVar upgradable Alice Ryhl 2026-03-03 22:08 ` Boqun Feng 2026-03-04 7:59 ` Alice Ryhl 2026-03-04 16:29 ` Boqun Feng 2026-03-04 21:37 ` Alice Ryhl 2026-03-04 23:36 ` Boqun Feng 2026-02-13 11:29 ` [PATCH v2 2/2] rust_binder: use UpgradePollCondVar Alice Ryhl
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox