From: Boqun Feng <boqun@kernel.org>
To: Alice Ryhl <aliceryhl@google.com>
Cc: Christian Brauner <brauner@kernel.org>,
Boqun Feng <boqun.feng@gmail.com>,
"Paul E. McKenney" <paulmck@kernel.org>,
Gary Guo <gary@garyguo.net>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Carlos Llamas <cmllamas@google.com>,
linux-fsdevel@vger.kernel.org, rust-for-linux@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 1/2] rust: poll: make PollCondVar upgradable
Date: Tue, 3 Mar 2026 14:08:08 -0800 [thread overview]
Message-ID: <aadbyBmaV8zCYiog@tardis.local> (raw)
In-Reply-To: <20260213-upgrade-poll-v2-1-984a0fb184fb@google.com>
On Fri, Feb 13, 2026 at 11:29:41AM +0000, Alice Ryhl wrote:
> Rust Binder currently uses PollCondVar, but it calls synchronize_rcu()
> in the destructor, which we would like to avoid. Add a variation of
> PollCondVar, which uses kfree_rcu() instead.
>
> Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> ---
> rust/kernel/sync/poll.rs | 160 ++++++++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 159 insertions(+), 1 deletion(-)
>
> diff --git a/rust/kernel/sync/poll.rs b/rust/kernel/sync/poll.rs
> index 0ec985d560c8d3405c08dbd86e48b14c7c34484d..9555f818a24d777dd908fca849015c3490ce38d3 100644
> --- a/rust/kernel/sync/poll.rs
> +++ b/rust/kernel/sync/poll.rs
> @@ -5,12 +5,21 @@
> //! Utilities for working with `struct poll_table`.
>
> use crate::{
> + alloc::AllocError,
> bindings,
> + container_of,
> fs::File,
> prelude::*,
> + sync::atomic::{Acquire, Atomic, Relaxed, Release},
> + sync::lock::{Backend, Lock},
> sync::{CondVar, LockClassKey},
> + types::Opaque, //
> +};
> +use core::{
> + marker::{PhantomData, PhantomPinned},
> + ops::Deref,
> + ptr,
> };
> -use core::{marker::PhantomData, ops::Deref};
>
> /// Creates a [`PollCondVar`] initialiser with the given name and a newly-created lock class.
> #[macro_export]
> @@ -66,6 +75,7 @@ pub fn register_wait(&self, file: &File, cv: &PollCondVar) {
> ///
> /// [`CondVar`]: crate::sync::CondVar
> #[pin_data(PinnedDrop)]
> +#[repr(transparent)]
> pub struct PollCondVar {
> #[pin]
> inner: CondVar,
> @@ -78,6 +88,17 @@ pub fn new(name: &'static CStr, key: Pin<&'static LockClassKey>) -> impl PinInit
> inner <- CondVar::new(name, key),
> })
> }
> +
> + /// Use this `CondVar` as a `PollCondVar`.
> + ///
> + /// # Safety
> + ///
> + /// After the last use of the returned `&PollCondVar`, `__wake_up_pollfree` must be called on
> + /// the `wait_queue_head` at least one grace period before the `CondVar` is destroyed.
> + unsafe fn from_non_poll(c: &CondVar) -> &PollCondVar {
> + // SAFETY: Layout is the same. Caller ensures that PollTables are cleared in time.
> + unsafe { &*ptr::from_ref(c).cast() }
> + }
> }
>
> // Make the `CondVar` methods callable on `PollCondVar`.
> @@ -104,3 +125,140 @@ fn drop(self: Pin<&mut Self>) {
> unsafe { bindings::synchronize_rcu() };
> }
> }
> +
> +/// Wrapper around [`CondVar`] that can be upgraded to [`PollCondVar`].
> +///
> +/// By using this wrapper, you can avoid rcu for cases that don't use [`PollTable`], and in all
> +/// cases you can avoid `synchronize_rcu()`.
> +///
> +/// # Invariants
> +///
> +/// `active` either references `simple`, or a `kmalloc` allocation holding an
> +/// `UpgradePollCondVarInner`. In the latter case, the allocation remains valid until
> +/// `Self::drop()` plus one grace period.
> +#[pin_data(PinnedDrop)]
> +pub struct UpgradePollCondVar {
> + #[pin]
> + simple: CondVar,
> + active: Atomic<*const CondVar>,
> + #[pin]
> + _pin: PhantomPinned,
> +}
> +
> +#[pin_data]
> +#[repr(C)]
> +struct UpgradePollCondVarInner {
> + #[pin]
> + upgraded: CondVar,
> + #[pin]
> + rcu: Opaque<bindings::callback_head>,
> +}
> +
> +impl UpgradePollCondVar {
> + /// Constructs a new upgradable condvar initialiser.
> + pub fn new(name: &'static CStr, key: Pin<&'static LockClassKey>) -> impl PinInit<Self> {
> + pin_init!(&this in Self {
> + simple <- CondVar::new(name, key),
> + // SAFETY: `this->simple` is in-bounds. Pointer remains valid since this type is
> + // pinned.
> + active: Atomic::new(unsafe { &raw const (*this.as_ptr()).simple }),
> + _pin: PhantomPinned,
> + })
> + }
> +
> + /// Obtain a [`PollCondVar`], upgrading if necessary.
> + ///
> + /// You should use the same lock as what is passed to the `wait_*` methods. Otherwise wakeups
> + /// may be missed.
> + pub fn poll<T: ?Sized, B: Backend>(
> + &self,
> + lock: &Lock<T, B>,
> + name: &'static CStr,
> + key: Pin<&'static LockClassKey>,
> + ) -> Result<&PollCondVar, AllocError> {
> + let mut ptr = self.active.load(Acquire);
> + if ptr::eq(ptr, &self.simple) {
> + self.upgrade(lock, name, key)?;
> + ptr = self.active.load(Acquire);
> + debug_assert_ne!(ptr, ptr::from_ref(&self.simple));
> + }
> + // SAFETY: Signature ensures that last use of returned `&PollCondVar` is before drop(), and
> + // drop() calls `__wake_up_pollfree` followed by waiting a grace period before the
> + // `CondVar` is destroyed.
> + Ok(unsafe { PollCondVar::from_non_poll(&*ptr) })
> + }
> +
> + fn upgrade<T: ?Sized, B: Backend>(
> + &self,
> + lock: &Lock<T, B>,
> + name: &'static CStr,
> + key: Pin<&'static LockClassKey>,
> + ) -> Result<(), AllocError> {
> + let upgraded = KBox::pin_init(
> + pin_init!(UpgradePollCondVarInner {
> + upgraded <- CondVar::new(name, key),
> + rcu: Opaque::uninit(),
> + }),
> + GFP_KERNEL,
> + )
> + .map_err(|_| AllocError)?;
> +
> + // SAFETY: The value is treated as pinned.
> + let upgraded = KBox::into_raw(unsafe { Pin::into_inner_unchecked(upgraded) });
> +
> + let res = self.active.cmpxchg(
> + ptr::from_ref(&self.simple),
> + // SAFETY: This operation stays in-bounds of the above allocation.
> + unsafe { &raw mut (*upgraded).upgraded },
> + Release,
> + );
> +
> + if res.is_err() {
> + // Already upgraded, so still succeess.
> + // SAFETY: The cmpxchg failed, so take back ownership of the box.
> + drop(unsafe { KBox::from_raw(upgraded) });
> + return Ok(());
> + }
> +
> + // If a normal waiter registers in parallel with us, then either:
> + // * We took the lock first. In that case, the waiter sees the above cmpxchg.
> + // * They took the lock first. In that case, we wake them up below.
> + drop(lock.lock());
> + self.simple.notify_all();
Hmm.. what if the waiter gets its `&CondVar` before `upgrade()` and use
that directly?
<waiter> <in upgrade()>
let poll_cv: &UpgradePollCondVar = ...;
let cv = poll_cv.deref();
cmpxchg();
drop(lock.lock());
self.simple.notify_all();
let mut guard = lock.lock();
cv.wait(&mut guard);
we still miss the wake-up, right?
It's creative, but I particularly hate we use an empty lock critical
section to synchronize ;-)
Do you think the complexity of a dynamic upgrading is worthwhile, or we
should just use the box-allocated PollCondVar unconditionally?
I think if the current users won't benefit from the dynamic upgrading
then we can avoid the complexity. We can always add it back later.
Thoughts?
Regards,
Boqun
> +
> + Ok(())
> + }
> +}
> +
> +// Make the `CondVar` methods callable on `UpgradePollCondVar`.
> +impl Deref for UpgradePollCondVar {
> + type Target = CondVar;
> +
> + fn deref(&self) -> &CondVar {
> + // SAFETY: By the type invariants, this is either `&self.simple` or references an
> + // allocation that lives until `UpgradePollCondVar::drop`.
> + unsafe { &*self.active.load(Acquire) }
> + }
> +}
> +
> +#[pinned_drop]
> +impl PinnedDrop for UpgradePollCondVar {
> + #[inline]
> + fn drop(self: Pin<&mut Self>) {
> + // ORDERING: All calls to upgrade happens-before Drop, so no synchronization is required.
> + let ptr = self.active.load(Relaxed);
> + if ptr::eq(ptr, &self.simple) {
> + return;
> + }
> + // SAFETY: When the pointer is not &self.active, it is an `UpgradePollCondVarInner`.
> + let ptr = unsafe { container_of!(ptr.cast_mut(), UpgradePollCondVarInner, upgraded) };
> + // SAFETY: The pointer points at a valid `wait_queue_head`.
> + unsafe { bindings::__wake_up_pollfree((*ptr).upgraded.wait_queue_head.get()) };
> + // This skips drop of `CondVar`, but that's ok because we reimplemented its drop here.
> + //
> + // SAFETY: `__wake_up_pollfree` ensures that all registered PollTable instances are gone in
> + // one grace period, and this is the destructor so no new PollTable instances can be
> + // registered. Thus, it's safety to rcu free the `UpgradePollCondVarInner`.
> + unsafe { bindings::kvfree_call_rcu((*ptr).rcu.get(), ptr.cast::<c_void>()) };
> + }
> +}
>
> --
> 2.53.0.273.g2a3d683680-goog
>
next prev parent reply other threads:[~2026-03-03 22:08 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-13 11:29 [PATCH v2 0/2] Avoid synchronize_rcu() for every thread drop in Rust Binder Alice Ryhl
2026-02-13 11:29 ` [PATCH v2 1/2] rust: poll: make PollCondVar upgradable Alice Ryhl
2026-03-03 22:08 ` Boqun Feng [this message]
2026-03-04 7:59 ` Alice Ryhl
2026-03-04 16:29 ` Boqun Feng
2026-03-04 21:37 ` Alice Ryhl
2026-03-04 23:36 ` Boqun Feng
2026-02-13 11:29 ` [PATCH v2 2/2] rust_binder: use UpgradePollCondVar Alice Ryhl
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aadbyBmaV8zCYiog@tardis.local \
--to=boqun@kernel.org \
--cc=aliceryhl@google.com \
--cc=boqun.feng@gmail.com \
--cc=brauner@kernel.org \
--cc=cmllamas@google.com \
--cc=gary@garyguo.net \
--cc=gregkh@linuxfoundation.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=paulmck@kernel.org \
--cc=rust-for-linux@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox