[PATCH v2 0/5] rust: Add Per-CPU Variable API

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/5] rust: Add Per-CPU Variable API
@ 2025-07-12 21:31 Mitchell Levy
  2025-07-12 21:31 ` [PATCH v2 1/5] rust: percpu: introduce a rust API for per-CPU variables Mitchell Levy
                   ` (4 more replies)
  0 siblings, 5 replies; 20+ messages in thread
From: Mitchell Levy @ 2025-07-12 21:31 UTC (permalink / raw)
  To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
	Andrew Morton, Dennis Zhou, Tejun Heo, Christoph Lameter,
	Danilo Krummrich, Benno Lossin
  Cc: linux-kernel, rust-for-linux, linux-mm, Mitchell Levy

This series adds an API for declaring an using per-CPU variables from
Rust, and it also adds support for Rust access to C per-CPU variables
(subject to some soundness requirements). It also adds a small test
module, lib/percpu_test_rust.rs, in the vein of lib/percpu_test.c.

---
Signed-off-by: Mitchell Levy <levymitchell0@gmail.com>

---
Changes in v2:
- Fix kernel test robot issues
- Fix documentation error
- Require `T: Zeroable` in the dynamic case
- Link to v1: https://lore.kernel.org/r/20250624-rust-percpu-v1-0-9c59b07d2a9c@gmail.com

Changes in v1:
- Use wrapping_add in `PerCpuPtr::get_ref` since overflow is expected.
- Separate the dynamic and static cases, with shared logic in a
  `PerCpuPtr` type.
- Implement pin-hole optimizations for numeric types
- Don't assume `GFP_KERNEL` when allocating the `Arc` in the dynamic
  case.
- Link to RFC v2: https://lore.kernel.org/r/20250414-rust-percpu-v2-0-5ea0d0de13a5@gmail.com

Changes in RFC v2:
- Renamed PerCpuVariable to StaticPerCpuSymbol to be more descriptive
- Support dynamically allocated per-CPU variables via the
  PerCpuAllocation type. Rework statically allocated variables to use
  this new type.
- Make use of a token/closure-based API via the PerCpu and PerCpuToken
  types, rather than an API based on PerCpuRef that automatically
  Deref(Mut)'s into a &(mut) T.
- Rebased
- Link to RFC: https://lore.kernel.org/r/20241219-rust-percpu-v1-0-209117e822b1@gmail.com

---
Mitchell Levy (5):
      rust: percpu: introduce a rust API for per-CPU variables
      rust: rust-analyzer: add lib to dirs searched for crates
      rust: percpu: add a rust per-CPU variable test
      rust: percpu: Add pin-hole optimizations for numerics
      rust: percpu: cache per-CPU pointers in the dynamic case

 lib/Kconfig.debug                 |   9 ++
 lib/Makefile                      |   1 +
 lib/percpu_test_rust.rs           | 156 +++++++++++++++++++
 rust/helpers/helpers.c            |   2 +
 rust/helpers/percpu.c             |  20 +++
 rust/helpers/preempt.c            |  14 ++
 rust/kernel/lib.rs                |   3 +
 rust/kernel/percpu.rs             | 315 ++++++++++++++++++++++++++++++++++++++
 rust/kernel/percpu/cpu_guard.rs   |  35 +++++
 rust/kernel/percpu/numeric.rs     | 117 ++++++++++++++
 scripts/generate_rust_analyzer.py |   2 +-
 11 files changed, 673 insertions(+), 1 deletion(-)
---
base-commit: 19272b37aa4f83ca52bdf9c16d5d81bdd1354494
change-id: 20240813-rust-percpu-ea2f54b5da33

Best regards,
-- 
Mitchell Levy <levymitchell0@gmail.com>



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 1/5] rust: percpu: introduce a rust API for per-CPU variables
  2025-07-12 21:31 [PATCH v2 0/5] rust: Add Per-CPU Variable API Mitchell Levy
@ 2025-07-12 21:31 ` Mitchell Levy
  2025-07-12 21:31 ` [PATCH v2 2/5] rust: rust-analyzer: add lib to dirs searched for crates Mitchell Levy
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 20+ messages in thread
From: Mitchell Levy @ 2025-07-12 21:31 UTC (permalink / raw)
  To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
	Andrew Morton, Dennis Zhou, Tejun Heo, Christoph Lameter,
	Danilo Krummrich, Benno Lossin
  Cc: linux-kernel, rust-for-linux, linux-mm, Mitchell Levy

Add a `CpuGuard` type that disables preemption for its lifetime. Add a
`PerCpuAllocation` type used to track dynamic allocations. Add a
`define_per_cpu!` macro to create static per-CPU allocations. Add
`DynamicPerCpu` and `StaticPerCpu` to provide a high-level API. Add a
`PerCpu` trait to unify the dynamic and static cases.

Co-developed-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Mitchell Levy <levymitchell0@gmail.com>
---
 rust/helpers/helpers.c          |   2 +
 rust/helpers/percpu.c           |   9 ++
 rust/helpers/preempt.c          |  14 ++
 rust/kernel/lib.rs              |   3 +
 rust/kernel/percpu.rs           | 308 ++++++++++++++++++++++++++++++++++++++++
 rust/kernel/percpu/cpu_guard.rs |  35 +++++
 6 files changed, 371 insertions(+)

diff --git a/rust/helpers/helpers.c b/rust/helpers/helpers.c
index 0f1b5d115985..d56bbe6334d3 100644
--- a/rust/helpers/helpers.c
+++ b/rust/helpers/helpers.c
@@ -29,7 +29,9 @@
 #include "page.c"
 #include "platform.c"
 #include "pci.c"
+#include "percpu.c"
 #include "pid_namespace.c"
+#include "preempt.c"
 #include "rbtree.c"
 #include "rcu.c"
 #include "refcount.c"
diff --git a/rust/helpers/percpu.c b/rust/helpers/percpu.c
new file mode 100644
index 000000000000..a091389f730f
--- /dev/null
+++ b/rust/helpers/percpu.c
@@ -0,0 +1,9 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/percpu.h>
+
+void __percpu *rust_helper_alloc_percpu(size_t sz, size_t align)
+{
+	return __alloc_percpu(sz, align);
+}
+
diff --git a/rust/helpers/preempt.c b/rust/helpers/preempt.c
new file mode 100644
index 000000000000..2c7529528ddd
--- /dev/null
+++ b/rust/helpers/preempt.c
@@ -0,0 +1,14 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/preempt.h>
+
+void rust_helper_preempt_disable(void)
+{
+	preempt_disable();
+}
+
+void rust_helper_preempt_enable(void)
+{
+	preempt_enable();
+}
+
diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
index 6b4774b2b1c3..733f9ff8b888 100644
--- a/rust/kernel/lib.rs
+++ b/rust/kernel/lib.rs
@@ -95,6 +95,9 @@
 pub mod page;
 #[cfg(CONFIG_PCI)]
 pub mod pci;
+// Only x86_64 is supported by percpu for now
+#[cfg(CONFIG_X86_64)]
+pub mod percpu;
 pub mod pid_namespace;
 pub mod platform;
 pub mod prelude;
diff --git a/rust/kernel/percpu.rs b/rust/kernel/percpu.rs
new file mode 100644
index 000000000000..7dfceb6aefd7
--- /dev/null
+++ b/rust/kernel/percpu.rs
@@ -0,0 +1,308 @@
+// SPDX-License-Identifier: GPL-2.0
+//! This module contains abstractions for creating and using per-CPU variables from Rust.
+//! See the define_per_cpu! macro and the DynamicPerCpu<T> type, as well as the PerCpu<T> trait.
+pub mod cpu_guard;
+
+use bindings::{alloc_percpu, free_percpu};
+
+use crate::alloc::Flags;
+use crate::percpu::cpu_guard::CpuGuard;
+use crate::prelude::*;
+use crate::sync::Arc;
+
+use core::arch::asm;
+use core::mem::{align_of, size_of};
+
+use ffi::c_void;
+
+/// A per-CPU pointer; that is, an offset into the per-CPU area. Note that this type is NOT a smart
+/// pointer, it does not manage the allocation.
+pub struct PerCpuPtr<T>(*mut T);
+
+/// Represents a dynamic allocation of a per-CPU variable via alloc_percpu. Calls free_percpu when
+/// dropped.
+pub struct PerCpuAllocation<T>(PerCpuPtr<T>);
+
+/// Holds a dynamically-allocated per-CPU variable.
+pub struct DynamicPerCpu<T> {
+    alloc: Arc<PerCpuAllocation<T>>,
+}
+
+/// Holds a statically-allocated per-CPU variable.
+pub struct StaticPerCpu<T>(PerCpuPtr<T>);
+
+/// Represents exclusive access to the memory location pointed at by a particular PerCpu<T>.
+pub struct PerCpuToken<'a, T> {
+    _guard: CpuGuard,
+    ptr: &'a PerCpuPtr<T>,
+}
+
+/// A wrapper used for declaring static per-CPU variables. These symbols are "virtual" in that the
+/// linker uses them to generate offsets into each CPU's per-CPU area, but shouldn't be read
+/// from/written to directly. The fact that the statics are immutable prevents them being written
+/// to (generally), this struct having _val be non-public prevents reading from them.
+///
+/// The end-user of the per-CPU API should make use of the define_per_cpu! macro instead of
+/// declaring variables of this type directly.
+#[repr(transparent)]
+pub struct StaticPerCpuSymbol<T> {
+    _val: T, // generate a correctly sized type
+}
+
+impl<T> PerCpuPtr<T> {
+    /// Makes a new PerCpuPtr from a raw per-CPU pointer.
+    ///
+    /// # Safety
+    /// `ptr` must be a valid per-CPU pointer.
+    pub unsafe fn new(ptr: *mut T) -> Self {
+        Self(ptr)
+    }
+
+    /// Get a `&mut T` to the per-CPU variable represented by `&self`
+    ///
+    /// # Safety
+    /// The returned `&mut T` must follow Rust's aliasing rules. That is, no other `&(mut) T` may
+    /// exist that points to the same location in memory. In practice, this means that `get_ref`
+    /// must not be called on another `PerCpuPtr<T>` that is a copy/clone of `&self` for as long as
+    /// the returned reference lives.
+    ///
+    /// CPU preemption must be disabled before calling this function and for the lifetime of the
+    /// returned reference. Otherwise, the returned &mut T might end up being a reference to a
+    /// different CPU's per-CPU area, causing the potential for a data race.
+    #[allow(clippy::mut_from_ref)] // Safety requirements prevent aliasing issues
+    pub unsafe fn get_ref(&self) -> &mut T {
+        let this_cpu_off_pcpu = core::ptr::addr_of!(this_cpu_off);
+        let mut this_cpu_area: *mut c_void;
+        // SAFETY: gs + this_cpu_off_pcpu is guaranteed to be a valid pointer because `gs` points
+        // to the per-CPU area and this_cpu_off_pcpu is a valid per-CPU allocation.
+        unsafe {
+            asm!(
+                "mov {out}, gs:[{off_val}]",
+                off_val = in(reg) this_cpu_off_pcpu,
+                out = out(reg) this_cpu_area,
+            )
+        };
+        // SAFETY: this_cpu_area + self.0 is guaranteed to be a valid pointer by the per-CPU
+        // subsystem and the invariant that self.0 is a valid offset into the per-CPU area.
+        //
+        // We know no-one else has a reference to the underlying pcpu variable because of the
+        // safety requirements of this function.
+        unsafe { &mut *((this_cpu_area).wrapping_add(self.0 as usize) as *mut T) }
+    }
+}
+
+impl<T> Clone for PerCpuPtr<T> {
+    fn clone(&self) -> Self {
+        *self
+    }
+}
+
+/// PerCpuPtr is just a pointer, so it's safe to copy.
+impl<T> Copy for PerCpuPtr<T> {}
+
+impl<T: Zeroable> PerCpuAllocation<T> {
+    /// Dynamically allocates a space in the per-CPU area suitably sized and aligned to hold a `T`.
+    ///
+    /// Returns `None` under the same circumstances the C function `alloc_percpu` returns `NULL`.
+    pub fn new() -> Option<PerCpuAllocation<T>> {
+        // SAFETY: No preconditions to call alloc_percpu
+        let ptr: *mut T = unsafe { alloc_percpu(size_of::<T>(), align_of::<T>()) } as *mut T;
+        if ptr.is_null() {
+            return None;
+        }
+
+        Some(Self(PerCpuPtr(ptr)))
+    }
+}
+
+impl<T> Drop for PerCpuAllocation<T> {
+    fn drop(&mut self) {
+        // SAFETY: self.0.0 was returned by alloc_percpu, and so was a valid pointer into
+        // the percpu area, and has remained valid by the invariants of PerCpuAllocation<T>.
+        unsafe { free_percpu(self.0 .0 as *mut c_void) }
+    }
+}
+
+/// A trait representing a per-CPU variable. This is implemented for both `StaticPerCpu<T>` and
+/// `DynamicPerCpu<T>`. The main usage of this trait is to call `get` to get a `PerCpuToken` that
+/// can be used to access the underlying per-CPU variable. See `PerCpuToken::with`.
+///
+/// # Safety
+/// The returned value from `ptr` must be valid for the lifetime of `&mut self`.
+pub unsafe trait PerCpu<T> {
+    /// Gets a `PerCpuPtr<T>` to the per-CPU variable represented by `&mut self`
+    ///
+    /// # Safety
+    /// `self` may be doing all sorts of things to track when the underlying per-CPU variable can
+    /// be deallocated. You almost certainly shouldn't be calling this function directly (it's
+    /// essentially an implementation detail of the trait), and you certainly shouldn't be making
+    /// copies of the returned `PerCpuPtr<T>` that may outlive `&mut self`.
+    ///
+    /// Implementers of this trait must ensure that the returned `PerCpuPtr<T>` is valid for the
+    /// lifetime of `&mut self`.
+    unsafe fn ptr(&mut self) -> &PerCpuPtr<T>;
+
+    /// Produces a token, asserting that the holder has exclusive access to the underlying memory
+    /// pointed to by `self`
+    ///
+    /// # Safety
+    /// `func` (or its callees that execute on the same CPU) may not, for any `x: PerCpu<T>` that
+    /// is a `clone` of `&mut self` (or, for a statically allocated variable, a `StaticPerCpu<T>`
+    /// that came from the same `define_per_cpu!`):
+    /// - call `x.get()`
+    /// - make use of the value returned by `x.ptr()`
+    ///
+    /// `func` and its callees must not access or modify the memory associated with `&mut self`'s
+    /// allocation in the per-CPU area, except via (reborrows of) the reference passed to `func`.
+    ///
+    /// The underlying per-CPU variable cannot ever be mutated from an interrupt context, unless
+    /// irqs are disabled for the lifetime of the returned `PerCpuToken`.
+    unsafe fn get(&mut self, guard: CpuGuard) -> PerCpuToken<'_, T> {
+        PerCpuToken {
+            _guard: guard,
+            // SAFETY: The lifetime of the returned `PerCpuToken<'_, T>` is bounded by the lifetime
+            // of `&mut self`.
+            ptr: unsafe { self.ptr() },
+        }
+    }
+}
+
+impl<T> StaticPerCpu<T> {
+    /// Creates a new PerCpu<T> pointing to the statically allocated variable at `ptr`. End-users
+    /// should probably be using the `unsafe_get_per_cpu!` macro instead of calling this function.
+    ///
+    /// # Safety
+    /// `ptr` must be a valid pointer to a per-CPU variable. This means that it must be a valid
+    /// offset into the per-CPU area, and that the per-CPU area must be suitably sized and aligned
+    /// to hold a `T`.
+    pub unsafe fn new(ptr: *mut T) -> Self {
+        Self(PerCpuPtr(ptr))
+    }
+}
+
+// SAFETY: The `PerCpuPtr<T>` returned by `ptr` is valid for the lifetime of `self` (and in fact,
+// forever).
+unsafe impl<T> PerCpu<T> for StaticPerCpu<T> {
+    unsafe fn ptr(&mut self) -> &PerCpuPtr<T> {
+        &self.0
+    }
+}
+
+impl<T> Clone for StaticPerCpu<T> {
+    fn clone(&self) -> Self {
+        Self(self.0)
+    }
+}
+
+impl<T: Zeroable> DynamicPerCpu<T> {
+    /// Allocates a new per-CPU variable
+    ///
+    /// # Arguments
+    /// * `flags` - Flags used to allocate an `Arc` that keeps track of the underlying
+    ///   `PerCpuAllocation`.
+    pub fn new(flags: Flags) -> Option<Self> {
+        let alloc: PerCpuAllocation<T> = PerCpuAllocation::new()?;
+
+        let arc = Arc::new(alloc, flags).ok()?;
+
+        Some(Self { alloc: arc })
+    }
+}
+
+impl<T> DynamicPerCpu<T> {
+    /// Wraps a `PerCpuAllocation<T>` in a `PerCpu<T>`
+    ///
+    /// # Arguments
+    /// * `alloc` - The allocation to use
+    /// * `flags` - The flags used to allocate an `Arc` that keeps track of the `PerCpuAllocation`.
+    pub fn new_from_allocation(alloc: PerCpuAllocation<T>, flags: Flags) -> Option<Self> {
+        let arc = Arc::new(alloc, flags).ok()?;
+        Some(Self { alloc: arc })
+    }
+}
+
+// SAFETY: The `PerCpuPtr<T>` returned by `ptr` is valid for the lifetime of `self` because we
+// don't deallocate the underlying `PerCpuAllocation` until `self` is dropped.
+unsafe impl<T> PerCpu<T> for DynamicPerCpu<T> {
+    unsafe fn ptr(&mut self) -> &PerCpuPtr<T> {
+        &self.alloc.0
+    }
+}
+
+impl<T> Clone for DynamicPerCpu<T> {
+    fn clone(&self) -> Self {
+        Self {
+            alloc: self.alloc.clone(),
+        }
+    }
+}
+
+impl<T> PerCpuToken<'_, T> {
+    /// Immediately invokes `func` with a `&mut T` that points at the underlying per-CPU variable
+    /// that `&mut self` represents.
+    pub fn with<U>(&mut self, func: U)
+    where
+        U: FnOnce(&mut T),
+    {
+        // SAFETY: The existence of a PerCpuToken means that the requirements for get_ref are
+        // satisfied.
+        func(unsafe { self.ptr.get_ref() });
+    }
+}
+
+/// define_per_cpu! is analogous to the C DEFINE_PER_CPU macro in that it lets you create a
+/// statically allocated per-CPU variable.
+///
+/// # Example
+/// ```
+/// use kernel::define_per_cpu;
+/// use kernel::percpu::StaticPerCpuSymbol;
+///
+/// define_per_cpu!(pub MY_PERCPU: u64 = 0);
+/// ```
+#[macro_export]
+macro_rules! define_per_cpu {
+    ($vis:vis $id:ident: $ty:ty = $expr:expr) => {
+        $crate::macros::paste! {
+            // Expand $expr outside of the unsafe block to avoid silently allowing unsafe code to be
+            // used without a user-facing unsafe block
+            static [<__INIT_ $id>]: $ty = $expr;
+
+            // SAFETY: StaticPerCpuSymbol<T> is #[repr(transparent)], so we can freely convert from T
+            #[link_section = ".data..percpu"]
+            $vis static $id: StaticPerCpuSymbol<$ty> = unsafe {
+                core::mem::transmute::<$ty, StaticPerCpuSymbol<$ty>>([<__INIT_ $id>])
+            };
+        }
+    };
+}
+
+/// Gets a `StaticPerCpu<T>` from a symbol declared with `define_per_cpu!` or
+/// `declare_extern_per_cpu!`.
+///
+/// # Arguments
+/// * `ident` - The identifier declared
+///
+/// # Safety
+/// `$id` must be declared with either `define_per_cpu!` or `declare_extern_per_cpu!`, and the
+/// returned value must be stored in a `StaticPerCpu<T>` where `T` matches the declared type of
+/// `$id`.
+#[macro_export]
+macro_rules! unsafe_get_per_cpu {
+    ($id:ident) => {{
+        $crate::percpu::StaticPerCpu::new((&$id) as *const _ as *mut _)
+    }};
+}
+
+/// Declares a StaticPerCpuSymbol corresponding to a per-CPU variable defined in C. Be sure to read
+/// the safety requirements of `PerCpu::get`.
+#[macro_export]
+macro_rules! declare_extern_per_cpu {
+    ($id:ident: $ty:ty) => {
+        extern "C" {
+            static $id: StaticPerCpuSymbol<$ty>;
+        }
+    };
+}
+
+declare_extern_per_cpu!(this_cpu_off: u64);
diff --git a/rust/kernel/percpu/cpu_guard.rs b/rust/kernel/percpu/cpu_guard.rs
new file mode 100644
index 000000000000..14c04b12e7f0
--- /dev/null
+++ b/rust/kernel/percpu/cpu_guard.rs
@@ -0,0 +1,35 @@
+// SPDX-License-Identifier: GPL-2.0
+//! Contains abstractions for disabling CPU preemption. See `CpuGuard`.
+
+/// A RAII guard for bindings::preempt_disable and bindings::preempt_enable. Guarantees preemption
+/// is disabled for as long as this object exists.
+pub struct CpuGuard {
+    // Don't make one without using new()
+    _phantom: (),
+}
+
+impl CpuGuard {
+    /// Create a new CpuGuard. Disables preemption for its lifetime.
+    pub fn new() -> Self {
+        // SAFETY: There are no preconditions required to call preempt_disable
+        unsafe {
+            bindings::preempt_disable();
+        }
+        CpuGuard { _phantom: () }
+    }
+}
+
+impl Default for CpuGuard {
+    fn default() -> Self {
+        Self::new()
+    }
+}
+
+impl Drop for CpuGuard {
+    fn drop(&mut self) {
+        // SAFETY: There are no preconditions required to call preempt_enable
+        unsafe {
+            bindings::preempt_enable();
+        }
+    }
+}

-- 
2.34.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 2/5] rust: rust-analyzer: add lib to dirs searched for crates
  2025-07-12 21:31 [PATCH v2 0/5] rust: Add Per-CPU Variable API Mitchell Levy
  2025-07-12 21:31 ` [PATCH v2 1/5] rust: percpu: introduce a rust API for per-CPU variables Mitchell Levy
@ 2025-07-12 21:31 ` Mitchell Levy
  2025-07-12 21:31 ` [PATCH v2 3/5] rust: percpu: add a rust per-CPU variable test Mitchell Levy
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 20+ messages in thread
From: Mitchell Levy @ 2025-07-12 21:31 UTC (permalink / raw)
  To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
	Andrew Morton, Dennis Zhou, Tejun Heo, Christoph Lameter,
	Danilo Krummrich, Benno Lossin
  Cc: linux-kernel, rust-for-linux, linux-mm, Mitchell Levy

When generating rust-project.json, also include crates in lib/

Signed-off-by: Mitchell Levy <levymitchell0@gmail.com>
---
 scripts/generate_rust_analyzer.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/generate_rust_analyzer.py b/scripts/generate_rust_analyzer.py
index 7c3ea2b55041..08e14ae9c1a0 100755
--- a/scripts/generate_rust_analyzer.py
+++ b/scripts/generate_rust_analyzer.py
@@ -152,7 +152,7 @@ def generate_crates(srctree, objtree, sysroot_src, external_src, cfgs, core_edit
     # Then, the rest outside of `rust/`.
     #
     # We explicitly mention the top-level folders we want to cover.
-    extra_dirs = map(lambda dir: srctree / dir, ("samples", "drivers"))
+    extra_dirs = map(lambda dir: srctree / dir, ("samples", "drivers", "lib"))
     if external_src is not None:
         extra_dirs = [external_src]
     for folder in extra_dirs:

-- 
2.34.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 3/5] rust: percpu: add a rust per-CPU variable test
  2025-07-12 21:31 [PATCH v2 0/5] rust: Add Per-CPU Variable API Mitchell Levy
  2025-07-12 21:31 ` [PATCH v2 1/5] rust: percpu: introduce a rust API for per-CPU variables Mitchell Levy
  2025-07-12 21:31 ` [PATCH v2 2/5] rust: rust-analyzer: add lib to dirs searched for crates Mitchell Levy
@ 2025-07-12 21:31 ` Mitchell Levy
  2025-07-13  9:30   ` Benno Lossin
  2025-07-12 21:31 ` [PATCH v2 4/5] rust: percpu: Add pin-hole optimizations for numerics Mitchell Levy
  2025-07-12 21:31 ` [PATCH v2 5/5] rust: percpu: cache per-CPU pointers in the dynamic case Mitchell Levy
  4 siblings, 1 reply; 20+ messages in thread
From: Mitchell Levy @ 2025-07-12 21:31 UTC (permalink / raw)
  To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
	Andrew Morton, Dennis Zhou, Tejun Heo, Christoph Lameter,
	Danilo Krummrich, Benno Lossin
  Cc: linux-kernel, rust-for-linux, linux-mm, Mitchell Levy

Add a short exercise for Rust's per-CPU variable API, modelled after
lib/percpu_test.c

Signed-off-by: Mitchell Levy <levymitchell0@gmail.com>
---
 lib/Kconfig.debug       |   9 ++++
 lib/Makefile            |   1 +
 lib/percpu_test_rust.rs | 120 ++++++++++++++++++++++++++++++++++++++++++++++++
 rust/helpers/percpu.c   |  11 +++++
 4 files changed, 141 insertions(+)

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index ebe33181b6e6..959ce156c601 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -2418,6 +2418,15 @@ config PERCPU_TEST
 
 	  If unsure, say N.
 
+config PERCPU_TEST_RUST
+	tristate "Rust per cpu operations test"
+	depends on m && DEBUG_KERNEL && RUST
+	help
+	  Enable this option to build a test module which validates Rust per-cpu
+	  operations.
+
+	  If unsure, say N.
+
 config ATOMIC64_SELFTEST
 	tristate "Perform an atomic64_t self-test"
 	help
diff --git a/lib/Makefile b/lib/Makefile
index c38582f187dd..ab19106cc22c 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -288,6 +288,7 @@ obj-$(CONFIG_RBTREE_TEST) += rbtree_test.o
 obj-$(CONFIG_INTERVAL_TREE_TEST) += interval_tree_test.o
 
 obj-$(CONFIG_PERCPU_TEST) += percpu_test.o
+obj-$(CONFIG_PERCPU_TEST_RUST) += percpu_test_rust.o
 
 obj-$(CONFIG_ASN1) += asn1_decoder.o
 obj-$(CONFIG_ASN1_ENCODER) += asn1_encoder.o
diff --git a/lib/percpu_test_rust.rs b/lib/percpu_test_rust.rs
new file mode 100644
index 000000000000..a9652e6ece08
--- /dev/null
+++ b/lib/percpu_test_rust.rs
@@ -0,0 +1,120 @@
+// SPDX-License-Identifier: GPL-2.0
+//! A simple self test for the rust per-CPU API.
+
+use core::ffi::c_void;
+
+use kernel::{
+    bindings::{on_each_cpu, smp_processor_id},
+    define_per_cpu,
+    percpu::{cpu_guard::*, *},
+    pr_info,
+    prelude::*,
+    unsafe_get_per_cpu,
+};
+
+module! {
+    type: PerCpuTestModule,
+    name: "percpu_test_rust",
+    author: "Mitchell Levy",
+    description: "Test code to exercise the Rust Per CPU variable API",
+    license: "GPL v2",
+}
+
+struct PerCpuTestModule;
+
+define_per_cpu!(PERCPU: i64 = 0);
+define_per_cpu!(UPERCPU: u64 = 0);
+
+impl kernel::Module for PerCpuTestModule {
+    fn init(_module: &'static ThisModule) -> Result<Self, Error> {
+        pr_info!("rust percpu test start\n");
+
+        let mut native: i64 = 0;
+        // SAFETY: PERCPU is properly defined
+        let mut pcpu: StaticPerCpu<i64> = unsafe { unsafe_get_per_cpu!(PERCPU) };
+        // SAFETY: We only have one PerCpu that points at PERCPU
+        unsafe { pcpu.get(CpuGuard::new()) }.with(|val: &mut i64| {
+            pr_info!("The contents of pcpu are {}\n", val);
+
+            native += -1;
+            *val += -1;
+            pr_info!("Native: {}, *pcpu: {}\n", native, val);
+            assert!(native == *val && native == -1);
+
+            native += 1;
+            *val += 1;
+            pr_info!("Native: {}, *pcpu: {}\n", native, val);
+            assert!(native == *val && native == 0);
+        });
+
+        let mut unative: u64 = 0;
+        // SAFETY: UPERCPU is properly defined
+        let mut upcpu: StaticPerCpu<u64> = unsafe { unsafe_get_per_cpu!(UPERCPU) };
+
+        // SAFETY: We only have one PerCpu pointing at UPERCPU
+        unsafe { upcpu.get(CpuGuard::new()) }.with(|val: &mut u64| {
+            unative += 1;
+            *val += 1;
+            pr_info!("Unative: {}, *upcpu: {}\n", unative, val);
+            assert!(unative == *val && unative == 1);
+
+            unative = unative.wrapping_add((-1i64) as u64);
+            *val = val.wrapping_add((-1i64) as u64);
+            pr_info!("Unative: {}, *upcpu: {}\n", unative, val);
+            assert!(unative == *val && unative == 0);
+
+            unative = unative.wrapping_add((-1i64) as u64);
+            *val = val.wrapping_add((-1i64) as u64);
+            pr_info!("Unative: {}, *upcpu: {}\n", unative, val);
+            assert!(unative == *val && unative == (-1i64) as u64);
+
+            unative = 0;
+            *val = 0;
+
+            unative = unative.wrapping_sub(1);
+            *val = val.wrapping_sub(1);
+            pr_info!("Unative: {}, *upcpu: {}\n", unative, val);
+            assert!(unative == *val && unative == (-1i64) as u64);
+            assert!(unative == *val && unative == u64::MAX);
+        });
+
+        pr_info!("rust static percpu test done\n");
+
+        pr_info!("rust dynamic percpu test start\n");
+        let mut test: DynamicPerCpu<u64> = DynamicPerCpu::new(GFP_KERNEL).unwrap();
+
+        // SAFETY: No prerequisites for on_each_cpu.
+        unsafe {
+            on_each_cpu(Some(inc_percpu), (&raw mut test) as *mut c_void, 0);
+            on_each_cpu(Some(inc_percpu), (&raw mut test) as *mut c_void, 0);
+            on_each_cpu(Some(inc_percpu), (&raw mut test) as *mut c_void, 0);
+            on_each_cpu(Some(inc_percpu), (&raw mut test) as *mut c_void, 1);
+            on_each_cpu(Some(check_percpu), (&raw mut test) as *mut c_void, 1);
+        }
+
+        pr_info!("rust dynamic percpu test done\n");
+
+        // Return Err to unload the module
+        Result::Err(EINVAL)
+    }
+}
+
+extern "C" fn inc_percpu(info: *mut c_void) {
+    // SAFETY: We know that info is a void *const DynamicPerCpu<u64> and DynamicPerCpu<u64> is Send.
+    let mut pcpu = unsafe { (*(info as *const DynamicPerCpu<u64>)).clone() };
+    // SAFETY: smp_processor_id has no preconditions
+    pr_info!("Incrementing on {}\n", unsafe { smp_processor_id() });
+
+    // SAFETY: We don't have multiple clones of pcpu in scope
+    unsafe { pcpu.get(CpuGuard::new()) }.with(|val: &mut u64| *val += 1);
+}
+
+extern "C" fn check_percpu(info: *mut c_void) {
+    // SAFETY: We know that info is a void *const DynamicPerCpu<u64> and DynamicPerCpu<u64> is Send.
+    let mut pcpu = unsafe { (*(info as *const DynamicPerCpu<u64>)).clone() };
+    // SAFETY: smp_processor_id has no preconditions
+    pr_info!("Asserting on {}\n", unsafe { smp_processor_id() });
+
+    // SAFETY: We don't have multiple clones of pcpu in scope
+    unsafe { pcpu.get(CpuGuard::new()) }.with(|val: &mut u64| assert!(*val == 4));
+}
diff --git a/rust/helpers/percpu.c b/rust/helpers/percpu.c
index a091389f730f..0e9b2fed3ebd 100644
--- a/rust/helpers/percpu.c
+++ b/rust/helpers/percpu.c
@@ -1,9 +1,20 @@
 // SPDX-License-Identifier: GPL-2.0
 
 #include <linux/percpu.h>
+#include <linux/smp.h>
 
 void __percpu *rust_helper_alloc_percpu(size_t sz, size_t align)
 {
 	return __alloc_percpu(sz, align);
 }
 
+void rust_helper_on_each_cpu(smp_call_func_t func, void *info, int wait)
+{
+	on_each_cpu(func, info, wait);
+}
+
+int rust_helper_smp_processor_id(void)
+{
+	return smp_processor_id();
+}
+

-- 
2.34.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/5] rust: percpu: add a rust per-CPU variable test
  2025-07-12 21:31 ` [PATCH v2 3/5] rust: percpu: add a rust per-CPU variable test Mitchell Levy
@ 2025-07-13  9:30   ` Benno Lossin
  2025-07-15 10:31     ` Mitchell Levy
  0 siblings, 1 reply; 20+ messages in thread
From: Benno Lossin @ 2025-07-13  9:30 UTC (permalink / raw)
  To: Mitchell Levy, Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
	Andrew Morton, Dennis Zhou, Tejun Heo, Christoph Lameter,
	Danilo Krummrich
  Cc: linux-kernel, rust-for-linux, linux-mm

On Sat Jul 12, 2025 at 11:31 PM CEST, Mitchell Levy wrote:
> Add a short exercise for Rust's per-CPU variable API, modelled after
> lib/percpu_test.c
>
> Signed-off-by: Mitchell Levy <levymitchell0@gmail.com>
> ---
>  lib/Kconfig.debug       |   9 ++++
>  lib/Makefile            |   1 +
>  lib/percpu_test_rust.rs | 120 ++++++++++++++++++++++++++++++++++++++++++++++++

I don't know if this is the correct place, the code looks much more like
a sample, so why not place it there instead?

>  rust/helpers/percpu.c   |  11 +++++
>  4 files changed, 141 insertions(+)
> diff --git a/lib/percpu_test_rust.rs b/lib/percpu_test_rust.rs
> new file mode 100644
> index 000000000000..a9652e6ece08
> --- /dev/null
> +++ b/lib/percpu_test_rust.rs
> @@ -0,0 +1,120 @@
> +// SPDX-License-Identifier: GPL-2.0
> +//! A simple self test for the rust per-CPU API.
> +
> +use core::ffi::c_void;
> +
> +use kernel::{
> +    bindings::{on_each_cpu, smp_processor_id},
> +    define_per_cpu,
> +    percpu::{cpu_guard::*, *},
> +    pr_info,
> +    prelude::*,
> +    unsafe_get_per_cpu,
> +};
> +
> +module! {
> +    type: PerCpuTestModule,
> +    name: "percpu_test_rust",
> +    author: "Mitchell Levy",
> +    description: "Test code to exercise the Rust Per CPU variable API",
> +    license: "GPL v2",
> +}
> +
> +struct PerCpuTestModule;
> +
> +define_per_cpu!(PERCPU: i64 = 0);
> +define_per_cpu!(UPERCPU: u64 = 0);
> +
> +impl kernel::Module for PerCpuTestModule {
> +    fn init(_module: &'static ThisModule) -> Result<Self, Error> {
> +        pr_info!("rust percpu test start\n");
> +
> +        let mut native: i64 = 0;
> +        // SAFETY: PERCPU is properly defined
> +        let mut pcpu: StaticPerCpu<i64> = unsafe { unsafe_get_per_cpu!(PERCPU) };

I don't understand why we need unsafe here, can't we just create
something specially in the `define_per_cpu` macro that is then confirmed
by the `get_per_cpu!` macro and thus it can be safe?

> +        // SAFETY: We only have one PerCpu that points at PERCPU
> +        unsafe { pcpu.get(CpuGuard::new()) }.with(|val: &mut i64| {

Hmm I also don't like the unsafe part here...

Can't we use the same API that `thread_local!` in the standard library
has:

    https://doc.rust-lang.org/std/macro.thread_local.html

So in this example you would store a `Cell<i64>` instead.

I'm not familiar with per CPU variables, but if you're usually storing
`Copy` types, then this is much better wrt not having unsafe code
everywhere.

If one also often stores `!Copy` types, then we might be able to get
away with `RefCell`, but that's a small runtime overhead -- which is
probably bad given that per cpu variables are most likely used for
performance reasons? In that case the user might just need to store
`UnsafeCell` and use unsafe regardless. (or we invent something
specifically for that case, eg tokens that are statically known to be
unique etc)

---
Cheers,
Benno

> +            pr_info!("The contents of pcpu are {}\n", val);
> +
> +            native += -1;
> +            *val += -1;
> +            pr_info!("Native: {}, *pcpu: {}\n", native, val);
> +            assert!(native == *val && native == -1);
> +
> +            native += 1;
> +            *val += 1;
> +            pr_info!("Native: {}, *pcpu: {}\n", native, val);
> +            assert!(native == *val && native == 0);
> +        });


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/5] rust: percpu: add a rust per-CPU variable test
  2025-07-13  9:30   ` Benno Lossin
@ 2025-07-15 10:31     ` Mitchell Levy
  2025-07-15 11:31       ` Benno Lossin
  0 siblings, 1 reply; 20+ messages in thread
From: Mitchell Levy @ 2025-07-15 10:31 UTC (permalink / raw)
  To: Benno Lossin
  Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
	Andrew Morton, Dennis Zhou, Tejun Heo, Christoph Lameter,
	Danilo Krummrich, linux-kernel, rust-for-linux, linux-mm

On Sun, Jul 13, 2025 at 11:30:31AM +0200, Benno Lossin wrote:
> On Sat Jul 12, 2025 at 11:31 PM CEST, Mitchell Levy wrote:
> > Add a short exercise for Rust's per-CPU variable API, modelled after
> > lib/percpu_test.c
> >
> > Signed-off-by: Mitchell Levy <levymitchell0@gmail.com>
> > ---
> >  lib/Kconfig.debug       |   9 ++++
> >  lib/Makefile            |   1 +
> >  lib/percpu_test_rust.rs | 120 ++++++++++++++++++++++++++++++++++++++++++++++++
> 
> I don't know if this is the correct place, the code looks much more like
> a sample, so why not place it there instead?

I don't feel particularly strongly either way --- I defaulted to `lib/`
since that's where the `percpu_test.c` I was working off of is located.
Happy to change for v3

> >  rust/helpers/percpu.c   |  11 +++++
> >  4 files changed, 141 insertions(+)
> > diff --git a/lib/percpu_test_rust.rs b/lib/percpu_test_rust.rs
> > new file mode 100644
> > index 000000000000..a9652e6ece08
> > --- /dev/null
> > +++ b/lib/percpu_test_rust.rs
> > @@ -0,0 +1,120 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +//! A simple self test for the rust per-CPU API.
> > +
> > +use core::ffi::c_void;
> > +
> > +use kernel::{
> > +    bindings::{on_each_cpu, smp_processor_id},
> > +    define_per_cpu,
> > +    percpu::{cpu_guard::*, *},
> > +    pr_info,
> > +    prelude::*,
> > +    unsafe_get_per_cpu,
> > +};
> > +
> > +module! {
> > +    type: PerCpuTestModule,
> > +    name: "percpu_test_rust",
> > +    author: "Mitchell Levy",
> > +    description: "Test code to exercise the Rust Per CPU variable API",
> > +    license: "GPL v2",
> > +}
> > +
> > +struct PerCpuTestModule;
> > +
> > +define_per_cpu!(PERCPU: i64 = 0);
> > +define_per_cpu!(UPERCPU: u64 = 0);
> > +
> > +impl kernel::Module for PerCpuTestModule {
> > +    fn init(_module: &'static ThisModule) -> Result<Self, Error> {
> > +        pr_info!("rust percpu test start\n");
> > +
> > +        let mut native: i64 = 0;
> > +        // SAFETY: PERCPU is properly defined
> > +        let mut pcpu: StaticPerCpu<i64> = unsafe { unsafe_get_per_cpu!(PERCPU) };
> 
> I don't understand why we need unsafe here, can't we just create
> something specially in the `define_per_cpu` macro that is then confirmed
> by the `get_per_cpu!` macro and thus it can be safe?

As is, something like
    define_per_cpu!(PERCPU: i32 = 0);

    fn func() {
        let mut pcpu: StaticPerCpu<i64> = unsafe { unsafe_get_per_cpu!(PERCPU) };
    }
will compile, but any usage of `pcpu` will be UB. This is because
`unsafe_get_per_cpu!` is just blindly casting pointers and, as far as I
know, the compiler does not do any checking of pointer casts. If you
have thoughts/ideas on how to get around this problem, I'd certainly
*like* to provide a safe API here :)

> > +        // SAFETY: We only have one PerCpu that points at PERCPU
> > +        unsafe { pcpu.get(CpuGuard::new()) }.with(|val: &mut i64| {
> 
> Hmm I also don't like the unsafe part here...
> 
> Can't we use the same API that `thread_local!` in the standard library
> has:
> 
>     https://doc.rust-lang.org/std/macro.thread_local.html
> 
> So in this example you would store a `Cell<i64>` instead.
> 
> I'm not familiar with per CPU variables, but if you're usually storing
> `Copy` types, then this is much better wrt not having unsafe code
> everywhere.
> 
> If one also often stores `!Copy` types, then we might be able to get
> away with `RefCell`, but that's a small runtime overhead -- which is
> probably bad given that per cpu variables are most likely used for
> performance reasons? In that case the user might just need to store
> `UnsafeCell` and use unsafe regardless. (or we invent something
> specifically for that case, eg tokens that are statically known to be
> unique etc)

I'm open to including a specialization for `T: Copy` in a similar vein
to what I have here for numeric types. Off the top of my head, that
shouldn't require any user-facing `unsafe`. But yes, I believe there is
a significant amount of interest in having `!Copy` per-CPU variables.
(At least, I'm interested in having them around for experimenting with
using Rust for HV drivers.)

I would definitely like to avoid *requiring* the use of `RefCell` since,
as you mention, it does have a runtime overhead. Per-CPU variables can
be used for "logical" reasons rather than just as a performance
optimization, so there might be some cases where paying the runtime
overhead is ok. But that's certainly not true in all cases. That said,
perhaps there could be a safely obtainable token type that only passes a
`&T` (rather than a `&mut T`) to its closure, and then if a user doesn't
mind the runtime overhead, they can choose `T` to be a `RefCell`.
Thoughts?

For `UnsafeCell`, if a user of the API were to have something like a
`PerCpu<UnsafeCell<T>>` that safely spits out a `&UnsafeCell<T>`, my
understanding is that mutating the underlying `T` would require the
exact same safety guarantees as what's here, except now it'd need a much
bigger unsafe block and would have to do all of its manipulations via
pointers. That seems like a pretty big ergonomics burden without a clear
(to me) benefit.

> ---
> Cheers,
> Benno
> 
> > +            pr_info!("The contents of pcpu are {}\n", val);
> > +
> > +            native += -1;
> > +            *val += -1;
> > +            pr_info!("Native: {}, *pcpu: {}\n", native, val);
> > +            assert!(native == *val && native == -1);
> > +
> > +            native += 1;
> > +            *val += 1;
> > +            pr_info!("Native: {}, *pcpu: {}\n", native, val);
> > +            assert!(native == *val && native == 0);
> > +        });


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/5] rust: percpu: add a rust per-CPU variable test
  2025-07-15 10:31     ` Mitchell Levy
@ 2025-07-15 11:31       ` Benno Lossin
  2025-07-15 14:10         ` Boqun Feng
  0 siblings, 1 reply; 20+ messages in thread
From: Benno Lossin @ 2025-07-15 11:31 UTC (permalink / raw)
  To: Mitchell Levy
  Cc: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
	Andrew Morton, Dennis Zhou, Tejun Heo, Christoph Lameter,
	Danilo Krummrich, linux-kernel, rust-for-linux, linux-mm

On Tue Jul 15, 2025 at 12:31 PM CEST, Mitchell Levy wrote:
> On Sun, Jul 13, 2025 at 11:30:31AM +0200, Benno Lossin wrote:
>> On Sat Jul 12, 2025 at 11:31 PM CEST, Mitchell Levy wrote:
>> > Add a short exercise for Rust's per-CPU variable API, modelled after
>> > lib/percpu_test.c
>> >
>> > Signed-off-by: Mitchell Levy <levymitchell0@gmail.com>
>> > ---
>> >  lib/Kconfig.debug       |   9 ++++
>> >  lib/Makefile            |   1 +
>> >  lib/percpu_test_rust.rs | 120 ++++++++++++++++++++++++++++++++++++++++++++++++
>> 
>> I don't know if this is the correct place, the code looks much more like
>> a sample, so why not place it there instead?
>
> I don't feel particularly strongly either way --- I defaulted to `lib/`
> since that's where the `percpu_test.c` I was working off of is located.
> Happy to change for v3

Since we don't have Rust stuff in lib/ yet (and that the code looks much
more like the samples we already have) I think putting it in
samples/rust is better.

>> >  rust/helpers/percpu.c   |  11 +++++
>> >  4 files changed, 141 insertions(+)
>> > diff --git a/lib/percpu_test_rust.rs b/lib/percpu_test_rust.rs
>> > new file mode 100644
>> > index 000000000000..a9652e6ece08
>> > --- /dev/null
>> > +++ b/lib/percpu_test_rust.rs
>> > @@ -0,0 +1,120 @@
>> > +// SPDX-License-Identifier: GPL-2.0
>> > +//! A simple self test for the rust per-CPU API.
>> > +
>> > +use core::ffi::c_void;
>> > +
>> > +use kernel::{
>> > +    bindings::{on_each_cpu, smp_processor_id},
>> > +    define_per_cpu,
>> > +    percpu::{cpu_guard::*, *},
>> > +    pr_info,
>> > +    prelude::*,
>> > +    unsafe_get_per_cpu,
>> > +};
>> > +
>> > +module! {
>> > +    type: PerCpuTestModule,
>> > +    name: "percpu_test_rust",
>> > +    author: "Mitchell Levy",
>> > +    description: "Test code to exercise the Rust Per CPU variable API",
>> > +    license: "GPL v2",
>> > +}
>> > +
>> > +struct PerCpuTestModule;
>> > +
>> > +define_per_cpu!(PERCPU: i64 = 0);
>> > +define_per_cpu!(UPERCPU: u64 = 0);
>> > +
>> > +impl kernel::Module for PerCpuTestModule {
>> > +    fn init(_module: &'static ThisModule) -> Result<Self, Error> {
>> > +        pr_info!("rust percpu test start\n");
>> > +
>> > +        let mut native: i64 = 0;
>> > +        // SAFETY: PERCPU is properly defined
>> > +        let mut pcpu: StaticPerCpu<i64> = unsafe { unsafe_get_per_cpu!(PERCPU) };
>> 
>> I don't understand why we need unsafe here, can't we just create
>> something specially in the `define_per_cpu` macro that is then confirmed
>> by the `get_per_cpu!` macro and thus it can be safe?
>
> As is, something like
>     define_per_cpu!(PERCPU: i32 = 0);
>
>     fn func() {
>         let mut pcpu: StaticPerCpu<i64> = unsafe { unsafe_get_per_cpu!(PERCPU) };
>     }
> will compile, but any usage of `pcpu` will be UB. This is because
> `unsafe_get_per_cpu!` is just blindly casting pointers and, as far as I
> know, the compiler does not do any checking of pointer casts. If you
> have thoughts/ideas on how to get around this problem, I'd certainly
> *like* to provide a safe API here :)

I haven't taken a look at your implementation, but you do have the type
declared in `define_per_cpu!`, so it's a bit of a mystery to me why you
can't get that out in `unsafe_get_per_cpu!`...

Maybe in a few weeks I'll be able to take a closer look.

>> > +        // SAFETY: We only have one PerCpu that points at PERCPU
>> > +        unsafe { pcpu.get(CpuGuard::new()) }.with(|val: &mut i64| {
>> 
>> Hmm I also don't like the unsafe part here...
>> 
>> Can't we use the same API that `thread_local!` in the standard library
>> has:
>> 
>>     https://doc.rust-lang.org/std/macro.thread_local.html
>> 
>> So in this example you would store a `Cell<i64>` instead.
>> 
>> I'm not familiar with per CPU variables, but if you're usually storing
>> `Copy` types, then this is much better wrt not having unsafe code
>> everywhere.
>> 
>> If one also often stores `!Copy` types, then we might be able to get
>> away with `RefCell`, but that's a small runtime overhead -- which is
>> probably bad given that per cpu variables are most likely used for
>> performance reasons? In that case the user might just need to store
>> `UnsafeCell` and use unsafe regardless. (or we invent something
>> specifically for that case, eg tokens that are statically known to be
>> unique etc)
>
> I'm open to including a specialization for `T: Copy` in a similar vein
> to what I have here for numeric types. Off the top of my head, that
> shouldn't require any user-facing `unsafe`. But yes, I believe there is
> a significant amount of interest in having `!Copy` per-CPU variables.
> (At least, I'm interested in having them around for experimenting with
> using Rust for HV drivers.)

What kinds of types would you like to store? Allocations? Just integers
in bigger structs? Mutexes?

> I would definitely like to avoid *requiring* the use of `RefCell` since,
> as you mention, it does have a runtime overhead. Per-CPU variables can
> be used for "logical" reasons rather than just as a performance
> optimization, so there might be some cases where paying the runtime
> overhead is ok. But that's certainly not true in all cases. That said,
> perhaps there could be a safely obtainable token type that only passes a
> `&T` (rather than a `&mut T`) to its closure, and then if a user doesn't
> mind the runtime overhead, they can choose `T` to be a `RefCell`.
> Thoughts?

So I think using an API similar to `thread_local!` will allow us to have
multiple other APIs that slot into that. `Cell<T>` for `T: Copy`,
`RefCell<T>` for cases where you don't care about the runtime overhead,
plain `T` for cases where you only need `&T`. For the case where you
need `&mut T`, we could have something like a `TokenCell<T>` that gives
out a token that you need to mutably borrow in order to get `&mut T`.
Finally for anything else that is too restricted by this, users can also
use `UnsafeCell<T>` although that requires `unsafe`.

I think the advantage of this is that the common cases are all safe and
very idiomatic. In the current design, you *always* have to use unsafe.

> For `UnsafeCell`, if a user of the API were to have something like a
> `PerCpu<UnsafeCell<T>>` that safely spits out a `&UnsafeCell<T>`, my
> understanding is that mutating the underlying `T` would require the
> exact same safety guarantees as what's here, except now it'd need a much
> bigger unsafe block and would have to do all of its manipulations via
> pointers. That seems like a pretty big ergonomics burden without a clear
> (to me) benefit.

It would require the same amount of unsafe & safety comments, but it
wouldn't be bigger comments, since you can just as well create `&mut T`
to the value.

---
Cheers,
Benno


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/5] rust: percpu: add a rust per-CPU variable test
  2025-07-15 11:31       ` Benno Lossin
@ 2025-07-15 14:10         ` Boqun Feng
  2025-07-15 15:55           ` Benno Lossin
  0 siblings, 1 reply; 20+ messages in thread
From: Boqun Feng @ 2025-07-15 14:10 UTC (permalink / raw)
  To: Benno Lossin
  Cc: Mitchell Levy, Miguel Ojeda, Alex Gaynor, Gary Guo,
	Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
	Andrew Morton, Dennis Zhou, Tejun Heo, Christoph Lameter,
	Danilo Krummrich, linux-kernel, rust-for-linux, linux-mm

On Tue, Jul 15, 2025 at 01:31:06PM +0200, Benno Lossin wrote:
[...]
> >> > +impl kernel::Module for PerCpuTestModule {
> >> > +    fn init(_module: &'static ThisModule) -> Result<Self, Error> {
> >> > +        pr_info!("rust percpu test start\n");
> >> > +
> >> > +        let mut native: i64 = 0;
> >> > +        // SAFETY: PERCPU is properly defined
> >> > +        let mut pcpu: StaticPerCpu<i64> = unsafe { unsafe_get_per_cpu!(PERCPU) };
> >> 
> >> I don't understand why we need unsafe here, can't we just create
> >> something specially in the `define_per_cpu` macro that is then confirmed
> >> by the `get_per_cpu!` macro and thus it can be safe?
> >
> > As is, something like
> >     define_per_cpu!(PERCPU: i32 = 0);
> >
> >     fn func() {
> >         let mut pcpu: StaticPerCpu<i64> = unsafe { unsafe_get_per_cpu!(PERCPU) };
> >     }
> > will compile, but any usage of `pcpu` will be UB. This is because
> > `unsafe_get_per_cpu!` is just blindly casting pointers and, as far as I
> > know, the compiler does not do any checking of pointer casts. If you
> > have thoughts/ideas on how to get around this problem, I'd certainly
> > *like* to provide a safe API here :)
> 
> I haven't taken a look at your implementation, but you do have the type
> declared in `define_per_cpu!`, so it's a bit of a mystery to me why you
> can't get that out in `unsafe_get_per_cpu!`...
> 
> Maybe in a few weeks I'll be able to take a closer look.
> 
> >> > +        // SAFETY: We only have one PerCpu that points at PERCPU
> >> > +        unsafe { pcpu.get(CpuGuard::new()) }.with(|val: &mut i64| {
> >> 
> >> Hmm I also don't like the unsafe part here...
> >> 
> >> Can't we use the same API that `thread_local!` in the standard library

First of all, `thread_local!` has to be implemented by some sys-specific
unsafe mechanism, right? For example on unix, I think it's using
pthread_key_t:

	https://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_key_create.html

what we are implementing (or wrapping) is the very basic unsafe
mechanism for percpu here. Surely we can explore the design for a safe
API, but the unsafe mechanism is probably necessary to look into at
first.

> >> has:
> >> 
> >>     https://doc.rust-lang.org/std/macro.thread_local.html
> >> 
> >> So in this example you would store a `Cell<i64>` instead.
> >> 
> >> I'm not familiar with per CPU variables, but if you're usually storing
> >> `Copy` types, then this is much better wrt not having unsafe code
> >> everywhere.
> >> 
> >> If one also often stores `!Copy` types, then we might be able to get
> >> away with `RefCell`, but that's a small runtime overhead -- which is
> >> probably bad given that per cpu variables are most likely used for
> >> performance reasons? In that case the user might just need to store
> >> `UnsafeCell` and use unsafe regardless. (or we invent something

This sounds reasonable to me.

> >> specifically for that case, eg tokens that are statically known to be
> >> unique etc)
> >
> > I'm open to including a specialization for `T: Copy` in a similar vein
> > to what I have here for numeric types. Off the top of my head, that
> > shouldn't require any user-facing `unsafe`. But yes, I believe there is
> > a significant amount of interest in having `!Copy` per-CPU variables.
> > (At least, I'm interested in having them around for experimenting with
> > using Rust for HV drivers.)
> 
> What kinds of types would you like to store? Allocations? Just integers
> in bigger structs? Mutexes?
> 

In the VMBus driver, there is a percpu work_struct.

> > I would definitely like to avoid *requiring* the use of `RefCell` since,
> > as you mention, it does have a runtime overhead. Per-CPU variables can
> > be used for "logical" reasons rather than just as a performance
> > optimization, so there might be some cases where paying the runtime
> > overhead is ok. But that's certainly not true in all cases. That said,
> > perhaps there could be a safely obtainable token type that only passes a
> > `&T` (rather than a `&mut T`) to its closure, and then if a user doesn't
> > mind the runtime overhead, they can choose `T` to be a `RefCell`.
> > Thoughts?
> 
> So I think using an API similar to `thread_local!` will allow us to have
> multiple other APIs that slot into that. `Cell<T>` for `T: Copy`,
> `RefCell<T>` for cases where you don't care about the runtime overhead,
> plain `T` for cases where you only need `&T`. For the case where you
> need `&mut T`, we could have something like a `TokenCell<T>` that gives
> out a token that you need to mutably borrow in order to get `&mut T`.
> Finally for anything else that is too restricted by this, users can also
> use `UnsafeCell<T>` although that requires `unsafe`.
> 
> I think the advantage of this is that the common cases are all safe and
> very idiomatic. In the current design, you *always* have to use unsafe.
> 

I agree, but like I said, we need to figure out the unsafe interface
that C already uses and build API upon it. I think focusing on the
unsafe mechanism may be the way to start: you cannot implement something
that cannot be implemented, and we don't have the magic pthread_key here
;-)

Regards,
Boqun

> > For `UnsafeCell`, if a user of the API were to have something like a
> > `PerCpu<UnsafeCell<T>>` that safely spits out a `&UnsafeCell<T>`, my
> > understanding is that mutating the underlying `T` would require the
> > exact same safety guarantees as what's here, except now it'd need a much
> > bigger unsafe block and would have to do all of its manipulations via
> > pointers. That seems like a pretty big ergonomics burden without a clear
> > (to me) benefit.
> 
> It would require the same amount of unsafe & safety comments, but it
> wouldn't be bigger comments, since you can just as well create `&mut T`
> to the value.
> 
> ---
> Cheers,
> Benno


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/5] rust: percpu: add a rust per-CPU variable test
  2025-07-15 14:10         ` Boqun Feng
@ 2025-07-15 15:55           ` Benno Lossin
  2025-07-15 16:31             ` Boqun Feng
  0 siblings, 1 reply; 20+ messages in thread
From: Benno Lossin @ 2025-07-15 15:55 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Mitchell Levy, Miguel Ojeda, Alex Gaynor, Gary Guo,
	Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
	Andrew Morton, Dennis Zhou, Tejun Heo, Christoph Lameter,
	Danilo Krummrich, linux-kernel, rust-for-linux, linux-mm

On Tue Jul 15, 2025 at 4:10 PM CEST, Boqun Feng wrote:
> On Tue, Jul 15, 2025 at 01:31:06PM +0200, Benno Lossin wrote:
> [...]
>> >> > +impl kernel::Module for PerCpuTestModule {
>> >> > +    fn init(_module: &'static ThisModule) -> Result<Self, Error> {
>> >> > +        pr_info!("rust percpu test start\n");
>> >> > +
>> >> > +        let mut native: i64 = 0;
>> >> > +        // SAFETY: PERCPU is properly defined
>> >> > +        let mut pcpu: StaticPerCpu<i64> = unsafe { unsafe_get_per_cpu!(PERCPU) };
>> >> 
>> >> I don't understand why we need unsafe here, can't we just create
>> >> something specially in the `define_per_cpu` macro that is then confirmed
>> >> by the `get_per_cpu!` macro and thus it can be safe?
>> >
>> > As is, something like
>> >     define_per_cpu!(PERCPU: i32 = 0);
>> >
>> >     fn func() {
>> >         let mut pcpu: StaticPerCpu<i64> = unsafe { unsafe_get_per_cpu!(PERCPU) };
>> >     }
>> > will compile, but any usage of `pcpu` will be UB. This is because
>> > `unsafe_get_per_cpu!` is just blindly casting pointers and, as far as I
>> > know, the compiler does not do any checking of pointer casts. If you
>> > have thoughts/ideas on how to get around this problem, I'd certainly
>> > *like* to provide a safe API here :)
>> 
>> I haven't taken a look at your implementation, but you do have the type
>> declared in `define_per_cpu!`, so it's a bit of a mystery to me why you
>> can't get that out in `unsafe_get_per_cpu!`...
>> 
>> Maybe in a few weeks I'll be able to take a closer look.
>> 
>> >> > +        // SAFETY: We only have one PerCpu that points at PERCPU
>> >> > +        unsafe { pcpu.get(CpuGuard::new()) }.with(|val: &mut i64| {
>> >> 
>> >> Hmm I also don't like the unsafe part here...
>> >> 
>> >> Can't we use the same API that `thread_local!` in the standard library
>
> First of all, `thread_local!` has to be implemented by some sys-specific
> unsafe mechanism, right? For example on unix, I think it's using
> pthread_key_t:
>
> 	https://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_key_create.html
>
> what we are implementing (or wrapping) is the very basic unsafe
> mechanism for percpu here. Surely we can explore the design for a safe
> API, but the unsafe mechanism is probably necessary to look into at
> first.

But this is intended to be used by drivers, right? If so, then we should
do our usual due diligence and work out a safe abstraction. Only fall
back to unsafe if it isn't possible.

I'm not familiar with percpu, but from the name I assumed that it's
"just a variable for each cpu" so similar to `thread_local!`, but it's
bound to the specific cpu instead of the thread.

That in my mind should be rather easy to support in Rust at least with
the thread_local-style API. You just need to ensure that no reference
can escape the cpu, so we can make it `!Send` & `!Sync` + rely on klint
to detect context switches.

>> >> has:
>> >> 
>> >>     https://doc.rust-lang.org/std/macro.thread_local.html
>> >> 
>> >> So in this example you would store a `Cell<i64>` instead.
>> >> 
>> >> I'm not familiar with per CPU variables, but if you're usually storing
>> >> `Copy` types, then this is much better wrt not having unsafe code
>> >> everywhere.
>> >> 
>> >> If one also often stores `!Copy` types, then we might be able to get
>> >> away with `RefCell`, but that's a small runtime overhead -- which is
>> >> probably bad given that per cpu variables are most likely used for
>> >> performance reasons? In that case the user might just need to store
>> >> `UnsafeCell` and use unsafe regardless. (or we invent something
>
> This sounds reasonable to me.
>
>> >> specifically for that case, eg tokens that are statically known to be
>> >> unique etc)
>> >
>> > I'm open to including a specialization for `T: Copy` in a similar vein
>> > to what I have here for numeric types. Off the top of my head, that
>> > shouldn't require any user-facing `unsafe`. But yes, I believe there is
>> > a significant amount of interest in having `!Copy` per-CPU variables.
>> > (At least, I'm interested in having them around for experimenting with
>> > using Rust for HV drivers.)
>> 
>> What kinds of types would you like to store? Allocations? Just integers
>> in bigger structs? Mutexes?
>> 
>
> In the VMBus driver, there is a percpu work_struct.

Do you have a link? Or better yet a Rust struct description of what you
think it will look like :)

>> > I would definitely like to avoid *requiring* the use of `RefCell` since,
>> > as you mention, it does have a runtime overhead. Per-CPU variables can
>> > be used for "logical" reasons rather than just as a performance
>> > optimization, so there might be some cases where paying the runtime
>> > overhead is ok. But that's certainly not true in all cases. That said,
>> > perhaps there could be a safely obtainable token type that only passes a
>> > `&T` (rather than a `&mut T`) to its closure, and then if a user doesn't
>> > mind the runtime overhead, they can choose `T` to be a `RefCell`.
>> > Thoughts?
>> 
>> So I think using an API similar to `thread_local!` will allow us to have
>> multiple other APIs that slot into that. `Cell<T>` for `T: Copy`,
>> `RefCell<T>` for cases where you don't care about the runtime overhead,
>> plain `T` for cases where you only need `&T`. For the case where you
>> need `&mut T`, we could have something like a `TokenCell<T>` that gives
>> out a token that you need to mutably borrow in order to get `&mut T`.
>> Finally for anything else that is too restricted by this, users can also
>> use `UnsafeCell<T>` although that requires `unsafe`.
>> 
>> I think the advantage of this is that the common cases are all safe and
>> very idiomatic. In the current design, you *always* have to use unsafe.
>> 
>
> I agree, but like I said, we need to figure out the unsafe interface
> that C already uses and build API upon it. I think focusing on the
> unsafe mechanism may be the way to start: you cannot implement something
> that cannot be implemented, and we don't have the magic pthread_key here
> ;-)

Sure we can do some experimentation, but I don't think we should put
unsafe abstractions upstream that we intend to replace with a safe
abstraction later. Otherwise people are going to depend on it and it's
going to be a mess. Do the experimenting out of tree and learn there.

---
Cheers,
Benno


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/5] rust: percpu: add a rust per-CPU variable test
  2025-07-15 15:55           ` Benno Lossin
@ 2025-07-15 16:31             ` Boqun Feng
  2025-07-15 17:44               ` Benno Lossin
  0 siblings, 1 reply; 20+ messages in thread
From: Boqun Feng @ 2025-07-15 16:31 UTC (permalink / raw)
  To: Benno Lossin
  Cc: Mitchell Levy, Miguel Ojeda, Alex Gaynor, Gary Guo,
	Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
	Andrew Morton, Dennis Zhou, Tejun Heo, Christoph Lameter,
	Danilo Krummrich, linux-kernel, rust-for-linux, linux-mm

On Tue, Jul 15, 2025 at 05:55:13PM +0200, Benno Lossin wrote:
> On Tue Jul 15, 2025 at 4:10 PM CEST, Boqun Feng wrote:
> > On Tue, Jul 15, 2025 at 01:31:06PM +0200, Benno Lossin wrote:
> > [...]
> >> >> > +impl kernel::Module for PerCpuTestModule {
> >> >> > +    fn init(_module: &'static ThisModule) -> Result<Self, Error> {
> >> >> > +        pr_info!("rust percpu test start\n");
> >> >> > +
> >> >> > +        let mut native: i64 = 0;
> >> >> > +        // SAFETY: PERCPU is properly defined
> >> >> > +        let mut pcpu: StaticPerCpu<i64> = unsafe { unsafe_get_per_cpu!(PERCPU) };
> >> >> 
> >> >> I don't understand why we need unsafe here, can't we just create
> >> >> something specially in the `define_per_cpu` macro that is then confirmed
> >> >> by the `get_per_cpu!` macro and thus it can be safe?
> >> >
> >> > As is, something like
> >> >     define_per_cpu!(PERCPU: i32 = 0);
> >> >
> >> >     fn func() {
> >> >         let mut pcpu: StaticPerCpu<i64> = unsafe { unsafe_get_per_cpu!(PERCPU) };
> >> >     }
> >> > will compile, but any usage of `pcpu` will be UB. This is because
> >> > `unsafe_get_per_cpu!` is just blindly casting pointers and, as far as I
> >> > know, the compiler does not do any checking of pointer casts. If you
> >> > have thoughts/ideas on how to get around this problem, I'd certainly
> >> > *like* to provide a safe API here :)
> >> 
> >> I haven't taken a look at your implementation, but you do have the type
> >> declared in `define_per_cpu!`, so it's a bit of a mystery to me why you
> >> can't get that out in `unsafe_get_per_cpu!`...
> >> 
> >> Maybe in a few weeks I'll be able to take a closer look.
> >> 
> >> >> > +        // SAFETY: We only have one PerCpu that points at PERCPU
> >> >> > +        unsafe { pcpu.get(CpuGuard::new()) }.with(|val: &mut i64| {
> >> >> 
> >> >> Hmm I also don't like the unsafe part here...
> >> >> 
> >> >> Can't we use the same API that `thread_local!` in the standard library
> >
> > First of all, `thread_local!` has to be implemented by some sys-specific
> > unsafe mechanism, right? For example on unix, I think it's using
> > pthread_key_t:
> >
> > 	https://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_key_create.html
> >
> > what we are implementing (or wrapping) is the very basic unsafe
> > mechanism for percpu here. Surely we can explore the design for a safe
> > API, but the unsafe mechanism is probably necessary to look into at
> > first.
> 
> But this is intended to be used by drivers, right? If so, then we should

Not necessarily only for drivers, we can also use it for implementing
other safe abstraction (e.g. hazard pointers, percpu counters etc)

> do our usual due diligence and work out a safe abstraction. Only fall
> back to unsafe if it isn't possible.
> 

All I'm saying is instead of figuring out a safe abstraction at first,
we should probably focus on identifying how to implement it and which
part is really unsafe and the safety requirement for that.

> I'm not familiar with percpu, but from the name I assumed that it's
> "just a variable for each cpu" so similar to `thread_local!`, but it's
> bound to the specific cpu instead of the thread.
> 
> That in my mind should be rather easy to support in Rust at least with
> the thread_local-style API. You just need to ensure that no reference
> can escape the cpu, so we can make it `!Send` & `!Sync` + rely on klint

Not really, in kernel, we have plenty of use cases that we read the
other CPU's percpu variables. For example, each CPU keeps it's own
counter and we sum them other in another CPU.

If we would like to model it conceptually, it's more like an array
that's index by CpuId to me.

> to detect context switches.
> 
> >> >> has:
> >> >> 
> >> >>     https://doc.rust-lang.org/std/macro.thread_local.html
> >> >> 
> >> >> So in this example you would store a `Cell<i64>` instead.
> >> >> 
> >> >> I'm not familiar with per CPU variables, but if you're usually storing
> >> >> `Copy` types, then this is much better wrt not having unsafe code
> >> >> everywhere.
> >> >> 
> >> >> If one also often stores `!Copy` types, then we might be able to get
> >> >> away with `RefCell`, but that's a small runtime overhead -- which is
> >> >> probably bad given that per cpu variables are most likely used for
> >> >> performance reasons? In that case the user might just need to store
> >> >> `UnsafeCell` and use unsafe regardless. (or we invent something
> >
> > This sounds reasonable to me.
> >
> >> >> specifically for that case, eg tokens that are statically known to be
> >> >> unique etc)
> >> >
> >> > I'm open to including a specialization for `T: Copy` in a similar vein
> >> > to what I have here for numeric types. Off the top of my head, that
> >> > shouldn't require any user-facing `unsafe`. But yes, I believe there is
> >> > a significant amount of interest in having `!Copy` per-CPU variables.
> >> > (At least, I'm interested in having them around for experimenting with
> >> > using Rust for HV drivers.)
> >> 
> >> What kinds of types would you like to store? Allocations? Just integers
> >> in bigger structs? Mutexes?
> >> 
> >
> > In the VMBus driver, there is a percpu work_struct.
> 
> Do you have a link? Or better yet a Rust struct description of what you
> think it will look like :)
> 

Not Rust code yet, but here is the corresponding C code:

	https://github.com/Rust-for-Linux/linux/blob/rust-next/drivers/hv/vmbus_drv.c#L1396

But please note that we are not solely developing the abstraction for
this usage, but more for generally understand how to wrap percpu
functionality similar to the usage in C.

> >> > I would definitely like to avoid *requiring* the use of `RefCell` since,
> >> > as you mention, it does have a runtime overhead. Per-CPU variables can
> >> > be used for "logical" reasons rather than just as a performance
> >> > optimization, so there might be some cases where paying the runtime
> >> > overhead is ok. But that's certainly not true in all cases. That said,
> >> > perhaps there could be a safely obtainable token type that only passes a
> >> > `&T` (rather than a `&mut T`) to its closure, and then if a user doesn't
> >> > mind the runtime overhead, they can choose `T` to be a `RefCell`.
> >> > Thoughts?
> >> 
> >> So I think using an API similar to `thread_local!` will allow us to have
> >> multiple other APIs that slot into that. `Cell<T>` for `T: Copy`,
> >> `RefCell<T>` for cases where you don't care about the runtime overhead,
> >> plain `T` for cases where you only need `&T`. For the case where you
> >> need `&mut T`, we could have something like a `TokenCell<T>` that gives
> >> out a token that you need to mutably borrow in order to get `&mut T`.
> >> Finally for anything else that is too restricted by this, users can also
> >> use `UnsafeCell<T>` although that requires `unsafe`.
> >> 
> >> I think the advantage of this is that the common cases are all safe and
> >> very idiomatic. In the current design, you *always* have to use unsafe.
> >> 
> >
> > I agree, but like I said, we need to figure out the unsafe interface
> > that C already uses and build API upon it. I think focusing on the
> > unsafe mechanism may be the way to start: you cannot implement something
> > that cannot be implemented, and we don't have the magic pthread_key here
> > ;-)
> 
> Sure we can do some experimentation, but I don't think we should put
> unsafe abstractions upstream that we intend to replace with a safe
> abstraction later. Otherwise people are going to depend on it and it's

I doubt we can replace the unsafe abstraction with a safe one, if users
really care the performance then they would really need to use some
unsafe API to build their safe abstraction.

> going to be a mess. Do the experimenting out of tree and learn there.

I disagree, Rust as a language its own should be able to do what C does
including being able to implement the percpu functionality same as C,
there is nothing wrong with a set of Rust primitives in the kernel that
provides fundamental percpu functionality the other core facilities can
rely on. The better part is that it will have all the safety requirement
documented well.

Regards,
Boqun

> 
> ---
> Cheers,
> Benno


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/5] rust: percpu: add a rust per-CPU variable test
  2025-07-15 16:31             ` Boqun Feng
@ 2025-07-15 17:44               ` Benno Lossin
  2025-07-15 21:34                 ` Boqun Feng
  2025-07-16 15:35                 ` Boqun Feng
  0 siblings, 2 replies; 20+ messages in thread
From: Benno Lossin @ 2025-07-15 17:44 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Mitchell Levy, Miguel Ojeda, Alex Gaynor, Gary Guo,
	Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
	Andrew Morton, Dennis Zhou, Tejun Heo, Christoph Lameter,
	Danilo Krummrich, linux-kernel, rust-for-linux, linux-mm

On Tue Jul 15, 2025 at 6:31 PM CEST, Boqun Feng wrote:
> On Tue, Jul 15, 2025 at 05:55:13PM +0200, Benno Lossin wrote:
>> On Tue Jul 15, 2025 at 4:10 PM CEST, Boqun Feng wrote:
>> > On Tue, Jul 15, 2025 at 01:31:06PM +0200, Benno Lossin wrote:
>> > [...]
>> >> >> > +impl kernel::Module for PerCpuTestModule {
>> >> >> > +    fn init(_module: &'static ThisModule) -> Result<Self, Error> {
>> >> >> > +        pr_info!("rust percpu test start\n");
>> >> >> > +
>> >> >> > +        let mut native: i64 = 0;
>> >> >> > +        // SAFETY: PERCPU is properly defined
>> >> >> > +        let mut pcpu: StaticPerCpu<i64> = unsafe { unsafe_get_per_cpu!(PERCPU) };
>> >> >> 
>> >> >> I don't understand why we need unsafe here, can't we just create
>> >> >> something specially in the `define_per_cpu` macro that is then confirmed
>> >> >> by the `get_per_cpu!` macro and thus it can be safe?
>> >> >
>> >> > As is, something like
>> >> >     define_per_cpu!(PERCPU: i32 = 0);
>> >> >
>> >> >     fn func() {
>> >> >         let mut pcpu: StaticPerCpu<i64> = unsafe { unsafe_get_per_cpu!(PERCPU) };
>> >> >     }
>> >> > will compile, but any usage of `pcpu` will be UB. This is because
>> >> > `unsafe_get_per_cpu!` is just blindly casting pointers and, as far as I
>> >> > know, the compiler does not do any checking of pointer casts. If you
>> >> > have thoughts/ideas on how to get around this problem, I'd certainly
>> >> > *like* to provide a safe API here :)
>> >> 
>> >> I haven't taken a look at your implementation, but you do have the type
>> >> declared in `define_per_cpu!`, so it's a bit of a mystery to me why you
>> >> can't get that out in `unsafe_get_per_cpu!`...
>> >> 
>> >> Maybe in a few weeks I'll be able to take a closer look.
>> >> 
>> >> >> > +        // SAFETY: We only have one PerCpu that points at PERCPU
>> >> >> > +        unsafe { pcpu.get(CpuGuard::new()) }.with(|val: &mut i64| {
>> >> >> 
>> >> >> Hmm I also don't like the unsafe part here...
>> >> >> 
>> >> >> Can't we use the same API that `thread_local!` in the standard library
>> >
>> > First of all, `thread_local!` has to be implemented by some sys-specific
>> > unsafe mechanism, right? For example on unix, I think it's using
>> > pthread_key_t:
>> >
>> > 	https://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_key_create.html
>> >
>> > what we are implementing (or wrapping) is the very basic unsafe
>> > mechanism for percpu here. Surely we can explore the design for a safe
>> > API, but the unsafe mechanism is probably necessary to look into at
>> > first.
>> 
>> But this is intended to be used by drivers, right? If so, then we should
>
> Not necessarily only for drivers, we can also use it for implementing
> other safe abstraction (e.g. hazard pointers, percpu counters etc)

That's fair, but then it should be `pub(crate)`.

>> do our usual due diligence and work out a safe abstraction. Only fall
>> back to unsafe if it isn't possible.
>> 
>
> All I'm saying is instead of figuring out a safe abstraction at first,
> we should probably focus on identifying how to implement it and which
> part is really unsafe and the safety requirement for that.

Yeah. But then we should do that before merging :)

>> I'm not familiar with percpu, but from the name I assumed that it's
>> "just a variable for each cpu" so similar to `thread_local!`, but it's
>> bound to the specific cpu instead of the thread.
>> 
>> That in my mind should be rather easy to support in Rust at least with
>> the thread_local-style API. You just need to ensure that no reference
>> can escape the cpu, so we can make it `!Send` & `!Sync` + rely on klint
>
> Not really, in kernel, we have plenty of use cases that we read the
> other CPU's percpu variables. For example, each CPU keeps it's own
> counter and we sum them other in another CPU.

But then you need some sort of synchronization?

> If we would like to model it conceptually, it's more like an array
> that's index by CpuId to me.

Gotcha, but this model is missing the access control/synchronization. So
I'm not so sure how useful it is.

(I think I asked this somewhere else, but the number of CPUs doesn't
change, right?)

>> to detect context switches.
>> 
>> >> >> has:
>> >> >> 
>> >> >>     https://doc.rust-lang.org/std/macro.thread_local.html
>> >> >> 
>> >> >> So in this example you would store a `Cell<i64>` instead.
>> >> >> 
>> >> >> I'm not familiar with per CPU variables, but if you're usually storing
>> >> >> `Copy` types, then this is much better wrt not having unsafe code
>> >> >> everywhere.
>> >> >> 
>> >> >> If one also often stores `!Copy` types, then we might be able to get
>> >> >> away with `RefCell`, but that's a small runtime overhead -- which is
>> >> >> probably bad given that per cpu variables are most likely used for
>> >> >> performance reasons? In that case the user might just need to store
>> >> >> `UnsafeCell` and use unsafe regardless. (or we invent something
>> >
>> > This sounds reasonable to me.
>> >
>> >> >> specifically for that case, eg tokens that are statically known to be
>> >> >> unique etc)
>> >> >
>> >> > I'm open to including a specialization for `T: Copy` in a similar vein
>> >> > to what I have here for numeric types. Off the top of my head, that
>> >> > shouldn't require any user-facing `unsafe`. But yes, I believe there is
>> >> > a significant amount of interest in having `!Copy` per-CPU variables.
>> >> > (At least, I'm interested in having them around for experimenting with
>> >> > using Rust for HV drivers.)
>> >> 
>> >> What kinds of types would you like to store? Allocations? Just integers
>> >> in bigger structs? Mutexes?
>> >> 
>> >
>> > In the VMBus driver, there is a percpu work_struct.
>> 
>> Do you have a link? Or better yet a Rust struct description of what you
>> think it will look like :)
>> 
>
> Not Rust code yet, but here is the corresponding C code:
>
> 	https://github.com/Rust-for-Linux/linux/blob/rust-next/drivers/hv/vmbus_drv.c#L1396

Thanks!

> But please note that we are not solely developing the abstraction for
> this usage, but more for generally understand how to wrap percpu
> functionality similar to the usage in C.

Well, I have to start somewhere for looking at the use-cases :)

If you have more, just let me see. (probably won't have enough time to
look at them now, but maybe in a couple weeks)

>> >> > I would definitely like to avoid *requiring* the use of `RefCell` since,
>> >> > as you mention, it does have a runtime overhead. Per-CPU variables can
>> >> > be used for "logical" reasons rather than just as a performance
>> >> > optimization, so there might be some cases where paying the runtime
>> >> > overhead is ok. But that's certainly not true in all cases. That said,
>> >> > perhaps there could be a safely obtainable token type that only passes a
>> >> > `&T` (rather than a `&mut T`) to its closure, and then if a user doesn't
>> >> > mind the runtime overhead, they can choose `T` to be a `RefCell`.
>> >> > Thoughts?
>> >> 
>> >> So I think using an API similar to `thread_local!` will allow us to have
>> >> multiple other APIs that slot into that. `Cell<T>` for `T: Copy`,
>> >> `RefCell<T>` for cases where you don't care about the runtime overhead,
>> >> plain `T` for cases where you only need `&T`. For the case where you
>> >> need `&mut T`, we could have something like a `TokenCell<T>` that gives
>> >> out a token that you need to mutably borrow in order to get `&mut T`.
>> >> Finally for anything else that is too restricted by this, users can also
>> >> use `UnsafeCell<T>` although that requires `unsafe`.
>> >> 
>> >> I think the advantage of this is that the common cases are all safe and
>> >> very idiomatic. In the current design, you *always* have to use unsafe.
>> >> 
>> >
>> > I agree, but like I said, we need to figure out the unsafe interface
>> > that C already uses and build API upon it. I think focusing on the
>> > unsafe mechanism may be the way to start: you cannot implement something
>> > that cannot be implemented, and we don't have the magic pthread_key here
>> > ;-)
>> 
>> Sure we can do some experimentation, but I don't think we should put
>> unsafe abstractions upstream that we intend to replace with a safe
>> abstraction later. Otherwise people are going to depend on it and it's
>
> I doubt we can replace the unsafe abstraction with a safe one, if users
> really care the performance then they would really need to use some
> unsafe API to build their safe abstraction.

That sounds pretty pessimistic, why do you think that?

>> going to be a mess. Do the experimenting out of tree and learn there.
>
> I disagree, Rust as a language its own should be able to do what C does
> including being able to implement the percpu functionality same as C,
> there is nothing wrong with a set of Rust primitives in the kernel that
> provides fundamental percpu functionality the other core facilities can
> rely on. The better part is that it will have all the safety requirement
> documented well.

Sure, but we haven't even tried to make it safe, so I don't think we
should add them now in this state.

---
Cheers,
Benno


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/5] rust: percpu: add a rust per-CPU variable test
  2025-07-15 17:44               ` Benno Lossin
@ 2025-07-15 21:34                 ` Boqun Feng
  2025-07-16 10:32                   ` Benno Lossin
  2025-07-16 15:35                 ` Boqun Feng
  1 sibling, 1 reply; 20+ messages in thread
From: Boqun Feng @ 2025-07-15 21:34 UTC (permalink / raw)
  To: Benno Lossin
  Cc: Mitchell Levy, Miguel Ojeda, Alex Gaynor, Gary Guo,
	Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
	Andrew Morton, Dennis Zhou, Tejun Heo, Christoph Lameter,
	Danilo Krummrich, linux-kernel, rust-for-linux, linux-mm

On Tue, Jul 15, 2025 at 07:44:01PM +0200, Benno Lossin wrote:
[...]
> >> >
> >> > First of all, `thread_local!` has to be implemented by some sys-specific
> >> > unsafe mechanism, right? For example on unix, I think it's using
> >> > pthread_key_t:
> >> >
> >> > 	https://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_key_create.html
> >> >
> >> > what we are implementing (or wrapping) is the very basic unsafe
> >> > mechanism for percpu here. Surely we can explore the design for a safe
> >> > API, but the unsafe mechanism is probably necessary to look into at
> >> > first.
> >> 
> >> But this is intended to be used by drivers, right? If so, then we should
> >
> > Not necessarily only for drivers, we can also use it for implementing
> > other safe abstraction (e.g. hazard pointers, percpu counters etc)
> 
> That's fair, but then it should be `pub(crate)`.
> 

Fine by me, but please see below.

> >> do our usual due diligence and work out a safe abstraction. Only fall
> >> back to unsafe if it isn't possible.
> >> 
> >
> > All I'm saying is instead of figuring out a safe abstraction at first,
> > we should probably focus on identifying how to implement it and which
> > part is really unsafe and the safety requirement for that.
> 
> Yeah. But then we should do that before merging :)
> 

Well, who's talknig about merging? ;-) I thought we just began reviewing
here ;-)

> >> I'm not familiar with percpu, but from the name I assumed that it's
> >> "just a variable for each cpu" so similar to `thread_local!`, but it's
> >> bound to the specific cpu instead of the thread.
> >> 
> >> That in my mind should be rather easy to support in Rust at least with
> >> the thread_local-style API. You just need to ensure that no reference
> >> can escape the cpu, so we can make it `!Send` & `!Sync` + rely on klint
> >
> > Not really, in kernel, we have plenty of use cases that we read the
> > other CPU's percpu variables. For example, each CPU keeps it's own
> > counter and we sum them other in another CPU.
> 
> But then you need some sort of synchronization?
> 

Right, but the synchronization can exist either in the percpu operations
themselves or outside the percpu operations. Some cases, the data types
are small enough to fit in atomic data types, and operations are just
load/store/cmpxchg etc, then operations on the current cpu and remote
read will be naturally synchronized. Sometimes extra synchronization is
needed.

Keyword find all these cases are `per_cpu_ptr()`:

	https://elixir.bootlin.com/linux/v6.15.6/A/ident/per_cpu_ptr

> > If we would like to model it conceptually, it's more like an array
> > that's index by CpuId to me.
> 
> Gotcha, but this model is missing the access control/synchronization. So
> I'm not so sure how useful it is.
> 
> (I think I asked this somewhere else, but the number of CPUs doesn't
> change, right?)
> 

In terms of percpu variable, yes. A percpu variable is even available
for an offline CPU.

> >> to detect context switches.
> >> 
> >> >> >> has:
> >> >> >> 
> >> >> >>     https://doc.rust-lang.org/std/macro.thread_local.html
> >> >> >> 
> >> >> >> So in this example you would store a `Cell<i64>` instead.
> >> >> >> 
> >> >> >> I'm not familiar with per CPU variables, but if you're usually storing
> >> >> >> `Copy` types, then this is much better wrt not having unsafe code
> >> >> >> everywhere.
> >> >> >> 
> >> >> >> If one also often stores `!Copy` types, then we might be able to get
> >> >> >> away with `RefCell`, but that's a small runtime overhead -- which is
> >> >> >> probably bad given that per cpu variables are most likely used for
> >> >> >> performance reasons? In that case the user might just need to store
> >> >> >> `UnsafeCell` and use unsafe regardless. (or we invent something
> >> >
> >> > This sounds reasonable to me.
> >> >
> >> >> >> specifically for that case, eg tokens that are statically known to be
> >> >> >> unique etc)
> >> >> >
> >> >> > I'm open to including a specialization for `T: Copy` in a similar vein
> >> >> > to what I have here for numeric types. Off the top of my head, that
> >> >> > shouldn't require any user-facing `unsafe`. But yes, I believe there is
> >> >> > a significant amount of interest in having `!Copy` per-CPU variables.
> >> >> > (At least, I'm interested in having them around for experimenting with
> >> >> > using Rust for HV drivers.)
> >> >> 
> >> >> What kinds of types would you like to store? Allocations? Just integers
> >> >> in bigger structs? Mutexes?
> >> >> 
> >> >
> >> > In the VMBus driver, there is a percpu work_struct.
> >> 
> >> Do you have a link? Or better yet a Rust struct description of what you
> >> think it will look like :)
> >> 
> >
> > Not Rust code yet, but here is the corresponding C code:
> >
> > 	https://github.com/Rust-for-Linux/linux/blob/rust-next/drivers/hv/vmbus_drv.c#L1396
> 
> Thanks!
> 
> > But please note that we are not solely developing the abstraction for
> > this usage, but more for generally understand how to wrap percpu
> > functionality similar to the usage in C.
> 
> Well, I have to start somewhere for looking at the use-cases :)
> 
> If you have more, just let me see. (probably won't have enough time to
> look at them now, but maybe in a couple weeks)
> 

If you have time, feel free to take a look at hazard pointers;

https://lore.kernel.org/lkml/20240917143402.930114-1-boqun.feng@gmail.com/
https://lore.kernel.org/lkml/20250625031101.12555-1-boqun.feng@gmail.com/

You can also take a look at existing usage of percpu, e.g. SRCU uses to
track how many readers are active:

	https://elixir.bootlin.com/linux/v6.15.6/source/kernel/rcu/srcutree.c#L577

[...]

Regards,
Boqun


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/5] rust: percpu: add a rust per-CPU variable test
  2025-07-15 21:34                 ` Boqun Feng
@ 2025-07-16 10:32                   ` Benno Lossin
  2025-07-16 15:33                     ` Boqun Feng
  0 siblings, 1 reply; 20+ messages in thread
From: Benno Lossin @ 2025-07-16 10:32 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Mitchell Levy, Miguel Ojeda, Alex Gaynor, Gary Guo,
	Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
	Andrew Morton, Dennis Zhou, Tejun Heo, Christoph Lameter,
	Danilo Krummrich, linux-kernel, rust-for-linux, linux-mm

On Tue Jul 15, 2025 at 11:34 PM CEST, Boqun Feng wrote:
> On Tue, Jul 15, 2025 at 07:44:01PM +0200, Benno Lossin wrote:
> [...]
>> >> >
>> >> > First of all, `thread_local!` has to be implemented by some sys-specific
>> >> > unsafe mechanism, right? For example on unix, I think it's using
>> >> > pthread_key_t:
>> >> >
>> >> > 	https://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_key_create.html
>> >> >
>> >> > what we are implementing (or wrapping) is the very basic unsafe
>> >> > mechanism for percpu here. Surely we can explore the design for a safe
>> >> > API, but the unsafe mechanism is probably necessary to look into at
>> >> > first.
>> >> 
>> >> But this is intended to be used by drivers, right? If so, then we should
>> >
>> > Not necessarily only for drivers, we can also use it for implementing
>> > other safe abstraction (e.g. hazard pointers, percpu counters etc)
>> 
>> That's fair, but then it should be `pub(crate)`.
>> 
>
> Fine by me, but please see below.
>
>> >> do our usual due diligence and work out a safe abstraction. Only fall
>> >> back to unsafe if it isn't possible.
>> >> 
>> >
>> > All I'm saying is instead of figuring out a safe abstraction at first,
>> > we should probably focus on identifying how to implement it and which
>> > part is really unsafe and the safety requirement for that.
>> 
>> Yeah. But then we should do that before merging :)
>> 
>
> Well, who's talknig about merging? ;-) I thought we just began reviewing
> here ;-)

I understand [PATCH] emails as "I want to merge this" and [RFC PATCH] as
"I want to talk about merging this". It might be that I haven't seen the
RFC patch series, because I often mute those.

>> >> I'm not familiar with percpu, but from the name I assumed that it's
>> >> "just a variable for each cpu" so similar to `thread_local!`, but it's
>> >> bound to the specific cpu instead of the thread.
>> >> 
>> >> That in my mind should be rather easy to support in Rust at least with
>> >> the thread_local-style API. You just need to ensure that no reference
>> >> can escape the cpu, so we can make it `!Send` & `!Sync` + rely on klint
>> >
>> > Not really, in kernel, we have plenty of use cases that we read the
>> > other CPU's percpu variables. For example, each CPU keeps it's own
>> > counter and we sum them other in another CPU.
>> 
>> But then you need some sort of synchronization?
>> 
>
> Right, but the synchronization can exist either in the percpu operations
> themselves or outside the percpu operations. Some cases, the data types
> are small enough to fit in atomic data types, and operations are just
> load/store/cmpxchg etc, then operations on the current cpu and remote
> read will be naturally synchronized. Sometimes extra synchronization is
> needed.

Sure, so we probably want direct atomics support. What about "extra
synchronization"? Is that using locks or RCU or what else?

> Keyword find all these cases are `per_cpu_ptr()`:
>
> 	https://elixir.bootlin.com/linux/v6.15.6/A/ident/per_cpu_ptr

Could you explain to me how to find them? I can either click on one of
the files with horrible C preprocessor macros or the auto-completion in
the search bar. But that one only shows 3 suggestions `_hyp_sym`,
`_nvhe_sym` and `_to_phys` which doesn't really mean much to me.

---
Cheers,
Benno


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/5] rust: percpu: add a rust per-CPU variable test
  2025-07-16 10:32                   ` Benno Lossin
@ 2025-07-16 15:33                     ` Boqun Feng
  2025-07-16 17:21                       ` Benno Lossin
  0 siblings, 1 reply; 20+ messages in thread
From: Boqun Feng @ 2025-07-16 15:33 UTC (permalink / raw)
  To: Benno Lossin
  Cc: Mitchell Levy, Miguel Ojeda, Alex Gaynor, Gary Guo,
	Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
	Andrew Morton, Dennis Zhou, Tejun Heo, Christoph Lameter,
	Danilo Krummrich, linux-kernel, rust-for-linux, linux-mm

On Wed, Jul 16, 2025 at 12:32:04PM +0200, Benno Lossin wrote:
> On Tue Jul 15, 2025 at 11:34 PM CEST, Boqun Feng wrote:
> > On Tue, Jul 15, 2025 at 07:44:01PM +0200, Benno Lossin wrote:
> > [...]
> >> >> >
> >> >> > First of all, `thread_local!` has to be implemented by some sys-specific
> >> >> > unsafe mechanism, right? For example on unix, I think it's using
> >> >> > pthread_key_t:
> >> >> >
> >> >> > 	https://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_key_create.html
> >> >> >
> >> >> > what we are implementing (or wrapping) is the very basic unsafe
> >> >> > mechanism for percpu here. Surely we can explore the design for a safe
> >> >> > API, but the unsafe mechanism is probably necessary to look into at
> >> >> > first.
> >> >> 
> >> >> But this is intended to be used by drivers, right? If so, then we should
> >> >
> >> > Not necessarily only for drivers, we can also use it for implementing
> >> > other safe abstraction (e.g. hazard pointers, percpu counters etc)
> >> 
> >> That's fair, but then it should be `pub(crate)`.
> >> 
> >
> > Fine by me, but please see below.
> >
> >> >> do our usual due diligence and work out a safe abstraction. Only fall
> >> >> back to unsafe if it isn't possible.
> >> >> 
> >> >
> >> > All I'm saying is instead of figuring out a safe abstraction at first,
> >> > we should probably focus on identifying how to implement it and which
> >> > part is really unsafe and the safety requirement for that.
> >> 
> >> Yeah. But then we should do that before merging :)
> >> 
> >
> > Well, who's talknig about merging? ;-) I thought we just began reviewing
> > here ;-)
> 
> I understand [PATCH] emails as "I want to merge this" and [RFC PATCH] as

But it doesn't mean "merge as it is", right? I don't think either I or
Mitchell implied that, I'm surprised that you had to mention that, also
based on "I often mute those" below, making it "[PATCH]" seems to be a
practical way to get more attention if one wants to get some reviews.

> "I want to talk about merging this". It might be that I haven't seen the
> RFC patch series, because I often mute those.
> 

Well, then you cannot blame people to move from "RFC PATCH" to "PATCH"
stage for more reviews, right? And you cannot make rules about what the
difference between [PATCH] and [RFC PATCH] if you ignore one of them ;-)

> >> >> I'm not familiar with percpu, but from the name I assumed that it's
> >> >> "just a variable for each cpu" so similar to `thread_local!`, but it's
> >> >> bound to the specific cpu instead of the thread.
> >> >> 
> >> >> That in my mind should be rather easy to support in Rust at least with
> >> >> the thread_local-style API. You just need to ensure that no reference
> >> >> can escape the cpu, so we can make it `!Send` & `!Sync` + rely on klint
> >> >
> >> > Not really, in kernel, we have plenty of use cases that we read the
> >> > other CPU's percpu variables. For example, each CPU keeps it's own
> >> > counter and we sum them other in another CPU.
> >> 
> >> But then you need some sort of synchronization?
> >> 
> >
> > Right, but the synchronization can exist either in the percpu operations
> > themselves or outside the percpu operations. Some cases, the data types
> > are small enough to fit in atomic data types, and operations are just
> > load/store/cmpxchg etc, then operations on the current cpu and remote
> > read will be naturally synchronized. Sometimes extra synchronization is
> > needed.
> 
> Sure, so we probably want direct atomics support. What about "extra
> synchronization"? Is that using locks or RCU or what else?
> 

It's up to the users obviously, It could be some sort of locking or RCU,
it's case by case.

> > Keyword find all these cases are `per_cpu_ptr()`:
> >
> > 	https://elixir.bootlin.com/linux/v6.15.6/A/ident/per_cpu_ptr
> 
> Could you explain to me how to find them? I can either click on one of
> the files with horrible C preprocessor macros or the auto-completion in
> the search bar. But that one only shows 3 suggestions `_hyp_sym`,
> `_nvhe_sym` and `_to_phys` which doesn't really mean much to me.
> 

You need to find the usage of `per_cpu_ptr()`, which is a function that
gives you a pointer to a percpu variable on the other CPU, and then
that's usually the case where a "remote" read of percpu variable
happens.

Regards,
Boqun

> ---
> Cheers,
> Benno


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/5] rust: percpu: add a rust per-CPU variable test
  2025-07-16 15:33                     ` Boqun Feng
@ 2025-07-16 17:21                       ` Benno Lossin
  2025-07-16 17:52                         ` Boqun Feng
  0 siblings, 1 reply; 20+ messages in thread
From: Benno Lossin @ 2025-07-16 17:21 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Mitchell Levy, Miguel Ojeda, Alex Gaynor, Gary Guo,
	Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
	Andrew Morton, Dennis Zhou, Tejun Heo, Christoph Lameter,
	Danilo Krummrich, linux-kernel, rust-for-linux, linux-mm

On Wed Jul 16, 2025 at 5:33 PM CEST, Boqun Feng wrote:
> On Wed, Jul 16, 2025 at 12:32:04PM +0200, Benno Lossin wrote:
>> On Tue Jul 15, 2025 at 11:34 PM CEST, Boqun Feng wrote:
>> > On Tue, Jul 15, 2025 at 07:44:01PM +0200, Benno Lossin wrote:
>> > [...]
>> >> >> >
>> >> >> > First of all, `thread_local!` has to be implemented by some sys-specific
>> >> >> > unsafe mechanism, right? For example on unix, I think it's using
>> >> >> > pthread_key_t:
>> >> >> >
>> >> >> > 	https://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_key_create.html
>> >> >> >
>> >> >> > what we are implementing (or wrapping) is the very basic unsafe
>> >> >> > mechanism for percpu here. Surely we can explore the design for a safe
>> >> >> > API, but the unsafe mechanism is probably necessary to look into at
>> >> >> > first.
>> >> >> 
>> >> >> But this is intended to be used by drivers, right? If so, then we should
>> >> >
>> >> > Not necessarily only for drivers, we can also use it for implementing
>> >> > other safe abstraction (e.g. hazard pointers, percpu counters etc)
>> >> 
>> >> That's fair, but then it should be `pub(crate)`.
>> >> 
>> >
>> > Fine by me, but please see below.
>> >
>> >> >> do our usual due diligence and work out a safe abstraction. Only fall
>> >> >> back to unsafe if it isn't possible.
>> >> >> 
>> >> >
>> >> > All I'm saying is instead of figuring out a safe abstraction at first,
>> >> > we should probably focus on identifying how to implement it and which
>> >> > part is really unsafe and the safety requirement for that.
>> >> 
>> >> Yeah. But then we should do that before merging :)
>> >> 
>> >
>> > Well, who's talknig about merging? ;-) I thought we just began reviewing
>> > here ;-)
>> 
>> I understand [PATCH] emails as "I want to merge this" and [RFC PATCH] as
>
> But it doesn't mean "merge as it is", right? I don't think either I or
> Mitchell implied that, I'm surprised that you had to mention that,

Yeah that is true, but it at least shows the intention :)

> also based on "I often mute those" below, making it "[PATCH]" seems to
> be a practical way to get more attention if one wants to get some
> reviews.

That is true, I do usually read the titles of RFC patches though and
sometimes take a look eg your atomics series.

>> "I want to talk about merging this". It might be that I haven't seen the
>> RFC patch series, because I often mute those.
>> 
>
> Well, then you cannot blame people to move from "RFC PATCH" to "PATCH"
> stage for more reviews, right? And you cannot make rules about what the
> difference between [PATCH] and [RFC PATCH] if you ignore one of them ;-)

I'm not trying to blame anyone. I saw a lot of unsafe in the example and
thought "we can do better" and since I haven't heard any sufficient
arguments showing that it's impossible to improve, we should do some
design work.

>> >> >> I'm not familiar with percpu, but from the name I assumed that it's
>> >> >> "just a variable for each cpu" so similar to `thread_local!`, but it's
>> >> >> bound to the specific cpu instead of the thread.
>> >> >> 
>> >> >> That in my mind should be rather easy to support in Rust at least with
>> >> >> the thread_local-style API. You just need to ensure that no reference
>> >> >> can escape the cpu, so we can make it `!Send` & `!Sync` + rely on klint
>> >> >
>> >> > Not really, in kernel, we have plenty of use cases that we read the
>> >> > other CPU's percpu variables. For example, each CPU keeps it's own
>> >> > counter and we sum them other in another CPU.
>> >> 
>> >> But then you need some sort of synchronization?
>> >> 
>> >
>> > Right, but the synchronization can exist either in the percpu operations
>> > themselves or outside the percpu operations. Some cases, the data types
>> > are small enough to fit in atomic data types, and operations are just
>> > load/store/cmpxchg etc, then operations on the current cpu and remote
>> > read will be naturally synchronized. Sometimes extra synchronization is
>> > needed.
>> 
>> Sure, so we probably want direct atomics support. What about "extra
>> synchronization"? Is that using locks or RCU or what else?
>> 
>
> It's up to the users obviously, It could be some sort of locking or RCU,
> it's case by case.

Makes sense, what do you need in the VMS driver?

>> > Keyword find all these cases are `per_cpu_ptr()`:
>> >
>> > 	https://elixir.bootlin.com/linux/v6.15.6/A/ident/per_cpu_ptr
>> 
>> Could you explain to me how to find them? I can either click on one of
>> the files with horrible C preprocessor macros or the auto-completion in
>> the search bar. But that one only shows 3 suggestions `_hyp_sym`,
>> `_nvhe_sym` and `_to_phys` which doesn't really mean much to me.
>> 
>
> You need to find the usage of `per_cpu_ptr()`, which is a function that
> gives you a pointer to a percpu variable on the other CPU, and then
> that's usually the case where a "remote" read of percpu variable
> happens.

Ahh gotcha, I thought you pointed me to some definitions of operations
on percpu pointers.

---
Cheers,
Benno


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/5] rust: percpu: add a rust per-CPU variable test
  2025-07-16 17:21                       ` Benno Lossin
@ 2025-07-16 17:52                         ` Boqun Feng
  2025-07-16 18:22                           ` Benno Lossin
  0 siblings, 1 reply; 20+ messages in thread
From: Boqun Feng @ 2025-07-16 17:52 UTC (permalink / raw)
  To: Benno Lossin
  Cc: Mitchell Levy, Miguel Ojeda, Alex Gaynor, Gary Guo,
	Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
	Andrew Morton, Dennis Zhou, Tejun Heo, Christoph Lameter,
	Danilo Krummrich, linux-kernel, rust-for-linux, linux-mm

On Wed, Jul 16, 2025 at 07:21:32PM +0200, Benno Lossin wrote:
> On Wed Jul 16, 2025 at 5:33 PM CEST, Boqun Feng wrote:
> > On Wed, Jul 16, 2025 at 12:32:04PM +0200, Benno Lossin wrote:
> >> On Tue Jul 15, 2025 at 11:34 PM CEST, Boqun Feng wrote:
> >> > On Tue, Jul 15, 2025 at 07:44:01PM +0200, Benno Lossin wrote:
> >> > [...]
> >> >> >> >
> >> >> >> > First of all, `thread_local!` has to be implemented by some sys-specific
> >> >> >> > unsafe mechanism, right? For example on unix, I think it's using
> >> >> >> > pthread_key_t:
> >> >> >> >
> >> >> >> > 	https://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_key_create.html
> >> >> >> >
> >> >> >> > what we are implementing (or wrapping) is the very basic unsafe
> >> >> >> > mechanism for percpu here. Surely we can explore the design for a safe
> >> >> >> > API, but the unsafe mechanism is probably necessary to look into at
> >> >> >> > first.
> >> >> >> 
> >> >> >> But this is intended to be used by drivers, right? If so, then we should
> >> >> >
> >> >> > Not necessarily only for drivers, we can also use it for implementing
> >> >> > other safe abstraction (e.g. hazard pointers, percpu counters etc)
> >> >> 
> >> >> That's fair, but then it should be `pub(crate)`.
> >> >> 
> >> >
> >> > Fine by me, but please see below.
> >> >
> >> >> >> do our usual due diligence and work out a safe abstraction. Only fall
> >> >> >> back to unsafe if it isn't possible.
> >> >> >> 
> >> >> >
> >> >> > All I'm saying is instead of figuring out a safe abstraction at first,
> >> >> > we should probably focus on identifying how to implement it and which
> >> >> > part is really unsafe and the safety requirement for that.
> >> >> 
> >> >> Yeah. But then we should do that before merging :)
> >> >> 
> >> >
> >> > Well, who's talknig about merging? ;-) I thought we just began reviewing
> >> > here ;-)
> >> 
> >> I understand [PATCH] emails as "I want to merge this" and [RFC PATCH] as
> >
> > But it doesn't mean "merge as it is", right? I don't think either I or
> > Mitchell implied that, I'm surprised that you had to mention that,
> 
> Yeah that is true, but it at least shows the intention :)
> 
> > also based on "I often mute those" below, making it "[PATCH]" seems to
> > be a practical way to get more attention if one wants to get some
> > reviews.
> 
> That is true, I do usually read the titles of RFC patches though and
> sometimes take a look eg your atomics series.
> 
> >> "I want to talk about merging this". It might be that I haven't seen the
> >> RFC patch series, because I often mute those.
> >> 
> >
> > Well, then you cannot blame people to move from "RFC PATCH" to "PATCH"
> > stage for more reviews, right? And you cannot make rules about what the
> > difference between [PATCH] and [RFC PATCH] if you ignore one of them ;-)
> 
> I'm not trying to blame anyone. I saw a lot of unsafe in the example and
> thought "we can do better" and since I haven't heard any sufficient
> arguments showing that it's impossible to improve, we should do some
> design work.
> 

I agree with you, and I like what you're proposing, but I think design
work can be done at "PATCH" stage, right? And sometimes, it's also OK to
do some design work even at some version like "v12" ;-)

Also I want to see more forward-progress actions about the design work
improvement. For example, we can examine every case that makes
unsafe_get_per_cpu!() unsafe, and see if we can improve that by typing
or something else. We always can "do better", but the important part is
how to get there ;-)

> >> >> >> I'm not familiar with percpu, but from the name I assumed that it's
> >> >> >> "just a variable for each cpu" so similar to `thread_local!`, but it's
> >> >> >> bound to the specific cpu instead of the thread.
> >> >> >> 
> >> >> >> That in my mind should be rather easy to support in Rust at least with
> >> >> >> the thread_local-style API. You just need to ensure that no reference
> >> >> >> can escape the cpu, so we can make it `!Send` & `!Sync` + rely on klint
> >> >> >
> >> >> > Not really, in kernel, we have plenty of use cases that we read the
> >> >> > other CPU's percpu variables. For example, each CPU keeps it's own
> >> >> > counter and we sum them other in another CPU.
> >> >> 
> >> >> But then you need some sort of synchronization?
> >> >> 
> >> >
> >> > Right, but the synchronization can exist either in the percpu operations
> >> > themselves or outside the percpu operations. Some cases, the data types
> >> > are small enough to fit in atomic data types, and operations are just
> >> > load/store/cmpxchg etc, then operations on the current cpu and remote
> >> > read will be naturally synchronized. Sometimes extra synchronization is
> >> > needed.
> >> 
> >> Sure, so we probably want direct atomics support. What about "extra
> >> synchronization"? Is that using locks or RCU or what else?
> >> 
> >
> > It's up to the users obviously, It could be some sort of locking or RCU,
> > it's case by case.
> 
> Makes sense, what do you need in the VMS driver?
> 

In VMBus driver, it's actually isolate, i.e. each CPU only access it's
own work_struct, so synchronization between CPUs is not needed.

Regards,
Boqun

> >> > Keyword find all these cases are `per_cpu_ptr()`:
> >> >
> >> > 	https://elixir.bootlin.com/linux/v6.15.6/A/ident/per_cpu_ptr
> >> 
> >> Could you explain to me how to find them? I can either click on one of
> >> the files with horrible C preprocessor macros or the auto-completion in
> >> the search bar. But that one only shows 3 suggestions `_hyp_sym`,
> >> `_nvhe_sym` and `_to_phys` which doesn't really mean much to me.
> >> 
> >
> > You need to find the usage of `per_cpu_ptr()`, which is a function that
> > gives you a pointer to a percpu variable on the other CPU, and then
> > that's usually the case where a "remote" read of percpu variable
> > happens.
> 
> Ahh gotcha, I thought you pointed me to some definitions of operations
> on percpu pointers.
> 
> ---
> Cheers,
> Benno


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/5] rust: percpu: add a rust per-CPU variable test
  2025-07-16 17:52                         ` Boqun Feng
@ 2025-07-16 18:22                           ` Benno Lossin
  0 siblings, 0 replies; 20+ messages in thread
From: Benno Lossin @ 2025-07-16 18:22 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Mitchell Levy, Miguel Ojeda, Alex Gaynor, Gary Guo,
	Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
	Andrew Morton, Dennis Zhou, Tejun Heo, Christoph Lameter,
	Danilo Krummrich, linux-kernel, rust-for-linux, linux-mm

On Wed Jul 16, 2025 at 7:52 PM CEST, Boqun Feng wrote:
> On Wed, Jul 16, 2025 at 07:21:32PM +0200, Benno Lossin wrote:
>> On Wed Jul 16, 2025 at 5:33 PM CEST, Boqun Feng wrote:
>> > On Wed, Jul 16, 2025 at 12:32:04PM +0200, Benno Lossin wrote:
>> >> On Tue Jul 15, 2025 at 11:34 PM CEST, Boqun Feng wrote:
>> >> > On Tue, Jul 15, 2025 at 07:44:01PM +0200, Benno Lossin wrote:
>> >> > [...]
>> >> >> >> >
>> >> >> >> > First of all, `thread_local!` has to be implemented by some sys-specific
>> >> >> >> > unsafe mechanism, right? For example on unix, I think it's using
>> >> >> >> > pthread_key_t:
>> >> >> >> >
>> >> >> >> > 	https://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_key_create.html
>> >> >> >> >
>> >> >> >> > what we are implementing (or wrapping) is the very basic unsafe
>> >> >> >> > mechanism for percpu here. Surely we can explore the design for a safe
>> >> >> >> > API, but the unsafe mechanism is probably necessary to look into at
>> >> >> >> > first.
>> >> >> >> 
>> >> >> >> But this is intended to be used by drivers, right? If so, then we should
>> >> >> >
>> >> >> > Not necessarily only for drivers, we can also use it for implementing
>> >> >> > other safe abstraction (e.g. hazard pointers, percpu counters etc)
>> >> >> 
>> >> >> That's fair, but then it should be `pub(crate)`.
>> >> >> 
>> >> >
>> >> > Fine by me, but please see below.
>> >> >
>> >> >> >> do our usual due diligence and work out a safe abstraction. Only fall
>> >> >> >> back to unsafe if it isn't possible.
>> >> >> >> 
>> >> >> >
>> >> >> > All I'm saying is instead of figuring out a safe abstraction at first,
>> >> >> > we should probably focus on identifying how to implement it and which
>> >> >> > part is really unsafe and the safety requirement for that.
>> >> >> 
>> >> >> Yeah. But then we should do that before merging :)
>> >> >> 
>> >> >
>> >> > Well, who's talknig about merging? ;-) I thought we just began reviewing
>> >> > here ;-)
>> >> 
>> >> I understand [PATCH] emails as "I want to merge this" and [RFC PATCH] as
>> >
>> > But it doesn't mean "merge as it is", right? I don't think either I or
>> > Mitchell implied that, I'm surprised that you had to mention that,
>> 
>> Yeah that is true, but it at least shows the intention :)
>> 
>> > also based on "I often mute those" below, making it "[PATCH]" seems to
>> > be a practical way to get more attention if one wants to get some
>> > reviews.
>> 
>> That is true, I do usually read the titles of RFC patches though and
>> sometimes take a look eg your atomics series.
>> 
>> >> "I want to talk about merging this". It might be that I haven't seen the
>> >> RFC patch series, because I often mute those.
>> >> 
>> >
>> > Well, then you cannot blame people to move from "RFC PATCH" to "PATCH"
>> > stage for more reviews, right? And you cannot make rules about what the
>> > difference between [PATCH] and [RFC PATCH] if you ignore one of them ;-)
>> 
>> I'm not trying to blame anyone. I saw a lot of unsafe in the example and
>> thought "we can do better" and since I haven't heard any sufficient
>> arguments showing that it's impossible to improve, we should do some
>> design work.
>> 
>
> I agree with you, and I like what you're proposing, but I think design
> work can be done at "PATCH" stage, right? And sometimes, it's also OK to
> do some design work even at some version like "v12" ;-)

Yeah of course. The thing is just that nobody asked why there was unsafe
and thus I got the impression that people thought this would be a good
abstraction for percpu. (don't take from this that it's bad :)

> Also I want to see more forward-progress actions about the design work
> improvement. For example, we can examine every case that makes
> unsafe_get_per_cpu!() unsafe, and see if we can improve that by typing
> or something else. We always can "do better", but the important part is
> how to get there ;-)

Yeah that would be a starting point :)

>> >> >> >> I'm not familiar with percpu, but from the name I assumed that it's
>> >> >> >> "just a variable for each cpu" so similar to `thread_local!`, but it's
>> >> >> >> bound to the specific cpu instead of the thread.
>> >> >> >> 
>> >> >> >> That in my mind should be rather easy to support in Rust at least with
>> >> >> >> the thread_local-style API. You just need to ensure that no reference
>> >> >> >> can escape the cpu, so we can make it `!Send` & `!Sync` + rely on klint
>> >> >> >
>> >> >> > Not really, in kernel, we have plenty of use cases that we read the
>> >> >> > other CPU's percpu variables. For example, each CPU keeps it's own
>> >> >> > counter and we sum them other in another CPU.
>> >> >> 
>> >> >> But then you need some sort of synchronization?
>> >> >> 
>> >> >
>> >> > Right, but the synchronization can exist either in the percpu operations
>> >> > themselves or outside the percpu operations. Some cases, the data types
>> >> > are small enough to fit in atomic data types, and operations are just
>> >> > load/store/cmpxchg etc, then operations on the current cpu and remote
>> >> > read will be naturally synchronized. Sometimes extra synchronization is
>> >> > needed.
>> >> 
>> >> Sure, so we probably want direct atomics support. What about "extra
>> >> synchronization"? Is that using locks or RCU or what else?
>> >> 
>> >
>> > It's up to the users obviously, It could be some sort of locking or RCU,
>> > it's case by case.
>> 
>> Makes sense, what do you need in the VMS driver?
>> 
>
> In VMBus driver, it's actually isolate, i.e. each CPU only access it's
> own work_struct, so synchronization between CPUs is not needed.

I see, so we could either just start out with no sync support or --
which I would prefer -- get a list of the most common use-cases and
implement those too (or at least design the first part compatibly with
further extensions).

---
Cheers,
Benno


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/5] rust: percpu: add a rust per-CPU variable test
  2025-07-15 17:44               ` Benno Lossin
  2025-07-15 21:34                 ` Boqun Feng
@ 2025-07-16 15:35                 ` Boqun Feng
  1 sibling, 0 replies; 20+ messages in thread
From: Boqun Feng @ 2025-07-16 15:35 UTC (permalink / raw)
  To: Benno Lossin
  Cc: Mitchell Levy, Miguel Ojeda, Alex Gaynor, Gary Guo,
	Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
	Andrew Morton, Dennis Zhou, Tejun Heo, Christoph Lameter,
	Danilo Krummrich, linux-kernel, rust-for-linux, linux-mm

On Tue, Jul 15, 2025 at 07:44:01PM +0200, Benno Lossin wrote:
[...]
> >> 
> >> Sure we can do some experimentation, but I don't think we should put
> >> unsafe abstractions upstream that we intend to replace with a safe
> >> abstraction later. Otherwise people are going to depend on it and it's
> >
> > I doubt we can replace the unsafe abstraction with a safe one, if users
> > really care the performance then they would really need to use some
> > unsafe API to build their safe abstraction.
> 
> That sounds pretty pessimistic, why do you think that?
> 

I could ask you the similar, you barely know the implementation and
usage the percpu, why do you think it's possible? ;-)

Regards,
Boqun


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 4/5] rust: percpu: Add pin-hole optimizations for numerics
  2025-07-12 21:31 [PATCH v2 0/5] rust: Add Per-CPU Variable API Mitchell Levy
                   ` (2 preceding siblings ...)
  2025-07-12 21:31 ` [PATCH v2 3/5] rust: percpu: add a rust per-CPU variable test Mitchell Levy
@ 2025-07-12 21:31 ` Mitchell Levy
  2025-07-12 21:31 ` [PATCH v2 5/5] rust: percpu: cache per-CPU pointers in the dynamic case Mitchell Levy
  4 siblings, 0 replies; 20+ messages in thread
From: Mitchell Levy @ 2025-07-12 21:31 UTC (permalink / raw)
  To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
	Andrew Morton, Dennis Zhou, Tejun Heo, Christoph Lameter,
	Danilo Krummrich, Benno Lossin
  Cc: linux-kernel, rust-for-linux, linux-mm, Mitchell Levy

The C implementations of `this_cpu_add`, `this_cpu_sub`, etc., are
optimized to save an instruction by avoiding having to compute
`this_cpu_ptr(&x)` for some per-CPU variable `x`. For example, rather
than

    u64 *x_ptr = this_cpu_ptr(&x);
    *x_ptr += 5;

the implementation of `this_cpu_add` is clever enough to make use of the
fact that per-CPU variables are implemented on x86 via segment
registers, and so we can use only a single instruction (where we assume
`&x` is already in `rax`)

    add gs:[rax], 5

Add this optimization via a `PerCpuNumeric` type to enable code-reuse
between `DynamicPerCpu` and `StaticPerCpu`.

Signed-off-by: Mitchell Levy <levymitchell0@gmail.com>
---
 lib/percpu_test_rust.rs       |  36 +++++++++++++
 rust/kernel/percpu.rs         |   1 +
 rust/kernel/percpu/numeric.rs | 117 ++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 154 insertions(+)

diff --git a/lib/percpu_test_rust.rs b/lib/percpu_test_rust.rs
index a9652e6ece08..114015435a85 100644
--- a/lib/percpu_test_rust.rs
+++ b/lib/percpu_test_rust.rs
@@ -25,6 +25,26 @@
 define_per_cpu!(PERCPU: i64 = 0);
 define_per_cpu!(UPERCPU: u64 = 0);
 
+macro_rules! make_optimization_test {
+    ($ty:ty) => {
+        let mut test: DynamicPerCpu<$ty> = DynamicPerCpu::new(GFP_KERNEL).unwrap();
+        {
+            let _ = CpuGuard::new();
+            // SAFETY: No other usage of `test`
+            unsafe { test.get(CpuGuard::new()) }.with(|val: &mut $ty| *val = 10);
+            test.num().add(1);
+            // SAFETY: No other usage of `test`
+            unsafe { test.get(CpuGuard::new()) }.with(|val: &mut $ty| assert_eq!(*val, 11));
+            test.num().add(10);
+            // SAFETY: No other usage of `test`
+            unsafe { test.get(CpuGuard::new()) }.with(|val: &mut $ty| assert_eq!(*val, 21));
+            test.num().sub(5);
+            // SAFETY: No other usage of `test`
+            unsafe { test.get(CpuGuard::new()) }.with(|val: &mut $ty| assert_eq!(*val, 16));
+        }
+    };
+}
+
 impl kernel::Module for PerCpuTestModule {
     fn init(_module: &'static ThisModule) -> Result<Self, Error> {
         pr_info!("rust percpu test start\n");
@@ -94,6 +114,22 @@ fn init(_module: &'static ThisModule) -> Result<Self, Error> {
 
         pr_info!("rust dynamic percpu test done\n");
 
+        pr_info!("rust numeric optimizations test start\n");
+
+        make_optimization_test!(u8);
+        make_optimization_test!(u16);
+        make_optimization_test!(u32);
+        make_optimization_test!(u64);
+        make_optimization_test!(usize);
+
+        make_optimization_test!(i8);
+        make_optimization_test!(i16);
+        make_optimization_test!(i32);
+        make_optimization_test!(i64);
+        make_optimization_test!(isize);
+
+        pr_info!("rust numeric optimizations test done\n");
+
         // Return Err to unload the module
         Result::Err(EINVAL)
     }
diff --git a/rust/kernel/percpu.rs b/rust/kernel/percpu.rs
index 7dfceb6aefd7..b97d1d07a614 100644
--- a/rust/kernel/percpu.rs
+++ b/rust/kernel/percpu.rs
@@ -2,6 +2,7 @@
 //! This module contains abstractions for creating and using per-CPU variables from Rust.
 //! See the define_per_cpu! macro and the DynamicPerCpu<T> type, as well as the PerCpu<T> trait.
 pub mod cpu_guard;
+pub mod numeric;
 
 use bindings::{alloc_percpu, free_percpu};
 
diff --git a/rust/kernel/percpu/numeric.rs b/rust/kernel/percpu/numeric.rs
new file mode 100644
index 000000000000..e4008f872af1
--- /dev/null
+++ b/rust/kernel/percpu/numeric.rs
@@ -0,0 +1,117 @@
+// SPDX-License-Identifier: GPL-2.0
+//! Pin-hole optimizations for PerCpu<T> where T is a numeric type.
+
+use crate::percpu::*;
+use core::arch::asm;
+
+/// Represents a per-CPU variable that can be manipulated with machine-intrinsic numeric
+/// operations.
+pub struct PerCpuNumeric<'a, T> {
+    ptr: &'a PerCpuPtr<T>,
+}
+
+macro_rules! impl_ops {
+    ($ty:ty, $reg:tt) => {
+        impl DynamicPerCpu<$ty> {
+            /// Returns a `PerCpuNumeric` that can be used to manipulate the underlying per-CPU variable.
+            pub fn num(&self) -> PerCpuNumeric<'_, $ty> {
+                PerCpuNumeric { ptr: &self.alloc.0 }
+            }
+        }
+        impl StaticPerCpu<$ty> {
+            /// Returns a `PerCpuNumeric` that can be used to manipulate the underlying per-CPU variable.
+            pub fn num(&self) -> PerCpuNumeric<'_, $ty> {
+                PerCpuNumeric { ptr: &self.0 }
+            }
+        }
+
+        impl PerCpuNumeric<'_, $ty> {
+            /// Adds `rhs` to the per-CPU variable.
+            pub fn add(&mut self, rhs: $ty) {
+                // SAFETY: `self.ptr.0` is a valid offset into the per-CPU area (i.e., valid as a
+                // pointer relative to the `gs` segment register) by the invariants of PerCpu.
+                unsafe {
+                    asm!(
+                        concat!("add gs:[{off}], {val:", $reg, "}"),
+                        off = in(reg) self.ptr.0 as *mut $ty,
+                        val = in(reg) rhs,
+                    );
+                }
+            }
+        }
+        impl PerCpuNumeric<'_, $ty> {
+            /// Subtracts `rhs` from the per-CPU variable.
+            pub fn sub(&mut self, rhs: $ty) {
+                // SAFETY: `self.ptr.0` is a valid offset into the per-CPU area (i.e., valid as a
+                // pointer relative to the `gs` segment register) by the invariants of PerCpu.
+                unsafe {
+                    asm!(
+                        concat!("sub gs:[{off}], {val:", $reg, "}"),
+                        off = in(reg) self.ptr.0 as *mut $ty,
+                        val = in(reg) rhs,
+                    );
+                }
+            }
+        }
+    };
+}
+
+macro_rules! impl_ops_byte {
+    ($ty:ty) => {
+        impl DynamicPerCpu<$ty> {
+            /// Returns a `PerCpuNumeric` that can be used to manipulate the underlying per-CPU
+            /// variable.
+            pub fn num(&self) -> PerCpuNumeric<'_, $ty> {
+                PerCpuNumeric { ptr: &self.alloc.0 }
+            }
+        }
+        impl StaticPerCpu<$ty> {
+            /// Returns a `PerCpuNumeric` that can be used to manipulate the underlying per-CPU
+            /// variable.
+            pub fn num(&self) -> PerCpuNumeric<'_, $ty> {
+                PerCpuNumeric { ptr: &self.0 }
+            }
+        }
+
+        impl PerCpuNumeric<'_, $ty> {
+            /// Adds `rhs` to the per-CPU variable.
+            pub fn add(&mut self, rhs: $ty) {
+                // SAFETY: `self.ptr.0` is a valid offset into the per-CPU area (i.e., valid as a
+                // pointer relative to the `gs` segment register) by the invariants of PerCpu.
+                unsafe {
+                    asm!(
+                        concat!("add gs:[{off}], {val}"),
+                        off = in(reg) self.ptr.0 as *mut $ty,
+                        val = in(reg_byte) rhs,
+                    );
+                }
+            }
+        }
+        impl PerCpuNumeric<'_, $ty> {
+            /// Subtracts `rhs` from the per-CPU variable.
+            pub fn sub(&mut self, rhs: $ty) {
+                // SAFETY: `self.ptr.0` is a valid offset into the per-CPU area (i.e., valid as a
+                // pointer relative to the `gs` segment register) by the invariants of PerCpu.
+                unsafe {
+                    asm!(
+                        concat!("sub gs:[{off}], {val}"),
+                        off = in(reg) self.ptr.0 as *mut $ty,
+                        val = in(reg_byte) rhs,
+                    );
+                }
+            }
+        }
+    };
+}
+
+impl_ops_byte!(i8);
+impl_ops!(i16, "x");
+impl_ops!(i32, "e");
+impl_ops!(i64, "r");
+impl_ops!(isize, "r");
+
+impl_ops_byte!(u8);
+impl_ops!(u16, "x");
+impl_ops!(u32, "e");
+impl_ops!(u64, "r");
+impl_ops!(usize, "r");

-- 
2.34.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 5/5] rust: percpu: cache per-CPU pointers in the dynamic case
  2025-07-12 21:31 [PATCH v2 0/5] rust: Add Per-CPU Variable API Mitchell Levy
                   ` (3 preceding siblings ...)
  2025-07-12 21:31 ` [PATCH v2 4/5] rust: percpu: Add pin-hole optimizations for numerics Mitchell Levy
@ 2025-07-12 21:31 ` Mitchell Levy
  4 siblings, 0 replies; 20+ messages in thread
From: Mitchell Levy @ 2025-07-12 21:31 UTC (permalink / raw)
  To: Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
	Andrew Morton, Dennis Zhou, Tejun Heo, Christoph Lameter,
	Danilo Krummrich, Benno Lossin
  Cc: linux-kernel, rust-for-linux, linux-mm, Mitchell Levy

Currently, the creation of a `PerCpuNumeric` requires a memory read via
the `Arc` managing the dynamic allocation. While the compiler might be
clever enough to consolidate these reads in some cases, the read must
happen *somewhere*, which, when we're concerning ourselves with
individual instructions, is a very high burden.

Instead, cache the `PerCpuPointer` inside the `DynamicPerCpu` structure;
then, the `Arc` is used solely to manage the allocation.

Signed-off-by: Mitchell Levy <levymitchell0@gmail.com>
---
 rust/kernel/percpu.rs         | 12 +++++++++---
 rust/kernel/percpu/numeric.rs |  2 +-
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/rust/kernel/percpu.rs b/rust/kernel/percpu.rs
index b97d1d07a614..7458fe413f25 100644
--- a/rust/kernel/percpu.rs
+++ b/rust/kernel/percpu.rs
@@ -26,7 +26,10 @@
 
 /// Holds a dynamically-allocated per-CPU variable.
 pub struct DynamicPerCpu<T> {
+    // INVARIANT: `ptr` is managed by `alloc` and the value of `ptr` does not change for the
+    // lifetime of `self`.
     alloc: Arc<PerCpuAllocation<T>>,
+    ptr: PerCpuPtr<T>,
 }
 
 /// Holds a statically-allocated per-CPU variable.
@@ -204,9 +207,10 @@ impl<T: Zeroable> DynamicPerCpu<T> {
     pub fn new(flags: Flags) -> Option<Self> {
         let alloc: PerCpuAllocation<T> = PerCpuAllocation::new()?;
 
+        let ptr = alloc.0;
         let arc = Arc::new(alloc, flags).ok()?;
 
-        Some(Self { alloc: arc })
+        Some(Self { alloc: arc, ptr })
     }
 }
 
@@ -217,8 +221,9 @@ impl<T> DynamicPerCpu<T> {
     /// * `alloc` - The allocation to use
     /// * `flags` - The flags used to allocate an `Arc` that keeps track of the `PerCpuAllocation`.
     pub fn new_from_allocation(alloc: PerCpuAllocation<T>, flags: Flags) -> Option<Self> {
+        let ptr = alloc.0;
         let arc = Arc::new(alloc, flags).ok()?;
-        Some(Self { alloc: arc })
+        Some(Self { alloc: arc, ptr })
     }
 }
 
@@ -226,7 +231,7 @@ pub fn new_from_allocation(alloc: PerCpuAllocation<T>, flags: Flags) -> Option<S
 // don't deallocate the underlying `PerCpuAllocation` until `self` is dropped.
 unsafe impl<T> PerCpu<T> for DynamicPerCpu<T> {
     unsafe fn ptr(&mut self) -> &PerCpuPtr<T> {
-        &self.alloc.0
+        &self.ptr
     }
 }
 
@@ -234,6 +239,7 @@ impl<T> Clone for DynamicPerCpu<T> {
     fn clone(&self) -> Self {
         Self {
             alloc: self.alloc.clone(),
+            ptr: self.ptr,
         }
     }
 }
diff --git a/rust/kernel/percpu/numeric.rs b/rust/kernel/percpu/numeric.rs
index e4008f872af1..1b37cc7e5c19 100644
--- a/rust/kernel/percpu/numeric.rs
+++ b/rust/kernel/percpu/numeric.rs
@@ -62,7 +62,7 @@ impl DynamicPerCpu<$ty> {
             /// Returns a `PerCpuNumeric` that can be used to manipulate the underlying per-CPU
             /// variable.
             pub fn num(&self) -> PerCpuNumeric<'_, $ty> {
-                PerCpuNumeric { ptr: &self.alloc.0 }
+                PerCpuNumeric { ptr: &self.ptr }
             }
         }
         impl StaticPerCpu<$ty> {

-- 
2.34.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2025-07-16 18:22 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-12 21:31 [PATCH v2 0/5] rust: Add Per-CPU Variable API Mitchell Levy
2025-07-12 21:31 ` [PATCH v2 1/5] rust: percpu: introduce a rust API for per-CPU variables Mitchell Levy
2025-07-12 21:31 ` [PATCH v2 2/5] rust: rust-analyzer: add lib to dirs searched for crates Mitchell Levy
2025-07-12 21:31 ` [PATCH v2 3/5] rust: percpu: add a rust per-CPU variable test Mitchell Levy
2025-07-13  9:30   ` Benno Lossin
2025-07-15 10:31     ` Mitchell Levy
2025-07-15 11:31       ` Benno Lossin
2025-07-15 14:10         ` Boqun Feng
2025-07-15 15:55           ` Benno Lossin
2025-07-15 16:31             ` Boqun Feng
2025-07-15 17:44               ` Benno Lossin
2025-07-15 21:34                 ` Boqun Feng
2025-07-16 10:32                   ` Benno Lossin
2025-07-16 15:33                     ` Boqun Feng
2025-07-16 17:21                       ` Benno Lossin
2025-07-16 17:52                         ` Boqun Feng
2025-07-16 18:22                           ` Benno Lossin
2025-07-16 15:35                 ` Boqun Feng
2025-07-12 21:31 ` [PATCH v2 4/5] rust: percpu: Add pin-hole optimizations for numerics Mitchell Levy
2025-07-12 21:31 ` [PATCH v2 5/5] rust: percpu: cache per-CPU pointers in the dynamic case Mitchell Levy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).