From: Vitaly Wool <vitaly.wool@konsulko.se>
To: Danilo Krummrich <dakr@kernel.org>
Cc: rust-for-linux@vger.kernel.org, linux-kernel@vger.kernel.org,
Uladzislau Rezki <urezki@gmail.com>,
Alice Ryhl <aliceryhl@google.com>,
Vlastimil Babka <vbabka@suse.cz>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
"Liam R . Howlett" <Liam.Howlett@oracle.com>,
Miguel Ojeda <ojeda@kernel.org>,
Alex Gaynor <alex.gaynor@gmail.com>,
Boqun Feng <boqun.feng@gmail.com>, Gary Guo <gary@garyguo.net>,
Bjorn Roy Baron <bjorn3_gh@protonmail.com>,
Benno Lossin <lossin@kernel.org>,
Andreas Hindborg <a.hindborg@kernel.org>,
Trevor Gross <tmgross@umich.edu>,
Johannes Weiner <hannes@cmpxchg.org>,
Yosry Ahmed <yosry.ahmed@linux.dev>,
Nhat Pham <nphamcs@gmail.com>,
linux-mm@kvack.org
Subject: Re: [PATCH v4 2/2] rust: zpool: add abstraction for zpool drivers
Date: Wed, 27 Aug 2025 22:43:09 +0200 [thread overview]
Message-ID: <b1509e7f-4817-4466-bd2b-c083f024c0d4@konsulko.se> (raw)
In-Reply-To: <DCCCRYEUVJWZ.2AUDA0DXK0XSF@kernel.org>
On 8/26/25 14:20, Danilo Krummrich wrote:
> On Sat Aug 23, 2025 at 3:05 PM CEST, Vitaly Wool wrote:
>> +/// Zpool API.
>> +///
>> +/// The [`ZpoolDriver`] trait serves as an interface for Zpool drivers implemented in Rust.
>> +/// Such drivers implement memory storage pools in accordance with the zpool API.
>> +///
>> +/// # Example
>> +///
>> +/// A zpool driver implementation which uses KVec of 2**n sizes, n = 6, 7, ..., PAGE_SHIFT.
>> +/// Every zpool object is packed into a KVec that is sufficiently large, and n (the
>> +/// denominator) is saved in the least significant bits of the handle, which is guaranteed
>> +/// to be at least 2**6 aligned by kmalloc.
>> +///
>> +/// ```
>> +/// use core::ptr::{NonNull, copy_nonoverlapping};
>> +/// use core::sync::atomic::{AtomicU64, Ordering};
>> +/// use kernel::alloc::{Flags, KBox, KVec, NumaNode};
>> +/// use kernel::page::PAGE_SHIFT;
>> +/// use kernel::prelude::EINVAL;
>> +/// use kernel::zpool::*;
>> +///
>> +/// struct MyZpool {
>> +/// name: &'static CStr,
>> +/// bytes_used: AtomicU64,
>> +/// }
>> +///
>> +/// struct MyZpoolDriver;
>> +///
>> +/// impl ZpoolDriver for MyZpoolDriver {
>> +/// type Pool = KBox<MyZpool>;
>> +///
>> +/// fn create(name: &'static CStr, gfp: Flags) -> Result<KBox<MyZpool>> {
>> +/// let my_pool = MyZpool { name, bytes_used: AtomicU64::new(0) };
>> +/// let pool = KBox::new(my_pool, gfp)?;
>> +///
>> +/// Ok(pool)
>> +/// }
>> +///
>> +/// fn destroy(p: KBox<MyZpool>) {
>> +/// drop(p);
>> +/// }
>> +///
>> +/// fn malloc(pool: &mut MyZpool, size: usize, gfp: Flags, _nid: NumaNode) -> Result<usize> {
>> +/// let mut pow: usize = 0;
>> +/// for n in 6..=PAGE_SHIFT {
>> +/// if size <= 1 << n {
>> +/// pow = n;
>> +/// break;
>> +/// }
>> +/// }
>
> Why not just use next_power_of_two()? I think the same logic could also be
> achieved with
>
> size.next_power_of_two().trailing_zeros().max(6).min(PAGE_SHIFT)
It indeed can, thanks :)
>> +/// match pow {
>> +/// 0 => Err(EINVAL),
>> +/// _ => {
>> +/// let vec = KVec::<u64>::with_capacity(1 << (pow - 3), gfp)?;
>
> Why use u64 and 1 << (pow - 3), rather than simply u8 and 1 << pow?
>
> (Btw. you could also just use VBox<u8; PAGE_SIZE>::new_uninit() for all
> allocations to keep the example simple.)
Right, that fixation on u64 doesn't help at all here.
>> +/// let (ptr, _len, _cap) = vec.into_raw_parts();
>> +/// pool.bytes_used.fetch_add(1 << pow, Ordering::Relaxed);
>> +/// Ok(ptr as usize | (pow - 6))
>> +/// }
>> +/// }
>> +/// }
>> +///
>> +/// unsafe fn free(pool: &MyZpool, handle: usize) {
>> +/// let n = (handle & 0x3F) + 3;
>> +/// let uptr = handle & !0x3F;
>> +///
>> +/// // SAFETY:
>> +/// // - uptr comes from handle which points to the KVec allocation from `alloc`.
>
> That's not true, you modified the pointer you got from KVec. Please explain why
> it is always safe to use lower 6 bits for something else.
>
> What does "`alloc`" refer to?
It refers to the alloc function we implement in this toy backend for
ZpoolDriver trait.
> NIT: `uptr`, `KVec`
>
>> +/// // - size == capacity and is coming from the first 6 bits of handle.
>> +/// let vec = unsafe { KVec::<u64>::from_raw_parts(uptr as *mut u64, 1 << n, 1 << n) };
>
> Why do you set the length (not the capacity) of the Vector to 1 << n? I think
> technically it doesn't matter, but you should explain that in the safety
> comment.
Would the following work: "we know the capacity of this vector and it is
1 << n, we set the length to the same value to be deterministic, but the
actual length of vec doesn't matter because we're dropping it right here
anyway"?
>> +/// drop(vec);
>> +/// pool.bytes_used.fetch_sub(1 << (n + 3), Ordering::Relaxed);
>> +/// }
>> +///
>> +/// unsafe fn read_begin(_pool: &MyZpool, handle: usize) -> NonNull<u8> {
>> +/// let uptr = handle & !0x3F;
>> +/// // SAFETY: uptr points to a memory area allocated by KVec
>
> Please use markdown and end sentences with a period. (Applies to the entire
> file.)
>
>> +/// unsafe { NonNull::new_unchecked(uptr as *mut u8) }
>> +/// }
>> +///
>> +/// unsafe fn read_end(_pool: &MyZpool, _handle: usize, _handle_mem: NonNull<u8>) {}
>> +///
>> +/// unsafe fn write(_p: &MyZpool, handle: usize, handle_mem: NonNull<u8>, mem_len: usize) {
>> +/// let uptr = handle & !0x3F;
>> +/// // SAFETY: handle_mem is a valid non-null pointer provided by zpool, uptr points to
>> +/// // a KVec allocated in `malloc` and is therefore also valid.
>> +/// unsafe {
>> +/// copy_nonoverlapping(handle_mem.as_ptr().cast(), uptr as *mut c_void, mem_len)
>> +/// };
>> +/// }
>> +///
>> +/// fn total_pages(pool: &MyZpool) -> u64 {
>> +/// pool.bytes_used.load(Ordering::Relaxed) >> PAGE_SHIFT
>
> I'm not sure what the semantic of this function is; the documentation says
> "Get the number of pages used by the `pool`".
>
> However, given that you give out allocations from a kmalloc() bucket in
> malloc(), this pool might be backed by more pages than what you calculate here.
Well, maybe I need to add an explicit comment about it here, but the
idea is that with the SLUB allocator, you have kmalloc-64, kmalloc-128,
kmalloc-256 etc. caches which will be used to manage requests for up to
64, 128, 256, ... bytes respectively, and in that case the calculations
are correct. FWIW I smoke tested this allocator and the actual numbers
seem to be consistent with these calculations.
> So, what is done here is calculating the number of pages you could fill with
> the memory that is kept around, but not the number of backing pages you consume
> memory from.
>
> Using VBox<u8; PAGE_SIZE>::new_uninit() for all allocations might simplify this.
>
>> +/// }
>> +/// }
>> +/// ```
>> +///
>> +pub trait ZpoolDriver {
>> + /// Opaque Rust representation of `struct zpool`.
>> + type Pool: ForeignOwnable;
>> +
>> + /// Create a pool.
>> + fn create(name: &'static CStr, gfp: Flags) -> Result<Self::Pool>;
>> +
>> + /// Destroy the pool.
>> + fn destroy(pool: Self::Pool);
>> +
>> + /// Allocate an object of size `size` bytes from `pool`, with the allocation flags `gfp` and
>
> "of `size` bytes"
>
>> + /// preferred NUMA node `nid`. If the allocation is successful, an opaque handle is returned.
>> + fn malloc(
>> + pool: <Self::Pool as ForeignOwnable>::BorrowedMut<'_>,
>> + size: usize,
>> + gfp: Flags,
>> + nid: NumaNode,
>> + ) -> Result<usize>;
>> +
>> + /// Free a previously allocated from the `pool` object, represented by `handle`.
>> + ///
>> + /// # Safety
>> + ///
>> + /// - `handle` must be a valid handle previously returned by `malloc`.
>> + /// - `handle` must not be used any more after the call to `free`.
>> + unsafe fn free(pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>, handle: usize);
>> +
>> + /// Make all the necessary preparations for the caller to be able to read from the object
>> + /// represented by `handle` and return a valid pointer to the `handle` memory to be read.
>> + ///
>> + /// # Safety
>> + ///
>> + /// - `handle` must be a valid handle previously returned by `malloc`.
>> + /// - `read_end` with the same `handle` must be called for each `read_begin`.
>
> What can potentially happen if we don't? I.e. how is this different from
> malloc()?
Here the idea is that if read_begin() has some sort of extra mapping
involved (like, doing kmap_atomic() on some weird memory address) for
the caller to be able to read directly from the returned pointer,
read_end() must clean that mapping up.
>> + unsafe fn read_begin(
>> + pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>,
>> + handle: usize,
>> + ) -> NonNull<u8>;
>> +
>> + /// Finish reading from a previously allocated `handle`. `handle_mem` must be the pointer
>> + /// previously returned by `read_begin`.
>> + ///
>> + /// # Safety
>> + ///
>> + /// - `handle` must be a valid handle previously returned by `malloc`.
>> + /// - `handle_mem` must be the pointer previously returned by `read_begin`.
>> + unsafe fn read_end(
>> + pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>,
>> + handle: usize,
>> + handle_mem: NonNull<u8>,
>> + );
>> +
>> + /// Write to the object represented by a previously allocated `handle`. `handle_mem` points
>> + /// to the memory to copy data from, and `mem_len` defines the length of the data block to
>> + /// be copied.
>> + ///
>> + /// # Safety
>> + ///
>> + /// - `handle` must be a valid handle previously returned by `malloc`.
>> + /// - `handle_mem` must be a valid pointer to an allocated memory area.
>
> "must be a valid pointer into the allocated memory aread represented by
> `handle`"
>
>> + /// - `handle_mem` + `mem_len` must not point outside the allocated memory area.
>> + unsafe fn write(
>> + pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>,
>> + handle: usize,
>> + handle_mem: NonNull<u8>,
>> + mem_len: usize,
>> + );
>> +
>> + /// Get the number of pages used by the `pool`.
>> + fn total_pages(pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>) -> u64;
>> +}
>> +
>> +/// An "adapter" for the registration of zpool drivers.
>> +pub struct Adapter<T: ZpoolDriver>(T);
>> +
>> +impl<T: ZpoolDriver> Adapter<T> {
>> + extern "C" fn create_(name: *const c_uchar, gfp: u32) -> *mut c_void {
>> + // SAFETY: the memory pointed to by name is guaranteed by zpool to be a valid string
>
> What about the lifetime of the string? In the abstraction you assume 'static,
> how is this guaranteed?
Actually it isn't, thanks for finding this.
>> + let pool = unsafe { T::create(CStr::from_char_ptr(name), Flags::from_raw(gfp)) };
>> + match pool {
>> + Err(_) => null_mut(),
>> + Ok(p) => T::Pool::into_foreign(p),
>> + }
>> + }
>
> Please add an empty line in between function definitions.
>
>> + extern "C" fn destroy_(pool: *mut c_void) {
>> + // SAFETY: The pointer originates from an `into_foreign` call.
>> + T::destroy(unsafe { T::Pool::from_foreign(pool) })
>> + }
>> + extern "C" fn malloc_(
>> + pool: *mut c_void,
>> + size: usize,
>> + gfp: u32,
>> + handle: *mut usize,
>> + nid: c_int,
>> + ) -> c_int {
>> + // SAFETY: The pointer originates from an `into_foreign` call. If `pool` is passed to
>> + // `from_foreign`, then that happens in `_destroy` which will not be called during this
>> + // method.
>> + let pool = unsafe { T::Pool::borrow_mut(pool) };
>
> Wait, can't this happen concurrently to all the other functions that borrow the
> pool? This would be undefined behavior, no?
Theoretically, yes, but since pool is actually Box<T>, it's only the
inner T that is mutable.
Anyway, the only reason for malloc() to require a mutable reference is
that the backend implementation *may* use RBTree::cursor_lower_bound()
which requires a mutable reference of the tree.
Would it be okay if I
* change the Zpool API so that malloc takes an immutable reference
* extend the RBTree API with a cursor_lower_bound analog which doesn't
require a mutable tree?
>> + from_result(|| {
>> + let real_nid = match nid {
>> + bindings::NUMA_NO_NODE => NumaNode::NO_NODE,
>> + _ => NumaNode::new(nid)?,
>> + };
>> + let h = T::malloc(pool, size, Flags::from_raw(gfp), real_nid)?;
>> + // SAFETY: handle is guaranteed to be a valid pointer by zpool.
>> + unsafe { *handle = h };
>> + Ok(0)
>> + })
>> + }
>> + extern "C" fn free_(pool: *mut c_void, handle: usize) {
>> + // SAFETY: The pointer originates from an `into_foreign` call. If `pool` is passed to
>> + // `from_foreign`, then that happens in `_destroy` which will not be called during this
>> + // method.
>> + let pool = unsafe { T::Pool::borrow(pool) };
>> +
>> + // SAFETY: the caller (zswap) guarantees that `handle` is a valid handle previously
>
> Why does this mention zwap here and in the other functions below?
Should have been "(e. g. zswap)".
>> + // allocated by `malloc`.
>> + unsafe { T::free(pool, handle) }
>> + }
>
next prev parent reply other threads:[~2025-08-27 20:43 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-23 13:04 [PATCH v4 0/2] rust: zpool: add abstraction for zpool drivers Vitaly Wool
2025-08-23 13:05 ` [PATCH v4 1/2] rust: alloc: add from_raw method to Flags Vitaly Wool
2025-08-26 10:57 ` Danilo Krummrich
2025-08-23 13:05 ` [PATCH v4 2/2] rust: zpool: add abstraction for zpool drivers Vitaly Wool
2025-08-26 12:20 ` Danilo Krummrich
2025-08-27 20:43 ` Vitaly Wool [this message]
2025-08-28 10:01 ` Vitaly Wool
2025-08-26 17:02 ` Benno Lossin
2025-08-27 14:24 ` Vitaly Wool
2025-08-27 15:59 ` Benno Lossin
2025-08-28 7:22 ` Vitaly Wool
2025-08-26 10:43 ` [PATCH v4 0/2] " Miguel Ojeda
2025-08-26 11:37 ` Danilo Krummrich
2025-08-27 12:37 ` Alice Ryhl
2025-08-26 12:44 ` Johannes Weiner
2025-08-26 14:56 ` Vitaly Wool
2025-08-27 13:07 ` Johannes Weiner
2025-09-02 15:16 ` Vitaly Wool
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b1509e7f-4817-4466-bd2b-c083f024c0d4@konsulko.se \
--to=vitaly.wool@konsulko.se \
--cc=Liam.Howlett@oracle.com \
--cc=a.hindborg@kernel.org \
--cc=alex.gaynor@gmail.com \
--cc=aliceryhl@google.com \
--cc=bjorn3_gh@protonmail.com \
--cc=boqun.feng@gmail.com \
--cc=dakr@kernel.org \
--cc=gary@garyguo.net \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=lossin@kernel.org \
--cc=nphamcs@gmail.com \
--cc=ojeda@kernel.org \
--cc=rust-for-linux@vger.kernel.org \
--cc=tmgross@umich.edu \
--cc=urezki@gmail.com \
--cc=vbabka@suse.cz \
--cc=yosry.ahmed@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).