linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Vitaly Wool <vitaly.wool@konsulko.se>
To: Danilo Krummrich <dakr@kernel.org>
Cc: rust-for-linux@vger.kernel.org, linux-kernel@vger.kernel.org,
	Uladzislau Rezki <urezki@gmail.com>,
	Alice Ryhl <aliceryhl@google.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	"Liam R . Howlett" <Liam.Howlett@oracle.com>,
	Miguel Ojeda <ojeda@kernel.org>,
	Alex Gaynor <alex.gaynor@gmail.com>,
	Boqun Feng <boqun.feng@gmail.com>, Gary Guo <gary@garyguo.net>,
	Bjorn Roy Baron <bjorn3_gh@protonmail.com>,
	Benno Lossin <lossin@kernel.org>,
	Andreas Hindborg <a.hindborg@kernel.org>,
	Trevor Gross <tmgross@umich.edu>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Yosry Ahmed <yosry.ahmed@linux.dev>,
	Nhat Pham <nphamcs@gmail.com>,
	linux-mm@kvack.org
Subject: Re: [PATCH v4 2/2] rust: zpool: add abstraction for zpool drivers
Date: Wed, 27 Aug 2025 22:43:09 +0200	[thread overview]
Message-ID: <b1509e7f-4817-4466-bd2b-c083f024c0d4@konsulko.se> (raw)
In-Reply-To: <DCCCRYEUVJWZ.2AUDA0DXK0XSF@kernel.org>



On 8/26/25 14:20, Danilo Krummrich wrote:
> On Sat Aug 23, 2025 at 3:05 PM CEST, Vitaly Wool wrote:
>> +/// Zpool API.
>> +///
>> +/// The [`ZpoolDriver`] trait serves as an interface for Zpool drivers implemented in Rust.
>> +/// Such drivers implement memory storage pools in accordance with the zpool API.
>> +///
>> +/// # Example
>> +///
>> +/// A zpool driver implementation which uses KVec of 2**n sizes, n = 6, 7, ..., PAGE_SHIFT.
>> +/// Every zpool object is packed into a KVec that is sufficiently large, and n (the
>> +/// denominator) is saved in the least significant bits of the handle, which is guaranteed
>> +/// to be at least 2**6 aligned by kmalloc.
>> +///
>> +/// ```
>> +/// use core::ptr::{NonNull, copy_nonoverlapping};
>> +/// use core::sync::atomic::{AtomicU64, Ordering};
>> +/// use kernel::alloc::{Flags, KBox, KVec, NumaNode};
>> +/// use kernel::page::PAGE_SHIFT;
>> +/// use kernel::prelude::EINVAL;
>> +/// use kernel::zpool::*;
>> +///
>> +/// struct MyZpool {
>> +///     name: &'static CStr,
>> +///     bytes_used: AtomicU64,
>> +/// }
>> +///
>> +/// struct MyZpoolDriver;
>> +///
>> +/// impl ZpoolDriver for MyZpoolDriver {
>> +///     type Pool = KBox<MyZpool>;
>> +///
>> +///     fn create(name: &'static CStr, gfp: Flags) -> Result<KBox<MyZpool>> {
>> +///         let my_pool = MyZpool { name, bytes_used: AtomicU64::new(0) };
>> +///         let pool = KBox::new(my_pool, gfp)?;
>> +///
>> +///         Ok(pool)
>> +///     }
>> +///
>> +///     fn destroy(p: KBox<MyZpool>) {
>> +///         drop(p);
>> +///     }
>> +///
>> +///     fn malloc(pool: &mut MyZpool, size: usize, gfp: Flags, _nid: NumaNode) -> Result<usize> {
>> +///         let mut pow: usize = 0;
>> +///         for n in 6..=PAGE_SHIFT {
>> +///             if size <= 1 << n {
>> +///                 pow = n;
>> +///                 break;
>> +///             }
>> +///         }
> 
> Why not just use next_power_of_two()? I think the same logic could also be
> achieved with
> 
> 	size.next_power_of_two().trailing_zeros().max(6).min(PAGE_SHIFT)

It indeed can, thanks :)

>> +///         match pow {
>> +///             0 => Err(EINVAL),
>> +///             _ => {
>> +///                 let vec = KVec::<u64>::with_capacity(1 << (pow - 3), gfp)?;
> 
> Why use u64 and 1 << (pow - 3), rather than simply u8 and 1 << pow?
> 
> (Btw. you could also just use VBox<u8; PAGE_SIZE>::new_uninit() for all
> allocations to keep the example simple.)

Right, that fixation on u64 doesn't help at all here.

>> +///                 let (ptr, _len, _cap) = vec.into_raw_parts();
>> +///                 pool.bytes_used.fetch_add(1 << pow, Ordering::Relaxed);
>> +///                 Ok(ptr as usize | (pow - 6))
>> +///             }
>> +///         }
>> +///     }
>> +///
>> +///     unsafe fn free(pool: &MyZpool, handle: usize) {
>> +///         let n = (handle & 0x3F) + 3;
>> +///         let uptr = handle & !0x3F;
>> +///
>> +///         // SAFETY:
>> +///         // - uptr comes from handle which points to the KVec allocation from `alloc`.
> 
> That's not true, you modified the pointer you got from KVec. Please explain why
> it is always safe to use lower 6 bits for something else.
> 
> What does "`alloc`" refer to?

It refers to the alloc function we implement in this toy backend for 
ZpoolDriver trait.

> NIT: `uptr`, `KVec`
> 
>> +///         // - size == capacity and is coming from the first 6 bits of handle.
>> +///         let vec = unsafe { KVec::<u64>::from_raw_parts(uptr as *mut u64, 1 << n, 1 << n) };
> 
> Why do you set the length (not the capacity) of the Vector to 1 << n? I think
> technically it doesn't matter, but you should explain that in the safety
> comment.

Would the following work: "we know the capacity of this vector and it is 
1 << n, we set the length to the same value to be deterministic, but the 
actual length of vec doesn't matter because we're dropping it right here 
anyway"?

>> +///         drop(vec);
>> +///         pool.bytes_used.fetch_sub(1 << (n + 3), Ordering::Relaxed);
>> +///     }
>> +///
>> +///     unsafe fn read_begin(_pool: &MyZpool, handle: usize) -> NonNull<u8> {
>> +///         let uptr = handle & !0x3F;
>> +///         // SAFETY: uptr points to a memory area allocated by KVec
> 
> Please use markdown and end sentences with a period. (Applies to the entire
> file.)
> 
>> +///         unsafe { NonNull::new_unchecked(uptr as *mut u8) }
>> +///     }
>> +///
>> +///     unsafe fn read_end(_pool: &MyZpool, _handle: usize, _handle_mem: NonNull<u8>) {}
>> +///
>> +///     unsafe fn write(_p: &MyZpool, handle: usize, handle_mem: NonNull<u8>, mem_len: usize) {
>> +///         let uptr = handle & !0x3F;
>> +///         // SAFETY: handle_mem is a valid non-null pointer provided by zpool, uptr points to
>> +///         // a KVec allocated in `malloc` and is therefore also valid.
>> +///         unsafe {
>> +///             copy_nonoverlapping(handle_mem.as_ptr().cast(), uptr as *mut c_void, mem_len)
>> +///         };
>> +///     }
>> +///
>> +///     fn total_pages(pool: &MyZpool) -> u64 {
>> +///         pool.bytes_used.load(Ordering::Relaxed) >> PAGE_SHIFT
> 
> I'm not sure what the semantic of this function is; the documentation says
> "Get the number of pages used by the `pool`".
> 
> However, given that you give out allocations from a kmalloc() bucket in
> malloc(), this pool might be backed by more pages than what you calculate here.

Well, maybe I need to add an explicit comment about it here, but the 
idea is that with the SLUB allocator, you have kmalloc-64, kmalloc-128, 
kmalloc-256 etc. caches which will be used to manage requests for up to 
64, 128, 256, ... bytes respectively, and in that case the calculations 
are correct. FWIW I smoke tested this allocator and the actual numbers 
seem to be consistent with these calculations.

> So, what is done here is calculating the number of pages you could fill with
> the memory that is kept around, but not the number of backing pages you consume
> memory from.
> 
> Using VBox<u8; PAGE_SIZE>::new_uninit() for all allocations might simplify this.
> 
>> +///     }
>> +/// }
>> +/// ```
>> +///
>> +pub trait ZpoolDriver {
>> +    /// Opaque Rust representation of `struct zpool`.
>> +    type Pool: ForeignOwnable;
>> +
>> +    /// Create a pool.
>> +    fn create(name: &'static CStr, gfp: Flags) -> Result<Self::Pool>;
>> +
>> +    /// Destroy the pool.
>> +    fn destroy(pool: Self::Pool);
>> +
>> +    /// Allocate an object of size `size` bytes from `pool`, with the allocation flags `gfp` and
> 
> "of `size` bytes"
> 
>> +    /// preferred NUMA node `nid`. If the allocation is successful, an opaque handle is returned.
>> +    fn malloc(
>> +        pool: <Self::Pool as ForeignOwnable>::BorrowedMut<'_>,
>> +        size: usize,
>> +        gfp: Flags,
>> +        nid: NumaNode,
>> +    ) -> Result<usize>;
>> +
>> +    /// Free a previously allocated from the `pool` object, represented by `handle`.
>> +    ///
>> +    /// # Safety
>> +    ///
>> +    /// - `handle` must be a valid handle previously returned by `malloc`.
>> +    /// - `handle` must not be used any more after the call to `free`.
>> +    unsafe fn free(pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>, handle: usize);
>> +
>> +    /// Make all the necessary preparations for the caller to be able to read from the object
>> +    /// represented by `handle` and return a valid pointer to the `handle` memory to be read.
>> +    ///
>> +    /// # Safety
>> +    ///
>> +    /// - `handle` must be a valid handle previously returned by `malloc`.
>> +    /// - `read_end` with the same `handle` must be called for each `read_begin`.
> 
> What can potentially happen if we don't? I.e. how is this different from
> malloc()?

Here the idea is that if read_begin() has some sort of extra mapping 
involved (like, doing kmap_atomic() on some weird memory address) for 
the caller to be able to read directly from the returned pointer, 
read_end() must clean that mapping up.

>> +    unsafe fn read_begin(
>> +        pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>,
>> +        handle: usize,
>> +    ) -> NonNull<u8>;
>> +
>> +    /// Finish reading from a previously allocated `handle`. `handle_mem` must be the pointer
>> +    /// previously returned by `read_begin`.
>> +    ///
>> +    /// # Safety
>> +    ///
>> +    /// - `handle` must be a valid handle previously returned by `malloc`.
>> +    /// - `handle_mem` must be the pointer previously returned by `read_begin`.
>> +    unsafe fn read_end(
>> +        pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>,
>> +        handle: usize,
>> +        handle_mem: NonNull<u8>,
>> +    );
>> +
>> +    /// Write to the object represented by a previously allocated `handle`. `handle_mem` points
>> +    /// to the memory to copy data from, and `mem_len` defines the length of the data block to
>> +    /// be copied.
>> +    ///
>> +    /// # Safety
>> +    ///
>> +    /// - `handle` must be a valid handle previously returned by `malloc`.
>> +    /// - `handle_mem` must be a valid pointer to an allocated memory area.
> 
> "must be a valid pointer into the allocated memory aread represented by
> `handle`"
> 
>> +    /// - `handle_mem` + `mem_len` must not point outside the allocated memory area.
>> +    unsafe fn write(
>> +        pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>,
>> +        handle: usize,
>> +        handle_mem: NonNull<u8>,
>> +        mem_len: usize,
>> +    );
>> +
>> +    /// Get the number of pages used by the `pool`.
>> +    fn total_pages(pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>) -> u64;
>> +}
>> +
>> +/// An "adapter" for the registration of zpool drivers.
>> +pub struct Adapter<T: ZpoolDriver>(T);
>> +
>> +impl<T: ZpoolDriver> Adapter<T> {
>> +    extern "C" fn create_(name: *const c_uchar, gfp: u32) -> *mut c_void {
>> +        // SAFETY: the memory pointed to by name is guaranteed by zpool to be a valid string
> 
> What about the lifetime of the string? In the abstraction you assume 'static,
> how is this guaranteed?

Actually it isn't, thanks for finding this.

>> +        let pool = unsafe { T::create(CStr::from_char_ptr(name), Flags::from_raw(gfp)) };
>> +        match pool {
>> +            Err(_) => null_mut(),
>> +            Ok(p) => T::Pool::into_foreign(p),
>> +        }
>> +    }
> 
> Please add an empty line in between function definitions.
> 
>> +    extern "C" fn destroy_(pool: *mut c_void) {
>> +        // SAFETY: The pointer originates from an `into_foreign` call.
>> +        T::destroy(unsafe { T::Pool::from_foreign(pool) })
>> +    }
>> +    extern "C" fn malloc_(
>> +        pool: *mut c_void,
>> +        size: usize,
>> +        gfp: u32,
>> +        handle: *mut usize,
>> +        nid: c_int,
>> +    ) -> c_int {
>> +        // SAFETY: The pointer originates from an `into_foreign` call. If `pool` is passed to
>> +        // `from_foreign`, then that happens in `_destroy` which will not be called during this
>> +        // method.
>> +        let pool = unsafe { T::Pool::borrow_mut(pool) };
> 
> Wait, can't this happen concurrently to all the other functions that borrow the
> pool? This would be undefined behavior, no?

Theoretically, yes, but since pool is actually Box<T>, it's only the 
inner T that is mutable.

Anyway, the only reason for malloc() to require a mutable reference is 
that the backend implementation *may* use RBTree::cursor_lower_bound() 
which requires a mutable reference of the tree.

Would it be okay if I
* change the Zpool API so that malloc takes an immutable reference
* extend the RBTree API with a cursor_lower_bound analog which doesn't 
require a mutable tree?

>> +        from_result(|| {
>> +            let real_nid = match nid {
>> +                bindings::NUMA_NO_NODE => NumaNode::NO_NODE,
>> +                _ => NumaNode::new(nid)?,
>> +            };
>> +            let h = T::malloc(pool, size, Flags::from_raw(gfp), real_nid)?;
>> +            // SAFETY: handle is guaranteed to be a valid pointer by zpool.
>> +            unsafe { *handle = h };
>> +            Ok(0)
>> +        })
>> +    }
>> +    extern "C" fn free_(pool: *mut c_void, handle: usize) {
>> +        // SAFETY: The pointer originates from an `into_foreign` call. If `pool` is passed to
>> +        // `from_foreign`, then that happens in `_destroy` which will not be called during this
>> +        // method.
>> +        let pool = unsafe { T::Pool::borrow(pool) };
>> +
>> +        // SAFETY: the caller (zswap) guarantees that `handle` is a valid handle previously
> 
> Why does this mention zwap here and in the other functions below?

Should have been "(e. g. zswap)".

>> +        // allocated by `malloc`.
>> +        unsafe { T::free(pool, handle) }
>> +    }
> 


  reply	other threads:[~2025-08-27 20:43 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-23 13:04 [PATCH v4 0/2] rust: zpool: add abstraction for zpool drivers Vitaly Wool
2025-08-23 13:05 ` [PATCH v4 1/2] rust: alloc: add from_raw method to Flags Vitaly Wool
2025-08-26 10:57   ` Danilo Krummrich
2025-08-23 13:05 ` [PATCH v4 2/2] rust: zpool: add abstraction for zpool drivers Vitaly Wool
2025-08-26 12:20   ` Danilo Krummrich
2025-08-27 20:43     ` Vitaly Wool [this message]
2025-08-28 10:01       ` Vitaly Wool
2025-08-26 17:02   ` Benno Lossin
2025-08-27 14:24     ` Vitaly Wool
2025-08-27 15:59       ` Benno Lossin
2025-08-28  7:22         ` Vitaly Wool
2025-08-26 10:43 ` [PATCH v4 0/2] " Miguel Ojeda
2025-08-26 11:37   ` Danilo Krummrich
2025-08-27 12:37     ` Alice Ryhl
2025-08-26 12:44 ` Johannes Weiner
2025-08-26 14:56   ` Vitaly Wool
2025-08-27 13:07     ` Johannes Weiner
2025-09-02 15:16       ` Vitaly Wool

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b1509e7f-4817-4466-bd2b-c083f024c0d4@konsulko.se \
    --to=vitaly.wool@konsulko.se \
    --cc=Liam.Howlett@oracle.com \
    --cc=a.hindborg@kernel.org \
    --cc=alex.gaynor@gmail.com \
    --cc=aliceryhl@google.com \
    --cc=bjorn3_gh@protonmail.com \
    --cc=boqun.feng@gmail.com \
    --cc=dakr@kernel.org \
    --cc=gary@garyguo.net \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=lossin@kernel.org \
    --cc=nphamcs@gmail.com \
    --cc=ojeda@kernel.org \
    --cc=rust-for-linux@vger.kernel.org \
    --cc=tmgross@umich.edu \
    --cc=urezki@gmail.com \
    --cc=vbabka@suse.cz \
    --cc=yosry.ahmed@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).