* [PATCH v6 1/2] mm: reinstate zpool as a thin API
2025-09-23 10:25 [PATCH v6 0/2] rust: zpool: add API for C and Rust Vitaly Wool
@ 2025-09-23 10:26 ` Vitaly Wool
2025-09-23 10:27 ` [PATCH v6 2/2] rust: zpool: add abstraction for zpool drivers Vitaly Wool
2025-09-23 16:50 ` [syzbot ci] Re: rust: zpool: add API for C and Rust syzbot ci
2 siblings, 0 replies; 11+ messages in thread
From: Vitaly Wool @ 2025-09-23 10:26 UTC (permalink / raw)
To: linux-mm, rust-for-linux
Cc: Johannes Weiner, Yosry Ahmed, Nhat Pham, Chengming Zhou,
Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Minchan Kim, Sergey Senozhatsky,
Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Danilo Krummrich, Greg Kroah-Hartman, linux-kernel,
Vitaly Wool
With the removal of the zpool layer the opportunity to select an allocation
backend for zswap has been completely lost. While the flexibility zpool
used to provide may have seemed unjustified, the option to select an
alternative backend should still exist, especially with the recent
advancements in the vmap/vmalloc field that allow for faster and simple
allocation backends. This patch reinstates zpool API as a thin header-only
layer and provides the infrastructure for backend compile time selection.
Signed-off-by: Vitaly Wool <vitaly.wool@konsulko.se>
---
Documentation/admin-guide/mm/zswap.rst | 14 +++---
MAINTAINERS | 1 +
include/linux/zpool.h | 62 ++++++++++++++++++++++++++
mm/Kconfig | 22 ++++++++-
mm/zsmalloc.c | 3 ++
mm/zswap.c | 30 ++++++-------
6 files changed, 110 insertions(+), 22 deletions(-)
create mode 100644 include/linux/zpool.h
diff --git a/Documentation/admin-guide/mm/zswap.rst b/Documentation/admin-guide/mm/zswap.rst
index 283d77217c6f..44075d35e512 100644
--- a/Documentation/admin-guide/mm/zswap.rst
+++ b/Documentation/admin-guide/mm/zswap.rst
@@ -53,16 +53,20 @@ Zswap receives pages for compression from the swap subsystem and is able to
evict pages from its own compressed pool on an LRU basis and write them back to
the backing swap device in the case that the compressed pool is full.
-Zswap makes use of zsmalloc for the managing the compressed memory pool. Each
-allocation in zsmalloc is not directly accessible by address. Rather, a handle is
+Zswap makes use of zpool API for the managing the compressed memory pool. Each
+allocation in zpool is not directly accessible by address. Rather, a handle is
returned by the allocation routine and that handle must be mapped before being
accessed. The compressed memory pool grows on demand and shrinks as compressed
-pages are freed. The pool is not preallocated.
+pages are freed. The pool is not preallocated.
+
+An allocator backend implementing zpool API can be selected during the build
+time as a kernel configuration option. Currently only one backend (zsmalloc) is
+supported and it is selected automatically.
When a swap page is passed from swapout to zswap, zswap maintains a mapping
of the swap entry, a combination of the swap type and swap offset, to the
-zsmalloc handle that references that compressed swap page. This mapping is
-achieved with a red-black tree per swap type. The swap offset is the search
+zpool handle that references that compressed swap page. This mapping is
+achieved with a red-black tree per swap type. The swap offset is the search
key for the tree nodes.
During a page fault on a PTE that is a swap entry, the swapin code calls the
diff --git a/MAINTAINERS b/MAINTAINERS
index ca8e3d18eedd..149a06be4000 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -27885,6 +27885,7 @@ R: Chengming Zhou <chengming.zhou@linux.dev>
L: linux-mm@kvack.org
S: Maintained
F: Documentation/admin-guide/mm/zswap.rst
+F: include/linux/zpool.h
F: include/linux/zswap.h
F: mm/zswap.c
F: tools/testing/selftests/cgroup/test_zswap.c
diff --git a/include/linux/zpool.h b/include/linux/zpool.h
new file mode 100644
index 000000000000..05406b00d734
--- /dev/null
+++ b/include/linux/zpool.h
@@ -0,0 +1,62 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * zpool memory storage api
+ *
+ * Copyright (C) 2014 Dan Streetman
+ *
+ * This is a common frontend for the zswap compressed memory storage
+ * implementations.
+ */
+
+#ifndef _ZPOOL_H_
+#define _ZPOOL_H_
+
+struct zpool {
+ void *pool;
+};
+
+struct zpool *zpool_create_pool(const char *name);
+void zpool_destroy_pool(struct zpool *pool);
+unsigned long zpool_malloc(struct zpool *pool, size_t size, gfp_t gfp, const int nid);
+void zpool_free(struct zpool *pool, unsigned long handle);
+void *zpool_obj_read_begin(struct zpool *zpool, unsigned long handle, void *local_copy);
+void zpool_obj_read_end(struct zpool *zpool, unsigned long handle, void *handle_mem);
+void zpool_obj_write(struct zpool *zpool, unsigned long handle, void *handle_mem, size_t mem_len);
+u64 zpool_get_total_pages(struct zpool *zpool);
+
+#define DECLARE_ZPOOL(prefix) \
+ struct zpool *zpool_create_pool(const char *name) \
+ { \
+ return (struct zpool *) prefix ## _create_pool(name); \
+ } \
+ void zpool_destroy_pool(struct zpool *pool) \
+ { \
+ return prefix ## _destroy_pool(pool->pool); \
+ } \
+ unsigned long zpool_malloc(struct zpool *pool, size_t size, gfp_t gfp, const int nid) \
+ { \
+ return prefix ## _malloc(pool->pool, size, gfp, nid); \
+ } \
+ void zpool_free(struct zpool *pool, unsigned long handle) \
+ { \
+ return prefix ## _free(pool->pool, handle); \
+ } \
+ void *zpool_obj_read_begin(struct zpool *zpool, unsigned long handle, void *local_copy) \
+ { \
+ return prefix ## _obj_read_begin(zpool->pool, handle, local_copy); \
+ } \
+ void zpool_obj_read_end(struct zpool *zpool, unsigned long handle, void *handle_mem) \
+ { \
+ return prefix ## _obj_read_end(zpool->pool, handle, handle_mem); \
+ } \
+ void zpool_obj_write(struct zpool *zpool, unsigned long handle, \
+ void *handle_mem, size_t mem_len) \
+ { \
+ prefix ## _obj_write(zpool->pool, handle, handle_mem, mem_len); \
+ } \
+ u64 zpool_get_total_pages(struct zpool *zpool) \
+ { \
+ return prefix ## _get_total_pages(zpool->pool); \
+ }
+
+#endif
diff --git a/mm/Kconfig b/mm/Kconfig
index d1ed839ca710..f72483b91098 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -23,7 +23,7 @@ config ZSWAP
bool "Compressed cache for swap pages"
depends on SWAP
select CRYPTO
- select ZSMALLOC
+ select ZPOOL_ALLOCATOR
help
A lightweight compressed cache for swap pages. It takes
pages that are in the process of being swapped out and attempts to
@@ -122,8 +122,26 @@ config ZSWAP_COMPRESSOR_DEFAULT
default "zstd" if ZSWAP_COMPRESSOR_DEFAULT_ZSTD
default ""
+config ZPOOL_ALLOCATOR
+ bool
+
+choice
+ prompt "Allocator for zswap"
+ default ZSMALLOC
+ help
+ Allocator to be used for compressed object storage in zswap.
+ Currently only zsmalloc is supported.
+
config ZSMALLOC
- tristate
+ bool "N:1 compression allocator (zsmalloc)"
+ select ZPOOL_ALLOCATOR
+ depends on MMU
+ help
+ zsmalloc is a slab-based memory allocator designed to store
+ pages of various compression levels efficiently. It achieves
+ the highest storage density with the least amount of fragmentation.
+
+endchoice
if ZSMALLOC
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 5bf832f9c05c..c84e96fd30f9 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -36,6 +36,7 @@
#include <linux/types.h>
#include <linux/debugfs.h>
#include <linux/zsmalloc.h>
+#include <linux/zpool.h>
#include <linux/fs.h>
#include <linux/workqueue.h>
#include "zpdesc.h"
@@ -432,6 +433,8 @@ static void record_obj(unsigned long handle, unsigned long obj)
*(unsigned long *)handle = obj;
}
+DECLARE_ZPOOL(zs);
+
static inline bool __maybe_unused is_first_zpdesc(struct zpdesc *zpdesc)
{
return PagePrivate(zpdesc_page(zpdesc));
diff --git a/mm/zswap.c b/mm/zswap.c
index c1af782e54ec..a567cc16b9a3 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -34,7 +34,7 @@
#include <linux/pagemap.h>
#include <linux/workqueue.h>
#include <linux/list_lru.h>
-#include <linux/zsmalloc.h>
+#include <linux/zpool.h>
#include "swap.h"
#include "internal.h"
@@ -151,7 +151,7 @@ struct crypto_acomp_ctx {
* needs to be verified that it's still valid in the tree.
*/
struct zswap_pool {
- struct zs_pool *zs_pool;
+ struct zpool *pool;
struct crypto_acomp_ctx __percpu *acomp_ctx;
struct percpu_ref ref;
struct list_head list;
@@ -257,8 +257,8 @@ static struct zswap_pool *zswap_pool_create(char *compressor)
/* unique name for each pool specifically required by zsmalloc */
snprintf(name, 38, "zswap%x", atomic_inc_return(&zswap_pools_count));
- pool->zs_pool = zs_create_pool(name);
- if (!pool->zs_pool)
+ pool->pool = zpool_create_pool(name);
+ if (!pool->pool)
goto error;
strscpy(pool->tfm_name, compressor, sizeof(pool->tfm_name));
@@ -295,8 +295,8 @@ static struct zswap_pool *zswap_pool_create(char *compressor)
error:
if (pool->acomp_ctx)
free_percpu(pool->acomp_ctx);
- if (pool->zs_pool)
- zs_destroy_pool(pool->zs_pool);
+ if (pool->pool)
+ zpool_destroy_pool(pool->pool);
kfree(pool);
return NULL;
}
@@ -327,7 +327,7 @@ static void zswap_pool_destroy(struct zswap_pool *pool)
cpuhp_state_remove_instance(CPUHP_MM_ZSWP_POOL_PREPARE, &pool->node);
free_percpu(pool->acomp_ctx);
- zs_destroy_pool(pool->zs_pool);
+ zpool_destroy_pool(pool->pool);
kfree(pool);
}
@@ -454,7 +454,7 @@ unsigned long zswap_total_pages(void)
rcu_read_lock();
list_for_each_entry_rcu(pool, &zswap_pools, list)
- total += zs_get_total_pages(pool->zs_pool);
+ total += zpool_get_total_pages(pool->pool);
rcu_read_unlock();
return total;
@@ -717,7 +717,7 @@ static void zswap_entry_cache_free(struct zswap_entry *entry)
static void zswap_entry_free(struct zswap_entry *entry)
{
zswap_lru_del(&zswap_list_lru, entry);
- zs_free(entry->pool->zs_pool, entry->handle);
+ zpool_free(entry->pool->pool, entry->handle);
zswap_pool_put(entry->pool);
if (entry->objcg) {
obj_cgroup_uncharge_zswap(entry->objcg, entry->length);
@@ -907,13 +907,13 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry,
}
gfp = GFP_NOWAIT | __GFP_NORETRY | __GFP_HIGHMEM | __GFP_MOVABLE;
- handle = zs_malloc(pool->zs_pool, dlen, gfp, page_to_nid(page));
+ handle = zpool_malloc(pool->pool, dlen, gfp, page_to_nid(page));
if (IS_ERR_VALUE(handle)) {
alloc_ret = PTR_ERR((void *)handle);
goto unlock;
}
- zs_obj_write(pool->zs_pool, handle, dst, dlen);
+ zpool_obj_write(pool->pool, handle, dst, dlen);
entry->handle = handle;
entry->length = dlen;
@@ -940,7 +940,7 @@ static bool zswap_decompress(struct zswap_entry *entry, struct folio *folio)
u8 *src, *obj;
acomp_ctx = acomp_ctx_get_cpu_lock(pool);
- obj = zs_obj_read_begin(pool->zs_pool, entry->handle, acomp_ctx->buffer);
+ obj = zpool_obj_read_begin(pool->pool, entry->handle, acomp_ctx->buffer);
/* zswap entries of length PAGE_SIZE are not compressed. */
if (entry->length == PAGE_SIZE) {
@@ -949,7 +949,7 @@ static bool zswap_decompress(struct zswap_entry *entry, struct folio *folio)
}
/*
- * zs_obj_read_begin() might return a kmap address of highmem when
+ * zpool_obj_read_begin() might return a kmap address of highmem when
* acomp_ctx->buffer is not used. However, sg_init_one() does not
* handle highmem addresses, so copy the object to acomp_ctx->buffer.
*/
@@ -969,7 +969,7 @@ static bool zswap_decompress(struct zswap_entry *entry, struct folio *folio)
dlen = acomp_ctx->req->dlen;
read_done:
- zs_obj_read_end(pool->zs_pool, entry->handle, obj);
+ zpool_obj_read_end(pool->pool, entry->handle, obj);
acomp_ctx_put_unlock(acomp_ctx);
if (!decomp_ret && dlen == PAGE_SIZE)
@@ -1486,7 +1486,7 @@ static bool zswap_store_page(struct page *page,
return true;
store_failed:
- zs_free(pool->zs_pool, entry->handle);
+ zpool_free(pool->pool, entry->handle);
compress_failed:
zswap_entry_cache_free(entry);
return false;
--
2.39.2
^ permalink raw reply related [flat|nested] 11+ messages in thread* [PATCH v6 2/2] rust: zpool: add abstraction for zpool drivers
2025-09-23 10:25 [PATCH v6 0/2] rust: zpool: add API for C and Rust Vitaly Wool
2025-09-23 10:26 ` [PATCH v6 1/2] mm: reinstate zpool as a thin API Vitaly Wool
@ 2025-09-23 10:27 ` Vitaly Wool
2025-09-23 21:49 ` Benno Lossin
2025-09-23 16:50 ` [syzbot ci] Re: rust: zpool: add API for C and Rust syzbot ci
2 siblings, 1 reply; 11+ messages in thread
From: Vitaly Wool @ 2025-09-23 10:27 UTC (permalink / raw)
To: linux-mm, rust-for-linux
Cc: Johannes Weiner, Yosry Ahmed, Nhat Pham, Chengming Zhou,
Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Minchan Kim, Sergey Senozhatsky,
Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
Trevor Gross, Danilo Krummrich, Greg Kroah-Hartman, linux-kernel,
Vitaly Wool
zpool is a common thin API for memory storage pool implementations.
A backend implementing this API will be used to store compressed
memory objects, e. g. for zswap, the lightweight compressed cache for
swap pages.
This patch provides the interface to use zpool in Rust kernel code,
thus enabling Rust implementations of zpool allocators for zswap. Now
that zpool API doesn't provide runtime registration option and presumes
compile time selection of an allocation backend, an allocator is
expected to call DECLARE_ZPOOL! macro to register itself during compile
time. Please also see the example in zpool.rs for more details.
Co-developed-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Vitaly Wool <vitaly.wool@konsulko.se>
---
rust/bindings/bindings_helper.h | 1 +
rust/kernel/lib.rs | 2 +
rust/kernel/zpool.rs | 366 ++++++++++++++++++++++++++++++++
3 files changed, 369 insertions(+)
create mode 100644 rust/kernel/zpool.rs
diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
index 84d60635e8a9..f0c4c454882b 100644
--- a/rust/bindings/bindings_helper.h
+++ b/rust/bindings/bindings_helper.h
@@ -75,6 +75,7 @@
#include <linux/wait.h>
#include <linux/workqueue.h>
#include <linux/xarray.h>
+#include <linux/zpool.h>
#include <trace/events/rust_sample.h>
#if defined(CONFIG_DRM_PANIC_SCREEN_QR_CODE)
diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
index 6b0a5689669f..4d5b0cc881d8 100644
--- a/rust/kernel/lib.rs
+++ b/rust/kernel/lib.rs
@@ -130,6 +130,8 @@
pub mod uaccess;
pub mod workqueue;
pub mod xarray;
+#[cfg(CONFIG_ZPOOL_ALLOCATOR)]
+pub mod zpool;
#[doc(hidden)]
pub use bindings;
diff --git a/rust/kernel/zpool.rs b/rust/kernel/zpool.rs
new file mode 100644
index 000000000000..53a0dc0607e6
--- /dev/null
+++ b/rust/kernel/zpool.rs
@@ -0,0 +1,366 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Implementation of Rust interface towards zpool API.
+
+use crate::{error::Result, kernel::alloc::Flags, str::CString, types::ForeignOwnable};
+use core::ptr::NonNull;
+use kernel::alloc::NumaNode;
+
+/// The Rust representation of zpool handle.
+pub struct ZpoolHandle(usize);
+
+impl ZpoolHandle {
+ /// Create `ZpoolHandle` from the raw representation.
+ pub fn from_raw(h: usize) -> Self {
+ Self(h)
+ }
+
+ /// Get the raw representation of the handle.
+ pub fn as_raw(self) -> usize {
+ self.0
+ }
+}
+
+/// Zpool API.
+///
+/// The [`ZpoolDriver`] trait serves as an interface for Zpool drivers implemented in Rust.
+/// Such drivers implement memory storage pools in accordance with the zpool API.
+///
+/// # Example
+///
+/// A zpool driver implementation which uses KVec of 2**n sizes, n = 6, 7, ..., PAGE_SHIFT.
+/// Every zpool object is packed into a KVec that is sufficiently large, and n (the
+/// denominator) is saved in the least significant bits of the handle, which is guaranteed
+/// to be at least 2**6 aligned by kmalloc.
+///
+/// ```
+/// use core::ptr::{NonNull, copy_nonoverlapping};
+/// use core::sync::atomic::{AtomicU64, Ordering};
+/// use kernel::alloc::{Flags, flags, KBox, KVec, NumaNode};
+/// use kernel::page::PAGE_SHIFT;
+/// use kernel::prelude::EINVAL;
+/// use kernel::str::CString;
+/// use kernel::zpool::*;
+///
+/// struct MyZpool {
+/// name: CString,
+/// bytes_used: AtomicU64,
+/// }
+///
+/// struct MyZpoolDriver;
+///
+/// impl ZpoolDriver for MyZpoolDriver {
+/// type Pool = KBox<MyZpool>;
+///
+/// fn create(name: CString, gfp: Flags) -> Result<KBox<MyZpool>> {
+/// let my_pool = MyZpool { name, bytes_used: AtomicU64::new(0) };
+/// let pool = KBox::new(my_pool, gfp)?;
+///
+/// pr_debug!("Pool {:?} created\n", pool.name);
+/// Ok(pool)
+/// }
+///
+/// fn malloc(pool: &MyZpool, size: usize, _gfp: Flags, _nid: NumaNode) -> Result<ZpoolHandle> {
+/// let pow = size.next_power_of_two().trailing_zeros().max(6);
+/// match pow {
+/// 0 => Err(EINVAL),
+/// m if m > PAGE_SHIFT as u32 => Err(ENOSPC),
+/// _ => {
+/// let vec = KVec::<u8>::with_capacity(1 << pow, GFP_KERNEL)?;
+/// let (ptr, _len, _cap) = vec.into_raw_parts();
+///
+/// // We assume that kmalloc-64, kmalloc-128 etc. kmem caches will be used for
+/// // our allocations, so it's actually `1 << pow` bytes that we have consumed.
+/// pool.bytes_used.fetch_add(1 << pow, Ordering::Relaxed);
+///
+/// // `kmalloc` guarantees that an allocation of size x*2^n is 2^n aligned.
+/// // Therefore the 6 lower bits are zeros and we can use these to store `pow`.
+/// Ok(ZpoolHandle::from_raw(ptr as usize | (pow as usize - 6)))
+/// }
+/// }
+/// }
+///
+/// unsafe fn free(pool: &MyZpool, handle: ZpoolHandle) {
+/// let h = handle.as_raw();
+/// let n = (h & 0x3F) + 6;
+/// let uptr = h & !0x3F;
+///
+/// // SAFETY:
+/// // - we derive `uptr` from handle by zeroing 6 lower bits where we store the power
+/// // denominator for the vector capacity. As noted above, the result will be exactly the
+/// // pointer to the area allocated by `KVec`. Thus, uptr is a valid pointer pointing to
+/// // the vector allocated by `alloc` function above.
+/// // - 1 << n == capacity and is coming from the first 6 bits of handle.
+/// let vec = unsafe { KVec::<u8>::from_raw_parts(uptr as *mut u8, 0, 1 << n) };
+/// drop(vec);
+/// pool.bytes_used.fetch_sub(1 << n, Ordering::Relaxed);
+/// }
+///
+/// unsafe fn read_begin(_pool: &MyZpool, handle: ZpoolHandle) -> NonNull<u8> {
+/// let uptr = handle.as_raw() & !0x3F;
+/// // SAFETY:
+/// // - we derive `uptr` from handle by zeroing 6 lower bits where we store the power
+/// // denominator for the vector capacity. As noted above, the result will be exactly the
+/// // pointer to the area allocated by `KVec`. Thus, uptr is a valid pointer pointing to
+/// // the vector allocated by `alloc` function above.
+/// unsafe { NonNull::new_unchecked(uptr as *mut u8) }
+/// }
+///
+/// unsafe fn read_end(_pool: &MyZpool, _handle: ZpoolHandle, _handle_mem: NonNull<u8>) {}
+///
+/// unsafe fn write(_p: &MyZpool, handle: ZpoolHandle, h_mem: NonNull<u8>, mem_len: usize) {
+/// let uptr = handle.as_raw() & !0x3F;
+/// // SAFETY:
+/// // - `h_mem` is a valid non-null pointer provided by zswap.
+/// // - `uptr` is derived from handle by zeroing 6 lower bits where we store the power
+/// // denominator for the vector capacity. As noted above, the result will be exactly the
+/// // pointer to the area allocated by `KVec`. Thus, uptr is a valid pointer pointing to
+/// // the vector allocated by `alloc` function above.
+/// unsafe {
+/// copy_nonoverlapping(h_mem.as_ptr().cast(), uptr as *mut c_void, mem_len)
+/// };
+/// }
+///
+/// fn total_pages(pool: &MyZpool) -> u64 {
+/// pool.bytes_used.load(Ordering::Relaxed) >> PAGE_SHIFT
+/// }
+/// }
+///
+/// // Uncomment this for compile time registration (disabled to avoid build error when KUNIT
+/// // tests and zsmalloc are enabled):
+/// // kernel::DECLARE_ZPOOL!(MyZpoolDriver);
+/// ```
+///
+pub trait ZpoolDriver {
+ /// Opaque Rust representation of `struct zpool`.
+ type Pool: ForeignOwnable;
+
+ /// Create a pool.
+ fn create(name: CString, gfp: Flags) -> Result<Self::Pool>;
+
+ /// Allocate an object of `size` bytes from `pool`, with the allocation flags `gfp` and
+ /// preferred NUMA node `nid`. If the allocation is successful, an opaque handle is returned.
+ fn malloc(
+ pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>,
+ size: usize,
+ gfp: Flags,
+ nid: NumaNode,
+ ) -> Result<ZpoolHandle>;
+
+ /// Free an object previously allocated from the `pool`, represented by `handle`.
+ ///
+ /// # Safety
+ ///
+ /// - `handle` must be a valid handle previously returned by `malloc`.
+ /// - `handle` must not be used any more after the call to `free`.
+ unsafe fn free(pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>, handle: ZpoolHandle);
+
+ /// Make all the necessary preparations for the caller to be able to read from the object
+ /// represented by `handle` and return a valid pointer to that object's memory to be read.
+ ///
+ /// # Safety
+ ///
+ /// - `handle` must be a valid handle previously returned by `malloc`.
+ /// - `read_end` with the same `handle` must be called for each `read_begin`.
+ unsafe fn read_begin(
+ pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>,
+ handle: ZpoolHandle,
+ ) -> NonNull<u8>;
+
+ /// Finish reading from a previously allocated `handle`. `handle_mem` must be the pointer
+ /// previously returned by `read_begin`.
+ ///
+ /// # Safety
+ ///
+ /// - `handle` must be a valid handle previously returned by `malloc`.
+ /// - `handle_mem` must be the pointer previously returned by `read_begin`.
+ unsafe fn read_end(
+ pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>,
+ handle: ZpoolHandle,
+ handle_mem: NonNull<u8>,
+ );
+
+ /// Write to the object represented by a previously allocated `handle`. `handle_mem` points
+ /// to the memory to copy data from, and `mem_len` defines the length of the data block to
+ /// be copied.
+ ///
+ /// # Safety
+ ///
+ /// - `handle` must be a valid handle previously returned by `malloc`.
+ /// - `handle_mem` must be a valid pointer into the allocated memory aread represented by
+ /// `handle`.
+ /// - `handle_mem + mem_len - 1` must not point outside the allocated memory area.
+ unsafe fn write(
+ pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>,
+ handle: ZpoolHandle,
+ handle_mem: NonNull<u8>,
+ mem_len: usize,
+ );
+
+ /// Get the number of pages used by the `pool`.
+ fn total_pages(pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>) -> u64;
+}
+
+/// Provide a zpool allocator to zswap
+#[macro_export]
+macro_rules! DECLARE_ZPOOL {
+ ($t: ident) => {
+ use core::ptr::null_mut;
+ use kernel::error::from_result;
+ use kernel::types::ForeignOwnable;
+
+ const __LOG_PREFIX: &[u8] = b"zpool_rust\0";
+
+ fn handle_from_result<T, F>(f: F) -> T
+ where
+ T: From<usize>,
+ F: FnOnce() -> Result<T>,
+ {
+ match f() {
+ Ok(v) => v,
+ Err(e) => T::from(0),
+ }
+ }
+
+ /// Create a pool.
+ #[no_mangle]
+ pub unsafe extern "C" fn zpool_create_pool(name: *const c_uchar) -> *mut c_void {
+ match (|| -> Result<<$t as ZpoolDriver>::Pool> {
+ // SAFETY: the memory pointed to by name is guaranteed by `zpool` to be a valid
+ // string.
+ let name_r = unsafe { CStr::from_char_ptr(name).to_cstring() }?;
+ $t::create(name_r, flags::GFP_KERNEL)
+ })() {
+ Err(_) => null_mut(),
+ Ok(pool) => <$t as ZpoolDriver>::Pool::into_foreign(pool),
+ }
+ }
+
+ /// Destroy the pool.
+ #[no_mangle]
+ pub unsafe extern "C" fn zpool_destroy_pool(pool: *mut c_void) {
+ // SAFETY: The pointer originates from an `into_foreign` call.
+ drop(unsafe { <$t as ZpoolDriver>::Pool::from_foreign(pool) })
+ }
+
+ /// Allocate an object of `size` bytes from `pool`, with the allocation flags `gfp` and
+ /// preferred NUMA node `nid`. If the allocation is successful, an opaque handle
+ /// is returned.
+ #[no_mangle]
+ pub unsafe extern "C" fn zpool_malloc(
+ pool: *mut c_void,
+ size: usize,
+ gfp: u32,
+ nid: c_int,
+ ) -> c_ulong {
+ // SAFETY: The pointer originates from an `into_foreign` call. If `pool` is passed to
+ // `from_foreign`, then that happens in `zpool_destroy_pool` which will not be called
+ // during this method.
+ let pool = unsafe { <$t as ZpoolDriver>::Pool::borrow(pool) };
+ handle_from_result(|| {
+ let the_nid = match nid {
+ kernel::bindings::NUMA_NO_NODE => NumaNode::NO_NODE,
+ _ => NumaNode::new(nid)?,
+ };
+ let h = $t::malloc(
+ pool,
+ size,
+ flags::GFP_NOWAIT | flags::__GFP_HIGHMEM,
+ the_nid,
+ )?;
+ Ok(h.as_raw())
+ })
+ }
+
+ /// Free an object previously allocated from the `pool`, represented by `handle`.
+ #[no_mangle]
+ pub unsafe extern "C" fn zpool_free(pool: *mut c_void, handle: usize) {
+ // SAFETY: The pointer originates from an `into_foreign` call. If `pool` is passed to
+ // `from_foreign`, then that happens in `zpool_destroy_pool` which will not be called
+ // during this method.
+ let pool = unsafe { <$t as ZpoolDriver>::Pool::borrow(pool) };
+
+ // SAFETY:
+ // - the caller (`zswap`) guarantees that `handle` is a valid handle previously
+ // allocated by `malloc`.
+ // - the caller (`zswap`) guarantees that it will not call any other function with this
+ // `handle` as a parameter after this call.
+ unsafe { $t::free(pool, ZpoolHandle::from_raw(handle)) }
+ }
+
+ /// Make all the necessary preparations for the caller to be able to read from the object
+ /// represented by `handle` and return a valid pointer to that object's memory to be read.
+ #[no_mangle]
+ pub unsafe extern "C" fn zpool_obj_read_begin(
+ pool: *mut c_void,
+ handle: usize,
+ _local_copy: *mut c_void,
+ ) -> *mut c_void {
+ // SAFETY: The pointer originates from an `into_foreign` call. If `pool` is passed to
+ // `from_foreign`, then that happens in `zpool_destroy_pool` which will not be called
+ // during this method.
+ let pool = unsafe { <$t as ZpoolDriver>::Pool::borrow(pool) };
+
+ // SAFETY: the caller (`zswap`) guarantees that `handle` is a valid handle previously
+ // allocated by `malloc`.
+ let non_null_ptr = unsafe { $t::read_begin(pool, ZpoolHandle::from_raw(handle)) };
+ non_null_ptr.as_ptr().cast()
+ }
+
+ /// Finish reading from a previously allocated `handle`. `handle_mem` must be the pointer
+ /// previously returned by `read_begin`.
+ #[no_mangle]
+ pub unsafe extern "C" fn zpool_obj_read_end(
+ pool: *mut c_void,
+ handle: usize,
+ handle_mem: *mut c_void,
+ ) {
+ // SAFETY: The pointer originates from an `into_foreign` call. If `pool` is passed to
+ // `from_foreign`, then that happens in `zpool_destroy_pool` which will not be called
+ // during this method.
+ let pool = unsafe { <$t as ZpoolDriver>::Pool::borrow(pool) };
+
+ // SAFETY: `handle_mem` is guaranteed to be non-null by the caller (`zswap`).
+ let handle_mem_ptr = unsafe { NonNull::new_unchecked(handle_mem.cast()) };
+
+ // SAFETY: the caller (`zswap`) guarantees that `handle` is a valid handle previously
+ // allocated by `malloc`.
+ unsafe { $t::read_end(pool, ZpoolHandle::from_raw(handle), handle_mem_ptr) }
+ }
+
+ /// Write to the object represented by a previously allocated `handle`. `handle_mem` points
+ /// to the memory to copy data from, and `mem_len` defines the length of the data block to
+ /// be copied.
+ #[no_mangle]
+ pub unsafe extern "C" fn zpool_obj_write(
+ pool: *mut c_void,
+ handle: usize,
+ handle_mem: *mut c_void,
+ mem_len: usize,
+ ) {
+ // SAFETY: The pointer originates from an `into_foreign` call. If `pool` is passed to
+ // `from_foreign`, then that happens in `zpool_destroy_pool` which will not be called
+ // during this method.
+ let pool = unsafe { <$t as ZpoolDriver>::Pool::borrow(pool) };
+
+ // SAFETY: `handle_mem` is guaranteed to be non-null by the caller (`zswap`).
+ let handle_mem_ptr = unsafe { NonNull::new_unchecked(handle_mem.cast()) };
+
+ // SAFETY: the caller (`zswap`) guarantees that `handle` is a valid handle previously
+ // allocated by `malloc`.
+ unsafe {
+ $t::write(pool, ZpoolHandle::from_raw(handle), handle_mem_ptr, mem_len);
+ }
+ }
+
+ /// Get the number of pages used by the `pool`.
+ #[no_mangle]
+ pub unsafe extern "C" fn zpool_get_total_pages(pool: *mut c_void) -> u64 {
+ // SAFETY: The pointer originates from an `into_foreign` call. If `pool` is passed to
+ // `from_foreign`, then that happens in `zpool_destroy_pool` which will not be called
+ // during this method.
+ let pool = unsafe { <$t as ZpoolDriver>::Pool::borrow(pool) };
+ $t::total_pages(pool)
+ }
+ };
+}
--
2.39.2
^ permalink raw reply related [flat|nested] 11+ messages in thread* Re: [PATCH v6 2/2] rust: zpool: add abstraction for zpool drivers
2025-09-23 10:27 ` [PATCH v6 2/2] rust: zpool: add abstraction for zpool drivers Vitaly Wool
@ 2025-09-23 21:49 ` Benno Lossin
2025-09-24 9:23 ` Vitaly Wool
0 siblings, 1 reply; 11+ messages in thread
From: Benno Lossin @ 2025-09-23 21:49 UTC (permalink / raw)
To: Vitaly Wool, linux-mm, rust-for-linux
Cc: Johannes Weiner, Yosry Ahmed, Nhat Pham, Chengming Zhou,
Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Minchan Kim, Sergey Senozhatsky,
Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, Greg Kroah-Hartman, linux-kernel
On Tue Sep 23, 2025 at 12:27 PM CEST, Vitaly Wool wrote:
> diff --git a/rust/kernel/zpool.rs b/rust/kernel/zpool.rs
> new file mode 100644
> index 000000000000..53a0dc0607e6
> --- /dev/null
> +++ b/rust/kernel/zpool.rs
> @@ -0,0 +1,366 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +//! Implementation of Rust interface towards zpool API.
> +
> +use crate::{error::Result, kernel::alloc::Flags, str::CString, types::ForeignOwnable};
> +use core::ptr::NonNull;
> +use kernel::alloc::NumaNode;
> +
> +/// The Rust representation of zpool handle.
Would be great to explain what this means, what's the handle used for?
> +pub struct ZpoolHandle(usize);
> +
> +impl ZpoolHandle {
> + /// Create `ZpoolHandle` from the raw representation.
> + pub fn from_raw(h: usize) -> Self {
> + Self(h)
> + }
> +
> + /// Get the raw representation of the handle.
> + pub fn as_raw(self) -> usize {
> + self.0
> + }
Hmm this seems a bit weird, because now users of Zpools can manufacture
their own handles? Not sure as to how we could still allow Zpool
implementers to create these while preventing other users from doing
creating them though...
> +}
> +
> +/// Zpool API.
> +///
> +/// The [`ZpoolDriver`] trait serves as an interface for Zpool drivers implemented in Rust.
> +/// Such drivers implement memory storage pools in accordance with the zpool API.
> +///
> +/// # Example
> +///
> +/// A zpool driver implementation which uses KVec of 2**n sizes, n = 6, 7, ..., PAGE_SHIFT.
> +/// Every zpool object is packed into a KVec that is sufficiently large, and n (the
> +/// denominator) is saved in the least significant bits of the handle, which is guaranteed
> +/// to be at least 2**6 aligned by kmalloc.
> +///
> +/// ```
> +/// use core::ptr::{NonNull, copy_nonoverlapping};
> +/// use core::sync::atomic::{AtomicU64, Ordering};
> +/// use kernel::alloc::{Flags, flags, KBox, KVec, NumaNode};
> +/// use kernel::page::PAGE_SHIFT;
> +/// use kernel::prelude::EINVAL;
> +/// use kernel::str::CString;
> +/// use kernel::zpool::*;
> +///
> +/// struct MyZpool {
> +/// name: CString,
> +/// bytes_used: AtomicU64,
> +/// }
> +///
> +/// struct MyZpoolDriver;
> +///
> +/// impl ZpoolDriver for MyZpoolDriver {
> +/// type Pool = KBox<MyZpool>;
> +///
> +/// fn create(name: CString, gfp: Flags) -> Result<KBox<MyZpool>> {
> +/// let my_pool = MyZpool { name, bytes_used: AtomicU64::new(0) };
> +/// let pool = KBox::new(my_pool, gfp)?;
> +///
> +/// pr_debug!("Pool {:?} created\n", pool.name);
> +/// Ok(pool)
> +/// }
> +///
> +/// fn malloc(pool: &MyZpool, size: usize, _gfp: Flags, _nid: NumaNode) -> Result<ZpoolHandle> {
> +/// let pow = size.next_power_of_two().trailing_zeros().max(6);
> +/// match pow {
> +/// 0 => Err(EINVAL),
> +/// m if m > PAGE_SHIFT as u32 => Err(ENOSPC),
> +/// _ => {
> +/// let vec = KVec::<u8>::with_capacity(1 << pow, GFP_KERNEL)?;
> +/// let (ptr, _len, _cap) = vec.into_raw_parts();
> +///
> +/// // We assume that kmalloc-64, kmalloc-128 etc. kmem caches will be used for
> +/// // our allocations, so it's actually `1 << pow` bytes that we have consumed.
> +/// pool.bytes_used.fetch_add(1 << pow, Ordering::Relaxed);
> +///
> +/// // `kmalloc` guarantees that an allocation of size x*2^n is 2^n aligned.
> +/// // Therefore the 6 lower bits are zeros and we can use these to store `pow`.
> +/// Ok(ZpoolHandle::from_raw(ptr as usize | (pow as usize - 6)))
> +/// }
> +/// }
> +/// }
> +///
> +/// unsafe fn free(pool: &MyZpool, handle: ZpoolHandle) {
> +/// let h = handle.as_raw();
> +/// let n = (h & 0x3F) + 6;
> +/// let uptr = h & !0x3F;
> +///
> +/// // SAFETY:
> +/// // - we derive `uptr` from handle by zeroing 6 lower bits where we store the power
> +/// // denominator for the vector capacity. As noted above, the result will be exactly the
> +/// // pointer to the area allocated by `KVec`. Thus, uptr is a valid pointer pointing to
> +/// // the vector allocated by `alloc` function above.
> +/// // - 1 << n == capacity and is coming from the first 6 bits of handle.
> +/// let vec = unsafe { KVec::<u8>::from_raw_parts(uptr as *mut u8, 0, 1 << n) };
> +/// drop(vec);
> +/// pool.bytes_used.fetch_sub(1 << n, Ordering::Relaxed);
> +/// }
> +///
> +/// unsafe fn read_begin(_pool: &MyZpool, handle: ZpoolHandle) -> NonNull<u8> {
> +/// let uptr = handle.as_raw() & !0x3F;
> +/// // SAFETY:
> +/// // - we derive `uptr` from handle by zeroing 6 lower bits where we store the power
> +/// // denominator for the vector capacity. As noted above, the result will be exactly the
> +/// // pointer to the area allocated by `KVec`. Thus, uptr is a valid pointer pointing to
> +/// // the vector allocated by `alloc` function above.
> +/// unsafe { NonNull::new_unchecked(uptr as *mut u8) }
> +/// }
> +///
> +/// unsafe fn read_end(_pool: &MyZpool, _handle: ZpoolHandle, _handle_mem: NonNull<u8>) {}
> +///
> +/// unsafe fn write(_p: &MyZpool, handle: ZpoolHandle, h_mem: NonNull<u8>, mem_len: usize) {
> +/// let uptr = handle.as_raw() & !0x3F;
> +/// // SAFETY:
> +/// // - `h_mem` is a valid non-null pointer provided by zswap.
> +/// // - `uptr` is derived from handle by zeroing 6 lower bits where we store the power
> +/// // denominator for the vector capacity. As noted above, the result will be exactly the
> +/// // pointer to the area allocated by `KVec`. Thus, uptr is a valid pointer pointing to
> +/// // the vector allocated by `alloc` function above.
> +/// unsafe {
> +/// copy_nonoverlapping(h_mem.as_ptr().cast(), uptr as *mut c_void, mem_len)
> +/// };
> +/// }
> +///
> +/// fn total_pages(pool: &MyZpool) -> u64 {
> +/// pool.bytes_used.load(Ordering::Relaxed) >> PAGE_SHIFT
> +/// }
> +/// }
> +///
> +/// // Uncomment this for compile time registration (disabled to avoid build error when KUNIT
> +/// // tests and zsmalloc are enabled):
> +/// // kernel::DECLARE_ZPOOL!(MyZpoolDriver);
> +/// ```
> +///
> +pub trait ZpoolDriver {
> + /// Opaque Rust representation of `struct zpool`.
> + type Pool: ForeignOwnable;
Also what happened to my comment on v4 of this patchset?
https://lore.kernel.org/all/DCLK1YG1L5TZ.1VMGX131LII9V@kernel.org:
>> It can indeed but then the ZpoolDriver trait will have to be extended
>> with functions like into_raw() and from_raw(), because zpool expects
>> *mut c_void, so on the Adapter side it will look like
>>
>> extern "C" fn create_(name: *const c_uchar, gfp: u32) -> *mut c_void {
>> // SAFETY: the memory pointed to by name is guaranteed by zpool
>> to be a valid string
>> let pool = unsafe { T::create(CStr::from_char_ptr(name),
>> Flags::from_raw(gfp)) };
>> match pool {
>> Err(_) => null_mut(),
>> Ok(p) => T::into_raw(p).cast(),
>> }
>> }
>>
>> The question is, why does this make it better?
>
> No, thanks for sharing that function. Then the question becomes, do you
> really need `ForeignOwnable`? Or is `KBox` enough? Do you really want
> people to use `Arc<MyZPool>`? Because `BorrowedMut` of `Arc` is the same
> as it's `Borrowed` variant (it's read-only after all).
>
> If you can get away with just `Box` (you might want people to choose
> their allocator, which is fine IMO), then I'd do so.
I still think that if you can use `Box<Self>` the trait is going to be
much easier to understand.
> +
> + /// Create a pool.
> + fn create(name: CString, gfp: Flags) -> Result<Self::Pool>;
> +
> + /// Allocate an object of `size` bytes from `pool`, with the allocation flags `gfp` and
> + /// preferred NUMA node `nid`. If the allocation is successful, an opaque handle is returned.
> + fn malloc(
> + pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>,
> + size: usize,
> + gfp: Flags,
> + nid: NumaNode,
> + ) -> Result<ZpoolHandle>;
> +
> + /// Free an object previously allocated from the `pool`, represented by `handle`.
> + ///
> + /// # Safety
> + ///
> + /// - `handle` must be a valid handle previously returned by `malloc`.
> + /// - `handle` must not be used any more after the call to `free`.
> + unsafe fn free(pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>, handle: ZpoolHandle);
> +
> + /// Make all the necessary preparations for the caller to be able to read from the object
> + /// represented by `handle` and return a valid pointer to that object's memory to be read.
> + ///
> + /// # Safety
> + ///
> + /// - `handle` must be a valid handle previously returned by `malloc`.
> + /// - `read_end` with the same `handle` must be called for each `read_begin`.
> + unsafe fn read_begin(
> + pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>,
> + handle: ZpoolHandle,
> + ) -> NonNull<u8>;
> +
> + /// Finish reading from a previously allocated `handle`. `handle_mem` must be the pointer
> + /// previously returned by `read_begin`.
> + ///
> + /// # Safety
> + ///
> + /// - `handle` must be a valid handle previously returned by `malloc`.
> + /// - `handle_mem` must be the pointer previously returned by `read_begin`.
> + unsafe fn read_end(
> + pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>,
> + handle: ZpoolHandle,
> + handle_mem: NonNull<u8>,
> + );
> +
> + /// Write to the object represented by a previously allocated `handle`. `handle_mem` points
> + /// to the memory to copy data from, and `mem_len` defines the length of the data block to
> + /// be copied.
> + ///
> + /// # Safety
> + ///
> + /// - `handle` must be a valid handle previously returned by `malloc`.
> + /// - `handle_mem` must be a valid pointer into the allocated memory aread represented by
> + /// `handle`.
> + /// - `handle_mem + mem_len - 1` must not point outside the allocated memory area.
> + unsafe fn write(
> + pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>,
> + handle: ZpoolHandle,
> + handle_mem: NonNull<u8>,
> + mem_len: usize,
> + );
> +
> + /// Get the number of pages used by the `pool`.
> + fn total_pages(pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>) -> u64;
> +}
> +
> +/// Provide a zpool allocator to zswap
> +#[macro_export]
> +macro_rules! DECLARE_ZPOOL {
Why all caps and not snake case?
> + ($t: ident) => {
> + use core::ptr::null_mut;
> + use kernel::error::from_result;
> + use kernel::types::ForeignOwnable;
Macros shouldn't have `use` statements in a non-local area (so inside
`const` bodies and modules is fine).
> +
> + const __LOG_PREFIX: &[u8] = b"zpool_rust\0";
This seems wrong. Why do you need to generate this? Shouldn't the user
still invoke `module!` or a similar macro?
> +
> + fn handle_from_result<T, F>(f: F) -> T
> + where
> + T: From<usize>,
> + F: FnOnce() -> Result<T>,
> + {
> + match f() {
> + Ok(v) => v,
> + Err(e) => T::from(0),
> + }
> + }
Why is this function inside the macro?
> +
> + /// Create a pool.
> + #[no_mangle]
> + pub unsafe extern "C" fn zpool_create_pool(name: *const c_uchar) -> *mut c_void {
Missing safety docs.
> + match (|| -> Result<<$t as ZpoolDriver>::Pool> {
> + // SAFETY: the memory pointed to by name is guaranteed by `zpool` to be a valid
> + // string.
> + let name_r = unsafe { CStr::from_char_ptr(name).to_cstring() }?;
> + $t::create(name_r, flags::GFP_KERNEL)
> + })() {
> + Err(_) => null_mut(),
> + Ok(pool) => <$t as ZpoolDriver>::Pool::into_foreign(pool),
> + }
> + }
> +
> + /// Destroy the pool.
> + #[no_mangle]
> + pub unsafe extern "C" fn zpool_destroy_pool(pool: *mut c_void) {
> + // SAFETY: The pointer originates from an `into_foreign` call.
> + drop(unsafe { <$t as ZpoolDriver>::Pool::from_foreign(pool) })
> + }
Have you tried to use the same pattern that we use for the many
different `Registration` types in the kernel? For example take a look at
`Adapter<T>` from `rust/kernel/net/phy.rs`. That way you don't need a
macro and can make the registration a function that takes a generic `T:
ZpoolDriver`.
---
Cheers,
Benno
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH v6 2/2] rust: zpool: add abstraction for zpool drivers
2025-09-23 21:49 ` Benno Lossin
@ 2025-09-24 9:23 ` Vitaly Wool
2025-09-24 17:32 ` Benno Lossin
0 siblings, 1 reply; 11+ messages in thread
From: Vitaly Wool @ 2025-09-24 9:23 UTC (permalink / raw)
To: Benno Lossin, linux-mm, rust-for-linux
Cc: Johannes Weiner, Yosry Ahmed, Nhat Pham, Chengming Zhou,
Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Minchan Kim, Sergey Senozhatsky,
Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, Greg Kroah-Hartman, linux-kernel
On 9/23/25 23:49, Benno Lossin wrote:
> On Tue Sep 23, 2025 at 12:27 PM CEST, Vitaly Wool wrote:
>> diff --git a/rust/kernel/zpool.rs b/rust/kernel/zpool.rs
>> new file mode 100644
>> index 000000000000..53a0dc0607e6
>> --- /dev/null
>> +++ b/rust/kernel/zpool.rs
>> @@ -0,0 +1,366 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +
>> +//! Implementation of Rust interface towards zpool API.
>> +
>> +use crate::{error::Result, kernel::alloc::Flags, str::CString, types::ForeignOwnable};
>> +use core::ptr::NonNull;
>> +use kernel::alloc::NumaNode;
>> +
>> +/// The Rust representation of zpool handle.
>
> Would be great to explain what this means, what's the handle used for?
Sorry, do you mean explaining it here or in the code?
>> +pub struct ZpoolHandle(usize);
>> +
>> +impl ZpoolHandle {
>> + /// Create `ZpoolHandle` from the raw representation.
>> + pub fn from_raw(h: usize) -> Self {
>> + Self(h)
>> + }
>> +
>> + /// Get the raw representation of the handle.
>> + pub fn as_raw(self) -> usize {
>> + self.0
>> + }
>
> Hmm this seems a bit weird, because now users of Zpools can manufacture
> their own handles? Not sure as to how we could still allow Zpool
> implementers to create these while preventing other users from doing
> creating them though...
This is a good question indeed. The thing is, an allocation backend
(these implementing zpool api) is to provide an opaque handle which is
usize, and since it has the best knowledge how to compose it, we need to
let it do that. OTOH I am not too happy with this straightforward
approach (from_raw()/as_raw()) either.
Would making ZpoolHandle opaque here and defining a trait that a backend
must implement for ZpoolHandle work better? The example backend would
then do something like
struct MyZpoolHandle {
ptr: *mut u8,
pow: usize,
}
type ZpoolHandle = MyZpoolHandle;
trait AsRawHandle {
fn as_raw_handle(&self) -> usize;
}
impl AsRawHandle for ZpoolHandle {
fn as_raw_handle(&self) -> usize {
(self.ptr as usize) | self.pow
}
}
Would that make sense?
>> +
>> +/// Zpool API.
>> +///
>> +/// The [`ZpoolDriver`] trait serves as an interface for Zpool drivers implemented in Rust.
>> +/// Such drivers implement memory storage pools in accordance with the zpool API.
>> +///
>> +/// # Example
>> +///
>> +/// A zpool driver implementation which uses KVec of 2**n sizes, n = 6, 7, ..., PAGE_SHIFT.
>> +/// Every zpool object is packed into a KVec that is sufficiently large, and n (the
>> +/// denominator) is saved in the least significant bits of the handle, which is guaranteed
>> +/// to be at least 2**6 aligned by kmalloc.
>> +///
>> +/// ```
>> +/// use core::ptr::{NonNull, copy_nonoverlapping};
>> +/// use core::sync::atomic::{AtomicU64, Ordering};
>> +/// use kernel::alloc::{Flags, flags, KBox, KVec, NumaNode};
>> +/// use kernel::page::PAGE_SHIFT;
>> +/// use kernel::prelude::EINVAL;
>> +/// use kernel::str::CString;
>> +/// use kernel::zpool::*;
>> +///
>> +/// struct MyZpool {
>> +/// name: CString,
>> +/// bytes_used: AtomicU64,
>> +/// }
>> +///
>> +/// struct MyZpoolDriver;
>> +///
>> +/// impl ZpoolDriver for MyZpoolDriver {
>> +/// type Pool = KBox<MyZpool>;
>> +///
>> +/// fn create(name: CString, gfp: Flags) -> Result<KBox<MyZpool>> {
>> +/// let my_pool = MyZpool { name, bytes_used: AtomicU64::new(0) };
>> +/// let pool = KBox::new(my_pool, gfp)?;
>> +///
>> +/// pr_debug!("Pool {:?} created\n", pool.name);
>> +/// Ok(pool)
>> +/// }
>> +///
>> +/// fn malloc(pool: &MyZpool, size: usize, _gfp: Flags, _nid: NumaNode) -> Result<ZpoolHandle> {
>> +/// let pow = size.next_power_of_two().trailing_zeros().max(6);
>> +/// match pow {
>> +/// 0 => Err(EINVAL),
>> +/// m if m > PAGE_SHIFT as u32 => Err(ENOSPC),
>> +/// _ => {
>> +/// let vec = KVec::<u8>::with_capacity(1 << pow, GFP_KERNEL)?;
>> +/// let (ptr, _len, _cap) = vec.into_raw_parts();
>> +///
>> +/// // We assume that kmalloc-64, kmalloc-128 etc. kmem caches will be used for
>> +/// // our allocations, so it's actually `1 << pow` bytes that we have consumed.
>> +/// pool.bytes_used.fetch_add(1 << pow, Ordering::Relaxed);
>> +///
>> +/// // `kmalloc` guarantees that an allocation of size x*2^n is 2^n aligned.
>> +/// // Therefore the 6 lower bits are zeros and we can use these to store `pow`.
>> +/// Ok(ZpoolHandle::from_raw(ptr as usize | (pow as usize - 6)))
>> +/// }
>> +/// }
>> +/// }
>> +///
>> +/// unsafe fn free(pool: &MyZpool, handle: ZpoolHandle) {
>> +/// let h = handle.as_raw();
>> +/// let n = (h & 0x3F) + 6;
>> +/// let uptr = h & !0x3F;
>> +///
>> +/// // SAFETY:
>> +/// // - we derive `uptr` from handle by zeroing 6 lower bits where we store the power
>> +/// // denominator for the vector capacity. As noted above, the result will be exactly the
>> +/// // pointer to the area allocated by `KVec`. Thus, uptr is a valid pointer pointing to
>> +/// // the vector allocated by `alloc` function above.
>> +/// // - 1 << n == capacity and is coming from the first 6 bits of handle.
>> +/// let vec = unsafe { KVec::<u8>::from_raw_parts(uptr as *mut u8, 0, 1 << n) };
>> +/// drop(vec);
>> +/// pool.bytes_used.fetch_sub(1 << n, Ordering::Relaxed);
>> +/// }
>> +///
>> +/// unsafe fn read_begin(_pool: &MyZpool, handle: ZpoolHandle) -> NonNull<u8> {
>> +/// let uptr = handle.as_raw() & !0x3F;
>> +/// // SAFETY:
>> +/// // - we derive `uptr` from handle by zeroing 6 lower bits where we store the power
>> +/// // denominator for the vector capacity. As noted above, the result will be exactly the
>> +/// // pointer to the area allocated by `KVec`. Thus, uptr is a valid pointer pointing to
>> +/// // the vector allocated by `alloc` function above.
>> +/// unsafe { NonNull::new_unchecked(uptr as *mut u8) }
>> +/// }
>> +///
>> +/// unsafe fn read_end(_pool: &MyZpool, _handle: ZpoolHandle, _handle_mem: NonNull<u8>) {}
>> +///
>> +/// unsafe fn write(_p: &MyZpool, handle: ZpoolHandle, h_mem: NonNull<u8>, mem_len: usize) {
>> +/// let uptr = handle.as_raw() & !0x3F;
>> +/// // SAFETY:
>> +/// // - `h_mem` is a valid non-null pointer provided by zswap.
>> +/// // - `uptr` is derived from handle by zeroing 6 lower bits where we store the power
>> +/// // denominator for the vector capacity. As noted above, the result will be exactly the
>> +/// // pointer to the area allocated by `KVec`. Thus, uptr is a valid pointer pointing to
>> +/// // the vector allocated by `alloc` function above.
>> +/// unsafe {
>> +/// copy_nonoverlapping(h_mem.as_ptr().cast(), uptr as *mut c_void, mem_len)
>> +/// };
>> +/// }
>> +///
>> +/// fn total_pages(pool: &MyZpool) -> u64 {
>> +/// pool.bytes_used.load(Ordering::Relaxed) >> PAGE_SHIFT
>> +/// }
>> +/// }
>> +///
>> +/// // Uncomment this for compile time registration (disabled to avoid build error when KUNIT
>> +/// // tests and zsmalloc are enabled):
>> +/// // kernel::DECLARE_ZPOOL!(MyZpoolDriver);
>> +/// ```
>> +///
>> +pub trait ZpoolDriver {
>> + /// Opaque Rust representation of `struct zpool`.
>> + type Pool: ForeignOwnable;
>
> Also what happened to my comment on v4 of this patchset?
>
> https://lore.kernel.org/all/DCLK1YG1L5TZ.1VMGX131LII9V@kernel.org:
>
>>> It can indeed but then the ZpoolDriver trait will have to be extended
>>> with functions like into_raw() and from_raw(), because zpool expects
>>> *mut c_void, so on the Adapter side it will look like
>>>
>>> extern "C" fn create_(name: *const c_uchar, gfp: u32) -> *mut c_void {
>>> // SAFETY: the memory pointed to by name is guaranteed by zpool
>>> to be a valid string
>>> let pool = unsafe { T::create(CStr::from_char_ptr(name),
>>> Flags::from_raw(gfp)) };
>>> match pool {
>>> Err(_) => null_mut(),
>>> Ok(p) => T::into_raw(p).cast(),
>>> }
>>> }
>>>
>>> The question is, why does this make it better?
>>
>> No, thanks for sharing that function. Then the question becomes, do you
>> really need `ForeignOwnable`? Or is `KBox` enough? Do you really want
>> people to use `Arc<MyZPool>`? Because `BorrowedMut` of `Arc` is the same
>> as it's `Borrowed` variant (it's read-only after all).
>>
>> If you can get away with just `Box` (you might want people to choose
>> their allocator, which is fine IMO), then I'd do so.
>
> I still think that if you can use `Box<Self>` the trait is going to be
> much easier to understand.
Okay, thanks, I'll work on that.
>> +
>> + /// Create a pool.
>> + fn create(name: CString, gfp: Flags) -> Result<Self::Pool>;
>> +
>> + /// Allocate an object of `size` bytes from `pool`, with the allocation flags `gfp` and
>> + /// preferred NUMA node `nid`. If the allocation is successful, an opaque handle is returned.
>> + fn malloc(
>> + pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>,
>> + size: usize,
>> + gfp: Flags,
>> + nid: NumaNode,
>> + ) -> Result<ZpoolHandle>;
>> +
>> + /// Free an object previously allocated from the `pool`, represented by `handle`.
>> + ///
>> + /// # Safety
>> + ///
>> + /// - `handle` must be a valid handle previously returned by `malloc`.
>> + /// - `handle` must not be used any more after the call to `free`.
>> + unsafe fn free(pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>, handle: ZpoolHandle);
>> +
>> + /// Make all the necessary preparations for the caller to be able to read from the object
>> + /// represented by `handle` and return a valid pointer to that object's memory to be read.
>> + ///
>> + /// # Safety
>> + ///
>> + /// - `handle` must be a valid handle previously returned by `malloc`.
>> + /// - `read_end` with the same `handle` must be called for each `read_begin`.
>> + unsafe fn read_begin(
>> + pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>,
>> + handle: ZpoolHandle,
>> + ) -> NonNull<u8>;
>> +
>> + /// Finish reading from a previously allocated `handle`. `handle_mem` must be the pointer
>> + /// previously returned by `read_begin`.
>> + ///
>> + /// # Safety
>> + ///
>> + /// - `handle` must be a valid handle previously returned by `malloc`.
>> + /// - `handle_mem` must be the pointer previously returned by `read_begin`.
>> + unsafe fn read_end(
>> + pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>,
>> + handle: ZpoolHandle,
>> + handle_mem: NonNull<u8>,
>> + );
>> +
>> + /// Write to the object represented by a previously allocated `handle`. `handle_mem` points
>> + /// to the memory to copy data from, and `mem_len` defines the length of the data block to
>> + /// be copied.
>> + ///
>> + /// # Safety
>> + ///
>> + /// - `handle` must be a valid handle previously returned by `malloc`.
>> + /// - `handle_mem` must be a valid pointer into the allocated memory aread represented by
>> + /// `handle`.
>> + /// - `handle_mem + mem_len - 1` must not point outside the allocated memory area.
>> + unsafe fn write(
>> + pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>,
>> + handle: ZpoolHandle,
>> + handle_mem: NonNull<u8>,
>> + mem_len: usize,
>> + );
>> +
>> + /// Get the number of pages used by the `pool`.
>> + fn total_pages(pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>) -> u64;
>> +}
>> +
>> +/// Provide a zpool allocator to zswap
>> +#[macro_export]
>> +macro_rules! DECLARE_ZPOOL {
>
> Why all caps and not snake case?
C style, sorry :) Will fix.
>> + ($t: ident) => {
>> + use core::ptr::null_mut;
>> + use kernel::error::from_result;
>> + use kernel::types::ForeignOwnable;
>
> Macros shouldn't have `use` statements in a non-local area (so inside
> `const` bodies and modules is fine).
>
>> +
>> + const __LOG_PREFIX: &[u8] = b"zpool_rust\0";
>
> This seems wrong. Why do you need to generate this? Shouldn't the user
> still invoke `module!` or a similar macro?
Unfortunately not. As we have discussed at Kangrejos, the zpool
implementation as a driver is on its way out [1] and there has to be
more voices against that for it to be stopped. If we now are dealing
with the build time API (i. e. a wrapper over zsmalloc functions) then
we have to define a build time macro, be that DECLARE_ZPOOL or
DeclareZpool :)
>> +
>> + fn handle_from_result<T, F>(f: F) -> T
>> + where
>> + T: From<usize>,
>> + F: FnOnce() -> Result<T>,
>> + {
>> + match f() {
>> + Ok(v) => v,
>> + Err(e) => T::from(0),
>> + }
>> + }
>
> Why is this function inside the macro?
Doesn't seem to be needed elsewhere. I could put it in a separate patch
for error.rs as a complement to from_result() but I thought no one was
interested in this case.
>> +
>> + /// Create a pool.
>> + #[no_mangle]
>> + pub unsafe extern "C" fn zpool_create_pool(name: *const c_uchar) -> *mut c_void {
>
> Missing safety docs.
>
>> + match (|| -> Result<<$t as ZpoolDriver>::Pool> {
>> + // SAFETY: the memory pointed to by name is guaranteed by `zpool` to be a valid
>> + // string.
>> + let name_r = unsafe { CStr::from_char_ptr(name).to_cstring() }?;
>> + $t::create(name_r, flags::GFP_KERNEL)
>> + })() {
>> + Err(_) => null_mut(),
>> + Ok(pool) => <$t as ZpoolDriver>::Pool::into_foreign(pool),
>> + }
>> + }
>> +
>> + /// Destroy the pool.
>> + #[no_mangle]
>> + pub unsafe extern "C" fn zpool_destroy_pool(pool: *mut c_void) {
>> + // SAFETY: The pointer originates from an `into_foreign` call.
>> + drop(unsafe { <$t as ZpoolDriver>::Pool::from_foreign(pool) })
>> + }
>
> Have you tried to use the same pattern that we use for the many
> different `Registration` types in the kernel? For example take a look at
> `Adapter<T>` from `rust/kernel/net/phy.rs`. That way you don't need a
> macro and can make the registration a function that takes a generic `T:
> ZpoolDriver`.
That's what was in the previous version. Unfortunately it won't work any
more for the reason described above, see also [1].
[1]
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/commit/mm?h=mm-stable&id=2ccd9fecd9163f168761d4398564c81554f636ef
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH v6 2/2] rust: zpool: add abstraction for zpool drivers
2025-09-24 9:23 ` Vitaly Wool
@ 2025-09-24 17:32 ` Benno Lossin
0 siblings, 0 replies; 11+ messages in thread
From: Benno Lossin @ 2025-09-24 17:32 UTC (permalink / raw)
To: Vitaly Wool, linux-mm, rust-for-linux
Cc: Johannes Weiner, Yosry Ahmed, Nhat Pham, Chengming Zhou,
Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, Minchan Kim, Sergey Senozhatsky,
Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
Danilo Krummrich, Greg Kroah-Hartman, linux-kernel
On Wed Sep 24, 2025 at 11:23 AM CEST, Vitaly Wool wrote:
> On 9/23/25 23:49, Benno Lossin wrote:
>> On Tue Sep 23, 2025 at 12:27 PM CEST, Vitaly Wool wrote:
>>> diff --git a/rust/kernel/zpool.rs b/rust/kernel/zpool.rs
>>> new file mode 100644
>>> index 000000000000..53a0dc0607e6
>>> --- /dev/null
>>> +++ b/rust/kernel/zpool.rs
>>> @@ -0,0 +1,366 @@
>>> +// SPDX-License-Identifier: GPL-2.0
>>> +
>>> +//! Implementation of Rust interface towards zpool API.
>>> +
>>> +use crate::{error::Result, kernel::alloc::Flags, str::CString, types::ForeignOwnable};
>>> +use core::ptr::NonNull;
>>> +use kernel::alloc::NumaNode;
>>> +
>>> +/// The Rust representation of zpool handle.
>>
>> Would be great to explain what this means, what's the handle used for?
>
> Sorry, do you mean explaining it here or in the code?
Yeah give a short summary what this means. So something like:
/// Zpool handle
///
/// Used to represent an allocation on a [`ZPool`].
Doesn't need to be detailed.
>>> +pub struct ZpoolHandle(usize);
>>> +
>>> +impl ZpoolHandle {
>>> + /// Create `ZpoolHandle` from the raw representation.
>>> + pub fn from_raw(h: usize) -> Self {
>>> + Self(h)
>>> + }
>>> +
>>> + /// Get the raw representation of the handle.
>>> + pub fn as_raw(self) -> usize {
>>> + self.0
>>> + }
>>
>> Hmm this seems a bit weird, because now users of Zpools can manufacture
>> their own handles? Not sure as to how we could still allow Zpool
>> implementers to create these while preventing other users from doing
>> creating them though...
>
> This is a good question indeed. The thing is, an allocation backend
> (these implementing zpool api) is to provide an opaque handle which is
> usize, and since it has the best knowledge how to compose it, we need to
> let it do that. OTOH I am not too happy with this straightforward
> approach (from_raw()/as_raw()) either.
>
> Would making ZpoolHandle opaque here and defining a trait that a backend
> must implement for ZpoolHandle work better? The example backend would
> then do something like
>
> struct MyZpoolHandle {
> ptr: *mut u8,
> pow: usize,
> }
> type ZpoolHandle = MyZpoolHandle;
>
> trait AsRawHandle {
> fn as_raw_handle(&self) -> usize;
> }
>
> impl AsRawHandle for ZpoolHandle {
> fn as_raw_handle(&self) -> usize {
> (self.ptr as usize) | self.pow
> }
> }
>
> Would that make sense?
You don't need the trait. You can just add an associated type to
`ZpoolDriver` called `Handle` and then use that instead of this concrete
type.
Implementers of `ZpoolDriver` need to then provide their own handle &
they can ensure nobody except they are able to construct one.
Or do some other functions need access to the raw usize representation?
>>> +
>>> +/// Zpool API.
>>> +///
>>> +/// The [`ZpoolDriver`] trait serves as an interface for Zpool drivers implemented in Rust.
>>> +/// Such drivers implement memory storage pools in accordance with the zpool API.
>>> +///
>>> +/// # Example
>>> +///
>>> +/// A zpool driver implementation which uses KVec of 2**n sizes, n = 6, 7, ..., PAGE_SHIFT.
>>> +/// Every zpool object is packed into a KVec that is sufficiently large, and n (the
>>> +/// denominator) is saved in the least significant bits of the handle, which is guaranteed
>>> +/// to be at least 2**6 aligned by kmalloc.
>>> +///
>>> +/// ```
>>> +/// use core::ptr::{NonNull, copy_nonoverlapping};
>>> +/// use core::sync::atomic::{AtomicU64, Ordering};
>>> +/// use kernel::alloc::{Flags, flags, KBox, KVec, NumaNode};
>>> +/// use kernel::page::PAGE_SHIFT;
>>> +/// use kernel::prelude::EINVAL;
>>> +/// use kernel::str::CString;
>>> +/// use kernel::zpool::*;
>>> +///
>>> +/// struct MyZpool {
>>> +/// name: CString,
>>> +/// bytes_used: AtomicU64,
>>> +/// }
>>> +///
>>> +/// struct MyZpoolDriver;
>>> +///
>>> +/// impl ZpoolDriver for MyZpoolDriver {
>>> +/// type Pool = KBox<MyZpool>;
>>> +///
>>> +/// fn create(name: CString, gfp: Flags) -> Result<KBox<MyZpool>> {
>>> +/// let my_pool = MyZpool { name, bytes_used: AtomicU64::new(0) };
>>> +/// let pool = KBox::new(my_pool, gfp)?;
>>> +///
>>> +/// pr_debug!("Pool {:?} created\n", pool.name);
>>> +/// Ok(pool)
>>> +/// }
>>> +///
>>> +/// fn malloc(pool: &MyZpool, size: usize, _gfp: Flags, _nid: NumaNode) -> Result<ZpoolHandle> {
>>> +/// let pow = size.next_power_of_two().trailing_zeros().max(6);
>>> +/// match pow {
>>> +/// 0 => Err(EINVAL),
>>> +/// m if m > PAGE_SHIFT as u32 => Err(ENOSPC),
>>> +/// _ => {
>>> +/// let vec = KVec::<u8>::with_capacity(1 << pow, GFP_KERNEL)?;
>>> +/// let (ptr, _len, _cap) = vec.into_raw_parts();
>>> +///
>>> +/// // We assume that kmalloc-64, kmalloc-128 etc. kmem caches will be used for
>>> +/// // our allocations, so it's actually `1 << pow` bytes that we have consumed.
>>> +/// pool.bytes_used.fetch_add(1 << pow, Ordering::Relaxed);
>>> +///
>>> +/// // `kmalloc` guarantees that an allocation of size x*2^n is 2^n aligned.
>>> +/// // Therefore the 6 lower bits are zeros and we can use these to store `pow`.
>>> +/// Ok(ZpoolHandle::from_raw(ptr as usize | (pow as usize - 6)))
>>> +/// }
>>> +/// }
>>> +/// }
>>> +///
>>> +/// unsafe fn free(pool: &MyZpool, handle: ZpoolHandle) {
>>> +/// let h = handle.as_raw();
>>> +/// let n = (h & 0x3F) + 6;
>>> +/// let uptr = h & !0x3F;
>>> +///
>>> +/// // SAFETY:
>>> +/// // - we derive `uptr` from handle by zeroing 6 lower bits where we store the power
>>> +/// // denominator for the vector capacity. As noted above, the result will be exactly the
>>> +/// // pointer to the area allocated by `KVec`. Thus, uptr is a valid pointer pointing to
>>> +/// // the vector allocated by `alloc` function above.
>>> +/// // - 1 << n == capacity and is coming from the first 6 bits of handle.
>>> +/// let vec = unsafe { KVec::<u8>::from_raw_parts(uptr as *mut u8, 0, 1 << n) };
>>> +/// drop(vec);
>>> +/// pool.bytes_used.fetch_sub(1 << n, Ordering::Relaxed);
>>> +/// }
>>> +///
>>> +/// unsafe fn read_begin(_pool: &MyZpool, handle: ZpoolHandle) -> NonNull<u8> {
>>> +/// let uptr = handle.as_raw() & !0x3F;
>>> +/// // SAFETY:
>>> +/// // - we derive `uptr` from handle by zeroing 6 lower bits where we store the power
>>> +/// // denominator for the vector capacity. As noted above, the result will be exactly the
>>> +/// // pointer to the area allocated by `KVec`. Thus, uptr is a valid pointer pointing to
>>> +/// // the vector allocated by `alloc` function above.
>>> +/// unsafe { NonNull::new_unchecked(uptr as *mut u8) }
>>> +/// }
>>> +///
>>> +/// unsafe fn read_end(_pool: &MyZpool, _handle: ZpoolHandle, _handle_mem: NonNull<u8>) {}
>>> +///
>>> +/// unsafe fn write(_p: &MyZpool, handle: ZpoolHandle, h_mem: NonNull<u8>, mem_len: usize) {
>>> +/// let uptr = handle.as_raw() & !0x3F;
>>> +/// // SAFETY:
>>> +/// // - `h_mem` is a valid non-null pointer provided by zswap.
>>> +/// // - `uptr` is derived from handle by zeroing 6 lower bits where we store the power
>>> +/// // denominator for the vector capacity. As noted above, the result will be exactly the
>>> +/// // pointer to the area allocated by `KVec`. Thus, uptr is a valid pointer pointing to
>>> +/// // the vector allocated by `alloc` function above.
>>> +/// unsafe {
>>> +/// copy_nonoverlapping(h_mem.as_ptr().cast(), uptr as *mut c_void, mem_len)
>>> +/// };
>>> +/// }
>>> +///
>>> +/// fn total_pages(pool: &MyZpool) -> u64 {
>>> +/// pool.bytes_used.load(Ordering::Relaxed) >> PAGE_SHIFT
>>> +/// }
>>> +/// }
>>> +///
>>> +/// // Uncomment this for compile time registration (disabled to avoid build error when KUNIT
>>> +/// // tests and zsmalloc are enabled):
>>> +/// // kernel::DECLARE_ZPOOL!(MyZpoolDriver);
>>> +/// ```
>>> +///
>>> +pub trait ZpoolDriver {
>>> + /// Opaque Rust representation of `struct zpool`.
>>> + type Pool: ForeignOwnable;
>>
>> Also what happened to my comment on v4 of this patchset?
>>
>> https://lore.kernel.org/all/DCLK1YG1L5TZ.1VMGX131LII9V@kernel.org:
>>
>>>> It can indeed but then the ZpoolDriver trait will have to be extended
>>>> with functions like into_raw() and from_raw(), because zpool expects
>>>> *mut c_void, so on the Adapter side it will look like
>>>>
>>>> extern "C" fn create_(name: *const c_uchar, gfp: u32) -> *mut c_void {
>>>> // SAFETY: the memory pointed to by name is guaranteed by zpool
>>>> to be a valid string
>>>> let pool = unsafe { T::create(CStr::from_char_ptr(name),
>>>> Flags::from_raw(gfp)) };
>>>> match pool {
>>>> Err(_) => null_mut(),
>>>> Ok(p) => T::into_raw(p).cast(),
>>>> }
>>>> }
>>>>
>>>> The question is, why does this make it better?
>>>
>>> No, thanks for sharing that function. Then the question becomes, do you
>>> really need `ForeignOwnable`? Or is `KBox` enough? Do you really want
>>> people to use `Arc<MyZPool>`? Because `BorrowedMut` of `Arc` is the same
>>> as it's `Borrowed` variant (it's read-only after all).
>>>
>>> If you can get away with just `Box` (you might want people to choose
>>> their allocator, which is fine IMO), then I'd do so.
>>
>> I still think that if you can use `Box<Self>` the trait is going to be
>> much easier to understand.
>
> Okay, thanks, I'll work on that.
If you do need `Arc` support, then I'd keep it this way, but I don't
know if you do need it.
>>> +
>>> + /// Create a pool.
>>> + fn create(name: CString, gfp: Flags) -> Result<Self::Pool>;
>>> +
>>> + /// Allocate an object of `size` bytes from `pool`, with the allocation flags `gfp` and
>>> + /// preferred NUMA node `nid`. If the allocation is successful, an opaque handle is returned.
>>> + fn malloc(
>>> + pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>,
>>> + size: usize,
>>> + gfp: Flags,
>>> + nid: NumaNode,
>>> + ) -> Result<ZpoolHandle>;
>>> +
>>> + /// Free an object previously allocated from the `pool`, represented by `handle`.
>>> + ///
>>> + /// # Safety
>>> + ///
>>> + /// - `handle` must be a valid handle previously returned by `malloc`.
>>> + /// - `handle` must not be used any more after the call to `free`.
>>> + unsafe fn free(pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>, handle: ZpoolHandle);
>>> +
>>> + /// Make all the necessary preparations for the caller to be able to read from the object
>>> + /// represented by `handle` and return a valid pointer to that object's memory to be read.
>>> + ///
>>> + /// # Safety
>>> + ///
>>> + /// - `handle` must be a valid handle previously returned by `malloc`.
>>> + /// - `read_end` with the same `handle` must be called for each `read_begin`.
>>> + unsafe fn read_begin(
>>> + pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>,
>>> + handle: ZpoolHandle,
>>> + ) -> NonNull<u8>;
>>> +
>>> + /// Finish reading from a previously allocated `handle`. `handle_mem` must be the pointer
>>> + /// previously returned by `read_begin`.
>>> + ///
>>> + /// # Safety
>>> + ///
>>> + /// - `handle` must be a valid handle previously returned by `malloc`.
>>> + /// - `handle_mem` must be the pointer previously returned by `read_begin`.
>>> + unsafe fn read_end(
>>> + pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>,
>>> + handle: ZpoolHandle,
>>> + handle_mem: NonNull<u8>,
>>> + );
>>> +
>>> + /// Write to the object represented by a previously allocated `handle`. `handle_mem` points
>>> + /// to the memory to copy data from, and `mem_len` defines the length of the data block to
>>> + /// be copied.
>>> + ///
>>> + /// # Safety
>>> + ///
>>> + /// - `handle` must be a valid handle previously returned by `malloc`.
>>> + /// - `handle_mem` must be a valid pointer into the allocated memory aread represented by
>>> + /// `handle`.
>>> + /// - `handle_mem + mem_len - 1` must not point outside the allocated memory area.
>>> + unsafe fn write(
>>> + pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>,
>>> + handle: ZpoolHandle,
>>> + handle_mem: NonNull<u8>,
>>> + mem_len: usize,
>>> + );
>>> +
>>> + /// Get the number of pages used by the `pool`.
>>> + fn total_pages(pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>) -> u64;
>>> +}
>>> +
>>> +/// Provide a zpool allocator to zswap
>>> +#[macro_export]
>>> +macro_rules! DECLARE_ZPOOL {
>>
>> Why all caps and not snake case?
>
> C style, sorry :) Will fix.
>
>>> + ($t: ident) => {
>>> + use core::ptr::null_mut;
>>> + use kernel::error::from_result;
>>> + use kernel::types::ForeignOwnable;
>>
>> Macros shouldn't have `use` statements in a non-local area (so inside
>> `const` bodies and modules is fine).
>>
>>> +
>>> + const __LOG_PREFIX: &[u8] = b"zpool_rust\0";
>>
>> This seems wrong. Why do you need to generate this? Shouldn't the user
>> still invoke `module!` or a similar macro?
>
> Unfortunately not. As we have discussed at Kangrejos, the zpool
> implementation as a driver is on its way out [1] and there has to be
> more voices against that for it to be stopped. If we now are dealing
> with the build time API (i. e. a wrapper over zsmalloc functions) then
> we have to define a build time macro, be that DECLARE_ZPOOL or
> DeclareZpool :)
But this macro shouldn't declare the `__LOG_PREFIX` const? What if I
want to have a zpool driver in addition to having a module? Or is that
impossible to begin with?
>>> +
>>> + fn handle_from_result<T, F>(f: F) -> T
>>> + where
>>> + T: From<usize>,
>>> + F: FnOnce() -> Result<T>,
>>> + {
>>> + match f() {
>>> + Ok(v) => v,
>>> + Err(e) => T::from(0),
>>> + }
>>> + }
>>
>> Why is this function inside the macro?
>
> Doesn't seem to be needed elsewhere. I could put it in a separate patch
> for error.rs as a complement to from_result() but I thought no one was
> interested in this case.
Now that I look more closely at this function, why does it even exist?
You only have a single usage here and since this is a macro, you could
just perform the match there...
>>> +
>>> + /// Create a pool.
>>> + #[no_mangle]
>>> + pub unsafe extern "C" fn zpool_create_pool(name: *const c_uchar) -> *mut c_void {
>>
>> Missing safety docs.
>>
>>> + match (|| -> Result<<$t as ZpoolDriver>::Pool> {
>>> + // SAFETY: the memory pointed to by name is guaranteed by `zpool` to be a valid
>>> + // string.
>>> + let name_r = unsafe { CStr::from_char_ptr(name).to_cstring() }?;
>>> + $t::create(name_r, flags::GFP_KERNEL)
>>> + })() {
>>> + Err(_) => null_mut(),
>>> + Ok(pool) => <$t as ZpoolDriver>::Pool::into_foreign(pool),
>>> + }
>>> + }
>>> +
>>> + /// Destroy the pool.
>>> + #[no_mangle]
>>> + pub unsafe extern "C" fn zpool_destroy_pool(pool: *mut c_void) {
>>> + // SAFETY: The pointer originates from an `into_foreign` call.
>>> + drop(unsafe { <$t as ZpoolDriver>::Pool::from_foreign(pool) })
>>> + }
>>
>> Have you tried to use the same pattern that we use for the many
>> different `Registration` types in the kernel? For example take a look at
>> `Adapter<T>` from `rust/kernel/net/phy.rs`. That way you don't need a
>> macro and can make the registration a function that takes a generic `T:
>> ZpoolDriver`.
>
> That's what was in the previous version. Unfortunately it won't work any
> more for the reason described above, see also [1].
I see. But this API design is a bit weird IMO, since now you can't have
two Zpool drivers at the same time (even if only one of them is used at
runtime)? Or is that something that you wouldn't do in the first place?
I also dislike that we have to generate these `no_mangle` functions from
within a macro...
---
Cheers,
Benno
> [1]
> https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/commit/mm?h=mm-stable&id=2ccd9fecd9163f168761d4398564c81554f636ef
^ permalink raw reply [flat|nested] 11+ messages in thread
* [syzbot ci] Re: rust: zpool: add API for C and Rust
2025-09-23 10:25 [PATCH v6 0/2] rust: zpool: add API for C and Rust Vitaly Wool
2025-09-23 10:26 ` [PATCH v6 1/2] mm: reinstate zpool as a thin API Vitaly Wool
2025-09-23 10:27 ` [PATCH v6 2/2] rust: zpool: add abstraction for zpool drivers Vitaly Wool
@ 2025-09-23 16:50 ` syzbot ci
2025-09-23 21:59 ` Johannes Weiner
2 siblings, 1 reply; 11+ messages in thread
From: syzbot ci @ 2025-09-23 16:50 UTC (permalink / raw)
To: a.hindborg, akpm, alex.gaynor, aliceryhl, bjorn3_gh, boqun.feng,
chengming.zhou, dakr, david, gary, gregkh, hannes, liam.howlett,
linux-kernel, linux-mm, lorenzo.stoakes, lossin, mhocko, minchan,
nphamcs, ojeda, rppt, rust-for-linux, senozhatsky, surenb,
tmgross, vbabka, vitaly.wool, yosry.ahmed
Cc: syzbot, syzkaller-bugs
syzbot ci has tested the following series
[v6] rust: zpool: add API for C and Rust
https://lore.kernel.org/all/20250923102547.2545992-1-vitaly.wool@konsulko.se
* [PATCH v6 1/2] mm: reinstate zpool as a thin API
* [PATCH v6 2/2] rust: zpool: add abstraction for zpool drivers
and found the following issues:
* BUG: unable to handle kernel NULL pointer dereference in zswap_store
* KASAN: slab-out-of-bounds Read in zpool_get_total_pages
* KASAN: slab-out-of-bounds Read in zswap_store
* KASAN: slab-use-after-free Read in zpool_get_total_pages
* KASAN: use-after-free Read in zpool_get_total_pages
Full report is available here:
https://ci.syzbot.org/series/e8b22352-ae56-4d7c-9113-75573acf2b64
***
BUG: unable to handle kernel NULL pointer dereference in zswap_store
tree: linux-next
URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next
base: 8f7f8b1b3f4c613dd886f53f768f82816b41eaa3
arch: amd64
compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
config: https://ci.syzbot.org/builds/6739e899-fcf7-4f71-b943-5ad0ca0ef8eb/config
syz repro: https://ci.syzbot.org/findings/c2ea1ccf-0bb7-4479-ac6d-d6e8e80efa8b/syz_repro
BUG: kernel NULL pointer dereference, address: 0000000000000034
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 800000010f582067 P4D 800000010f582067 PUD 0
Oops: Oops: 0000 [#1] SMP KASAN PTI
CPU: 1 UID: 0 PID: 6005 Comm: syz.2.21 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:kmem_cache_alloc_noprof+0x2e/0x6e0 mm/slub.c:5252
Code: 55 41 57 41 56 41 55 41 54 53 48 83 ec 38 89 f5 49 89 fe 65 48 8b 05 c1 43 ab 10 48 89 44 24 30 48 8b 44 24 68 48 89 44 24 18 <8b> 47 34 48 89 44 24 08 8b 1d 44 78 ab 0d 21 f3 89 df e8 db 9c fd
RSP: 0018:ffffc90002dee640 EFLAGS: 00010282
RAX: ffffffff822e7088 RBX: 0000000000012800 RCX: ffff888105ee3a00
RDX: 0000000000000000 RSI: 0000000000012800 RDI: 0000000000000000
RBP: 0000000000012800 R08: ffff888105ee3a00 R09: 0000000000000002
R10: 00000000fffffff0 R11: 0000000000000000 R12: ffff88801d2b1aa0
R13: 1ffff11003a56454 R14: 0000000000000000 R15: 0000000000000020
FS: 00007f819a1c56c0(0000) GS:ffff8881a39dd000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000034 CR3: 000000010e704000 CR4: 00000000000006f0
Call Trace:
<TASK>
cache_alloc_handle mm/zsmalloc.c:410 [inline]
zs_malloc+0x88/0x720 mm/zsmalloc.c:1281
zswap_compress mm/zswap.c:910 [inline]
zswap_store_page mm/zswap.c:1429 [inline]
zswap_store+0x1062/0x1f40 mm/zswap.c:1540
swap_writeout+0x710/0xd70 mm/page_io.c:275
writeout mm/vmscan.c:662 [inline]
pageout mm/vmscan.c:721 [inline]
shrink_folio_list+0x3011/0x4c70 mm/vmscan.c:1453
reclaim_folio_list+0xeb/0x500 mm/vmscan.c:2233
reclaim_pages+0x454/0x520 mm/vmscan.c:2270
madvise_cold_or_pageout_pte_range+0x1974/0x1d00 mm/madvise.c:565
walk_pmd_range mm/pagewalk.c:130 [inline]
walk_pud_range mm/pagewalk.c:224 [inline]
walk_p4d_range mm/pagewalk.c:262 [inline]
walk_pgd_range+0xfe9/0x1d40 mm/pagewalk.c:303
__walk_page_range+0x14c/0x710 mm/pagewalk.c:410
walk_page_range_vma+0x393/0x440 mm/pagewalk.c:705
madvise_pageout_page_range mm/madvise.c:624 [inline]
madvise_pageout mm/madvise.c:649 [inline]
madvise_vma_behavior+0x311f/0x3a10 mm/madvise.c:1352
madvise_walk_vmas+0x51c/0xa30 mm/madvise.c:1669
madvise_do_behavior+0x38e/0x550 mm/madvise.c:1885
do_madvise+0x1bc/0x270 mm/madvise.c:1978
__do_sys_madvise mm/madvise.c:1987 [inline]
__se_sys_madvise mm/madvise.c:1985 [inline]
__x64_sys_madvise+0xa7/0xc0 mm/madvise.c:1985
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xfa/0xfa0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f819938ec29
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f819a1c5038 EFLAGS: 00000246 ORIG_RAX: 000000000000001c
RAX: ffffffffffffffda RBX: 00007f81995d5fa0 RCX: 00007f819938ec29
RDX: 0000000000000015 RSI: 0000000000600003 RDI: 0000200000000000
RBP: 00007f8199411e41 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f81995d6038 R14: 00007f81995d5fa0 R15: 00007ffe68a453c8
</TASK>
Modules linked in:
CR2: 0000000000000034
---[ end trace 0000000000000000 ]---
RIP: 0010:kmem_cache_alloc_noprof+0x2e/0x6e0 mm/slub.c:5252
Code: 55 41 57 41 56 41 55 41 54 53 48 83 ec 38 89 f5 49 89 fe 65 48 8b 05 c1 43 ab 10 48 89 44 24 30 48 8b 44 24 68 48 89 44 24 18 <8b> 47 34 48 89 44 24 08 8b 1d 44 78 ab 0d 21 f3 89 df e8 db 9c fd
RSP: 0018:ffffc90002dee640 EFLAGS: 00010282
RAX: ffffffff822e7088 RBX: 0000000000012800 RCX: ffff888105ee3a00
RDX: 0000000000000000 RSI: 0000000000012800 RDI: 0000000000000000
RBP: 0000000000012800 R08: ffff888105ee3a00 R09: 0000000000000002
R10: 00000000fffffff0 R11: 0000000000000000 R12: ffff88801d2b1aa0
R13: 1ffff11003a56454 R14: 0000000000000000 R15: 0000000000000020
FS: 00007f819a1c56c0(0000) GS:ffff8881a39dd000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000034 CR3: 000000010e704000 CR4: 00000000000006f0
----------------
Code disassembly (best guess):
0: 55 push %rbp
1: 41 57 push %r15
3: 41 56 push %r14
5: 41 55 push %r13
7: 41 54 push %r12
9: 53 push %rbx
a: 48 83 ec 38 sub $0x38,%rsp
e: 89 f5 mov %esi,%ebp
10: 49 89 fe mov %rdi,%r14
13: 65 48 8b 05 c1 43 ab mov %gs:0x10ab43c1(%rip),%rax # 0x10ab43dc
1a: 10
1b: 48 89 44 24 30 mov %rax,0x30(%rsp)
20: 48 8b 44 24 68 mov 0x68(%rsp),%rax
25: 48 89 44 24 18 mov %rax,0x18(%rsp)
* 2a: 8b 47 34 mov 0x34(%rdi),%eax <-- trapping instruction
2d: 48 89 44 24 08 mov %rax,0x8(%rsp)
32: 8b 1d 44 78 ab 0d mov 0xdab7844(%rip),%ebx # 0xdab787c
38: 21 f3 and %esi,%ebx
3a: 89 df mov %ebx,%edi
3c: e8 .byte 0xe8
3d: db .byte 0xdb
3e: 9c pushf
3f: fd std
***
KASAN: slab-out-of-bounds Read in zpool_get_total_pages
tree: linux-next
URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next
base: 8f7f8b1b3f4c613dd886f53f768f82816b41eaa3
arch: amd64
compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
config: https://ci.syzbot.org/builds/6739e899-fcf7-4f71-b943-5ad0ca0ef8eb/config
C repro: https://ci.syzbot.org/findings/a2aa8de3-367f-4cb7-b39a-e3eb65596e6d/c_repro
syz repro: https://ci.syzbot.org/findings/a2aa8de3-367f-4cb7-b39a-e3eb65596e6d/syz_repro
==================================================================
BUG: KASAN: slab-out-of-bounds in instrument_atomic_read include/linux/instrumented.h:68 [inline]
BUG: KASAN: slab-out-of-bounds in atomic_long_read include/linux/atomic/atomic-instrumented.h:3188 [inline]
BUG: KASAN: slab-out-of-bounds in zs_get_total_pages mm/zsmalloc.c:1066 [inline]
BUG: KASAN: slab-out-of-bounds in zpool_get_total_pages+0x46/0x70 mm/zsmalloc.c:436
Read of size 8 at addr ffff88810ccc7b10 by task syz.0.17/5992
CPU: 0 UID: 0 PID: 5992 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x189/0x250 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:378 [inline]
print_report+0xca/0x240 mm/kasan/report.c:482
kasan_report+0x118/0x150 mm/kasan/report.c:595
check_region_inline mm/kasan/generic.c:-1 [inline]
kasan_check_range+0x2b0/0x2c0 mm/kasan/generic.c:200
instrument_atomic_read include/linux/instrumented.h:68 [inline]
atomic_long_read include/linux/atomic/atomic-instrumented.h:3188 [inline]
zs_get_total_pages mm/zsmalloc.c:1066 [inline]
zpool_get_total_pages+0x46/0x70 mm/zsmalloc.c:436
zswap_total_pages+0xf6/0x1e0 mm/zswap.c:457
zswap_check_limits mm/zswap.c:465 [inline]
zswap_store+0x52f/0x1f40 mm/zswap.c:1521
swap_writeout+0x710/0xd70 mm/page_io.c:275
writeout mm/vmscan.c:662 [inline]
pageout mm/vmscan.c:721 [inline]
shrink_folio_list+0x3011/0x4c70 mm/vmscan.c:1453
reclaim_folio_list+0xeb/0x500 mm/vmscan.c:2233
reclaim_pages+0x454/0x520 mm/vmscan.c:2270
madvise_cold_or_pageout_pte_range+0x1974/0x1d00 mm/madvise.c:565
walk_pmd_range mm/pagewalk.c:130 [inline]
walk_pud_range mm/pagewalk.c:224 [inline]
walk_p4d_range mm/pagewalk.c:262 [inline]
walk_pgd_range+0xfe9/0x1d40 mm/pagewalk.c:303
__walk_page_range+0x14c/0x710 mm/pagewalk.c:410
walk_page_range_vma+0x393/0x440 mm/pagewalk.c:705
madvise_pageout_page_range mm/madvise.c:624 [inline]
madvise_pageout mm/madvise.c:649 [inline]
madvise_vma_behavior+0x311f/0x3a10 mm/madvise.c:1352
madvise_walk_vmas+0x51c/0xa30 mm/madvise.c:1669
madvise_do_behavior+0x38e/0x550 mm/madvise.c:1885
do_madvise+0x1bc/0x270 mm/madvise.c:1978
__do_sys_madvise mm/madvise.c:1987 [inline]
__se_sys_madvise mm/madvise.c:1985 [inline]
__x64_sys_madvise+0xa7/0xc0 mm/madvise.c:1985
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xfa/0xfa0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f526318ec29
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffdd0a28298 EFLAGS: 00000246 ORIG_RAX: 000000000000001c
RAX: ffffffffffffffda RBX: 00007f52633d5fa0 RCX: 00007f526318ec29
RDX: 0000000000000015 RSI: 0000000000003000 RDI: 0000200000000000
RBP: 00007f5263211e41 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f52633d5fa0 R14: 00007f52633d5fa0 R15: 0000000000000003
</TASK>
Allocated by task 1:
kasan_save_stack mm/kasan/common.c:56 [inline]
kasan_save_track+0x3e/0x80 mm/kasan/common.c:77
poison_kmalloc_redzone mm/kasan/common.c:400 [inline]
__kasan_kmalloc+0x93/0xb0 mm/kasan/common.c:417
kasan_kmalloc include/linux/kasan.h:262 [inline]
__do_kmalloc_node mm/slub.c:5602 [inline]
__kmalloc_noprof+0x411/0x7f0 mm/slub.c:5614
kmalloc_noprof include/linux/slab.h:961 [inline]
kzalloc_noprof include/linux/slab.h:1094 [inline]
shrinker_alloc+0x199/0xa80 mm/shrinker.c:724
binder_alloc_shrinker_init+0x45/0xe0 drivers/android/binder_alloc.c:1265
binder_init+0x17/0x260 drivers/android/binder.c:7095
do_one_initcall+0x236/0x820 init/main.c:1283
do_initcall_level+0x104/0x190 init/main.c:1345
do_initcalls+0x59/0xa0 init/main.c:1361
kernel_init_freeable+0x334/0x4b0 init/main.c:1593
kernel_init+0x1d/0x1d0 init/main.c:1483
ret_from_fork+0x4bc/0x870 arch/x86/kernel/process.c:158
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
The buggy address belongs to the object at ffff88810ccc7b00
which belongs to the cache kmalloc-8 of size 8
The buggy address is located 8 bytes to the right of
allocated 8-byte region [ffff88810ccc7b00, ffff88810ccc7b08)
The buggy address belongs to the physical page:
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x10ccc7
anon flags: 0x57ff00000000000(node=1|zone=2|lastcpupid=0x7ff)
page_type: f5(slab)
raw: 057ff00000000000 ffff88801a841500 0000000000000000 dead000000000001
raw: 0000000000000000 0000000000800080 00000000f5000000 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 0, migratetype Unmovable, gfp_mask 0x52cc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP), pid 1, tgid 1 (swapper/0), ts 16130935485, free_ts 0
set_page_owner include/linux/page_owner.h:32 [inline]
post_alloc_hook+0x240/0x2a0 mm/page_alloc.c:1850
prep_new_page mm/page_alloc.c:1858 [inline]
get_page_from_freelist+0x21e4/0x22c0 mm/page_alloc.c:3869
__alloc_frozen_pages_noprof+0x181/0x370 mm/page_alloc.c:5159
alloc_pages_mpol+0x232/0x4a0 mm/mempolicy.c:2416
alloc_slab_page mm/slub.c:3023 [inline]
allocate_slab+0x96/0x3a0 mm/slub.c:3196
new_slab mm/slub.c:3250 [inline]
___slab_alloc+0xe94/0x1920 mm/slub.c:4626
__slab_alloc+0x65/0x100 mm/slub.c:4745
__slab_alloc_node mm/slub.c:4821 [inline]
slab_alloc_node mm/slub.c:5232 [inline]
__do_kmalloc_node mm/slub.c:5601 [inline]
__kmalloc_node_track_caller_noprof+0x5c7/0x800 mm/slub.c:5711
__kmemdup_nul mm/util.c:64 [inline]
kstrdup+0x42/0x100 mm/util.c:84
__kernfs_new_node+0x9c/0x7e0 fs/kernfs/dir.c:633
kernfs_new_node+0x102/0x210 fs/kernfs/dir.c:713
kernfs_create_link+0xa7/0x200 fs/kernfs/symlink.c:39
sysfs_do_create_link_sd+0x83/0x110 fs/sysfs/symlink.c:44
device_create_sys_dev_entry+0x11a/0x180 drivers/base/core.c:3515
device_add+0x733/0xb50 drivers/base/core.c:3659
__video_register_device+0x3dc1/0x4ca0 drivers/media/v4l2-core/v4l2-dev.c:1076
page_owner free stack trace missing
Memory state around the buggy address:
ffff88810ccc7a00: 00 fc fc fc 05 fc fc fc 00 fc fc fc 00 fc fc fc
ffff88810ccc7a80: 00 fc fc fc 05 fc fc fc 06 fc fc fc 06 fc fc fc
>ffff88810ccc7b00: 00 fc fc fc 06 fc fc fc 07 fc fc fc 07 fc fc fc
^
ffff88810ccc7b80: 04 fc fc fc 06 fc fc fc 00 fc fc fc 00 fc fc fc
ffff88810ccc7c00: 07 fc fc fc 00 fc fc fc 06 fc fc fc 05 fc fc fc
==================================================================
***
KASAN: slab-out-of-bounds Read in zswap_store
tree: linux-next
URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next
base: 8f7f8b1b3f4c613dd886f53f768f82816b41eaa3
arch: amd64
compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
config: https://ci.syzbot.org/builds/6739e899-fcf7-4f71-b943-5ad0ca0ef8eb/config
syz repro: https://ci.syzbot.org/findings/cba572f0-863b-4f91-9ac0-c6f5a16096af/syz_repro
==================================================================
BUG: KASAN: slab-out-of-bounds in cache_alloc_handle mm/zsmalloc.c:410 [inline]
BUG: KASAN: slab-out-of-bounds in zs_malloc+0x77/0x720 mm/zsmalloc.c:1281
Read of size 8 at addr ffff888100ae7680 by task syz.2.19/6046
CPU: 1 UID: 0 PID: 6046 Comm: syz.2.19 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x189/0x250 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:378 [inline]
print_report+0xca/0x240 mm/kasan/report.c:482
kasan_report+0x118/0x150 mm/kasan/report.c:595
cache_alloc_handle mm/zsmalloc.c:410 [inline]
zs_malloc+0x77/0x720 mm/zsmalloc.c:1281
zswap_compress mm/zswap.c:910 [inline]
zswap_store_page mm/zswap.c:1429 [inline]
zswap_store+0x1062/0x1f40 mm/zswap.c:1540
swap_writeout+0x710/0xd70 mm/page_io.c:275
writeout mm/vmscan.c:662 [inline]
pageout mm/vmscan.c:721 [inline]
shrink_folio_list+0x3011/0x4c70 mm/vmscan.c:1453
reclaim_folio_list+0xeb/0x500 mm/vmscan.c:2233
reclaim_pages+0x454/0x520 mm/vmscan.c:2270
madvise_cold_or_pageout_pte_range+0x1974/0x1d00 mm/madvise.c:565
walk_pmd_range mm/pagewalk.c:130 [inline]
walk_pud_range mm/pagewalk.c:224 [inline]
walk_p4d_range mm/pagewalk.c:262 [inline]
walk_pgd_range+0xfe9/0x1d40 mm/pagewalk.c:303
__walk_page_range+0x14c/0x710 mm/pagewalk.c:410
walk_page_range_vma+0x393/0x440 mm/pagewalk.c:705
madvise_pageout_page_range mm/madvise.c:624 [inline]
madvise_pageout mm/madvise.c:649 [inline]
madvise_vma_behavior+0x311f/0x3a10 mm/madvise.c:1352
madvise_walk_vmas+0x51c/0xa30 mm/madvise.c:1669
madvise_do_behavior+0x38e/0x550 mm/madvise.c:1885
do_madvise+0x1bc/0x270 mm/madvise.c:1978
__do_sys_madvise mm/madvise.c:1987 [inline]
__se_sys_madvise mm/madvise.c:1985 [inline]
__x64_sys_madvise+0xa7/0xc0 mm/madvise.c:1985
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xfa/0xfa0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fa813d8ec29
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fa814c4c038 EFLAGS: 00000246 ORIG_RAX: 000000000000001c
RAX: ffffffffffffffda RBX: 00007fa813fd5fa0 RCX: 00007fa813d8ec29
RDX: 0000000000000015 RSI: 0000000000600000 RDI: 0000200000000000
RBP: 00007fa813e11e41 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007fa813fd6038 R14: 00007fa813fd5fa0 R15: 00007ffcee547488
</TASK>
Allocated by task 1:
kasan_save_stack mm/kasan/common.c:56 [inline]
kasan_save_track+0x3e/0x80 mm/kasan/common.c:77
unpoison_slab_object mm/kasan/common.c:342 [inline]
__kasan_slab_alloc+0x6c/0x80 mm/kasan/common.c:368
kasan_slab_alloc include/linux/kasan.h:252 [inline]
slab_post_alloc_hook mm/slub.c:4945 [inline]
slab_alloc_node mm/slub.c:5244 [inline]
kmem_cache_alloc_noprof+0x367/0x6e0 mm/slub.c:5251
__kernfs_new_node+0xd7/0x7e0 fs/kernfs/dir.c:637
kernfs_new_node+0x102/0x210 fs/kernfs/dir.c:713
__kernfs_create_file+0x4b/0x2e0 fs/kernfs/file.c:1057
sysfs_add_file_mode_ns+0x238/0x300 fs/sysfs/file.c:313
create_files fs/sysfs/group.c:76 [inline]
internal_create_group+0x66d/0x1110 fs/sysfs/group.c:183
internal_create_groups fs/sysfs/group.c:223 [inline]
sysfs_create_groups+0x59/0x120 fs/sysfs/group.c:249
device_add_groups drivers/base/core.c:2836 [inline]
device_add_attrs+0x1c4/0x5a0 drivers/base/core.c:2911
device_add+0x496/0xb50 drivers/base/core.c:3643
usb_new_device+0xa39/0x16f0 drivers/usb/core/hub.c:2694
register_root_hub+0x275/0x590 drivers/usb/core/hcd.c:994
usb_add_hcd+0xba1/0x1050 drivers/usb/core/hcd.c:2993
vhci_hcd_probe+0x1c1/0x380 drivers/usb/usbip/vhci_hcd.c:1377
platform_probe+0xf9/0x190 drivers/base/platform.c:1405
call_driver_probe drivers/base/dd.c:-1 [inline]
really_probe+0x26d/0x9e0 drivers/base/dd.c:659
__driver_probe_device+0x18c/0x2f0 drivers/base/dd.c:801
driver_probe_device+0x4f/0x430 drivers/base/dd.c:831
__device_attach_driver+0x2ce/0x530 drivers/base/dd.c:959
bus_for_each_drv+0x251/0x2e0 drivers/base/bus.c:462
__device_attach+0x2b8/0x400 drivers/base/dd.c:1031
bus_probe_device+0x185/0x260 drivers/base/bus.c:537
device_add+0x7b6/0xb50 drivers/base/core.c:3689
platform_device_add+0x4b4/0x820 drivers/base/platform.c:716
platform_device_register_full+0x46c/0x570 drivers/base/platform.c:844
vhci_hcd_init+0x1bc/0x310 drivers/usb/usbip/vhci_hcd.c:1533
do_one_initcall+0x236/0x820 init/main.c:1283
do_initcall_level+0x104/0x190 init/main.c:1345
do_initcalls+0x59/0xa0 init/main.c:1361
kernel_init_freeable+0x334/0x4b0 init/main.c:1593
kernel_init+0x1d/0x1d0 init/main.c:1483
ret_from_fork+0x4bc/0x870 arch/x86/kernel/process.c:158
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
The buggy address belongs to the object at ffff888100ae75a0
which belongs to the cache kernfs_node_cache of size 176
The buggy address is located 48 bytes to the right of
allocated 176-byte region [ffff888100ae75a0, ffff888100ae7650)
The buggy address belongs to the physical page:
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x100ae7
flags: 0x57ff00000000000(node=1|zone=2|lastcpupid=0x7ff)
page_type: f5(slab)
raw: 057ff00000000000 ffff888100015dc0 dead000000000122 0000000000000000
raw: 0000000000000000 0000000000110011 00000000f5000000 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 0, migratetype Unmovable, gfp_mask 0x52cc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP), pid 1, tgid 1 (swapper/0), ts 12683536009, free_ts 12683151344
set_page_owner include/linux/page_owner.h:32 [inline]
post_alloc_hook+0x240/0x2a0 mm/page_alloc.c:1850
prep_new_page mm/page_alloc.c:1858 [inline]
get_page_from_freelist+0x21e4/0x22c0 mm/page_alloc.c:3869
__alloc_frozen_pages_noprof+0x181/0x370 mm/page_alloc.c:5159
alloc_pages_mpol+0x232/0x4a0 mm/mempolicy.c:2416
alloc_slab_page mm/slub.c:3023 [inline]
allocate_slab+0x96/0x3a0 mm/slub.c:3196
new_slab mm/slub.c:3250 [inline]
___slab_alloc+0xe94/0x1920 mm/slub.c:4626
__slab_alloc+0x65/0x100 mm/slub.c:4745
__slab_alloc_node mm/slub.c:4821 [inline]
slab_alloc_node mm/slub.c:5232 [inline]
kmem_cache_alloc_noprof+0x3f9/0x6e0 mm/slub.c:5251
__kernfs_new_node+0xd7/0x7e0 fs/kernfs/dir.c:637
kernfs_new_node+0x102/0x210 fs/kernfs/dir.c:713
__kernfs_create_file+0x4b/0x2e0 fs/kernfs/file.c:1057
sysfs_add_file_mode_ns+0x238/0x300 fs/sysfs/file.c:313
create_files fs/sysfs/group.c:76 [inline]
internal_create_group+0x66d/0x1110 fs/sysfs/group.c:183
internal_create_groups fs/sysfs/group.c:223 [inline]
sysfs_create_groups+0x59/0x120 fs/sysfs/group.c:249
device_add_groups drivers/base/core.c:2836 [inline]
device_add_attrs+0x1c4/0x5a0 drivers/base/core.c:2911
device_add+0x496/0xb50 drivers/base/core.c:3643
page last free pid 1 tgid 1 stack trace:
reset_page_owner include/linux/page_owner.h:25 [inline]
free_pages_prepare mm/page_alloc.c:1394 [inline]
__free_frozen_pages+0xbc4/0xd30 mm/page_alloc.c:2906
__slab_free+0x2e7/0x390 mm/slub.c:5922
qlink_free mm/kasan/quarantine.c:163 [inline]
qlist_free_all+0x97/0x140 mm/kasan/quarantine.c:179
kasan_quarantine_reduce+0x148/0x160 mm/kasan/quarantine.c:286
__kasan_slab_alloc+0x22/0x80 mm/kasan/common.c:352
kasan_slab_alloc include/linux/kasan.h:252 [inline]
slab_post_alloc_hook mm/slub.c:4945 [inline]
slab_alloc_node mm/slub.c:5244 [inline]
__kmalloc_cache_noprof+0x36f/0x6f0 mm/slub.c:5718
kmalloc_noprof include/linux/slab.h:957 [inline]
kzalloc_noprof include/linux/slab.h:1094 [inline]
device_private_init drivers/base/core.c:3534 [inline]
device_add+0xbe/0xb50 drivers/base/core.c:3585
usb_new_device+0xa39/0x16f0 drivers/usb/core/hub.c:2694
register_root_hub+0x275/0x590 drivers/usb/core/hcd.c:994
usb_add_hcd+0xba1/0x1050 drivers/usb/core/hcd.c:2993
vhci_hcd_probe+0x1c1/0x380 drivers/usb/usbip/vhci_hcd.c:1377
platform_probe+0xf9/0x190 drivers/base/platform.c:1405
call_driver_probe drivers/base/dd.c:-1 [inline]
really_probe+0x26d/0x9e0 drivers/base/dd.c:659
__driver_probe_device+0x18c/0x2f0 drivers/base/dd.c:801
driver_probe_device+0x4f/0x430 drivers/base/dd.c:831
__device_attach_driver+0x2ce/0x530 drivers/base/dd.c:959
Memory state around the buggy address:
ffff888100ae7580: fc fc fc fc 00 00 00 00 00 00 00 00 00 00 00 00
ffff888100ae7600: 00 00 00 00 00 00 00 00 00 00 fc fc fc fc fc fc
>ffff888100ae7680: fc fc 00 00 00 00 00 00 00 00 00 00 00 00 00 00
^
ffff888100ae7700: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc
ffff888100ae7780: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================
***
KASAN: slab-use-after-free Read in zpool_get_total_pages
tree: linux-next
URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next
base: 8f7f8b1b3f4c613dd886f53f768f82816b41eaa3
arch: amd64
compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
config: https://ci.syzbot.org/builds/6739e899-fcf7-4f71-b943-5ad0ca0ef8eb/config
syz repro: https://ci.syzbot.org/findings/bbd2b5a8-bad1-404c-8f0d-414451cc731a/syz_repro
==================================================================
BUG: KASAN: slab-use-after-free in instrument_atomic_read include/linux/instrumented.h:68 [inline]
BUG: KASAN: slab-use-after-free in atomic_long_read include/linux/atomic/atomic-instrumented.h:3188 [inline]
BUG: KASAN: slab-use-after-free in zs_get_total_pages mm/zsmalloc.c:1066 [inline]
BUG: KASAN: slab-use-after-free in zpool_get_total_pages+0x46/0x70 mm/zsmalloc.c:436
Read of size 8 at addr ffff88801b3033b0 by task syz.0.17/6006
CPU: 1 UID: 0 PID: 6006 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x189/0x250 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:378 [inline]
print_report+0xca/0x240 mm/kasan/report.c:482
kasan_report+0x118/0x150 mm/kasan/report.c:595
check_region_inline mm/kasan/generic.c:-1 [inline]
kasan_check_range+0x2b0/0x2c0 mm/kasan/generic.c:200
instrument_atomic_read include/linux/instrumented.h:68 [inline]
atomic_long_read include/linux/atomic/atomic-instrumented.h:3188 [inline]
zs_get_total_pages mm/zsmalloc.c:1066 [inline]
zpool_get_total_pages+0x46/0x70 mm/zsmalloc.c:436
zswap_total_pages+0xf6/0x1e0 mm/zswap.c:457
zswap_check_limits mm/zswap.c:465 [inline]
zswap_store+0x52f/0x1f40 mm/zswap.c:1521
swap_writeout+0x710/0xd70 mm/page_io.c:275
writeout mm/vmscan.c:662 [inline]
pageout mm/vmscan.c:721 [inline]
shrink_folio_list+0x3011/0x4c70 mm/vmscan.c:1453
reclaim_folio_list+0xeb/0x500 mm/vmscan.c:2233
reclaim_pages+0x2f4/0x520 mm/vmscan.c:2266
madvise_cold_or_pageout_pte_range+0x1974/0x1d00 mm/madvise.c:565
walk_pmd_range mm/pagewalk.c:130 [inline]
walk_pud_range mm/pagewalk.c:224 [inline]
walk_p4d_range mm/pagewalk.c:262 [inline]
walk_pgd_range+0xfe9/0x1d40 mm/pagewalk.c:303
__walk_page_range+0x14c/0x710 mm/pagewalk.c:410
walk_page_range_vma+0x393/0x440 mm/pagewalk.c:705
madvise_pageout_page_range mm/madvise.c:624 [inline]
madvise_pageout mm/madvise.c:649 [inline]
madvise_vma_behavior+0x311f/0x3a10 mm/madvise.c:1352
madvise_walk_vmas+0x51c/0xa30 mm/madvise.c:1669
madvise_do_behavior+0x38e/0x550 mm/madvise.c:1885
do_madvise+0x1bc/0x270 mm/madvise.c:1978
__do_sys_madvise mm/madvise.c:1987 [inline]
__se_sys_madvise mm/madvise.c:1985 [inline]
__x64_sys_madvise+0xa7/0xc0 mm/madvise.c:1985
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xfa/0xfa0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f4f4118ec29
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f4f407fe038 EFLAGS: 00000246 ORIG_RAX: 000000000000001c
RAX: ffffffffffffffda RBX: 00007f4f413d5fa0 RCX: 00007f4f4118ec29
RDX: 0000000000000015 RSI: 7fffffffffffffff RDI: 0000200000000000
RBP: 00007f4f41211e41 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f4f413d6038 R14: 00007f4f413d5fa0 R15: 00007ffefd579118
</TASK>
Allocated by task 1:
kasan_save_stack mm/kasan/common.c:56 [inline]
kasan_save_track+0x3e/0x80 mm/kasan/common.c:77
unpoison_slab_object mm/kasan/common.c:342 [inline]
__kasan_slab_alloc+0x6c/0x80 mm/kasan/common.c:368
kasan_slab_alloc include/linux/kasan.h:252 [inline]
slab_post_alloc_hook mm/slub.c:4945 [inline]
slab_alloc_node mm/slub.c:5244 [inline]
kmem_cache_alloc_noprof+0x367/0x6e0 mm/slub.c:5251
acpi_ut_allocate_object_desc_dbg drivers/acpi/acpica/utobject.c:359 [inline]
acpi_ut_create_internal_object_dbg+0xe6/0x470 drivers/acpi/acpica/utobject.c:69
acpi_ds_create_operand+0x2d7/0x890 drivers/acpi/acpica/dsutils.c:617
acpi_ds_create_operands+0x264/0x3f0 drivers/acpi/acpica/dsutils.c:707
acpi_ds_exec_end_op+0x26b/0x1120 drivers/acpi/acpica/dswexec.c:385
acpi_ps_parse_loop+0xc33/0x1ab0 drivers/acpi/acpica/psloop.c:525
acpi_ps_parse_aml+0x22d/0x9b0 drivers/acpi/acpica/psparse.c:475
acpi_ps_execute_method+0x58d/0x7c0 drivers/acpi/acpica/psxface.c:190
acpi_ns_evaluate+0x5a6/0xa20 drivers/acpi/acpica/nseval.c:205
acpi_evaluate_object+0x53f/0xa10 drivers/acpi/acpica/nsxfeval.c:354
acpi_evaluate_integer+0xfc/0x270 drivers/acpi/utils.c:260
acpi_bus_get_status_handle drivers/acpi/bus.c:82 [inline]
acpi_bus_get_status+0x14a/0x380 drivers/acpi/bus.c:111
acpi_scan_init_status drivers/acpi/scan.c:1863 [inline]
acpi_add_single_object+0x391/0x1a20 drivers/acpi/scan.c:1896
acpi_bus_check_add+0x349/0x820 drivers/acpi/scan.c:2179
acpi_ns_walk_namespace+0x26b/0x690 drivers/acpi/acpica/nswalk.c:-1
acpi_walk_namespace+0xe8/0x130 drivers/acpi/acpica/nsxfeval.c:606
acpi_bus_scan+0xe8/0x4b0 drivers/acpi/scan.c:2593
acpi_scan_init+0x1b0/0x550 drivers/acpi/scan.c:2746
acpi_init+0x130/0x1f0 drivers/acpi/bus.c:1469
do_one_initcall+0x236/0x820 init/main.c:1283
do_initcall_level+0x104/0x190 init/main.c:1345
do_initcalls+0x59/0xa0 init/main.c:1361
kernel_init_freeable+0x334/0x4b0 init/main.c:1593
kernel_init+0x1d/0x1d0 init/main.c:1483
ret_from_fork+0x4bc/0x870 arch/x86/kernel/process.c:158
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
Freed by task 1:
kasan_save_stack mm/kasan/common.c:56 [inline]
kasan_save_track+0x3e/0x80 mm/kasan/common.c:77
__kasan_save_free_info+0x46/0x50 mm/kasan/generic.c:587
kasan_save_free_info mm/kasan/kasan.h:406 [inline]
poison_slab_object mm/kasan/common.c:252 [inline]
__kasan_slab_free+0x5c/0x80 mm/kasan/common.c:284
kasan_slab_free include/linux/kasan.h:234 [inline]
slab_free_hook mm/slub.c:2507 [inline]
slab_free mm/slub.c:6557 [inline]
kmem_cache_free+0x19b/0x690 mm/slub.c:6668
acpi_os_release_object+0x1d/0x30 drivers/acpi/osl.c:1644
acpi_ut_update_object_reference+0x47f/0x710 drivers/acpi/acpica/utdelete.c:632
acpi_ds_clear_operands+0xa9/0x1b0 drivers/acpi/acpica/dsutils.c:396
acpi_ds_exec_end_op+0xbe9/0x1120 drivers/acpi/acpica/dswexec.c:442
acpi_ps_parse_loop+0xc33/0x1ab0 drivers/acpi/acpica/psloop.c:525
acpi_ps_parse_aml+0x22d/0x9b0 drivers/acpi/acpica/psparse.c:475
acpi_ps_execute_method+0x58d/0x7c0 drivers/acpi/acpica/psxface.c:190
acpi_ns_evaluate+0x5a6/0xa20 drivers/acpi/acpica/nseval.c:205
acpi_evaluate_object+0x53f/0xa10 drivers/acpi/acpica/nsxfeval.c:354
acpi_evaluate_integer+0xfc/0x270 drivers/acpi/utils.c:260
acpi_bus_get_status_handle drivers/acpi/bus.c:82 [inline]
acpi_bus_get_status+0x14a/0x380 drivers/acpi/bus.c:111
acpi_scan_init_status drivers/acpi/scan.c:1863 [inline]
acpi_add_single_object+0x391/0x1a20 drivers/acpi/scan.c:1896
acpi_bus_check_add+0x349/0x820 drivers/acpi/scan.c:2179
acpi_ns_walk_namespace+0x26b/0x690 drivers/acpi/acpica/nswalk.c:-1
acpi_walk_namespace+0xe8/0x130 drivers/acpi/acpica/nsxfeval.c:606
acpi_bus_scan+0xe8/0x4b0 drivers/acpi/scan.c:2593
acpi_scan_init+0x1b0/0x550 drivers/acpi/scan.c:2746
acpi_init+0x130/0x1f0 drivers/acpi/bus.c:1469
do_one_initcall+0x236/0x820 init/main.c:1283
do_initcall_level+0x104/0x190 init/main.c:1345
do_initcalls+0x59/0xa0 init/main.c:1361
kernel_init_freeable+0x334/0x4b0 init/main.c:1593
kernel_init+0x1d/0x1d0 init/main.c:1483
ret_from_fork+0x4bc/0x870 arch/x86/kernel/process.c:158
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
The buggy address belongs to the object at ffff88801b3033a8
which belongs to the cache Acpi-Operand of size 72
The buggy address is located 8 bytes inside of
freed 72-byte region [ffff88801b3033a8, ffff88801b3033f0)
The buggy address belongs to the physical page:
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0xffff88801b303000 pfn:0x1b303
flags: 0xfff00000000200(workingset|node=0|zone=1|lastcpupid=0x7ff)
page_type: f5(slab)
raw: 00fff00000000200 ffff88801a894dc0 ffffea00006be4d0 ffffea00006bffd0
raw: ffff88801b303000 000000000027001c 00000000f5000000 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 0, migratetype Unmovable, gfp_mask 0x52cc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP), pid 1, tgid 1 (swapper/0), ts 2367926237, free_ts 0
set_page_owner include/linux/page_owner.h:32 [inline]
post_alloc_hook+0x240/0x2a0 mm/page_alloc.c:1850
prep_new_page mm/page_alloc.c:1858 [inline]
get_page_from_freelist+0x21e4/0x22c0 mm/page_alloc.c:3869
__alloc_frozen_pages_noprof+0x181/0x370 mm/page_alloc.c:5159
alloc_pages_mpol+0x232/0x4a0 mm/mempolicy.c:2416
alloc_slab_page mm/slub.c:3023 [inline]
allocate_slab+0x96/0x3a0 mm/slub.c:3196
new_slab mm/slub.c:3250 [inline]
___slab_alloc+0xe94/0x1920 mm/slub.c:4626
__slab_alloc+0x65/0x100 mm/slub.c:4745
__slab_alloc_node mm/slub.c:4821 [inline]
slab_alloc_node mm/slub.c:5232 [inline]
kmem_cache_alloc_noprof+0x3f9/0x6e0 mm/slub.c:5251
acpi_ut_allocate_object_desc_dbg drivers/acpi/acpica/utobject.c:359 [inline]
acpi_ut_create_internal_object_dbg+0xe6/0x470 drivers/acpi/acpica/utobject.c:69
acpi_ds_create_operand+0x2d7/0x890 drivers/acpi/acpica/dsutils.c:617
acpi_ds_create_operands+0x264/0x3f0 drivers/acpi/acpica/dsutils.c:707
acpi_ds_load2_end_op+0xa51/0xf30 drivers/acpi/acpica/dswload2.c:663
acpi_ds_exec_end_op+0x67b/0x1120 drivers/acpi/acpica/dswexec.c:638
acpi_ps_parse_loop+0xc33/0x1ab0 drivers/acpi/acpica/psloop.c:525
acpi_ps_parse_aml+0x22d/0x9b0 drivers/acpi/acpica/psparse.c:475
acpi_ps_execute_table+0x335/0x410 drivers/acpi/acpica/psxface.c:295
page_owner free stack trace missing
Memory state around the buggy address:
ffff88801b303280: fb fb fb fb fb fb fb fc fc fc fc 00 00 00 00 00
ffff88801b303300: 00 00 00 00 fc fc fc fc 00 00 00 00 00 00 00 00
>ffff88801b303380: 00 fc fc fc fc fa fb fb fb fb fb fb fb fb fc fc
^
ffff88801b303400: fc fc 00 00 00 00 00 00 00 00 00 fc fc fc fc 00
ffff88801b303480: 00 00 00 00 00 00 00 00 fc fc fc fc 00 00 00 00
==================================================================
***
KASAN: use-after-free Read in zpool_get_total_pages
tree: linux-next
URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next
base: 8f7f8b1b3f4c613dd886f53f768f82816b41eaa3
arch: amd64
compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
config: https://ci.syzbot.org/builds/6739e899-fcf7-4f71-b943-5ad0ca0ef8eb/config
syz repro: https://ci.syzbot.org/findings/6687daf2-8ec6-4f60-ab6b-b53425f8483b/syz_repro
==================================================================
BUG: KASAN: use-after-free in instrument_atomic_read include/linux/instrumented.h:68 [inline]
BUG: KASAN: use-after-free in atomic_long_read include/linux/atomic/atomic-instrumented.h:3188 [inline]
BUG: KASAN: use-after-free in zs_get_total_pages mm/zsmalloc.c:1066 [inline]
BUG: KASAN: use-after-free in zpool_get_total_pages+0x46/0x70 mm/zsmalloc.c:436
Read of size 8 at addr ffff88801af0a0d0 by task syz.1.18/6044
CPU: 0 UID: 0 PID: 6044 Comm: syz.1.18 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x189/0x250 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:378 [inline]
print_report+0xca/0x240 mm/kasan/report.c:482
kasan_report+0x118/0x150 mm/kasan/report.c:595
check_region_inline mm/kasan/generic.c:-1 [inline]
kasan_check_range+0x2b0/0x2c0 mm/kasan/generic.c:200
instrument_atomic_read include/linux/instrumented.h:68 [inline]
atomic_long_read include/linux/atomic/atomic-instrumented.h:3188 [inline]
zs_get_total_pages mm/zsmalloc.c:1066 [inline]
zpool_get_total_pages+0x46/0x70 mm/zsmalloc.c:436
zswap_total_pages+0xf6/0x1e0 mm/zswap.c:457
zswap_check_limits mm/zswap.c:465 [inline]
zswap_store+0x52f/0x1f40 mm/zswap.c:1521
swap_writeout+0x710/0xd70 mm/page_io.c:275
writeout mm/vmscan.c:662 [inline]
pageout mm/vmscan.c:721 [inline]
shrink_folio_list+0x3011/0x4c70 mm/vmscan.c:1453
reclaim_folio_list+0xeb/0x500 mm/vmscan.c:2233
reclaim_pages+0x454/0x520 mm/vmscan.c:2270
madvise_cold_or_pageout_pte_range+0x1974/0x1d00 mm/madvise.c:565
walk_pmd_range mm/pagewalk.c:130 [inline]
walk_pud_range mm/pagewalk.c:224 [inline]
walk_p4d_range mm/pagewalk.c:262 [inline]
walk_pgd_range+0xfe9/0x1d40 mm/pagewalk.c:303
__walk_page_range+0x14c/0x710 mm/pagewalk.c:410
walk_page_range_vma+0x393/0x440 mm/pagewalk.c:705
madvise_pageout_page_range mm/madvise.c:624 [inline]
madvise_pageout mm/madvise.c:649 [inline]
madvise_vma_behavior+0x311f/0x3a10 mm/madvise.c:1352
madvise_walk_vmas+0x51c/0xa30 mm/madvise.c:1669
madvise_do_behavior+0x38e/0x550 mm/madvise.c:1885
do_madvise+0x1bc/0x270 mm/madvise.c:1978
__do_sys_madvise mm/madvise.c:1987 [inline]
__se_sys_madvise mm/madvise.c:1985 [inline]
__x64_sys_madvise+0xa7/0xc0 mm/madvise.c:1985
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xfa/0xfa0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fa5a1f8ec29
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fa5a2eb5038 EFLAGS: 00000246 ORIG_RAX: 000000000000001c
RAX: ffffffffffffffda RBX: 00007fa5a21d5fa0 RCX: 00007fa5a1f8ec29
RDX: 0000000000000015 RSI: 0000000000600000 RDI: 0000200000000000
RBP: 00007fa5a2011e41 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007fa5a21d6038 R14: 00007fa5a21d5fa0 R15: 00007ffc6cec8398
</TASK>
The buggy address belongs to the physical page:
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0xffff88801af0afc0 pfn:0x1af0a
flags: 0xfff00000000000(node=0|zone=1|lastcpupid=0x7ff)
page_type: f0(buddy)
raw: 00fff00000000000 ffffea00008c38c8 ffffea0000811c88 0000000000000000
raw: ffff88801af0afc0 0000000000000000 00000000f0000000 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as freed
page last allocated via order 0, migratetype Unmovable, gfp_mask 0x2dc2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_ZERO|__GFP_NOWARN), pid 5864, tgid 5864 (syz-executor), ts 59367185001, free_ts 60188955452
set_page_owner include/linux/page_owner.h:32 [inline]
post_alloc_hook+0x240/0x2a0 mm/page_alloc.c:1850
prep_new_page mm/page_alloc.c:1858 [inline]
get_page_from_freelist+0x21e4/0x22c0 mm/page_alloc.c:3869
__alloc_frozen_pages_noprof+0x181/0x370 mm/page_alloc.c:5159
alloc_pages_mpol+0x232/0x4a0 mm/mempolicy.c:2416
alloc_frozen_pages_noprof mm/mempolicy.c:2487 [inline]
alloc_pages_noprof+0xa9/0x190 mm/mempolicy.c:2507
vm_area_alloc_pages mm/vmalloc.c:3642 [inline]
__vmalloc_area_node mm/vmalloc.c:3720 [inline]
__vmalloc_node_range_noprof+0x97d/0x12f0 mm/vmalloc.c:3893
vmalloc_user_noprof+0xad/0xf0 mm/vmalloc.c:4046
kcov_ioctl+0x55/0x640 kernel/kcov.c:716
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:597 [inline]
__se_sys_ioctl+0xfc/0x170 fs/ioctl.c:583
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xfa/0xfa0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
page last free pid 5884 tgid 5884 stack trace:
reset_page_owner include/linux/page_owner.h:25 [inline]
free_pages_prepare mm/page_alloc.c:1394 [inline]
__free_frozen_pages+0xbc4/0xd30 mm/page_alloc.c:2906
vfree+0x25a/0x400 mm/vmalloc.c:3434
kcov_put kernel/kcov.c:439 [inline]
kcov_close+0x28/0x50 kernel/kcov.c:535
__fput+0x44c/0xa70 fs/file_table.c:468
task_work_run+0x1d4/0x260 kernel/task_work.c:227
exit_task_work include/linux/task_work.h:40 [inline]
do_exit+0x6b5/0x2300 kernel/exit.c:966
do_group_exit+0x21c/0x2d0 kernel/exit.c:1107
get_signal+0x1285/0x1340 kernel/signal.c:3034
arch_do_signal_or_restart+0xa0/0x790 arch/x86/kernel/signal.c:337
exit_to_user_mode_loop+0x72/0x130 kernel/entry/common.c:40
exit_to_user_mode_prepare include/linux/irq-entry-common.h:225 [inline]
syscall_exit_to_user_mode_work include/linux/entry-common.h:175 [inline]
syscall_exit_to_user_mode include/linux/entry-common.h:210 [inline]
do_syscall_64+0x2bd/0xfa0 arch/x86/entry/syscall_64.c:100
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Memory state around the buggy address:
ffff88801af09f80: 05 fc fc fc 06 fc fc fc 06 fc fc fc 05 fc fc fc
ffff88801af0a000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>ffff88801af0a080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
^
ffff88801af0a100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ffff88801af0a180: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
==================================================================
***
If these findings have caused you to resend the series or submit a
separate fix, please add the following tag to your commit message:
Tested-by: syzbot@syzkaller.appspotmail.com
---
This report is generated by a bot. It may contain errors.
syzbot ci engineers can be reached at syzkaller@googlegroups.com.
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [syzbot ci] Re: rust: zpool: add API for C and Rust
2025-09-23 16:50 ` [syzbot ci] Re: rust: zpool: add API for C and Rust syzbot ci
@ 2025-09-23 21:59 ` Johannes Weiner
2025-09-24 5:46 ` Vitaly Wool
0 siblings, 1 reply; 11+ messages in thread
From: Johannes Weiner @ 2025-09-23 21:59 UTC (permalink / raw)
To: syzbot ci
Cc: a.hindborg, akpm, alex.gaynor, aliceryhl, bjorn3_gh, boqun.feng,
chengming.zhou, dakr, david, gary, gregkh, liam.howlett,
linux-kernel, linux-mm, lorenzo.stoakes, lossin, mhocko, minchan,
nphamcs, ojeda, rppt, rust-for-linux, senozhatsky, surenb,
tmgross, vbabka, vitaly.wool, yosry.ahmed, syzbot, syzkaller-bugs
On Tue, Sep 23, 2025 at 09:50:10AM -0700, syzbot ci wrote:
> syzbot ci has tested the following series
>
> [v6] rust: zpool: add API for C and Rust
> https://lore.kernel.org/all/20250923102547.2545992-1-vitaly.wool@konsulko.se
> * [PATCH v6 1/2] mm: reinstate zpool as a thin API
> * [PATCH v6 2/2] rust: zpool: add abstraction for zpool drivers
>
> and found the following issues:
> * BUG: unable to handle kernel NULL pointer dereference in zswap_store
> * KASAN: slab-out-of-bounds Read in zpool_get_total_pages
> * KASAN: slab-out-of-bounds Read in zswap_store
> * KASAN: slab-use-after-free Read in zpool_get_total_pages
> * KASAN: use-after-free Read in zpool_get_total_pages
>
> Full report is available here:
> https://ci.syzbot.org/series/e8b22352-ae56-4d7c-9113-75573acf2b64
>
> ***
>
> BUG: unable to handle kernel NULL pointer dereference in zswap_store
struct zpool {
void *pool;
};
struct zpool *zpool_create_pool(const char *name) \
{ \
return (struct zpool *) prefix ## _create_pool(name); \
} \
u64 zpool_get_total_pages(struct zpool *zpool) \
{ \
return prefix ## _get_total_pages(zpool->pool); \
}
You create the zpool by simply casting the backend pool, but then you
deref it twice as if it were an actual container for the backend pool.
I'm guessing you didn't test this even superficially?
This also still proposes an API with no in-kernel user.
NAK
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [syzbot ci] Re: rust: zpool: add API for C and Rust
2025-09-23 21:59 ` Johannes Weiner
@ 2025-09-24 5:46 ` Vitaly Wool
2025-09-24 17:38 ` Nhat Pham
0 siblings, 1 reply; 11+ messages in thread
From: Vitaly Wool @ 2025-09-24 5:46 UTC (permalink / raw)
To: Johannes Weiner, syzbot ci
Cc: a.hindborg, akpm, alex.gaynor, aliceryhl, bjorn3_gh, boqun.feng,
chengming.zhou, dakr, david, gary, gregkh, liam.howlett,
linux-kernel, linux-mm, lorenzo.stoakes, lossin, mhocko, minchan,
nphamcs, ojeda, rppt, rust-for-linux, senozhatsky, surenb,
tmgross, vbabka, yosry.ahmed, syzbot, syzkaller-bugs
On 9/23/25 23:59, Johannes Weiner wrote:
> On Tue, Sep 23, 2025 at 09:50:10AM -0700, syzbot ci wrote:
>> syzbot ci has tested the following series
>>
>> [v6] rust: zpool: add API for C and Rust
>> https://lore.kernel.org/all/20250923102547.2545992-1-vitaly.wool@konsulko.se
>> * [PATCH v6 1/2] mm: reinstate zpool as a thin API
>> * [PATCH v6 2/2] rust: zpool: add abstraction for zpool drivers
>>
>> and found the following issues:
>> * BUG: unable to handle kernel NULL pointer dereference in zswap_store
>> * KASAN: slab-out-of-bounds Read in zpool_get_total_pages
>> * KASAN: slab-out-of-bounds Read in zswap_store
>> * KASAN: slab-use-after-free Read in zpool_get_total_pages
>> * KASAN: use-after-free Read in zpool_get_total_pages
>>
>> Full report is available here:
>> https://ci.syzbot.org/series/e8b22352-ae56-4d7c-9113-75573acf2b64
>>
>> ***
>>
>> BUG: unable to handle kernel NULL pointer dereference in zswap_store
>
> struct zpool {
> void *pool;
> };
>
> struct zpool *zpool_create_pool(const char *name) \
> { \
> return (struct zpool *) prefix ## _create_pool(name); \
> } \
>
> u64 zpool_get_total_pages(struct zpool *zpool) \
> { \
> return prefix ## _get_total_pages(zpool->pool); \
> }
>
> You create the zpool by simply casting the backend pool, but then you
> deref it twice as if it were an actual container for the backend pool.
>
> I'm guessing you didn't test this even superficially?
LOL, no, forgot to run git commit --amend so came up with a wrong version.
The Rust version is correct though.
> This also still proposes an API with no in-kernel user.
That's not correct, zsmalloc is the user.
~Vitaly
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [syzbot ci] Re: rust: zpool: add API for C and Rust
2025-09-24 5:46 ` Vitaly Wool
@ 2025-09-24 17:38 ` Nhat Pham
2025-09-26 14:27 ` Vlastimil Babka
0 siblings, 1 reply; 11+ messages in thread
From: Nhat Pham @ 2025-09-24 17:38 UTC (permalink / raw)
To: Vitaly Wool
Cc: Johannes Weiner, syzbot ci, a.hindborg, akpm, alex.gaynor,
aliceryhl, bjorn3_gh, boqun.feng, chengming.zhou, dakr, david,
gary, gregkh, liam.howlett, linux-kernel, linux-mm,
lorenzo.stoakes, lossin, mhocko, minchan, ojeda, rppt,
rust-for-linux, senozhatsky, surenb, tmgross, vbabka, yosry.ahmed,
syzbot, syzkaller-bugs
On Tue, Sep 23, 2025 at 10:46 PM Vitaly Wool <vitaly.wool@konsulko.se> wrote:
>
>
>
> On 9/23/25 23:59, Johannes Weiner wrote:
> > On Tue, Sep 23, 2025 at 09:50:10AM -0700, syzbot ci wrote:
> >> syzbot ci has tested the following series
> >>
> >> [v6] rust: zpool: add API for C and Rust
> >> https://lore.kernel.org/all/20250923102547.2545992-1-vitaly.wool@konsulko.se
> >> * [PATCH v6 1/2] mm: reinstate zpool as a thin API
> >> * [PATCH v6 2/2] rust: zpool: add abstraction for zpool drivers
> >>
> >> and found the following issues:
> >> * BUG: unable to handle kernel NULL pointer dereference in zswap_store
> >> * KASAN: slab-out-of-bounds Read in zpool_get_total_pages
> >> * KASAN: slab-out-of-bounds Read in zswap_store
> >> * KASAN: slab-use-after-free Read in zpool_get_total_pages
> >> * KASAN: use-after-free Read in zpool_get_total_pages
> >>
> >> Full report is available here:
> >> https://ci.syzbot.org/series/e8b22352-ae56-4d7c-9113-75573acf2b64
> >>
> >> ***
> >>
> >> BUG: unable to handle kernel NULL pointer dereference in zswap_store
> >
> > struct zpool {
> > void *pool;
> > };
> >
> > struct zpool *zpool_create_pool(const char *name) \
> > { \
> > return (struct zpool *) prefix ## _create_pool(name); \
> > } \
> >
> > u64 zpool_get_total_pages(struct zpool *zpool) \
> > { \
> > return prefix ## _get_total_pages(zpool->pool); \
> > }
> >
> > You create the zpool by simply casting the backend pool, but then you
> > deref it twice as if it were an actual container for the backend pool.
> >
> > I'm guessing you didn't test this even superficially?
>
> LOL, no, forgot to run git commit --amend so came up with a wrong version.
>
> The Rust version is correct though.
>
> > This also still proposes an API with no in-kernel user.
>
> That's not correct, zsmalloc is the user.
A single user does not an API make.
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [syzbot ci] Re: rust: zpool: add API for C and Rust
2025-09-24 17:38 ` Nhat Pham
@ 2025-09-26 14:27 ` Vlastimil Babka
0 siblings, 0 replies; 11+ messages in thread
From: Vlastimil Babka @ 2025-09-26 14:27 UTC (permalink / raw)
To: Nhat Pham, Vitaly Wool
Cc: Johannes Weiner, syzbot ci, a.hindborg, akpm, alex.gaynor,
aliceryhl, bjorn3_gh, boqun.feng, chengming.zhou, dakr, david,
gary, gregkh, liam.howlett, linux-kernel, linux-mm,
lorenzo.stoakes, lossin, mhocko, minchan, ojeda, rppt,
rust-for-linux, senozhatsky, surenb, tmgross, yosry.ahmed, syzbot,
syzkaller-bugs
On 9/24/25 19:38, Nhat Pham wrote:
> On Tue, Sep 23, 2025 at 10:46 PM Vitaly Wool <vitaly.wool@konsulko.se> wrote:
>>
>> LOL, no, forgot to run git commit --amend so came up with a wrong version.
>>
>> The Rust version is correct though.
>>
>> > This also still proposes an API with no in-kernel user.
>>
>> That's not correct, zsmalloc is the user.
>
> A single user does not an API make.
IIRC what was suggested is to implement the zsmalloc API directly. What does
the extra inline function layer get us in case of a compile-time switch?
And do you need the Rust abstraction or just can make it part of the zblock
itself? You don't expect there to be more Rust-based backends than zblock, no?
^ permalink raw reply [flat|nested] 11+ messages in thread