rust-for-linux.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 0/2] rust: zpool: add API for C and Rust
@ 2025-09-23 10:25 Vitaly Wool
  2025-09-23 10:26 ` [PATCH v6 1/2] mm: reinstate zpool as a thin API Vitaly Wool
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Vitaly Wool @ 2025-09-23 10:25 UTC (permalink / raw)
  To: linux-mm, rust-for-linux
  Cc: Johannes Weiner, Yosry Ahmed, Nhat Pham, Chengming Zhou,
	Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Minchan Kim, Sergey Senozhatsky,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, Greg Kroah-Hartman, linux-kernel,
	Vitaly Wool

zpool used to be a common frontend for memory storage pool
implementations. With its removal the opportunity to select an
allocation backend for zswap has been completely lost. However, with
the recent advancements in the vmap/vmalloc field that allow for fast
and simple allocation backends, and the initiative to implement one in
Rust, the zpool API is still necessary, though it's enough to have it
as a thin API for compile time backend selection.

This patchset provides such API and implements the interface to use
it in Rust kernel code, thus enabling both C and Rust implementations
of zpool allocators. zsmalloc and documentation are updated
accordingly.

Co-developed-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Vitaly Wool <vitaly.wool@konsulko.se>
---
Changelog:
v1 -> v2:
* reworked to stick to the existing Rust driver infrastructure
* removed raw pointers from the Rust API
v2 -> v3:
* detailed safety requirements for unsafe API functions
* removed unwrap()
* some typo corrections
v3 -> v4:
* added a working example of zpool Rust API usage in the
  documentation part
* change to Flags arranged as a separate patch
* improved safety requirements for ZpoolDriver trait
v4 -> v5:
* created a new type ZpoolHandle for handle representation on the
  Rust side
* improved description of Flags::from_raw()
* pool is no more borrowed as mutable for ZpoolDriver::malloc()
* ZpoolDriver::destroy() removed
* improved ZpoolDriver implementation example
* typos/markup corrections
v5 -> v6:
* removed zpool API is partially restored (to the minimal required
  extent)
* no Adapter based runtime registration is necessary
* a Rust macro for compile time registration is introduced instead
---
 Documentation/admin-guide/mm/zswap.rst |   14 +-
 MAINTAINERS                            |    1 
 include/linux/zpool.h                  |   62 ++++++++
 mm/Kconfig                             |   22 ++-
 mm/zsmalloc.c                          |    3 
 mm/zswap.c                             |   30 ++--
 rust/bindings/bindings_helper.h        |    1 
 rust/kernel/lib.rs                     |    2 
 rust/kernel/zpool.rs                   |  366 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 9 files changed, 479 insertions(+), 22 deletions(-)


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v6 1/2] mm: reinstate zpool as a thin API
  2025-09-23 10:25 [PATCH v6 0/2] rust: zpool: add API for C and Rust Vitaly Wool
@ 2025-09-23 10:26 ` Vitaly Wool
  2025-09-23 10:27 ` [PATCH v6 2/2] rust: zpool: add abstraction for zpool drivers Vitaly Wool
  2025-09-23 16:50 ` [syzbot ci] Re: rust: zpool: add API for C and Rust syzbot ci
  2 siblings, 0 replies; 11+ messages in thread
From: Vitaly Wool @ 2025-09-23 10:26 UTC (permalink / raw)
  To: linux-mm, rust-for-linux
  Cc: Johannes Weiner, Yosry Ahmed, Nhat Pham, Chengming Zhou,
	Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Minchan Kim, Sergey Senozhatsky,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, Greg Kroah-Hartman, linux-kernel,
	Vitaly Wool

With the removal of the zpool layer the opportunity to select an allocation
backend for zswap has been completely lost. While the flexibility zpool
used to provide may have seemed unjustified, the option to select an
alternative backend should still exist, especially with the recent
advancements in the vmap/vmalloc field that allow for faster and simple
allocation backends. This patch reinstates zpool API as a thin header-only
layer and provides the infrastructure for backend compile time selection.

Signed-off-by: Vitaly Wool <vitaly.wool@konsulko.se>
---
 Documentation/admin-guide/mm/zswap.rst | 14 +++---
 MAINTAINERS                            |  1 +
 include/linux/zpool.h                  | 62 ++++++++++++++++++++++++++
 mm/Kconfig                             | 22 ++++++++-
 mm/zsmalloc.c                          |  3 ++
 mm/zswap.c                             | 30 ++++++-------
 6 files changed, 110 insertions(+), 22 deletions(-)
 create mode 100644 include/linux/zpool.h

diff --git a/Documentation/admin-guide/mm/zswap.rst b/Documentation/admin-guide/mm/zswap.rst
index 283d77217c6f..44075d35e512 100644
--- a/Documentation/admin-guide/mm/zswap.rst
+++ b/Documentation/admin-guide/mm/zswap.rst
@@ -53,16 +53,20 @@ Zswap receives pages for compression from the swap subsystem and is able to
 evict pages from its own compressed pool on an LRU basis and write them back to
 the backing swap device in the case that the compressed pool is full.
 
-Zswap makes use of zsmalloc for the managing the compressed memory pool.  Each
-allocation in zsmalloc is not directly accessible by address.  Rather, a handle is
+Zswap makes use of zpool API for the managing the compressed memory pool. Each
+allocation in zpool is not directly accessible by address. Rather, a handle is
 returned by the allocation routine and that handle must be mapped before being
 accessed.  The compressed memory pool grows on demand and shrinks as compressed
-pages are freed.  The pool is not preallocated.
+pages are freed. The pool is not preallocated.
+
+An allocator backend implementing zpool API can be selected during the build
+time as a kernel configuration option. Currently only one backend (zsmalloc) is
+supported and it is selected automatically.
 
 When a swap page is passed from swapout to zswap, zswap maintains a mapping
 of the swap entry, a combination of the swap type and swap offset, to the
-zsmalloc handle that references that compressed swap page.  This mapping is
-achieved with a red-black tree per swap type.  The swap offset is the search
+zpool handle that references that compressed swap page.  This mapping is
+achieved with a red-black tree per swap type. The swap offset is the search
 key for the tree nodes.
 
 During a page fault on a PTE that is a swap entry, the swapin code calls the
diff --git a/MAINTAINERS b/MAINTAINERS
index ca8e3d18eedd..149a06be4000 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -27885,6 +27885,7 @@ R:	Chengming Zhou <chengming.zhou@linux.dev>
 L:	linux-mm@kvack.org
 S:	Maintained
 F:	Documentation/admin-guide/mm/zswap.rst
+F:	include/linux/zpool.h
 F:	include/linux/zswap.h
 F:	mm/zswap.c
 F:	tools/testing/selftests/cgroup/test_zswap.c
diff --git a/include/linux/zpool.h b/include/linux/zpool.h
new file mode 100644
index 000000000000..05406b00d734
--- /dev/null
+++ b/include/linux/zpool.h
@@ -0,0 +1,62 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * zpool memory storage api
+ *
+ * Copyright (C) 2014 Dan Streetman
+ *
+ * This is a common frontend for the zswap compressed memory storage
+ * implementations.
+ */
+
+#ifndef _ZPOOL_H_
+#define _ZPOOL_H_
+
+struct zpool {
+	void *pool;
+};
+
+struct zpool *zpool_create_pool(const char *name);
+void zpool_destroy_pool(struct zpool *pool);
+unsigned long  zpool_malloc(struct zpool *pool, size_t size, gfp_t gfp, const int nid);
+void zpool_free(struct zpool *pool, unsigned long handle);
+void *zpool_obj_read_begin(struct zpool *zpool, unsigned long handle, void *local_copy);
+void zpool_obj_read_end(struct zpool *zpool, unsigned long handle, void *handle_mem);
+void zpool_obj_write(struct zpool *zpool, unsigned long handle, void *handle_mem, size_t mem_len);
+u64 zpool_get_total_pages(struct zpool *zpool);
+
+#define DECLARE_ZPOOL(prefix) \
+	struct zpool *zpool_create_pool(const char *name) \
+	{ \
+		return (struct zpool *) prefix ## _create_pool(name); \
+	} \
+	void zpool_destroy_pool(struct zpool *pool) \
+	{ \
+		return prefix ## _destroy_pool(pool->pool); \
+	} \
+	unsigned long  zpool_malloc(struct zpool *pool, size_t size, gfp_t gfp, const int nid) \
+	{ \
+		return prefix ## _malloc(pool->pool, size, gfp, nid); \
+	} \
+	void zpool_free(struct zpool *pool, unsigned long handle) \
+	{ \
+		return prefix ## _free(pool->pool, handle); \
+	} \
+	void *zpool_obj_read_begin(struct zpool *zpool, unsigned long handle, void *local_copy) \
+	{ \
+		return prefix ## _obj_read_begin(zpool->pool, handle, local_copy); \
+	} \
+	void zpool_obj_read_end(struct zpool *zpool, unsigned long handle, void *handle_mem) \
+	{ \
+		return prefix ## _obj_read_end(zpool->pool, handle, handle_mem); \
+	} \
+	void zpool_obj_write(struct zpool *zpool, unsigned long handle, \
+		     void *handle_mem, size_t mem_len) \
+	{ \
+		prefix ## _obj_write(zpool->pool, handle, handle_mem, mem_len); \
+	} \
+	u64 zpool_get_total_pages(struct zpool *zpool) \
+	{ \
+		return prefix ## _get_total_pages(zpool->pool); \
+	}
+
+#endif
diff --git a/mm/Kconfig b/mm/Kconfig
index d1ed839ca710..f72483b91098 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -23,7 +23,7 @@ config ZSWAP
 	bool "Compressed cache for swap pages"
 	depends on SWAP
 	select CRYPTO
-	select ZSMALLOC
+	select ZPOOL_ALLOCATOR
 	help
 	  A lightweight compressed cache for swap pages.  It takes
 	  pages that are in the process of being swapped out and attempts to
@@ -122,8 +122,26 @@ config ZSWAP_COMPRESSOR_DEFAULT
        default "zstd" if ZSWAP_COMPRESSOR_DEFAULT_ZSTD
        default ""
 
+config ZPOOL_ALLOCATOR
+	bool
+
+choice
+	prompt "Allocator for zswap"
+	default ZSMALLOC
+	help
+	  Allocator to be used for compressed object storage in zswap.
+	  Currently only zsmalloc is supported.
+
 config ZSMALLOC
-	tristate
+	bool "N:1 compression allocator (zsmalloc)"
+	select ZPOOL_ALLOCATOR
+	depends on MMU
+	help
+	  zsmalloc is a slab-based memory allocator designed to store
+	  pages of various compression levels efficiently. It achieves
+	  the highest storage density with the least amount of fragmentation.
+
+endchoice
 
 if ZSMALLOC
 
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 5bf832f9c05c..c84e96fd30f9 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -36,6 +36,7 @@
 #include <linux/types.h>
 #include <linux/debugfs.h>
 #include <linux/zsmalloc.h>
+#include <linux/zpool.h>
 #include <linux/fs.h>
 #include <linux/workqueue.h>
 #include "zpdesc.h"
@@ -432,6 +433,8 @@ static void record_obj(unsigned long handle, unsigned long obj)
 	*(unsigned long *)handle = obj;
 }
 
+DECLARE_ZPOOL(zs);
+
 static inline bool __maybe_unused is_first_zpdesc(struct zpdesc *zpdesc)
 {
 	return PagePrivate(zpdesc_page(zpdesc));
diff --git a/mm/zswap.c b/mm/zswap.c
index c1af782e54ec..a567cc16b9a3 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -34,7 +34,7 @@
 #include <linux/pagemap.h>
 #include <linux/workqueue.h>
 #include <linux/list_lru.h>
-#include <linux/zsmalloc.h>
+#include <linux/zpool.h>
 
 #include "swap.h"
 #include "internal.h"
@@ -151,7 +151,7 @@ struct crypto_acomp_ctx {
  * needs to be verified that it's still valid in the tree.
  */
 struct zswap_pool {
-	struct zs_pool *zs_pool;
+	struct zpool *pool;
 	struct crypto_acomp_ctx __percpu *acomp_ctx;
 	struct percpu_ref ref;
 	struct list_head list;
@@ -257,8 +257,8 @@ static struct zswap_pool *zswap_pool_create(char *compressor)
 
 	/* unique name for each pool specifically required by zsmalloc */
 	snprintf(name, 38, "zswap%x", atomic_inc_return(&zswap_pools_count));
-	pool->zs_pool = zs_create_pool(name);
-	if (!pool->zs_pool)
+	pool->pool = zpool_create_pool(name);
+	if (!pool->pool)
 		goto error;
 
 	strscpy(pool->tfm_name, compressor, sizeof(pool->tfm_name));
@@ -295,8 +295,8 @@ static struct zswap_pool *zswap_pool_create(char *compressor)
 error:
 	if (pool->acomp_ctx)
 		free_percpu(pool->acomp_ctx);
-	if (pool->zs_pool)
-		zs_destroy_pool(pool->zs_pool);
+	if (pool->pool)
+		zpool_destroy_pool(pool->pool);
 	kfree(pool);
 	return NULL;
 }
@@ -327,7 +327,7 @@ static void zswap_pool_destroy(struct zswap_pool *pool)
 	cpuhp_state_remove_instance(CPUHP_MM_ZSWP_POOL_PREPARE, &pool->node);
 	free_percpu(pool->acomp_ctx);
 
-	zs_destroy_pool(pool->zs_pool);
+	zpool_destroy_pool(pool->pool);
 	kfree(pool);
 }
 
@@ -454,7 +454,7 @@ unsigned long zswap_total_pages(void)
 
 	rcu_read_lock();
 	list_for_each_entry_rcu(pool, &zswap_pools, list)
-		total += zs_get_total_pages(pool->zs_pool);
+		total += zpool_get_total_pages(pool->pool);
 	rcu_read_unlock();
 
 	return total;
@@ -717,7 +717,7 @@ static void zswap_entry_cache_free(struct zswap_entry *entry)
 static void zswap_entry_free(struct zswap_entry *entry)
 {
 	zswap_lru_del(&zswap_list_lru, entry);
-	zs_free(entry->pool->zs_pool, entry->handle);
+	zpool_free(entry->pool->pool, entry->handle);
 	zswap_pool_put(entry->pool);
 	if (entry->objcg) {
 		obj_cgroup_uncharge_zswap(entry->objcg, entry->length);
@@ -907,13 +907,13 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry,
 	}
 
 	gfp = GFP_NOWAIT | __GFP_NORETRY | __GFP_HIGHMEM | __GFP_MOVABLE;
-	handle = zs_malloc(pool->zs_pool, dlen, gfp, page_to_nid(page));
+	handle = zpool_malloc(pool->pool, dlen, gfp, page_to_nid(page));
 	if (IS_ERR_VALUE(handle)) {
 		alloc_ret = PTR_ERR((void *)handle);
 		goto unlock;
 	}
 
-	zs_obj_write(pool->zs_pool, handle, dst, dlen);
+	zpool_obj_write(pool->pool, handle, dst, dlen);
 	entry->handle = handle;
 	entry->length = dlen;
 
@@ -940,7 +940,7 @@ static bool zswap_decompress(struct zswap_entry *entry, struct folio *folio)
 	u8 *src, *obj;
 
 	acomp_ctx = acomp_ctx_get_cpu_lock(pool);
-	obj = zs_obj_read_begin(pool->zs_pool, entry->handle, acomp_ctx->buffer);
+	obj = zpool_obj_read_begin(pool->pool, entry->handle, acomp_ctx->buffer);
 
 	/* zswap entries of length PAGE_SIZE are not compressed. */
 	if (entry->length == PAGE_SIZE) {
@@ -949,7 +949,7 @@ static bool zswap_decompress(struct zswap_entry *entry, struct folio *folio)
 	}
 
 	/*
-	 * zs_obj_read_begin() might return a kmap address of highmem when
+	 * zpool_obj_read_begin() might return a kmap address of highmem when
 	 * acomp_ctx->buffer is not used.  However, sg_init_one() does not
 	 * handle highmem addresses, so copy the object to acomp_ctx->buffer.
 	 */
@@ -969,7 +969,7 @@ static bool zswap_decompress(struct zswap_entry *entry, struct folio *folio)
 	dlen = acomp_ctx->req->dlen;
 
 read_done:
-	zs_obj_read_end(pool->zs_pool, entry->handle, obj);
+	zpool_obj_read_end(pool->pool, entry->handle, obj);
 	acomp_ctx_put_unlock(acomp_ctx);
 
 	if (!decomp_ret && dlen == PAGE_SIZE)
@@ -1486,7 +1486,7 @@ static bool zswap_store_page(struct page *page,
 	return true;
 
 store_failed:
-	zs_free(pool->zs_pool, entry->handle);
+	zpool_free(pool->pool, entry->handle);
 compress_failed:
 	zswap_entry_cache_free(entry);
 	return false;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v6 2/2] rust: zpool: add abstraction for zpool drivers
  2025-09-23 10:25 [PATCH v6 0/2] rust: zpool: add API for C and Rust Vitaly Wool
  2025-09-23 10:26 ` [PATCH v6 1/2] mm: reinstate zpool as a thin API Vitaly Wool
@ 2025-09-23 10:27 ` Vitaly Wool
  2025-09-23 21:49   ` Benno Lossin
  2025-09-23 16:50 ` [syzbot ci] Re: rust: zpool: add API for C and Rust syzbot ci
  2 siblings, 1 reply; 11+ messages in thread
From: Vitaly Wool @ 2025-09-23 10:27 UTC (permalink / raw)
  To: linux-mm, rust-for-linux
  Cc: Johannes Weiner, Yosry Ahmed, Nhat Pham, Chengming Zhou,
	Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Minchan Kim, Sergey Senozhatsky,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg, Alice Ryhl,
	Trevor Gross, Danilo Krummrich, Greg Kroah-Hartman, linux-kernel,
	Vitaly Wool

zpool is a common thin API for memory storage pool implementations.
A backend implementing this API will be used to store compressed
memory objects, e. g. for zswap, the lightweight compressed cache for
swap pages.

This patch provides the interface to use zpool in Rust kernel code,
thus enabling Rust implementations of zpool allocators for zswap. Now
that zpool API doesn't provide runtime registration option and presumes
compile time selection of an allocation backend, an allocator is
expected to call DECLARE_ZPOOL! macro to register itself during compile
time. Please also see the example in zpool.rs for more details.

Co-developed-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Vitaly Wool <vitaly.wool@konsulko.se>
---
 rust/bindings/bindings_helper.h |   1 +
 rust/kernel/lib.rs              |   2 +
 rust/kernel/zpool.rs            | 366 ++++++++++++++++++++++++++++++++
 3 files changed, 369 insertions(+)
 create mode 100644 rust/kernel/zpool.rs

diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
index 84d60635e8a9..f0c4c454882b 100644
--- a/rust/bindings/bindings_helper.h
+++ b/rust/bindings/bindings_helper.h
@@ -75,6 +75,7 @@
 #include <linux/wait.h>
 #include <linux/workqueue.h>
 #include <linux/xarray.h>
+#include <linux/zpool.h>
 #include <trace/events/rust_sample.h>
 
 #if defined(CONFIG_DRM_PANIC_SCREEN_QR_CODE)
diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
index 6b0a5689669f..4d5b0cc881d8 100644
--- a/rust/kernel/lib.rs
+++ b/rust/kernel/lib.rs
@@ -130,6 +130,8 @@
 pub mod uaccess;
 pub mod workqueue;
 pub mod xarray;
+#[cfg(CONFIG_ZPOOL_ALLOCATOR)]
+pub mod zpool;
 
 #[doc(hidden)]
 pub use bindings;
diff --git a/rust/kernel/zpool.rs b/rust/kernel/zpool.rs
new file mode 100644
index 000000000000..53a0dc0607e6
--- /dev/null
+++ b/rust/kernel/zpool.rs
@@ -0,0 +1,366 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Implementation of Rust interface towards zpool API.
+
+use crate::{error::Result, kernel::alloc::Flags, str::CString, types::ForeignOwnable};
+use core::ptr::NonNull;
+use kernel::alloc::NumaNode;
+
+/// The Rust representation of zpool handle.
+pub struct ZpoolHandle(usize);
+
+impl ZpoolHandle {
+    /// Create `ZpoolHandle` from the raw representation.
+    pub fn from_raw(h: usize) -> Self {
+        Self(h)
+    }
+
+    /// Get the raw representation of the handle.
+    pub fn as_raw(self) -> usize {
+        self.0
+    }
+}
+
+/// Zpool API.
+///
+/// The [`ZpoolDriver`] trait serves as an interface for Zpool drivers implemented in Rust.
+/// Such drivers implement memory storage pools in accordance with the zpool API.
+///
+/// # Example
+///
+/// A zpool driver implementation which uses KVec of 2**n sizes, n = 6, 7, ..., PAGE_SHIFT.
+/// Every zpool object is packed into a KVec that is sufficiently large, and n (the
+/// denominator) is saved in the least significant bits of the handle, which is guaranteed
+/// to be at least 2**6 aligned by kmalloc.
+///
+/// ```
+/// use core::ptr::{NonNull, copy_nonoverlapping};
+/// use core::sync::atomic::{AtomicU64, Ordering};
+/// use kernel::alloc::{Flags, flags, KBox, KVec, NumaNode};
+/// use kernel::page::PAGE_SHIFT;
+/// use kernel::prelude::EINVAL;
+/// use kernel::str::CString;
+/// use kernel::zpool::*;
+///
+/// struct MyZpool {
+///     name: CString,
+///     bytes_used: AtomicU64,
+/// }
+///
+/// struct MyZpoolDriver;
+///
+/// impl ZpoolDriver for MyZpoolDriver {
+///     type Pool = KBox<MyZpool>;
+///
+///     fn create(name: CString, gfp: Flags) -> Result<KBox<MyZpool>> {
+///         let my_pool = MyZpool { name, bytes_used: AtomicU64::new(0) };
+///         let pool = KBox::new(my_pool, gfp)?;
+///
+///         pr_debug!("Pool {:?} created\n", pool.name);
+///         Ok(pool)
+///     }
+///
+///     fn malloc(pool: &MyZpool, size: usize, _gfp: Flags, _nid: NumaNode) -> Result<ZpoolHandle> {
+///         let pow = size.next_power_of_two().trailing_zeros().max(6);
+///         match pow {
+///             0 => Err(EINVAL),
+///             m if m > PAGE_SHIFT as u32 => Err(ENOSPC),
+///             _ => {
+///                 let vec = KVec::<u8>::with_capacity(1 << pow, GFP_KERNEL)?;
+///                 let (ptr, _len, _cap) = vec.into_raw_parts();
+///
+///                 // We assume that kmalloc-64, kmalloc-128 etc. kmem caches will be used for
+///                 // our allocations, so it's actually `1 << pow` bytes that we have consumed.
+///                 pool.bytes_used.fetch_add(1 << pow, Ordering::Relaxed);
+///
+///                 // `kmalloc` guarantees that an allocation of size x*2^n is 2^n aligned.
+///                 // Therefore the 6 lower bits are zeros and we can use these to store `pow`.
+///                 Ok(ZpoolHandle::from_raw(ptr as usize | (pow as usize - 6)))
+///             }
+///         }
+///     }
+///
+///     unsafe fn free(pool: &MyZpool, handle: ZpoolHandle) {
+///         let h = handle.as_raw();
+///         let n = (h & 0x3F) + 6;
+///         let uptr = h & !0x3F;
+///
+///         // SAFETY:
+///         // - we derive `uptr` from handle by zeroing 6 lower bits where we store the power
+///         //   denominator for the vector capacity. As noted above, the result will be exactly the
+///         //   pointer to the area allocated by `KVec`. Thus, uptr is a valid pointer pointing to
+///         //   the vector allocated by `alloc` function above.
+///         // - 1 << n == capacity and is coming from the first 6 bits of handle.
+///         let vec = unsafe { KVec::<u8>::from_raw_parts(uptr as *mut u8, 0, 1 << n) };
+///         drop(vec);
+///         pool.bytes_used.fetch_sub(1 << n, Ordering::Relaxed);
+///     }
+///
+///     unsafe fn read_begin(_pool: &MyZpool, handle: ZpoolHandle) -> NonNull<u8> {
+///         let uptr = handle.as_raw() & !0x3F;
+///         // SAFETY:
+///         // - we derive `uptr` from handle by zeroing 6 lower bits where we store the power
+///         //   denominator for the vector capacity. As noted above, the result will be exactly the
+///         //   pointer to the area allocated by `KVec`. Thus, uptr is a valid pointer pointing to
+///         //   the vector allocated by `alloc` function above.
+///         unsafe { NonNull::new_unchecked(uptr as *mut u8) }
+///     }
+///
+///     unsafe fn read_end(_pool: &MyZpool, _handle: ZpoolHandle, _handle_mem: NonNull<u8>) {}
+///
+///     unsafe fn write(_p: &MyZpool, handle: ZpoolHandle, h_mem: NonNull<u8>, mem_len: usize) {
+///         let uptr = handle.as_raw() & !0x3F;
+///         // SAFETY:
+///         // - `h_mem` is a valid non-null pointer provided by zswap.
+///         // - `uptr` is derived from handle by zeroing 6 lower bits where we store the power
+///         //   denominator for the vector capacity. As noted above, the result will be exactly the
+///         //   pointer to the area allocated by `KVec`. Thus, uptr is a valid pointer pointing to
+///         //   the vector allocated by `alloc` function above.
+///         unsafe {
+///             copy_nonoverlapping(h_mem.as_ptr().cast(), uptr as *mut c_void, mem_len)
+///         };
+///     }
+///
+///     fn total_pages(pool: &MyZpool) -> u64 {
+///         pool.bytes_used.load(Ordering::Relaxed) >> PAGE_SHIFT
+///     }
+/// }
+///
+/// // Uncomment this for compile time registration (disabled to avoid build error when KUNIT
+/// // tests and zsmalloc are enabled):
+/// // kernel::DECLARE_ZPOOL!(MyZpoolDriver);
+/// ```
+///
+pub trait ZpoolDriver {
+    /// Opaque Rust representation of `struct zpool`.
+    type Pool: ForeignOwnable;
+
+    /// Create a pool.
+    fn create(name: CString, gfp: Flags) -> Result<Self::Pool>;
+
+    /// Allocate an object of `size` bytes from `pool`, with the allocation flags `gfp` and
+    /// preferred NUMA node `nid`. If the allocation is successful, an opaque handle is returned.
+    fn malloc(
+        pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>,
+        size: usize,
+        gfp: Flags,
+        nid: NumaNode,
+    ) -> Result<ZpoolHandle>;
+
+    /// Free an object previously allocated from the `pool`, represented by `handle`.
+    ///
+    /// # Safety
+    ///
+    /// - `handle` must be a valid handle previously returned by `malloc`.
+    /// - `handle` must not be used any more after the call to `free`.
+    unsafe fn free(pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>, handle: ZpoolHandle);
+
+    /// Make all the necessary preparations for the caller to be able to read from the object
+    /// represented by `handle` and return a valid pointer to that object's memory to be read.
+    ///
+    /// # Safety
+    ///
+    /// - `handle` must be a valid handle previously returned by `malloc`.
+    /// - `read_end` with the same `handle` must be called for each `read_begin`.
+    unsafe fn read_begin(
+        pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>,
+        handle: ZpoolHandle,
+    ) -> NonNull<u8>;
+
+    /// Finish reading from a previously allocated `handle`. `handle_mem` must be the pointer
+    /// previously returned by `read_begin`.
+    ///
+    /// # Safety
+    ///
+    /// - `handle` must be a valid handle previously returned by `malloc`.
+    /// - `handle_mem` must be the pointer previously returned by `read_begin`.
+    unsafe fn read_end(
+        pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>,
+        handle: ZpoolHandle,
+        handle_mem: NonNull<u8>,
+    );
+
+    /// Write to the object represented by a previously allocated `handle`. `handle_mem` points
+    /// to the memory to copy data from, and `mem_len` defines the length of the data block to
+    /// be copied.
+    ///
+    /// # Safety
+    ///
+    /// - `handle` must be a valid handle previously returned by `malloc`.
+    /// - `handle_mem` must be a valid pointer into the allocated memory aread represented by
+    ///   `handle`.
+    /// - `handle_mem + mem_len - 1` must not point outside the allocated memory area.
+    unsafe fn write(
+        pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>,
+        handle: ZpoolHandle,
+        handle_mem: NonNull<u8>,
+        mem_len: usize,
+    );
+
+    /// Get the number of pages used by the `pool`.
+    fn total_pages(pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>) -> u64;
+}
+
+/// Provide a zpool allocator to zswap
+#[macro_export]
+macro_rules! DECLARE_ZPOOL {
+    ($t: ident) => {
+        use core::ptr::null_mut;
+        use kernel::error::from_result;
+        use kernel::types::ForeignOwnable;
+
+        const __LOG_PREFIX: &[u8] = b"zpool_rust\0";
+
+        fn handle_from_result<T, F>(f: F) -> T
+        where
+            T: From<usize>,
+            F: FnOnce() -> Result<T>,
+        {
+            match f() {
+                Ok(v) => v,
+                Err(e) => T::from(0),
+            }
+        }
+
+        /// Create a pool.
+        #[no_mangle]
+        pub unsafe extern "C" fn zpool_create_pool(name: *const c_uchar) -> *mut c_void {
+            match (|| -> Result<<$t as ZpoolDriver>::Pool> {
+                // SAFETY: the memory pointed to by name is guaranteed by `zpool` to be a valid
+                // string.
+                let name_r = unsafe { CStr::from_char_ptr(name).to_cstring() }?;
+                $t::create(name_r, flags::GFP_KERNEL)
+            })() {
+                Err(_) => null_mut(),
+                Ok(pool) => <$t as ZpoolDriver>::Pool::into_foreign(pool),
+            }
+        }
+
+        /// Destroy the pool.
+        #[no_mangle]
+        pub unsafe extern "C" fn zpool_destroy_pool(pool: *mut c_void) {
+            // SAFETY: The pointer originates from an `into_foreign` call.
+            drop(unsafe { <$t as ZpoolDriver>::Pool::from_foreign(pool) })
+        }
+
+        /// Allocate an object of `size` bytes from `pool`, with the allocation flags `gfp` and
+        /// preferred NUMA node `nid`. If the allocation is successful, an opaque handle
+        /// is returned.
+        #[no_mangle]
+        pub unsafe extern "C" fn zpool_malloc(
+            pool: *mut c_void,
+            size: usize,
+            gfp: u32,
+            nid: c_int,
+        ) -> c_ulong {
+            // SAFETY: The pointer originates from an `into_foreign` call. If `pool` is passed to
+            // `from_foreign`, then that happens in `zpool_destroy_pool` which will not be called
+            // during this method.
+            let pool = unsafe { <$t as ZpoolDriver>::Pool::borrow(pool) };
+            handle_from_result(|| {
+                let the_nid = match nid {
+                    kernel::bindings::NUMA_NO_NODE => NumaNode::NO_NODE,
+                    _ => NumaNode::new(nid)?,
+                };
+                let h = $t::malloc(
+                    pool,
+                    size,
+                    flags::GFP_NOWAIT | flags::__GFP_HIGHMEM,
+                    the_nid,
+                )?;
+                Ok(h.as_raw())
+            })
+        }
+
+        /// Free an object previously allocated from the `pool`, represented by `handle`.
+        #[no_mangle]
+        pub unsafe extern "C" fn zpool_free(pool: *mut c_void, handle: usize) {
+            // SAFETY: The pointer originates from an `into_foreign` call. If `pool` is passed to
+            // `from_foreign`, then that happens in `zpool_destroy_pool` which will not be called
+            // during this method.
+            let pool = unsafe { <$t as ZpoolDriver>::Pool::borrow(pool) };
+
+            // SAFETY:
+            // - the caller (`zswap`) guarantees that `handle` is a valid handle previously
+            // allocated by `malloc`.
+            // - the caller (`zswap`) guarantees that it will not call any other function with this
+            //   `handle` as a parameter after this call.
+            unsafe { $t::free(pool, ZpoolHandle::from_raw(handle)) }
+        }
+
+        /// Make all the necessary preparations for the caller to be able to read from the object
+        /// represented by `handle` and return a valid pointer to that object's memory to be read.
+        #[no_mangle]
+        pub unsafe extern "C" fn zpool_obj_read_begin(
+            pool: *mut c_void,
+            handle: usize,
+            _local_copy: *mut c_void,
+        ) -> *mut c_void {
+            // SAFETY: The pointer originates from an `into_foreign` call. If `pool` is passed to
+            // `from_foreign`, then that happens in `zpool_destroy_pool` which will not be called
+            // during this method.
+            let pool = unsafe { <$t as ZpoolDriver>::Pool::borrow(pool) };
+
+            // SAFETY: the caller (`zswap`) guarantees that `handle` is a valid handle previously
+            // allocated by `malloc`.
+            let non_null_ptr = unsafe { $t::read_begin(pool, ZpoolHandle::from_raw(handle)) };
+            non_null_ptr.as_ptr().cast()
+        }
+
+        /// Finish reading from a previously allocated `handle`. `handle_mem` must be the pointer
+        /// previously returned by `read_begin`.
+        #[no_mangle]
+        pub unsafe extern "C" fn zpool_obj_read_end(
+            pool: *mut c_void,
+            handle: usize,
+            handle_mem: *mut c_void,
+        ) {
+            // SAFETY: The pointer originates from an `into_foreign` call. If `pool` is passed to
+            // `from_foreign`, then that happens in `zpool_destroy_pool` which will not be called
+            // during this method.
+            let pool = unsafe { <$t as ZpoolDriver>::Pool::borrow(pool) };
+
+            // SAFETY: `handle_mem` is guaranteed to be non-null by the caller (`zswap`).
+            let handle_mem_ptr = unsafe { NonNull::new_unchecked(handle_mem.cast()) };
+
+            // SAFETY: the caller (`zswap`) guarantees that `handle` is a valid handle previously
+            // allocated by `malloc`.
+            unsafe { $t::read_end(pool, ZpoolHandle::from_raw(handle), handle_mem_ptr) }
+        }
+
+        /// Write to the object represented by a previously allocated `handle`. `handle_mem` points
+        /// to the memory to copy data from, and `mem_len` defines the length of the data block to
+        /// be copied.
+        #[no_mangle]
+        pub unsafe extern "C" fn zpool_obj_write(
+            pool: *mut c_void,
+            handle: usize,
+            handle_mem: *mut c_void,
+            mem_len: usize,
+        ) {
+            // SAFETY: The pointer originates from an `into_foreign` call. If `pool` is passed to
+            // `from_foreign`, then that happens in `zpool_destroy_pool` which will not be called
+            // during this method.
+            let pool = unsafe { <$t as ZpoolDriver>::Pool::borrow(pool) };
+
+            // SAFETY: `handle_mem` is guaranteed to be non-null by the caller (`zswap`).
+            let handle_mem_ptr = unsafe { NonNull::new_unchecked(handle_mem.cast()) };
+
+            // SAFETY: the caller (`zswap`) guarantees that `handle` is a valid handle previously
+            // allocated by `malloc`.
+            unsafe {
+                $t::write(pool, ZpoolHandle::from_raw(handle), handle_mem_ptr, mem_len);
+            }
+        }
+
+        /// Get the number of pages used by the `pool`.
+        #[no_mangle]
+        pub unsafe extern "C" fn zpool_get_total_pages(pool: *mut c_void) -> u64 {
+            // SAFETY: The pointer originates from an `into_foreign` call. If `pool` is passed to
+            // `from_foreign`, then that happens in `zpool_destroy_pool` which will not be called
+            // during this method.
+            let pool = unsafe { <$t as ZpoolDriver>::Pool::borrow(pool) };
+            $t::total_pages(pool)
+        }
+    };
+}
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [syzbot ci] Re: rust: zpool: add API for C and Rust
  2025-09-23 10:25 [PATCH v6 0/2] rust: zpool: add API for C and Rust Vitaly Wool
  2025-09-23 10:26 ` [PATCH v6 1/2] mm: reinstate zpool as a thin API Vitaly Wool
  2025-09-23 10:27 ` [PATCH v6 2/2] rust: zpool: add abstraction for zpool drivers Vitaly Wool
@ 2025-09-23 16:50 ` syzbot ci
  2025-09-23 21:59   ` Johannes Weiner
  2 siblings, 1 reply; 11+ messages in thread
From: syzbot ci @ 2025-09-23 16:50 UTC (permalink / raw)
  To: a.hindborg, akpm, alex.gaynor, aliceryhl, bjorn3_gh, boqun.feng,
	chengming.zhou, dakr, david, gary, gregkh, hannes, liam.howlett,
	linux-kernel, linux-mm, lorenzo.stoakes, lossin, mhocko, minchan,
	nphamcs, ojeda, rppt, rust-for-linux, senozhatsky, surenb,
	tmgross, vbabka, vitaly.wool, yosry.ahmed
  Cc: syzbot, syzkaller-bugs

syzbot ci has tested the following series

[v6] rust: zpool: add API for C and Rust
https://lore.kernel.org/all/20250923102547.2545992-1-vitaly.wool@konsulko.se
* [PATCH v6 1/2] mm: reinstate zpool as a thin API
* [PATCH v6 2/2] rust: zpool: add abstraction for zpool drivers

and found the following issues:
* BUG: unable to handle kernel NULL pointer dereference in zswap_store
* KASAN: slab-out-of-bounds Read in zpool_get_total_pages
* KASAN: slab-out-of-bounds Read in zswap_store
* KASAN: slab-use-after-free Read in zpool_get_total_pages
* KASAN: use-after-free Read in zpool_get_total_pages

Full report is available here:
https://ci.syzbot.org/series/e8b22352-ae56-4d7c-9113-75573acf2b64

***

BUG: unable to handle kernel NULL pointer dereference in zswap_store

tree:      linux-next
URL:       https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next
base:      8f7f8b1b3f4c613dd886f53f768f82816b41eaa3
arch:      amd64
compiler:  Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
config:    https://ci.syzbot.org/builds/6739e899-fcf7-4f71-b943-5ad0ca0ef8eb/config
syz repro: https://ci.syzbot.org/findings/c2ea1ccf-0bb7-4479-ac6d-d6e8e80efa8b/syz_repro

BUG: kernel NULL pointer dereference, address: 0000000000000034
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 800000010f582067 P4D 800000010f582067 PUD 0 
Oops: Oops: 0000 [#1] SMP KASAN PTI
CPU: 1 UID: 0 PID: 6005 Comm: syz.2.21 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:kmem_cache_alloc_noprof+0x2e/0x6e0 mm/slub.c:5252
Code: 55 41 57 41 56 41 55 41 54 53 48 83 ec 38 89 f5 49 89 fe 65 48 8b 05 c1 43 ab 10 48 89 44 24 30 48 8b 44 24 68 48 89 44 24 18 <8b> 47 34 48 89 44 24 08 8b 1d 44 78 ab 0d 21 f3 89 df e8 db 9c fd
RSP: 0018:ffffc90002dee640 EFLAGS: 00010282
RAX: ffffffff822e7088 RBX: 0000000000012800 RCX: ffff888105ee3a00
RDX: 0000000000000000 RSI: 0000000000012800 RDI: 0000000000000000
RBP: 0000000000012800 R08: ffff888105ee3a00 R09: 0000000000000002
R10: 00000000fffffff0 R11: 0000000000000000 R12: ffff88801d2b1aa0
R13: 1ffff11003a56454 R14: 0000000000000000 R15: 0000000000000020
FS:  00007f819a1c56c0(0000) GS:ffff8881a39dd000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000034 CR3: 000000010e704000 CR4: 00000000000006f0
Call Trace:
 <TASK>
 cache_alloc_handle mm/zsmalloc.c:410 [inline]
 zs_malloc+0x88/0x720 mm/zsmalloc.c:1281
 zswap_compress mm/zswap.c:910 [inline]
 zswap_store_page mm/zswap.c:1429 [inline]
 zswap_store+0x1062/0x1f40 mm/zswap.c:1540
 swap_writeout+0x710/0xd70 mm/page_io.c:275
 writeout mm/vmscan.c:662 [inline]
 pageout mm/vmscan.c:721 [inline]
 shrink_folio_list+0x3011/0x4c70 mm/vmscan.c:1453
 reclaim_folio_list+0xeb/0x500 mm/vmscan.c:2233
 reclaim_pages+0x454/0x520 mm/vmscan.c:2270
 madvise_cold_or_pageout_pte_range+0x1974/0x1d00 mm/madvise.c:565
 walk_pmd_range mm/pagewalk.c:130 [inline]
 walk_pud_range mm/pagewalk.c:224 [inline]
 walk_p4d_range mm/pagewalk.c:262 [inline]
 walk_pgd_range+0xfe9/0x1d40 mm/pagewalk.c:303
 __walk_page_range+0x14c/0x710 mm/pagewalk.c:410
 walk_page_range_vma+0x393/0x440 mm/pagewalk.c:705
 madvise_pageout_page_range mm/madvise.c:624 [inline]
 madvise_pageout mm/madvise.c:649 [inline]
 madvise_vma_behavior+0x311f/0x3a10 mm/madvise.c:1352
 madvise_walk_vmas+0x51c/0xa30 mm/madvise.c:1669
 madvise_do_behavior+0x38e/0x550 mm/madvise.c:1885
 do_madvise+0x1bc/0x270 mm/madvise.c:1978
 __do_sys_madvise mm/madvise.c:1987 [inline]
 __se_sys_madvise mm/madvise.c:1985 [inline]
 __x64_sys_madvise+0xa7/0xc0 mm/madvise.c:1985
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xfa/0xfa0 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f819938ec29
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f819a1c5038 EFLAGS: 00000246 ORIG_RAX: 000000000000001c
RAX: ffffffffffffffda RBX: 00007f81995d5fa0 RCX: 00007f819938ec29
RDX: 0000000000000015 RSI: 0000000000600003 RDI: 0000200000000000
RBP: 00007f8199411e41 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f81995d6038 R14: 00007f81995d5fa0 R15: 00007ffe68a453c8
 </TASK>
Modules linked in:
CR2: 0000000000000034
---[ end trace 0000000000000000 ]---
RIP: 0010:kmem_cache_alloc_noprof+0x2e/0x6e0 mm/slub.c:5252
Code: 55 41 57 41 56 41 55 41 54 53 48 83 ec 38 89 f5 49 89 fe 65 48 8b 05 c1 43 ab 10 48 89 44 24 30 48 8b 44 24 68 48 89 44 24 18 <8b> 47 34 48 89 44 24 08 8b 1d 44 78 ab 0d 21 f3 89 df e8 db 9c fd
RSP: 0018:ffffc90002dee640 EFLAGS: 00010282
RAX: ffffffff822e7088 RBX: 0000000000012800 RCX: ffff888105ee3a00
RDX: 0000000000000000 RSI: 0000000000012800 RDI: 0000000000000000
RBP: 0000000000012800 R08: ffff888105ee3a00 R09: 0000000000000002
R10: 00000000fffffff0 R11: 0000000000000000 R12: ffff88801d2b1aa0
R13: 1ffff11003a56454 R14: 0000000000000000 R15: 0000000000000020
FS:  00007f819a1c56c0(0000) GS:ffff8881a39dd000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000034 CR3: 000000010e704000 CR4: 00000000000006f0
----------------
Code disassembly (best guess):
   0:	55                   	push   %rbp
   1:	41 57                	push   %r15
   3:	41 56                	push   %r14
   5:	41 55                	push   %r13
   7:	41 54                	push   %r12
   9:	53                   	push   %rbx
   a:	48 83 ec 38          	sub    $0x38,%rsp
   e:	89 f5                	mov    %esi,%ebp
  10:	49 89 fe             	mov    %rdi,%r14
  13:	65 48 8b 05 c1 43 ab 	mov    %gs:0x10ab43c1(%rip),%rax        # 0x10ab43dc
  1a:	10
  1b:	48 89 44 24 30       	mov    %rax,0x30(%rsp)
  20:	48 8b 44 24 68       	mov    0x68(%rsp),%rax
  25:	48 89 44 24 18       	mov    %rax,0x18(%rsp)
* 2a:	8b 47 34             	mov    0x34(%rdi),%eax <-- trapping instruction
  2d:	48 89 44 24 08       	mov    %rax,0x8(%rsp)
  32:	8b 1d 44 78 ab 0d    	mov    0xdab7844(%rip),%ebx        # 0xdab787c
  38:	21 f3                	and    %esi,%ebx
  3a:	89 df                	mov    %ebx,%edi
  3c:	e8                   	.byte 0xe8
  3d:	db                   	.byte 0xdb
  3e:	9c                   	pushf
  3f:	fd                   	std


***

KASAN: slab-out-of-bounds Read in zpool_get_total_pages

tree:      linux-next
URL:       https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next
base:      8f7f8b1b3f4c613dd886f53f768f82816b41eaa3
arch:      amd64
compiler:  Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
config:    https://ci.syzbot.org/builds/6739e899-fcf7-4f71-b943-5ad0ca0ef8eb/config
C repro:   https://ci.syzbot.org/findings/a2aa8de3-367f-4cb7-b39a-e3eb65596e6d/c_repro
syz repro: https://ci.syzbot.org/findings/a2aa8de3-367f-4cb7-b39a-e3eb65596e6d/syz_repro

==================================================================
BUG: KASAN: slab-out-of-bounds in instrument_atomic_read include/linux/instrumented.h:68 [inline]
BUG: KASAN: slab-out-of-bounds in atomic_long_read include/linux/atomic/atomic-instrumented.h:3188 [inline]
BUG: KASAN: slab-out-of-bounds in zs_get_total_pages mm/zsmalloc.c:1066 [inline]
BUG: KASAN: slab-out-of-bounds in zpool_get_total_pages+0x46/0x70 mm/zsmalloc.c:436
Read of size 8 at addr ffff88810ccc7b10 by task syz.0.17/5992

CPU: 0 UID: 0 PID: 5992 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0x189/0x250 lib/dump_stack.c:120
 print_address_description mm/kasan/report.c:378 [inline]
 print_report+0xca/0x240 mm/kasan/report.c:482
 kasan_report+0x118/0x150 mm/kasan/report.c:595
 check_region_inline mm/kasan/generic.c:-1 [inline]
 kasan_check_range+0x2b0/0x2c0 mm/kasan/generic.c:200
 instrument_atomic_read include/linux/instrumented.h:68 [inline]
 atomic_long_read include/linux/atomic/atomic-instrumented.h:3188 [inline]
 zs_get_total_pages mm/zsmalloc.c:1066 [inline]
 zpool_get_total_pages+0x46/0x70 mm/zsmalloc.c:436
 zswap_total_pages+0xf6/0x1e0 mm/zswap.c:457
 zswap_check_limits mm/zswap.c:465 [inline]
 zswap_store+0x52f/0x1f40 mm/zswap.c:1521
 swap_writeout+0x710/0xd70 mm/page_io.c:275
 writeout mm/vmscan.c:662 [inline]
 pageout mm/vmscan.c:721 [inline]
 shrink_folio_list+0x3011/0x4c70 mm/vmscan.c:1453
 reclaim_folio_list+0xeb/0x500 mm/vmscan.c:2233
 reclaim_pages+0x454/0x520 mm/vmscan.c:2270
 madvise_cold_or_pageout_pte_range+0x1974/0x1d00 mm/madvise.c:565
 walk_pmd_range mm/pagewalk.c:130 [inline]
 walk_pud_range mm/pagewalk.c:224 [inline]
 walk_p4d_range mm/pagewalk.c:262 [inline]
 walk_pgd_range+0xfe9/0x1d40 mm/pagewalk.c:303
 __walk_page_range+0x14c/0x710 mm/pagewalk.c:410
 walk_page_range_vma+0x393/0x440 mm/pagewalk.c:705
 madvise_pageout_page_range mm/madvise.c:624 [inline]
 madvise_pageout mm/madvise.c:649 [inline]
 madvise_vma_behavior+0x311f/0x3a10 mm/madvise.c:1352
 madvise_walk_vmas+0x51c/0xa30 mm/madvise.c:1669
 madvise_do_behavior+0x38e/0x550 mm/madvise.c:1885
 do_madvise+0x1bc/0x270 mm/madvise.c:1978
 __do_sys_madvise mm/madvise.c:1987 [inline]
 __se_sys_madvise mm/madvise.c:1985 [inline]
 __x64_sys_madvise+0xa7/0xc0 mm/madvise.c:1985
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xfa/0xfa0 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f526318ec29
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffdd0a28298 EFLAGS: 00000246 ORIG_RAX: 000000000000001c
RAX: ffffffffffffffda RBX: 00007f52633d5fa0 RCX: 00007f526318ec29
RDX: 0000000000000015 RSI: 0000000000003000 RDI: 0000200000000000
RBP: 00007f5263211e41 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f52633d5fa0 R14: 00007f52633d5fa0 R15: 0000000000000003
 </TASK>

Allocated by task 1:
 kasan_save_stack mm/kasan/common.c:56 [inline]
 kasan_save_track+0x3e/0x80 mm/kasan/common.c:77
 poison_kmalloc_redzone mm/kasan/common.c:400 [inline]
 __kasan_kmalloc+0x93/0xb0 mm/kasan/common.c:417
 kasan_kmalloc include/linux/kasan.h:262 [inline]
 __do_kmalloc_node mm/slub.c:5602 [inline]
 __kmalloc_noprof+0x411/0x7f0 mm/slub.c:5614
 kmalloc_noprof include/linux/slab.h:961 [inline]
 kzalloc_noprof include/linux/slab.h:1094 [inline]
 shrinker_alloc+0x199/0xa80 mm/shrinker.c:724
 binder_alloc_shrinker_init+0x45/0xe0 drivers/android/binder_alloc.c:1265
 binder_init+0x17/0x260 drivers/android/binder.c:7095
 do_one_initcall+0x236/0x820 init/main.c:1283
 do_initcall_level+0x104/0x190 init/main.c:1345
 do_initcalls+0x59/0xa0 init/main.c:1361
 kernel_init_freeable+0x334/0x4b0 init/main.c:1593
 kernel_init+0x1d/0x1d0 init/main.c:1483
 ret_from_fork+0x4bc/0x870 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245

The buggy address belongs to the object at ffff88810ccc7b00
 which belongs to the cache kmalloc-8 of size 8
The buggy address is located 8 bytes to the right of
 allocated 8-byte region [ffff88810ccc7b00, ffff88810ccc7b08)

The buggy address belongs to the physical page:
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x10ccc7
anon flags: 0x57ff00000000000(node=1|zone=2|lastcpupid=0x7ff)
page_type: f5(slab)
raw: 057ff00000000000 ffff88801a841500 0000000000000000 dead000000000001
raw: 0000000000000000 0000000000800080 00000000f5000000 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 0, migratetype Unmovable, gfp_mask 0x52cc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP), pid 1, tgid 1 (swapper/0), ts 16130935485, free_ts 0
 set_page_owner include/linux/page_owner.h:32 [inline]
 post_alloc_hook+0x240/0x2a0 mm/page_alloc.c:1850
 prep_new_page mm/page_alloc.c:1858 [inline]
 get_page_from_freelist+0x21e4/0x22c0 mm/page_alloc.c:3869
 __alloc_frozen_pages_noprof+0x181/0x370 mm/page_alloc.c:5159
 alloc_pages_mpol+0x232/0x4a0 mm/mempolicy.c:2416
 alloc_slab_page mm/slub.c:3023 [inline]
 allocate_slab+0x96/0x3a0 mm/slub.c:3196
 new_slab mm/slub.c:3250 [inline]
 ___slab_alloc+0xe94/0x1920 mm/slub.c:4626
 __slab_alloc+0x65/0x100 mm/slub.c:4745
 __slab_alloc_node mm/slub.c:4821 [inline]
 slab_alloc_node mm/slub.c:5232 [inline]
 __do_kmalloc_node mm/slub.c:5601 [inline]
 __kmalloc_node_track_caller_noprof+0x5c7/0x800 mm/slub.c:5711
 __kmemdup_nul mm/util.c:64 [inline]
 kstrdup+0x42/0x100 mm/util.c:84
 __kernfs_new_node+0x9c/0x7e0 fs/kernfs/dir.c:633
 kernfs_new_node+0x102/0x210 fs/kernfs/dir.c:713
 kernfs_create_link+0xa7/0x200 fs/kernfs/symlink.c:39
 sysfs_do_create_link_sd+0x83/0x110 fs/sysfs/symlink.c:44
 device_create_sys_dev_entry+0x11a/0x180 drivers/base/core.c:3515
 device_add+0x733/0xb50 drivers/base/core.c:3659
 __video_register_device+0x3dc1/0x4ca0 drivers/media/v4l2-core/v4l2-dev.c:1076
page_owner free stack trace missing

Memory state around the buggy address:
 ffff88810ccc7a00: 00 fc fc fc 05 fc fc fc 00 fc fc fc 00 fc fc fc
 ffff88810ccc7a80: 00 fc fc fc 05 fc fc fc 06 fc fc fc 06 fc fc fc
>ffff88810ccc7b00: 00 fc fc fc 06 fc fc fc 07 fc fc fc 07 fc fc fc
                         ^
 ffff88810ccc7b80: 04 fc fc fc 06 fc fc fc 00 fc fc fc 00 fc fc fc
 ffff88810ccc7c00: 07 fc fc fc 00 fc fc fc 06 fc fc fc 05 fc fc fc
==================================================================


***

KASAN: slab-out-of-bounds Read in zswap_store

tree:      linux-next
URL:       https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next
base:      8f7f8b1b3f4c613dd886f53f768f82816b41eaa3
arch:      amd64
compiler:  Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
config:    https://ci.syzbot.org/builds/6739e899-fcf7-4f71-b943-5ad0ca0ef8eb/config
syz repro: https://ci.syzbot.org/findings/cba572f0-863b-4f91-9ac0-c6f5a16096af/syz_repro

==================================================================
BUG: KASAN: slab-out-of-bounds in cache_alloc_handle mm/zsmalloc.c:410 [inline]
BUG: KASAN: slab-out-of-bounds in zs_malloc+0x77/0x720 mm/zsmalloc.c:1281
Read of size 8 at addr ffff888100ae7680 by task syz.2.19/6046

CPU: 1 UID: 0 PID: 6046 Comm: syz.2.19 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0x189/0x250 lib/dump_stack.c:120
 print_address_description mm/kasan/report.c:378 [inline]
 print_report+0xca/0x240 mm/kasan/report.c:482
 kasan_report+0x118/0x150 mm/kasan/report.c:595
 cache_alloc_handle mm/zsmalloc.c:410 [inline]
 zs_malloc+0x77/0x720 mm/zsmalloc.c:1281
 zswap_compress mm/zswap.c:910 [inline]
 zswap_store_page mm/zswap.c:1429 [inline]
 zswap_store+0x1062/0x1f40 mm/zswap.c:1540
 swap_writeout+0x710/0xd70 mm/page_io.c:275
 writeout mm/vmscan.c:662 [inline]
 pageout mm/vmscan.c:721 [inline]
 shrink_folio_list+0x3011/0x4c70 mm/vmscan.c:1453
 reclaim_folio_list+0xeb/0x500 mm/vmscan.c:2233
 reclaim_pages+0x454/0x520 mm/vmscan.c:2270
 madvise_cold_or_pageout_pte_range+0x1974/0x1d00 mm/madvise.c:565
 walk_pmd_range mm/pagewalk.c:130 [inline]
 walk_pud_range mm/pagewalk.c:224 [inline]
 walk_p4d_range mm/pagewalk.c:262 [inline]
 walk_pgd_range+0xfe9/0x1d40 mm/pagewalk.c:303
 __walk_page_range+0x14c/0x710 mm/pagewalk.c:410
 walk_page_range_vma+0x393/0x440 mm/pagewalk.c:705
 madvise_pageout_page_range mm/madvise.c:624 [inline]
 madvise_pageout mm/madvise.c:649 [inline]
 madvise_vma_behavior+0x311f/0x3a10 mm/madvise.c:1352
 madvise_walk_vmas+0x51c/0xa30 mm/madvise.c:1669
 madvise_do_behavior+0x38e/0x550 mm/madvise.c:1885
 do_madvise+0x1bc/0x270 mm/madvise.c:1978
 __do_sys_madvise mm/madvise.c:1987 [inline]
 __se_sys_madvise mm/madvise.c:1985 [inline]
 __x64_sys_madvise+0xa7/0xc0 mm/madvise.c:1985
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xfa/0xfa0 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fa813d8ec29
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fa814c4c038 EFLAGS: 00000246 ORIG_RAX: 000000000000001c
RAX: ffffffffffffffda RBX: 00007fa813fd5fa0 RCX: 00007fa813d8ec29
RDX: 0000000000000015 RSI: 0000000000600000 RDI: 0000200000000000
RBP: 00007fa813e11e41 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007fa813fd6038 R14: 00007fa813fd5fa0 R15: 00007ffcee547488
 </TASK>

Allocated by task 1:
 kasan_save_stack mm/kasan/common.c:56 [inline]
 kasan_save_track+0x3e/0x80 mm/kasan/common.c:77
 unpoison_slab_object mm/kasan/common.c:342 [inline]
 __kasan_slab_alloc+0x6c/0x80 mm/kasan/common.c:368
 kasan_slab_alloc include/linux/kasan.h:252 [inline]
 slab_post_alloc_hook mm/slub.c:4945 [inline]
 slab_alloc_node mm/slub.c:5244 [inline]
 kmem_cache_alloc_noprof+0x367/0x6e0 mm/slub.c:5251
 __kernfs_new_node+0xd7/0x7e0 fs/kernfs/dir.c:637
 kernfs_new_node+0x102/0x210 fs/kernfs/dir.c:713
 __kernfs_create_file+0x4b/0x2e0 fs/kernfs/file.c:1057
 sysfs_add_file_mode_ns+0x238/0x300 fs/sysfs/file.c:313
 create_files fs/sysfs/group.c:76 [inline]
 internal_create_group+0x66d/0x1110 fs/sysfs/group.c:183
 internal_create_groups fs/sysfs/group.c:223 [inline]
 sysfs_create_groups+0x59/0x120 fs/sysfs/group.c:249
 device_add_groups drivers/base/core.c:2836 [inline]
 device_add_attrs+0x1c4/0x5a0 drivers/base/core.c:2911
 device_add+0x496/0xb50 drivers/base/core.c:3643
 usb_new_device+0xa39/0x16f0 drivers/usb/core/hub.c:2694
 register_root_hub+0x275/0x590 drivers/usb/core/hcd.c:994
 usb_add_hcd+0xba1/0x1050 drivers/usb/core/hcd.c:2993
 vhci_hcd_probe+0x1c1/0x380 drivers/usb/usbip/vhci_hcd.c:1377
 platform_probe+0xf9/0x190 drivers/base/platform.c:1405
 call_driver_probe drivers/base/dd.c:-1 [inline]
 really_probe+0x26d/0x9e0 drivers/base/dd.c:659
 __driver_probe_device+0x18c/0x2f0 drivers/base/dd.c:801
 driver_probe_device+0x4f/0x430 drivers/base/dd.c:831
 __device_attach_driver+0x2ce/0x530 drivers/base/dd.c:959
 bus_for_each_drv+0x251/0x2e0 drivers/base/bus.c:462
 __device_attach+0x2b8/0x400 drivers/base/dd.c:1031
 bus_probe_device+0x185/0x260 drivers/base/bus.c:537
 device_add+0x7b6/0xb50 drivers/base/core.c:3689
 platform_device_add+0x4b4/0x820 drivers/base/platform.c:716
 platform_device_register_full+0x46c/0x570 drivers/base/platform.c:844
 vhci_hcd_init+0x1bc/0x310 drivers/usb/usbip/vhci_hcd.c:1533
 do_one_initcall+0x236/0x820 init/main.c:1283
 do_initcall_level+0x104/0x190 init/main.c:1345
 do_initcalls+0x59/0xa0 init/main.c:1361
 kernel_init_freeable+0x334/0x4b0 init/main.c:1593
 kernel_init+0x1d/0x1d0 init/main.c:1483
 ret_from_fork+0x4bc/0x870 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245

The buggy address belongs to the object at ffff888100ae75a0
 which belongs to the cache kernfs_node_cache of size 176
The buggy address is located 48 bytes to the right of
 allocated 176-byte region [ffff888100ae75a0, ffff888100ae7650)

The buggy address belongs to the physical page:
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x100ae7
flags: 0x57ff00000000000(node=1|zone=2|lastcpupid=0x7ff)
page_type: f5(slab)
raw: 057ff00000000000 ffff888100015dc0 dead000000000122 0000000000000000
raw: 0000000000000000 0000000000110011 00000000f5000000 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 0, migratetype Unmovable, gfp_mask 0x52cc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP), pid 1, tgid 1 (swapper/0), ts 12683536009, free_ts 12683151344
 set_page_owner include/linux/page_owner.h:32 [inline]
 post_alloc_hook+0x240/0x2a0 mm/page_alloc.c:1850
 prep_new_page mm/page_alloc.c:1858 [inline]
 get_page_from_freelist+0x21e4/0x22c0 mm/page_alloc.c:3869
 __alloc_frozen_pages_noprof+0x181/0x370 mm/page_alloc.c:5159
 alloc_pages_mpol+0x232/0x4a0 mm/mempolicy.c:2416
 alloc_slab_page mm/slub.c:3023 [inline]
 allocate_slab+0x96/0x3a0 mm/slub.c:3196
 new_slab mm/slub.c:3250 [inline]
 ___slab_alloc+0xe94/0x1920 mm/slub.c:4626
 __slab_alloc+0x65/0x100 mm/slub.c:4745
 __slab_alloc_node mm/slub.c:4821 [inline]
 slab_alloc_node mm/slub.c:5232 [inline]
 kmem_cache_alloc_noprof+0x3f9/0x6e0 mm/slub.c:5251
 __kernfs_new_node+0xd7/0x7e0 fs/kernfs/dir.c:637
 kernfs_new_node+0x102/0x210 fs/kernfs/dir.c:713
 __kernfs_create_file+0x4b/0x2e0 fs/kernfs/file.c:1057
 sysfs_add_file_mode_ns+0x238/0x300 fs/sysfs/file.c:313
 create_files fs/sysfs/group.c:76 [inline]
 internal_create_group+0x66d/0x1110 fs/sysfs/group.c:183
 internal_create_groups fs/sysfs/group.c:223 [inline]
 sysfs_create_groups+0x59/0x120 fs/sysfs/group.c:249
 device_add_groups drivers/base/core.c:2836 [inline]
 device_add_attrs+0x1c4/0x5a0 drivers/base/core.c:2911
 device_add+0x496/0xb50 drivers/base/core.c:3643
page last free pid 1 tgid 1 stack trace:
 reset_page_owner include/linux/page_owner.h:25 [inline]
 free_pages_prepare mm/page_alloc.c:1394 [inline]
 __free_frozen_pages+0xbc4/0xd30 mm/page_alloc.c:2906
 __slab_free+0x2e7/0x390 mm/slub.c:5922
 qlink_free mm/kasan/quarantine.c:163 [inline]
 qlist_free_all+0x97/0x140 mm/kasan/quarantine.c:179
 kasan_quarantine_reduce+0x148/0x160 mm/kasan/quarantine.c:286
 __kasan_slab_alloc+0x22/0x80 mm/kasan/common.c:352
 kasan_slab_alloc include/linux/kasan.h:252 [inline]
 slab_post_alloc_hook mm/slub.c:4945 [inline]
 slab_alloc_node mm/slub.c:5244 [inline]
 __kmalloc_cache_noprof+0x36f/0x6f0 mm/slub.c:5718
 kmalloc_noprof include/linux/slab.h:957 [inline]
 kzalloc_noprof include/linux/slab.h:1094 [inline]
 device_private_init drivers/base/core.c:3534 [inline]
 device_add+0xbe/0xb50 drivers/base/core.c:3585
 usb_new_device+0xa39/0x16f0 drivers/usb/core/hub.c:2694
 register_root_hub+0x275/0x590 drivers/usb/core/hcd.c:994
 usb_add_hcd+0xba1/0x1050 drivers/usb/core/hcd.c:2993
 vhci_hcd_probe+0x1c1/0x380 drivers/usb/usbip/vhci_hcd.c:1377
 platform_probe+0xf9/0x190 drivers/base/platform.c:1405
 call_driver_probe drivers/base/dd.c:-1 [inline]
 really_probe+0x26d/0x9e0 drivers/base/dd.c:659
 __driver_probe_device+0x18c/0x2f0 drivers/base/dd.c:801
 driver_probe_device+0x4f/0x430 drivers/base/dd.c:831
 __device_attach_driver+0x2ce/0x530 drivers/base/dd.c:959

Memory state around the buggy address:
 ffff888100ae7580: fc fc fc fc 00 00 00 00 00 00 00 00 00 00 00 00
 ffff888100ae7600: 00 00 00 00 00 00 00 00 00 00 fc fc fc fc fc fc
>ffff888100ae7680: fc fc 00 00 00 00 00 00 00 00 00 00 00 00 00 00
                   ^
 ffff888100ae7700: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc
 ffff888100ae7780: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================


***

KASAN: slab-use-after-free Read in zpool_get_total_pages

tree:      linux-next
URL:       https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next
base:      8f7f8b1b3f4c613dd886f53f768f82816b41eaa3
arch:      amd64
compiler:  Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
config:    https://ci.syzbot.org/builds/6739e899-fcf7-4f71-b943-5ad0ca0ef8eb/config
syz repro: https://ci.syzbot.org/findings/bbd2b5a8-bad1-404c-8f0d-414451cc731a/syz_repro

==================================================================
BUG: KASAN: slab-use-after-free in instrument_atomic_read include/linux/instrumented.h:68 [inline]
BUG: KASAN: slab-use-after-free in atomic_long_read include/linux/atomic/atomic-instrumented.h:3188 [inline]
BUG: KASAN: slab-use-after-free in zs_get_total_pages mm/zsmalloc.c:1066 [inline]
BUG: KASAN: slab-use-after-free in zpool_get_total_pages+0x46/0x70 mm/zsmalloc.c:436
Read of size 8 at addr ffff88801b3033b0 by task syz.0.17/6006

CPU: 1 UID: 0 PID: 6006 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0x189/0x250 lib/dump_stack.c:120
 print_address_description mm/kasan/report.c:378 [inline]
 print_report+0xca/0x240 mm/kasan/report.c:482
 kasan_report+0x118/0x150 mm/kasan/report.c:595
 check_region_inline mm/kasan/generic.c:-1 [inline]
 kasan_check_range+0x2b0/0x2c0 mm/kasan/generic.c:200
 instrument_atomic_read include/linux/instrumented.h:68 [inline]
 atomic_long_read include/linux/atomic/atomic-instrumented.h:3188 [inline]
 zs_get_total_pages mm/zsmalloc.c:1066 [inline]
 zpool_get_total_pages+0x46/0x70 mm/zsmalloc.c:436
 zswap_total_pages+0xf6/0x1e0 mm/zswap.c:457
 zswap_check_limits mm/zswap.c:465 [inline]
 zswap_store+0x52f/0x1f40 mm/zswap.c:1521
 swap_writeout+0x710/0xd70 mm/page_io.c:275
 writeout mm/vmscan.c:662 [inline]
 pageout mm/vmscan.c:721 [inline]
 shrink_folio_list+0x3011/0x4c70 mm/vmscan.c:1453
 reclaim_folio_list+0xeb/0x500 mm/vmscan.c:2233
 reclaim_pages+0x2f4/0x520 mm/vmscan.c:2266
 madvise_cold_or_pageout_pte_range+0x1974/0x1d00 mm/madvise.c:565
 walk_pmd_range mm/pagewalk.c:130 [inline]
 walk_pud_range mm/pagewalk.c:224 [inline]
 walk_p4d_range mm/pagewalk.c:262 [inline]
 walk_pgd_range+0xfe9/0x1d40 mm/pagewalk.c:303
 __walk_page_range+0x14c/0x710 mm/pagewalk.c:410
 walk_page_range_vma+0x393/0x440 mm/pagewalk.c:705
 madvise_pageout_page_range mm/madvise.c:624 [inline]
 madvise_pageout mm/madvise.c:649 [inline]
 madvise_vma_behavior+0x311f/0x3a10 mm/madvise.c:1352
 madvise_walk_vmas+0x51c/0xa30 mm/madvise.c:1669
 madvise_do_behavior+0x38e/0x550 mm/madvise.c:1885
 do_madvise+0x1bc/0x270 mm/madvise.c:1978
 __do_sys_madvise mm/madvise.c:1987 [inline]
 __se_sys_madvise mm/madvise.c:1985 [inline]
 __x64_sys_madvise+0xa7/0xc0 mm/madvise.c:1985
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xfa/0xfa0 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f4f4118ec29
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f4f407fe038 EFLAGS: 00000246 ORIG_RAX: 000000000000001c
RAX: ffffffffffffffda RBX: 00007f4f413d5fa0 RCX: 00007f4f4118ec29
RDX: 0000000000000015 RSI: 7fffffffffffffff RDI: 0000200000000000
RBP: 00007f4f41211e41 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f4f413d6038 R14: 00007f4f413d5fa0 R15: 00007ffefd579118
 </TASK>

Allocated by task 1:
 kasan_save_stack mm/kasan/common.c:56 [inline]
 kasan_save_track+0x3e/0x80 mm/kasan/common.c:77
 unpoison_slab_object mm/kasan/common.c:342 [inline]
 __kasan_slab_alloc+0x6c/0x80 mm/kasan/common.c:368
 kasan_slab_alloc include/linux/kasan.h:252 [inline]
 slab_post_alloc_hook mm/slub.c:4945 [inline]
 slab_alloc_node mm/slub.c:5244 [inline]
 kmem_cache_alloc_noprof+0x367/0x6e0 mm/slub.c:5251
 acpi_ut_allocate_object_desc_dbg drivers/acpi/acpica/utobject.c:359 [inline]
 acpi_ut_create_internal_object_dbg+0xe6/0x470 drivers/acpi/acpica/utobject.c:69
 acpi_ds_create_operand+0x2d7/0x890 drivers/acpi/acpica/dsutils.c:617
 acpi_ds_create_operands+0x264/0x3f0 drivers/acpi/acpica/dsutils.c:707
 acpi_ds_exec_end_op+0x26b/0x1120 drivers/acpi/acpica/dswexec.c:385
 acpi_ps_parse_loop+0xc33/0x1ab0 drivers/acpi/acpica/psloop.c:525
 acpi_ps_parse_aml+0x22d/0x9b0 drivers/acpi/acpica/psparse.c:475
 acpi_ps_execute_method+0x58d/0x7c0 drivers/acpi/acpica/psxface.c:190
 acpi_ns_evaluate+0x5a6/0xa20 drivers/acpi/acpica/nseval.c:205
 acpi_evaluate_object+0x53f/0xa10 drivers/acpi/acpica/nsxfeval.c:354
 acpi_evaluate_integer+0xfc/0x270 drivers/acpi/utils.c:260
 acpi_bus_get_status_handle drivers/acpi/bus.c:82 [inline]
 acpi_bus_get_status+0x14a/0x380 drivers/acpi/bus.c:111
 acpi_scan_init_status drivers/acpi/scan.c:1863 [inline]
 acpi_add_single_object+0x391/0x1a20 drivers/acpi/scan.c:1896
 acpi_bus_check_add+0x349/0x820 drivers/acpi/scan.c:2179
 acpi_ns_walk_namespace+0x26b/0x690 drivers/acpi/acpica/nswalk.c:-1
 acpi_walk_namespace+0xe8/0x130 drivers/acpi/acpica/nsxfeval.c:606
 acpi_bus_scan+0xe8/0x4b0 drivers/acpi/scan.c:2593
 acpi_scan_init+0x1b0/0x550 drivers/acpi/scan.c:2746
 acpi_init+0x130/0x1f0 drivers/acpi/bus.c:1469
 do_one_initcall+0x236/0x820 init/main.c:1283
 do_initcall_level+0x104/0x190 init/main.c:1345
 do_initcalls+0x59/0xa0 init/main.c:1361
 kernel_init_freeable+0x334/0x4b0 init/main.c:1593
 kernel_init+0x1d/0x1d0 init/main.c:1483
 ret_from_fork+0x4bc/0x870 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245

Freed by task 1:
 kasan_save_stack mm/kasan/common.c:56 [inline]
 kasan_save_track+0x3e/0x80 mm/kasan/common.c:77
 __kasan_save_free_info+0x46/0x50 mm/kasan/generic.c:587
 kasan_save_free_info mm/kasan/kasan.h:406 [inline]
 poison_slab_object mm/kasan/common.c:252 [inline]
 __kasan_slab_free+0x5c/0x80 mm/kasan/common.c:284
 kasan_slab_free include/linux/kasan.h:234 [inline]
 slab_free_hook mm/slub.c:2507 [inline]
 slab_free mm/slub.c:6557 [inline]
 kmem_cache_free+0x19b/0x690 mm/slub.c:6668
 acpi_os_release_object+0x1d/0x30 drivers/acpi/osl.c:1644
 acpi_ut_update_object_reference+0x47f/0x710 drivers/acpi/acpica/utdelete.c:632
 acpi_ds_clear_operands+0xa9/0x1b0 drivers/acpi/acpica/dsutils.c:396
 acpi_ds_exec_end_op+0xbe9/0x1120 drivers/acpi/acpica/dswexec.c:442
 acpi_ps_parse_loop+0xc33/0x1ab0 drivers/acpi/acpica/psloop.c:525
 acpi_ps_parse_aml+0x22d/0x9b0 drivers/acpi/acpica/psparse.c:475
 acpi_ps_execute_method+0x58d/0x7c0 drivers/acpi/acpica/psxface.c:190
 acpi_ns_evaluate+0x5a6/0xa20 drivers/acpi/acpica/nseval.c:205
 acpi_evaluate_object+0x53f/0xa10 drivers/acpi/acpica/nsxfeval.c:354
 acpi_evaluate_integer+0xfc/0x270 drivers/acpi/utils.c:260
 acpi_bus_get_status_handle drivers/acpi/bus.c:82 [inline]
 acpi_bus_get_status+0x14a/0x380 drivers/acpi/bus.c:111
 acpi_scan_init_status drivers/acpi/scan.c:1863 [inline]
 acpi_add_single_object+0x391/0x1a20 drivers/acpi/scan.c:1896
 acpi_bus_check_add+0x349/0x820 drivers/acpi/scan.c:2179
 acpi_ns_walk_namespace+0x26b/0x690 drivers/acpi/acpica/nswalk.c:-1
 acpi_walk_namespace+0xe8/0x130 drivers/acpi/acpica/nsxfeval.c:606
 acpi_bus_scan+0xe8/0x4b0 drivers/acpi/scan.c:2593
 acpi_scan_init+0x1b0/0x550 drivers/acpi/scan.c:2746
 acpi_init+0x130/0x1f0 drivers/acpi/bus.c:1469
 do_one_initcall+0x236/0x820 init/main.c:1283
 do_initcall_level+0x104/0x190 init/main.c:1345
 do_initcalls+0x59/0xa0 init/main.c:1361
 kernel_init_freeable+0x334/0x4b0 init/main.c:1593
 kernel_init+0x1d/0x1d0 init/main.c:1483
 ret_from_fork+0x4bc/0x870 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245

The buggy address belongs to the object at ffff88801b3033a8
 which belongs to the cache Acpi-Operand of size 72
The buggy address is located 8 bytes inside of
 freed 72-byte region [ffff88801b3033a8, ffff88801b3033f0)

The buggy address belongs to the physical page:
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0xffff88801b303000 pfn:0x1b303
flags: 0xfff00000000200(workingset|node=0|zone=1|lastcpupid=0x7ff)
page_type: f5(slab)
raw: 00fff00000000200 ffff88801a894dc0 ffffea00006be4d0 ffffea00006bffd0
raw: ffff88801b303000 000000000027001c 00000000f5000000 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 0, migratetype Unmovable, gfp_mask 0x52cc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP), pid 1, tgid 1 (swapper/0), ts 2367926237, free_ts 0
 set_page_owner include/linux/page_owner.h:32 [inline]
 post_alloc_hook+0x240/0x2a0 mm/page_alloc.c:1850
 prep_new_page mm/page_alloc.c:1858 [inline]
 get_page_from_freelist+0x21e4/0x22c0 mm/page_alloc.c:3869
 __alloc_frozen_pages_noprof+0x181/0x370 mm/page_alloc.c:5159
 alloc_pages_mpol+0x232/0x4a0 mm/mempolicy.c:2416
 alloc_slab_page mm/slub.c:3023 [inline]
 allocate_slab+0x96/0x3a0 mm/slub.c:3196
 new_slab mm/slub.c:3250 [inline]
 ___slab_alloc+0xe94/0x1920 mm/slub.c:4626
 __slab_alloc+0x65/0x100 mm/slub.c:4745
 __slab_alloc_node mm/slub.c:4821 [inline]
 slab_alloc_node mm/slub.c:5232 [inline]
 kmem_cache_alloc_noprof+0x3f9/0x6e0 mm/slub.c:5251
 acpi_ut_allocate_object_desc_dbg drivers/acpi/acpica/utobject.c:359 [inline]
 acpi_ut_create_internal_object_dbg+0xe6/0x470 drivers/acpi/acpica/utobject.c:69
 acpi_ds_create_operand+0x2d7/0x890 drivers/acpi/acpica/dsutils.c:617
 acpi_ds_create_operands+0x264/0x3f0 drivers/acpi/acpica/dsutils.c:707
 acpi_ds_load2_end_op+0xa51/0xf30 drivers/acpi/acpica/dswload2.c:663
 acpi_ds_exec_end_op+0x67b/0x1120 drivers/acpi/acpica/dswexec.c:638
 acpi_ps_parse_loop+0xc33/0x1ab0 drivers/acpi/acpica/psloop.c:525
 acpi_ps_parse_aml+0x22d/0x9b0 drivers/acpi/acpica/psparse.c:475
 acpi_ps_execute_table+0x335/0x410 drivers/acpi/acpica/psxface.c:295
page_owner free stack trace missing

Memory state around the buggy address:
 ffff88801b303280: fb fb fb fb fb fb fb fc fc fc fc 00 00 00 00 00
 ffff88801b303300: 00 00 00 00 fc fc fc fc 00 00 00 00 00 00 00 00
>ffff88801b303380: 00 fc fc fc fc fa fb fb fb fb fb fb fb fb fc fc
                                     ^
 ffff88801b303400: fc fc 00 00 00 00 00 00 00 00 00 fc fc fc fc 00
 ffff88801b303480: 00 00 00 00 00 00 00 00 fc fc fc fc 00 00 00 00
==================================================================


***

KASAN: use-after-free Read in zpool_get_total_pages

tree:      linux-next
URL:       https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next
base:      8f7f8b1b3f4c613dd886f53f768f82816b41eaa3
arch:      amd64
compiler:  Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
config:    https://ci.syzbot.org/builds/6739e899-fcf7-4f71-b943-5ad0ca0ef8eb/config
syz repro: https://ci.syzbot.org/findings/6687daf2-8ec6-4f60-ab6b-b53425f8483b/syz_repro

==================================================================
BUG: KASAN: use-after-free in instrument_atomic_read include/linux/instrumented.h:68 [inline]
BUG: KASAN: use-after-free in atomic_long_read include/linux/atomic/atomic-instrumented.h:3188 [inline]
BUG: KASAN: use-after-free in zs_get_total_pages mm/zsmalloc.c:1066 [inline]
BUG: KASAN: use-after-free in zpool_get_total_pages+0x46/0x70 mm/zsmalloc.c:436
Read of size 8 at addr ffff88801af0a0d0 by task syz.1.18/6044

CPU: 0 UID: 0 PID: 6044 Comm: syz.1.18 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0x189/0x250 lib/dump_stack.c:120
 print_address_description mm/kasan/report.c:378 [inline]
 print_report+0xca/0x240 mm/kasan/report.c:482
 kasan_report+0x118/0x150 mm/kasan/report.c:595
 check_region_inline mm/kasan/generic.c:-1 [inline]
 kasan_check_range+0x2b0/0x2c0 mm/kasan/generic.c:200
 instrument_atomic_read include/linux/instrumented.h:68 [inline]
 atomic_long_read include/linux/atomic/atomic-instrumented.h:3188 [inline]
 zs_get_total_pages mm/zsmalloc.c:1066 [inline]
 zpool_get_total_pages+0x46/0x70 mm/zsmalloc.c:436
 zswap_total_pages+0xf6/0x1e0 mm/zswap.c:457
 zswap_check_limits mm/zswap.c:465 [inline]
 zswap_store+0x52f/0x1f40 mm/zswap.c:1521
 swap_writeout+0x710/0xd70 mm/page_io.c:275
 writeout mm/vmscan.c:662 [inline]
 pageout mm/vmscan.c:721 [inline]
 shrink_folio_list+0x3011/0x4c70 mm/vmscan.c:1453
 reclaim_folio_list+0xeb/0x500 mm/vmscan.c:2233
 reclaim_pages+0x454/0x520 mm/vmscan.c:2270
 madvise_cold_or_pageout_pte_range+0x1974/0x1d00 mm/madvise.c:565
 walk_pmd_range mm/pagewalk.c:130 [inline]
 walk_pud_range mm/pagewalk.c:224 [inline]
 walk_p4d_range mm/pagewalk.c:262 [inline]
 walk_pgd_range+0xfe9/0x1d40 mm/pagewalk.c:303
 __walk_page_range+0x14c/0x710 mm/pagewalk.c:410
 walk_page_range_vma+0x393/0x440 mm/pagewalk.c:705
 madvise_pageout_page_range mm/madvise.c:624 [inline]
 madvise_pageout mm/madvise.c:649 [inline]
 madvise_vma_behavior+0x311f/0x3a10 mm/madvise.c:1352
 madvise_walk_vmas+0x51c/0xa30 mm/madvise.c:1669
 madvise_do_behavior+0x38e/0x550 mm/madvise.c:1885
 do_madvise+0x1bc/0x270 mm/madvise.c:1978
 __do_sys_madvise mm/madvise.c:1987 [inline]
 __se_sys_madvise mm/madvise.c:1985 [inline]
 __x64_sys_madvise+0xa7/0xc0 mm/madvise.c:1985
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xfa/0xfa0 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fa5a1f8ec29
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fa5a2eb5038 EFLAGS: 00000246 ORIG_RAX: 000000000000001c
RAX: ffffffffffffffda RBX: 00007fa5a21d5fa0 RCX: 00007fa5a1f8ec29
RDX: 0000000000000015 RSI: 0000000000600000 RDI: 0000200000000000
RBP: 00007fa5a2011e41 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007fa5a21d6038 R14: 00007fa5a21d5fa0 R15: 00007ffc6cec8398
 </TASK>

The buggy address belongs to the physical page:
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0xffff88801af0afc0 pfn:0x1af0a
flags: 0xfff00000000000(node=0|zone=1|lastcpupid=0x7ff)
page_type: f0(buddy)
raw: 00fff00000000000 ffffea00008c38c8 ffffea0000811c88 0000000000000000
raw: ffff88801af0afc0 0000000000000000 00000000f0000000 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as freed
page last allocated via order 0, migratetype Unmovable, gfp_mask 0x2dc2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_ZERO|__GFP_NOWARN), pid 5864, tgid 5864 (syz-executor), ts 59367185001, free_ts 60188955452
 set_page_owner include/linux/page_owner.h:32 [inline]
 post_alloc_hook+0x240/0x2a0 mm/page_alloc.c:1850
 prep_new_page mm/page_alloc.c:1858 [inline]
 get_page_from_freelist+0x21e4/0x22c0 mm/page_alloc.c:3869
 __alloc_frozen_pages_noprof+0x181/0x370 mm/page_alloc.c:5159
 alloc_pages_mpol+0x232/0x4a0 mm/mempolicy.c:2416
 alloc_frozen_pages_noprof mm/mempolicy.c:2487 [inline]
 alloc_pages_noprof+0xa9/0x190 mm/mempolicy.c:2507
 vm_area_alloc_pages mm/vmalloc.c:3642 [inline]
 __vmalloc_area_node mm/vmalloc.c:3720 [inline]
 __vmalloc_node_range_noprof+0x97d/0x12f0 mm/vmalloc.c:3893
 vmalloc_user_noprof+0xad/0xf0 mm/vmalloc.c:4046
 kcov_ioctl+0x55/0x640 kernel/kcov.c:716
 vfs_ioctl fs/ioctl.c:51 [inline]
 __do_sys_ioctl fs/ioctl.c:597 [inline]
 __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:583
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xfa/0xfa0 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
page last free pid 5884 tgid 5884 stack trace:
 reset_page_owner include/linux/page_owner.h:25 [inline]
 free_pages_prepare mm/page_alloc.c:1394 [inline]
 __free_frozen_pages+0xbc4/0xd30 mm/page_alloc.c:2906
 vfree+0x25a/0x400 mm/vmalloc.c:3434
 kcov_put kernel/kcov.c:439 [inline]
 kcov_close+0x28/0x50 kernel/kcov.c:535
 __fput+0x44c/0xa70 fs/file_table.c:468
 task_work_run+0x1d4/0x260 kernel/task_work.c:227
 exit_task_work include/linux/task_work.h:40 [inline]
 do_exit+0x6b5/0x2300 kernel/exit.c:966
 do_group_exit+0x21c/0x2d0 kernel/exit.c:1107
 get_signal+0x1285/0x1340 kernel/signal.c:3034
 arch_do_signal_or_restart+0xa0/0x790 arch/x86/kernel/signal.c:337
 exit_to_user_mode_loop+0x72/0x130 kernel/entry/common.c:40
 exit_to_user_mode_prepare include/linux/irq-entry-common.h:225 [inline]
 syscall_exit_to_user_mode_work include/linux/entry-common.h:175 [inline]
 syscall_exit_to_user_mode include/linux/entry-common.h:210 [inline]
 do_syscall_64+0x2bd/0xfa0 arch/x86/entry/syscall_64.c:100
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Memory state around the buggy address:
 ffff88801af09f80: 05 fc fc fc 06 fc fc fc 06 fc fc fc 05 fc fc fc
 ffff88801af0a000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>ffff88801af0a080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
                                                 ^
 ffff88801af0a100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 ffff88801af0a180: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
==================================================================


***

If these findings have caused you to resend the series or submit a
separate fix, please add the following tag to your commit message:
  Tested-by: syzbot@syzkaller.appspotmail.com

---
This report is generated by a bot. It may contain errors.
syzbot ci engineers can be reached at syzkaller@googlegroups.com.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v6 2/2] rust: zpool: add abstraction for zpool drivers
  2025-09-23 10:27 ` [PATCH v6 2/2] rust: zpool: add abstraction for zpool drivers Vitaly Wool
@ 2025-09-23 21:49   ` Benno Lossin
  2025-09-24  9:23     ` Vitaly Wool
  0 siblings, 1 reply; 11+ messages in thread
From: Benno Lossin @ 2025-09-23 21:49 UTC (permalink / raw)
  To: Vitaly Wool, linux-mm, rust-for-linux
  Cc: Johannes Weiner, Yosry Ahmed, Nhat Pham, Chengming Zhou,
	Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Minchan Kim, Sergey Senozhatsky,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
	Danilo Krummrich, Greg Kroah-Hartman, linux-kernel

On Tue Sep 23, 2025 at 12:27 PM CEST, Vitaly Wool wrote:
> diff --git a/rust/kernel/zpool.rs b/rust/kernel/zpool.rs
> new file mode 100644
> index 000000000000..53a0dc0607e6
> --- /dev/null
> +++ b/rust/kernel/zpool.rs
> @@ -0,0 +1,366 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +//! Implementation of Rust interface towards zpool API.
> +
> +use crate::{error::Result, kernel::alloc::Flags, str::CString, types::ForeignOwnable};
> +use core::ptr::NonNull;
> +use kernel::alloc::NumaNode;
> +
> +/// The Rust representation of zpool handle.

Would be great to explain what this means, what's the handle used for?

> +pub struct ZpoolHandle(usize);
> +
> +impl ZpoolHandle {
> +    /// Create `ZpoolHandle` from the raw representation.
> +    pub fn from_raw(h: usize) -> Self {
> +        Self(h)
> +    }
> +
> +    /// Get the raw representation of the handle.
> +    pub fn as_raw(self) -> usize {
> +        self.0
> +    }

Hmm this seems a bit weird, because now users of Zpools can manufacture
their own handles? Not sure as to how we could still allow Zpool
implementers to create these while preventing other users from doing
creating them though...

> +}
> +
> +/// Zpool API.
> +///
> +/// The [`ZpoolDriver`] trait serves as an interface for Zpool drivers implemented in Rust.
> +/// Such drivers implement memory storage pools in accordance with the zpool API.
> +///
> +/// # Example
> +///
> +/// A zpool driver implementation which uses KVec of 2**n sizes, n = 6, 7, ..., PAGE_SHIFT.
> +/// Every zpool object is packed into a KVec that is sufficiently large, and n (the
> +/// denominator) is saved in the least significant bits of the handle, which is guaranteed
> +/// to be at least 2**6 aligned by kmalloc.
> +///
> +/// ```
> +/// use core::ptr::{NonNull, copy_nonoverlapping};
> +/// use core::sync::atomic::{AtomicU64, Ordering};
> +/// use kernel::alloc::{Flags, flags, KBox, KVec, NumaNode};
> +/// use kernel::page::PAGE_SHIFT;
> +/// use kernel::prelude::EINVAL;
> +/// use kernel::str::CString;
> +/// use kernel::zpool::*;
> +///
> +/// struct MyZpool {
> +///     name: CString,
> +///     bytes_used: AtomicU64,
> +/// }
> +///
> +/// struct MyZpoolDriver;
> +///
> +/// impl ZpoolDriver for MyZpoolDriver {
> +///     type Pool = KBox<MyZpool>;
> +///
> +///     fn create(name: CString, gfp: Flags) -> Result<KBox<MyZpool>> {
> +///         let my_pool = MyZpool { name, bytes_used: AtomicU64::new(0) };
> +///         let pool = KBox::new(my_pool, gfp)?;
> +///
> +///         pr_debug!("Pool {:?} created\n", pool.name);
> +///         Ok(pool)
> +///     }
> +///
> +///     fn malloc(pool: &MyZpool, size: usize, _gfp: Flags, _nid: NumaNode) -> Result<ZpoolHandle> {
> +///         let pow = size.next_power_of_two().trailing_zeros().max(6);
> +///         match pow {
> +///             0 => Err(EINVAL),
> +///             m if m > PAGE_SHIFT as u32 => Err(ENOSPC),
> +///             _ => {
> +///                 let vec = KVec::<u8>::with_capacity(1 << pow, GFP_KERNEL)?;
> +///                 let (ptr, _len, _cap) = vec.into_raw_parts();
> +///
> +///                 // We assume that kmalloc-64, kmalloc-128 etc. kmem caches will be used for
> +///                 // our allocations, so it's actually `1 << pow` bytes that we have consumed.
> +///                 pool.bytes_used.fetch_add(1 << pow, Ordering::Relaxed);
> +///
> +///                 // `kmalloc` guarantees that an allocation of size x*2^n is 2^n aligned.
> +///                 // Therefore the 6 lower bits are zeros and we can use these to store `pow`.
> +///                 Ok(ZpoolHandle::from_raw(ptr as usize | (pow as usize - 6)))
> +///             }
> +///         }
> +///     }
> +///
> +///     unsafe fn free(pool: &MyZpool, handle: ZpoolHandle) {
> +///         let h = handle.as_raw();
> +///         let n = (h & 0x3F) + 6;
> +///         let uptr = h & !0x3F;
> +///
> +///         // SAFETY:
> +///         // - we derive `uptr` from handle by zeroing 6 lower bits where we store the power
> +///         //   denominator for the vector capacity. As noted above, the result will be exactly the
> +///         //   pointer to the area allocated by `KVec`. Thus, uptr is a valid pointer pointing to
> +///         //   the vector allocated by `alloc` function above.
> +///         // - 1 << n == capacity and is coming from the first 6 bits of handle.
> +///         let vec = unsafe { KVec::<u8>::from_raw_parts(uptr as *mut u8, 0, 1 << n) };
> +///         drop(vec);
> +///         pool.bytes_used.fetch_sub(1 << n, Ordering::Relaxed);
> +///     }
> +///
> +///     unsafe fn read_begin(_pool: &MyZpool, handle: ZpoolHandle) -> NonNull<u8> {
> +///         let uptr = handle.as_raw() & !0x3F;
> +///         // SAFETY:
> +///         // - we derive `uptr` from handle by zeroing 6 lower bits where we store the power
> +///         //   denominator for the vector capacity. As noted above, the result will be exactly the
> +///         //   pointer to the area allocated by `KVec`. Thus, uptr is a valid pointer pointing to
> +///         //   the vector allocated by `alloc` function above.
> +///         unsafe { NonNull::new_unchecked(uptr as *mut u8) }
> +///     }
> +///
> +///     unsafe fn read_end(_pool: &MyZpool, _handle: ZpoolHandle, _handle_mem: NonNull<u8>) {}
> +///
> +///     unsafe fn write(_p: &MyZpool, handle: ZpoolHandle, h_mem: NonNull<u8>, mem_len: usize) {
> +///         let uptr = handle.as_raw() & !0x3F;
> +///         // SAFETY:
> +///         // - `h_mem` is a valid non-null pointer provided by zswap.
> +///         // - `uptr` is derived from handle by zeroing 6 lower bits where we store the power
> +///         //   denominator for the vector capacity. As noted above, the result will be exactly the
> +///         //   pointer to the area allocated by `KVec`. Thus, uptr is a valid pointer pointing to
> +///         //   the vector allocated by `alloc` function above.
> +///         unsafe {
> +///             copy_nonoverlapping(h_mem.as_ptr().cast(), uptr as *mut c_void, mem_len)
> +///         };
> +///     }
> +///
> +///     fn total_pages(pool: &MyZpool) -> u64 {
> +///         pool.bytes_used.load(Ordering::Relaxed) >> PAGE_SHIFT
> +///     }
> +/// }
> +///
> +/// // Uncomment this for compile time registration (disabled to avoid build error when KUNIT
> +/// // tests and zsmalloc are enabled):
> +/// // kernel::DECLARE_ZPOOL!(MyZpoolDriver);
> +/// ```
> +///
> +pub trait ZpoolDriver {
> +    /// Opaque Rust representation of `struct zpool`.
> +    type Pool: ForeignOwnable;

Also what happened to my comment on v4 of this patchset?

https://lore.kernel.org/all/DCLK1YG1L5TZ.1VMGX131LII9V@kernel.org:

>> It can indeed but then the ZpoolDriver trait will have to be extended 
>> with functions like into_raw() and from_raw(), because zpool expects 
>> *mut c_void, so on the Adapter side it will look like
>>
>>      extern "C" fn create_(name: *const c_uchar, gfp: u32) -> *mut c_void {
>>          // SAFETY: the memory pointed to by name is guaranteed by zpool 
>> to be a valid string
>>          let pool = unsafe { T::create(CStr::from_char_ptr(name), 
>> Flags::from_raw(gfp)) };
>>          match pool {
>>              Err(_) => null_mut(),
>>              Ok(p) => T::into_raw(p).cast(),
>>          }
>>      }
>>
>> The question is, why does this make it better?
>
> No, thanks for sharing that function. Then the question becomes, do you
> really need `ForeignOwnable`? Or is `KBox` enough? Do you really want
> people to use `Arc<MyZPool>`? Because `BorrowedMut` of `Arc` is the same
> as it's `Borrowed` variant (it's read-only after all).
>
> If you can get away with just `Box` (you might want people to choose
> their allocator, which is fine IMO), then I'd do so.

I still think that if you can use `Box<Self>` the trait is going to be
much easier to understand.

> +
> +    /// Create a pool.
> +    fn create(name: CString, gfp: Flags) -> Result<Self::Pool>;
> +
> +    /// Allocate an object of `size` bytes from `pool`, with the allocation flags `gfp` and
> +    /// preferred NUMA node `nid`. If the allocation is successful, an opaque handle is returned.
> +    fn malloc(
> +        pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>,
> +        size: usize,
> +        gfp: Flags,
> +        nid: NumaNode,
> +    ) -> Result<ZpoolHandle>;
> +
> +    /// Free an object previously allocated from the `pool`, represented by `handle`.
> +    ///
> +    /// # Safety
> +    ///
> +    /// - `handle` must be a valid handle previously returned by `malloc`.
> +    /// - `handle` must not be used any more after the call to `free`.
> +    unsafe fn free(pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>, handle: ZpoolHandle);
> +
> +    /// Make all the necessary preparations for the caller to be able to read from the object
> +    /// represented by `handle` and return a valid pointer to that object's memory to be read.
> +    ///
> +    /// # Safety
> +    ///
> +    /// - `handle` must be a valid handle previously returned by `malloc`.
> +    /// - `read_end` with the same `handle` must be called for each `read_begin`.
> +    unsafe fn read_begin(
> +        pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>,
> +        handle: ZpoolHandle,
> +    ) -> NonNull<u8>;
> +
> +    /// Finish reading from a previously allocated `handle`. `handle_mem` must be the pointer
> +    /// previously returned by `read_begin`.
> +    ///
> +    /// # Safety
> +    ///
> +    /// - `handle` must be a valid handle previously returned by `malloc`.
> +    /// - `handle_mem` must be the pointer previously returned by `read_begin`.
> +    unsafe fn read_end(
> +        pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>,
> +        handle: ZpoolHandle,
> +        handle_mem: NonNull<u8>,
> +    );
> +
> +    /// Write to the object represented by a previously allocated `handle`. `handle_mem` points
> +    /// to the memory to copy data from, and `mem_len` defines the length of the data block to
> +    /// be copied.
> +    ///
> +    /// # Safety
> +    ///
> +    /// - `handle` must be a valid handle previously returned by `malloc`.
> +    /// - `handle_mem` must be a valid pointer into the allocated memory aread represented by
> +    ///   `handle`.
> +    /// - `handle_mem + mem_len - 1` must not point outside the allocated memory area.
> +    unsafe fn write(
> +        pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>,
> +        handle: ZpoolHandle,
> +        handle_mem: NonNull<u8>,
> +        mem_len: usize,
> +    );
> +
> +    /// Get the number of pages used by the `pool`.
> +    fn total_pages(pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>) -> u64;
> +}
> +
> +/// Provide a zpool allocator to zswap
> +#[macro_export]
> +macro_rules! DECLARE_ZPOOL {

Why all caps and not snake case?

> +    ($t: ident) => {
> +        use core::ptr::null_mut;
> +        use kernel::error::from_result;
> +        use kernel::types::ForeignOwnable;

Macros shouldn't have `use` statements in a non-local area (so inside
`const` bodies and modules is fine).

> +
> +        const __LOG_PREFIX: &[u8] = b"zpool_rust\0";

This seems wrong. Why do you need to generate this? Shouldn't the user
still invoke `module!` or a similar macro?

> +
> +        fn handle_from_result<T, F>(f: F) -> T
> +        where
> +            T: From<usize>,
> +            F: FnOnce() -> Result<T>,
> +        {
> +            match f() {
> +                Ok(v) => v,
> +                Err(e) => T::from(0),
> +            }
> +        }

Why is this function inside the macro?

> +
> +        /// Create a pool.
> +        #[no_mangle]
> +        pub unsafe extern "C" fn zpool_create_pool(name: *const c_uchar) -> *mut c_void {

Missing safety docs.

> +            match (|| -> Result<<$t as ZpoolDriver>::Pool> {
> +                // SAFETY: the memory pointed to by name is guaranteed by `zpool` to be a valid
> +                // string.
> +                let name_r = unsafe { CStr::from_char_ptr(name).to_cstring() }?;
> +                $t::create(name_r, flags::GFP_KERNEL)
> +            })() {
> +                Err(_) => null_mut(),
> +                Ok(pool) => <$t as ZpoolDriver>::Pool::into_foreign(pool),
> +            }
> +        }
> +
> +        /// Destroy the pool.
> +        #[no_mangle]
> +        pub unsafe extern "C" fn zpool_destroy_pool(pool: *mut c_void) {
> +            // SAFETY: The pointer originates from an `into_foreign` call.
> +            drop(unsafe { <$t as ZpoolDriver>::Pool::from_foreign(pool) })
> +        }

Have you tried to use the same pattern that we use for the many
different `Registration` types in the kernel? For example take a look at
`Adapter<T>` from `rust/kernel/net/phy.rs`. That way you don't need a
macro and can make the registration a function that takes a generic `T:
ZpoolDriver`.

---
Cheers,
Benno

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [syzbot ci] Re: rust: zpool: add API for C and Rust
  2025-09-23 16:50 ` [syzbot ci] Re: rust: zpool: add API for C and Rust syzbot ci
@ 2025-09-23 21:59   ` Johannes Weiner
  2025-09-24  5:46     ` Vitaly Wool
  0 siblings, 1 reply; 11+ messages in thread
From: Johannes Weiner @ 2025-09-23 21:59 UTC (permalink / raw)
  To: syzbot ci
  Cc: a.hindborg, akpm, alex.gaynor, aliceryhl, bjorn3_gh, boqun.feng,
	chengming.zhou, dakr, david, gary, gregkh, liam.howlett,
	linux-kernel, linux-mm, lorenzo.stoakes, lossin, mhocko, minchan,
	nphamcs, ojeda, rppt, rust-for-linux, senozhatsky, surenb,
	tmgross, vbabka, vitaly.wool, yosry.ahmed, syzbot, syzkaller-bugs

On Tue, Sep 23, 2025 at 09:50:10AM -0700, syzbot ci wrote:
> syzbot ci has tested the following series
> 
> [v6] rust: zpool: add API for C and Rust
> https://lore.kernel.org/all/20250923102547.2545992-1-vitaly.wool@konsulko.se
> * [PATCH v6 1/2] mm: reinstate zpool as a thin API
> * [PATCH v6 2/2] rust: zpool: add abstraction for zpool drivers
> 
> and found the following issues:
> * BUG: unable to handle kernel NULL pointer dereference in zswap_store
> * KASAN: slab-out-of-bounds Read in zpool_get_total_pages
> * KASAN: slab-out-of-bounds Read in zswap_store
> * KASAN: slab-use-after-free Read in zpool_get_total_pages
> * KASAN: use-after-free Read in zpool_get_total_pages
> 
> Full report is available here:
> https://ci.syzbot.org/series/e8b22352-ae56-4d7c-9113-75573acf2b64
> 
> ***
> 
> BUG: unable to handle kernel NULL pointer dereference in zswap_store

struct zpool {
	void *pool;
};

struct zpool *zpool_create_pool(const char *name) \
{ \
	return (struct zpool *) prefix ## _create_pool(name); \
} \

u64 zpool_get_total_pages(struct zpool *zpool) \
{ \
	return prefix ## _get_total_pages(zpool->pool); \
}

You create the zpool by simply casting the backend pool, but then you
deref it twice as if it were an actual container for the backend pool.

I'm guessing you didn't test this even superficially?

This also still proposes an API with no in-kernel user.

NAK

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [syzbot ci] Re: rust: zpool: add API for C and Rust
  2025-09-23 21:59   ` Johannes Weiner
@ 2025-09-24  5:46     ` Vitaly Wool
  2025-09-24 17:38       ` Nhat Pham
  0 siblings, 1 reply; 11+ messages in thread
From: Vitaly Wool @ 2025-09-24  5:46 UTC (permalink / raw)
  To: Johannes Weiner, syzbot ci
  Cc: a.hindborg, akpm, alex.gaynor, aliceryhl, bjorn3_gh, boqun.feng,
	chengming.zhou, dakr, david, gary, gregkh, liam.howlett,
	linux-kernel, linux-mm, lorenzo.stoakes, lossin, mhocko, minchan,
	nphamcs, ojeda, rppt, rust-for-linux, senozhatsky, surenb,
	tmgross, vbabka, yosry.ahmed, syzbot, syzkaller-bugs



On 9/23/25 23:59, Johannes Weiner wrote:
> On Tue, Sep 23, 2025 at 09:50:10AM -0700, syzbot ci wrote:
>> syzbot ci has tested the following series
>>
>> [v6] rust: zpool: add API for C and Rust
>> https://lore.kernel.org/all/20250923102547.2545992-1-vitaly.wool@konsulko.se
>> * [PATCH v6 1/2] mm: reinstate zpool as a thin API
>> * [PATCH v6 2/2] rust: zpool: add abstraction for zpool drivers
>>
>> and found the following issues:
>> * BUG: unable to handle kernel NULL pointer dereference in zswap_store
>> * KASAN: slab-out-of-bounds Read in zpool_get_total_pages
>> * KASAN: slab-out-of-bounds Read in zswap_store
>> * KASAN: slab-use-after-free Read in zpool_get_total_pages
>> * KASAN: use-after-free Read in zpool_get_total_pages
>>
>> Full report is available here:
>> https://ci.syzbot.org/series/e8b22352-ae56-4d7c-9113-75573acf2b64
>>
>> ***
>>
>> BUG: unable to handle kernel NULL pointer dereference in zswap_store
> 
> struct zpool {
> 	void *pool;
> };
> 
> struct zpool *zpool_create_pool(const char *name) \
> { \
> 	return (struct zpool *) prefix ## _create_pool(name); \
> } \
> 
> u64 zpool_get_total_pages(struct zpool *zpool) \
> { \
> 	return prefix ## _get_total_pages(zpool->pool); \
> }
> 
> You create the zpool by simply casting the backend pool, but then you
> deref it twice as if it were an actual container for the backend pool.
> 
> I'm guessing you didn't test this even superficially?

LOL, no, forgot to run git commit --amend so came up with a wrong version.

The Rust version is correct though.

> This also still proposes an API with no in-kernel user.

That's not correct, zsmalloc is the user.

~Vitaly

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v6 2/2] rust: zpool: add abstraction for zpool drivers
  2025-09-23 21:49   ` Benno Lossin
@ 2025-09-24  9:23     ` Vitaly Wool
  2025-09-24 17:32       ` Benno Lossin
  0 siblings, 1 reply; 11+ messages in thread
From: Vitaly Wool @ 2025-09-24  9:23 UTC (permalink / raw)
  To: Benno Lossin, linux-mm, rust-for-linux
  Cc: Johannes Weiner, Yosry Ahmed, Nhat Pham, Chengming Zhou,
	Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Minchan Kim, Sergey Senozhatsky,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
	Danilo Krummrich, Greg Kroah-Hartman, linux-kernel



On 9/23/25 23:49, Benno Lossin wrote:
> On Tue Sep 23, 2025 at 12:27 PM CEST, Vitaly Wool wrote:
>> diff --git a/rust/kernel/zpool.rs b/rust/kernel/zpool.rs
>> new file mode 100644
>> index 000000000000..53a0dc0607e6
>> --- /dev/null
>> +++ b/rust/kernel/zpool.rs
>> @@ -0,0 +1,366 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +
>> +//! Implementation of Rust interface towards zpool API.
>> +
>> +use crate::{error::Result, kernel::alloc::Flags, str::CString, types::ForeignOwnable};
>> +use core::ptr::NonNull;
>> +use kernel::alloc::NumaNode;
>> +
>> +/// The Rust representation of zpool handle.
> 
> Would be great to explain what this means, what's the handle used for?

Sorry, do you mean explaining it here or in the code?

>> +pub struct ZpoolHandle(usize);
>> +
>> +impl ZpoolHandle {
>> +    /// Create `ZpoolHandle` from the raw representation.
>> +    pub fn from_raw(h: usize) -> Self {
>> +        Self(h)
>> +    }
>> +
>> +    /// Get the raw representation of the handle.
>> +    pub fn as_raw(self) -> usize {
>> +        self.0
>> +    }
> 
> Hmm this seems a bit weird, because now users of Zpools can manufacture
> their own handles? Not sure as to how we could still allow Zpool
> implementers to create these while preventing other users from doing
> creating them though...

This is a good question indeed. The thing is, an allocation backend 
(these implementing zpool api) is to provide an opaque handle which is 
usize, and since it has the best knowledge how to compose it, we need to 
let it do that. OTOH I am not too happy with this straightforward 
approach (from_raw()/as_raw()) either.

Would making ZpoolHandle opaque here and defining a trait that a backend 
must implement for ZpoolHandle work better? The example backend would 
then do something like

struct MyZpoolHandle {
     ptr: *mut u8,
     pow: usize,
}
type ZpoolHandle = MyZpoolHandle;

trait AsRawHandle {
     fn as_raw_handle(&self) -> usize;
}

impl AsRawHandle for ZpoolHandle {
     fn as_raw_handle(&self) -> usize {
         (self.ptr as usize) | self.pow
     }
}

Would that make sense?

>> +
>> +/// Zpool API.
>> +///
>> +/// The [`ZpoolDriver`] trait serves as an interface for Zpool drivers implemented in Rust.
>> +/// Such drivers implement memory storage pools in accordance with the zpool API.
>> +///
>> +/// # Example
>> +///
>> +/// A zpool driver implementation which uses KVec of 2**n sizes, n = 6, 7, ..., PAGE_SHIFT.
>> +/// Every zpool object is packed into a KVec that is sufficiently large, and n (the
>> +/// denominator) is saved in the least significant bits of the handle, which is guaranteed
>> +/// to be at least 2**6 aligned by kmalloc.
>> +///
>> +/// ```
>> +/// use core::ptr::{NonNull, copy_nonoverlapping};
>> +/// use core::sync::atomic::{AtomicU64, Ordering};
>> +/// use kernel::alloc::{Flags, flags, KBox, KVec, NumaNode};
>> +/// use kernel::page::PAGE_SHIFT;
>> +/// use kernel::prelude::EINVAL;
>> +/// use kernel::str::CString;
>> +/// use kernel::zpool::*;
>> +///
>> +/// struct MyZpool {
>> +///     name: CString,
>> +///     bytes_used: AtomicU64,
>> +/// }
>> +///
>> +/// struct MyZpoolDriver;
>> +///
>> +/// impl ZpoolDriver for MyZpoolDriver {
>> +///     type Pool = KBox<MyZpool>;
>> +///
>> +///     fn create(name: CString, gfp: Flags) -> Result<KBox<MyZpool>> {
>> +///         let my_pool = MyZpool { name, bytes_used: AtomicU64::new(0) };
>> +///         let pool = KBox::new(my_pool, gfp)?;
>> +///
>> +///         pr_debug!("Pool {:?} created\n", pool.name);
>> +///         Ok(pool)
>> +///     }
>> +///
>> +///     fn malloc(pool: &MyZpool, size: usize, _gfp: Flags, _nid: NumaNode) -> Result<ZpoolHandle> {
>> +///         let pow = size.next_power_of_two().trailing_zeros().max(6);
>> +///         match pow {
>> +///             0 => Err(EINVAL),
>> +///             m if m > PAGE_SHIFT as u32 => Err(ENOSPC),
>> +///             _ => {
>> +///                 let vec = KVec::<u8>::with_capacity(1 << pow, GFP_KERNEL)?;
>> +///                 let (ptr, _len, _cap) = vec.into_raw_parts();
>> +///
>> +///                 // We assume that kmalloc-64, kmalloc-128 etc. kmem caches will be used for
>> +///                 // our allocations, so it's actually `1 << pow` bytes that we have consumed.
>> +///                 pool.bytes_used.fetch_add(1 << pow, Ordering::Relaxed);
>> +///
>> +///                 // `kmalloc` guarantees that an allocation of size x*2^n is 2^n aligned.
>> +///                 // Therefore the 6 lower bits are zeros and we can use these to store `pow`.
>> +///                 Ok(ZpoolHandle::from_raw(ptr as usize | (pow as usize - 6)))
>> +///             }
>> +///         }
>> +///     }
>> +///
>> +///     unsafe fn free(pool: &MyZpool, handle: ZpoolHandle) {
>> +///         let h = handle.as_raw();
>> +///         let n = (h & 0x3F) + 6;
>> +///         let uptr = h & !0x3F;
>> +///
>> +///         // SAFETY:
>> +///         // - we derive `uptr` from handle by zeroing 6 lower bits where we store the power
>> +///         //   denominator for the vector capacity. As noted above, the result will be exactly the
>> +///         //   pointer to the area allocated by `KVec`. Thus, uptr is a valid pointer pointing to
>> +///         //   the vector allocated by `alloc` function above.
>> +///         // - 1 << n == capacity and is coming from the first 6 bits of handle.
>> +///         let vec = unsafe { KVec::<u8>::from_raw_parts(uptr as *mut u8, 0, 1 << n) };
>> +///         drop(vec);
>> +///         pool.bytes_used.fetch_sub(1 << n, Ordering::Relaxed);
>> +///     }
>> +///
>> +///     unsafe fn read_begin(_pool: &MyZpool, handle: ZpoolHandle) -> NonNull<u8> {
>> +///         let uptr = handle.as_raw() & !0x3F;
>> +///         // SAFETY:
>> +///         // - we derive `uptr` from handle by zeroing 6 lower bits where we store the power
>> +///         //   denominator for the vector capacity. As noted above, the result will be exactly the
>> +///         //   pointer to the area allocated by `KVec`. Thus, uptr is a valid pointer pointing to
>> +///         //   the vector allocated by `alloc` function above.
>> +///         unsafe { NonNull::new_unchecked(uptr as *mut u8) }
>> +///     }
>> +///
>> +///     unsafe fn read_end(_pool: &MyZpool, _handle: ZpoolHandle, _handle_mem: NonNull<u8>) {}
>> +///
>> +///     unsafe fn write(_p: &MyZpool, handle: ZpoolHandle, h_mem: NonNull<u8>, mem_len: usize) {
>> +///         let uptr = handle.as_raw() & !0x3F;
>> +///         // SAFETY:
>> +///         // - `h_mem` is a valid non-null pointer provided by zswap.
>> +///         // - `uptr` is derived from handle by zeroing 6 lower bits where we store the power
>> +///         //   denominator for the vector capacity. As noted above, the result will be exactly the
>> +///         //   pointer to the area allocated by `KVec`. Thus, uptr is a valid pointer pointing to
>> +///         //   the vector allocated by `alloc` function above.
>> +///         unsafe {
>> +///             copy_nonoverlapping(h_mem.as_ptr().cast(), uptr as *mut c_void, mem_len)
>> +///         };
>> +///     }
>> +///
>> +///     fn total_pages(pool: &MyZpool) -> u64 {
>> +///         pool.bytes_used.load(Ordering::Relaxed) >> PAGE_SHIFT
>> +///     }
>> +/// }
>> +///
>> +/// // Uncomment this for compile time registration (disabled to avoid build error when KUNIT
>> +/// // tests and zsmalloc are enabled):
>> +/// // kernel::DECLARE_ZPOOL!(MyZpoolDriver);
>> +/// ```
>> +///
>> +pub trait ZpoolDriver {
>> +    /// Opaque Rust representation of `struct zpool`.
>> +    type Pool: ForeignOwnable;
> 
> Also what happened to my comment on v4 of this patchset?
> 
> https://lore.kernel.org/all/DCLK1YG1L5TZ.1VMGX131LII9V@kernel.org:
> 
>>> It can indeed but then the ZpoolDriver trait will have to be extended
>>> with functions like into_raw() and from_raw(), because zpool expects
>>> *mut c_void, so on the Adapter side it will look like
>>>
>>>       extern "C" fn create_(name: *const c_uchar, gfp: u32) -> *mut c_void {
>>>           // SAFETY: the memory pointed to by name is guaranteed by zpool
>>> to be a valid string
>>>           let pool = unsafe { T::create(CStr::from_char_ptr(name),
>>> Flags::from_raw(gfp)) };
>>>           match pool {
>>>               Err(_) => null_mut(),
>>>               Ok(p) => T::into_raw(p).cast(),
>>>           }
>>>       }
>>>
>>> The question is, why does this make it better?
>>
>> No, thanks for sharing that function. Then the question becomes, do you
>> really need `ForeignOwnable`? Or is `KBox` enough? Do you really want
>> people to use `Arc<MyZPool>`? Because `BorrowedMut` of `Arc` is the same
>> as it's `Borrowed` variant (it's read-only after all).
>>
>> If you can get away with just `Box` (you might want people to choose
>> their allocator, which is fine IMO), then I'd do so.
> 
> I still think that if you can use `Box<Self>` the trait is going to be
> much easier to understand.

Okay, thanks, I'll work on that.

>> +
>> +    /// Create a pool.
>> +    fn create(name: CString, gfp: Flags) -> Result<Self::Pool>;
>> +
>> +    /// Allocate an object of `size` bytes from `pool`, with the allocation flags `gfp` and
>> +    /// preferred NUMA node `nid`. If the allocation is successful, an opaque handle is returned.
>> +    fn malloc(
>> +        pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>,
>> +        size: usize,
>> +        gfp: Flags,
>> +        nid: NumaNode,
>> +    ) -> Result<ZpoolHandle>;
>> +
>> +    /// Free an object previously allocated from the `pool`, represented by `handle`.
>> +    ///
>> +    /// # Safety
>> +    ///
>> +    /// - `handle` must be a valid handle previously returned by `malloc`.
>> +    /// - `handle` must not be used any more after the call to `free`.
>> +    unsafe fn free(pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>, handle: ZpoolHandle);
>> +
>> +    /// Make all the necessary preparations for the caller to be able to read from the object
>> +    /// represented by `handle` and return a valid pointer to that object's memory to be read.
>> +    ///
>> +    /// # Safety
>> +    ///
>> +    /// - `handle` must be a valid handle previously returned by `malloc`.
>> +    /// - `read_end` with the same `handle` must be called for each `read_begin`.
>> +    unsafe fn read_begin(
>> +        pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>,
>> +        handle: ZpoolHandle,
>> +    ) -> NonNull<u8>;
>> +
>> +    /// Finish reading from a previously allocated `handle`. `handle_mem` must be the pointer
>> +    /// previously returned by `read_begin`.
>> +    ///
>> +    /// # Safety
>> +    ///
>> +    /// - `handle` must be a valid handle previously returned by `malloc`.
>> +    /// - `handle_mem` must be the pointer previously returned by `read_begin`.
>> +    unsafe fn read_end(
>> +        pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>,
>> +        handle: ZpoolHandle,
>> +        handle_mem: NonNull<u8>,
>> +    );
>> +
>> +    /// Write to the object represented by a previously allocated `handle`. `handle_mem` points
>> +    /// to the memory to copy data from, and `mem_len` defines the length of the data block to
>> +    /// be copied.
>> +    ///
>> +    /// # Safety
>> +    ///
>> +    /// - `handle` must be a valid handle previously returned by `malloc`.
>> +    /// - `handle_mem` must be a valid pointer into the allocated memory aread represented by
>> +    ///   `handle`.
>> +    /// - `handle_mem + mem_len - 1` must not point outside the allocated memory area.
>> +    unsafe fn write(
>> +        pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>,
>> +        handle: ZpoolHandle,
>> +        handle_mem: NonNull<u8>,
>> +        mem_len: usize,
>> +    );
>> +
>> +    /// Get the number of pages used by the `pool`.
>> +    fn total_pages(pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>) -> u64;
>> +}
>> +
>> +/// Provide a zpool allocator to zswap
>> +#[macro_export]
>> +macro_rules! DECLARE_ZPOOL {
> 
> Why all caps and not snake case?

C style, sorry :) Will fix.

>> +    ($t: ident) => {
>> +        use core::ptr::null_mut;
>> +        use kernel::error::from_result;
>> +        use kernel::types::ForeignOwnable;
> 
> Macros shouldn't have `use` statements in a non-local area (so inside
> `const` bodies and modules is fine).
> 
>> +
>> +        const __LOG_PREFIX: &[u8] = b"zpool_rust\0";
> 
> This seems wrong. Why do you need to generate this? Shouldn't the user
> still invoke `module!` or a similar macro?

Unfortunately not. As we have discussed at Kangrejos, the zpool 
implementation as a driver is on its way out [1] and there has to be 
more voices against that for it to be stopped. If we now are dealing 
with the build time API (i. e. a wrapper over zsmalloc functions) then 
we have to define a build time macro, be that DECLARE_ZPOOL or 
DeclareZpool :)

>> +
>> +        fn handle_from_result<T, F>(f: F) -> T
>> +        where
>> +            T: From<usize>,
>> +            F: FnOnce() -> Result<T>,
>> +        {
>> +            match f() {
>> +                Ok(v) => v,
>> +                Err(e) => T::from(0),
>> +            }
>> +        }
> 
> Why is this function inside the macro?

Doesn't seem to be needed elsewhere. I could put it in a separate patch 
for error.rs as a complement to from_result() but I thought no one was 
interested in this case.

>> +
>> +        /// Create a pool.
>> +        #[no_mangle]
>> +        pub unsafe extern "C" fn zpool_create_pool(name: *const c_uchar) -> *mut c_void {
> 
> Missing safety docs.
> 
>> +            match (|| -> Result<<$t as ZpoolDriver>::Pool> {
>> +                // SAFETY: the memory pointed to by name is guaranteed by `zpool` to be a valid
>> +                // string.
>> +                let name_r = unsafe { CStr::from_char_ptr(name).to_cstring() }?;
>> +                $t::create(name_r, flags::GFP_KERNEL)
>> +            })() {
>> +                Err(_) => null_mut(),
>> +                Ok(pool) => <$t as ZpoolDriver>::Pool::into_foreign(pool),
>> +            }
>> +        }
>> +
>> +        /// Destroy the pool.
>> +        #[no_mangle]
>> +        pub unsafe extern "C" fn zpool_destroy_pool(pool: *mut c_void) {
>> +            // SAFETY: The pointer originates from an `into_foreign` call.
>> +            drop(unsafe { <$t as ZpoolDriver>::Pool::from_foreign(pool) })
>> +        }
> 
> Have you tried to use the same pattern that we use for the many
> different `Registration` types in the kernel? For example take a look at
> `Adapter<T>` from `rust/kernel/net/phy.rs`. That way you don't need a
> macro and can make the registration a function that takes a generic `T:
> ZpoolDriver`.

That's what was in the previous version. Unfortunately it won't work any 
more for the reason described above, see also [1].

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/commit/mm?h=mm-stable&id=2ccd9fecd9163f168761d4398564c81554f636ef

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v6 2/2] rust: zpool: add abstraction for zpool drivers
  2025-09-24  9:23     ` Vitaly Wool
@ 2025-09-24 17:32       ` Benno Lossin
  0 siblings, 0 replies; 11+ messages in thread
From: Benno Lossin @ 2025-09-24 17:32 UTC (permalink / raw)
  To: Vitaly Wool, linux-mm, rust-for-linux
  Cc: Johannes Weiner, Yosry Ahmed, Nhat Pham, Chengming Zhou,
	Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Liam R . Howlett, Vlastimil Babka, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Minchan Kim, Sergey Senozhatsky,
	Miguel Ojeda, Alex Gaynor, Boqun Feng, Gary Guo,
	Björn Roy Baron, Andreas Hindborg, Alice Ryhl, Trevor Gross,
	Danilo Krummrich, Greg Kroah-Hartman, linux-kernel

On Wed Sep 24, 2025 at 11:23 AM CEST, Vitaly Wool wrote:
> On 9/23/25 23:49, Benno Lossin wrote:
>> On Tue Sep 23, 2025 at 12:27 PM CEST, Vitaly Wool wrote:
>>> diff --git a/rust/kernel/zpool.rs b/rust/kernel/zpool.rs
>>> new file mode 100644
>>> index 000000000000..53a0dc0607e6
>>> --- /dev/null
>>> +++ b/rust/kernel/zpool.rs
>>> @@ -0,0 +1,366 @@
>>> +// SPDX-License-Identifier: GPL-2.0
>>> +
>>> +//! Implementation of Rust interface towards zpool API.
>>> +
>>> +use crate::{error::Result, kernel::alloc::Flags, str::CString, types::ForeignOwnable};
>>> +use core::ptr::NonNull;
>>> +use kernel::alloc::NumaNode;
>>> +
>>> +/// The Rust representation of zpool handle.
>> 
>> Would be great to explain what this means, what's the handle used for?
>
> Sorry, do you mean explaining it here or in the code?

Yeah give a short summary what this means. So something like:

    /// Zpool handle
    ///
    /// Used to represent an allocation on a [`ZPool`].

Doesn't need to be detailed.

>>> +pub struct ZpoolHandle(usize);
>>> +
>>> +impl ZpoolHandle {
>>> +    /// Create `ZpoolHandle` from the raw representation.
>>> +    pub fn from_raw(h: usize) -> Self {
>>> +        Self(h)
>>> +    }
>>> +
>>> +    /// Get the raw representation of the handle.
>>> +    pub fn as_raw(self) -> usize {
>>> +        self.0
>>> +    }
>> 
>> Hmm this seems a bit weird, because now users of Zpools can manufacture
>> their own handles? Not sure as to how we could still allow Zpool
>> implementers to create these while preventing other users from doing
>> creating them though...
>
> This is a good question indeed. The thing is, an allocation backend 
> (these implementing zpool api) is to provide an opaque handle which is 
> usize, and since it has the best knowledge how to compose it, we need to 
> let it do that. OTOH I am not too happy with this straightforward 
> approach (from_raw()/as_raw()) either.
>
> Would making ZpoolHandle opaque here and defining a trait that a backend 
> must implement for ZpoolHandle work better? The example backend would 
> then do something like
>
> struct MyZpoolHandle {
>      ptr: *mut u8,
>      pow: usize,
> }
> type ZpoolHandle = MyZpoolHandle;
>
> trait AsRawHandle {
>      fn as_raw_handle(&self) -> usize;
> }
>
> impl AsRawHandle for ZpoolHandle {
>      fn as_raw_handle(&self) -> usize {
>          (self.ptr as usize) | self.pow
>      }
> }
>
> Would that make sense?

You don't need the trait. You can just add an associated type to
`ZpoolDriver` called `Handle` and then use that instead of this concrete
type.

Implementers of `ZpoolDriver` need to then provide their own handle &
they can ensure nobody except they are able to construct one.

Or do some other functions need access to the raw usize representation?

>>> +
>>> +/// Zpool API.
>>> +///
>>> +/// The [`ZpoolDriver`] trait serves as an interface for Zpool drivers implemented in Rust.
>>> +/// Such drivers implement memory storage pools in accordance with the zpool API.
>>> +///
>>> +/// # Example
>>> +///
>>> +/// A zpool driver implementation which uses KVec of 2**n sizes, n = 6, 7, ..., PAGE_SHIFT.
>>> +/// Every zpool object is packed into a KVec that is sufficiently large, and n (the
>>> +/// denominator) is saved in the least significant bits of the handle, which is guaranteed
>>> +/// to be at least 2**6 aligned by kmalloc.
>>> +///
>>> +/// ```
>>> +/// use core::ptr::{NonNull, copy_nonoverlapping};
>>> +/// use core::sync::atomic::{AtomicU64, Ordering};
>>> +/// use kernel::alloc::{Flags, flags, KBox, KVec, NumaNode};
>>> +/// use kernel::page::PAGE_SHIFT;
>>> +/// use kernel::prelude::EINVAL;
>>> +/// use kernel::str::CString;
>>> +/// use kernel::zpool::*;
>>> +///
>>> +/// struct MyZpool {
>>> +///     name: CString,
>>> +///     bytes_used: AtomicU64,
>>> +/// }
>>> +///
>>> +/// struct MyZpoolDriver;
>>> +///
>>> +/// impl ZpoolDriver for MyZpoolDriver {
>>> +///     type Pool = KBox<MyZpool>;
>>> +///
>>> +///     fn create(name: CString, gfp: Flags) -> Result<KBox<MyZpool>> {
>>> +///         let my_pool = MyZpool { name, bytes_used: AtomicU64::new(0) };
>>> +///         let pool = KBox::new(my_pool, gfp)?;
>>> +///
>>> +///         pr_debug!("Pool {:?} created\n", pool.name);
>>> +///         Ok(pool)
>>> +///     }
>>> +///
>>> +///     fn malloc(pool: &MyZpool, size: usize, _gfp: Flags, _nid: NumaNode) -> Result<ZpoolHandle> {
>>> +///         let pow = size.next_power_of_two().trailing_zeros().max(6);
>>> +///         match pow {
>>> +///             0 => Err(EINVAL),
>>> +///             m if m > PAGE_SHIFT as u32 => Err(ENOSPC),
>>> +///             _ => {
>>> +///                 let vec = KVec::<u8>::with_capacity(1 << pow, GFP_KERNEL)?;
>>> +///                 let (ptr, _len, _cap) = vec.into_raw_parts();
>>> +///
>>> +///                 // We assume that kmalloc-64, kmalloc-128 etc. kmem caches will be used for
>>> +///                 // our allocations, so it's actually `1 << pow` bytes that we have consumed.
>>> +///                 pool.bytes_used.fetch_add(1 << pow, Ordering::Relaxed);
>>> +///
>>> +///                 // `kmalloc` guarantees that an allocation of size x*2^n is 2^n aligned.
>>> +///                 // Therefore the 6 lower bits are zeros and we can use these to store `pow`.
>>> +///                 Ok(ZpoolHandle::from_raw(ptr as usize | (pow as usize - 6)))
>>> +///             }
>>> +///         }
>>> +///     }
>>> +///
>>> +///     unsafe fn free(pool: &MyZpool, handle: ZpoolHandle) {
>>> +///         let h = handle.as_raw();
>>> +///         let n = (h & 0x3F) + 6;
>>> +///         let uptr = h & !0x3F;
>>> +///
>>> +///         // SAFETY:
>>> +///         // - we derive `uptr` from handle by zeroing 6 lower bits where we store the power
>>> +///         //   denominator for the vector capacity. As noted above, the result will be exactly the
>>> +///         //   pointer to the area allocated by `KVec`. Thus, uptr is a valid pointer pointing to
>>> +///         //   the vector allocated by `alloc` function above.
>>> +///         // - 1 << n == capacity and is coming from the first 6 bits of handle.
>>> +///         let vec = unsafe { KVec::<u8>::from_raw_parts(uptr as *mut u8, 0, 1 << n) };
>>> +///         drop(vec);
>>> +///         pool.bytes_used.fetch_sub(1 << n, Ordering::Relaxed);
>>> +///     }
>>> +///
>>> +///     unsafe fn read_begin(_pool: &MyZpool, handle: ZpoolHandle) -> NonNull<u8> {
>>> +///         let uptr = handle.as_raw() & !0x3F;
>>> +///         // SAFETY:
>>> +///         // - we derive `uptr` from handle by zeroing 6 lower bits where we store the power
>>> +///         //   denominator for the vector capacity. As noted above, the result will be exactly the
>>> +///         //   pointer to the area allocated by `KVec`. Thus, uptr is a valid pointer pointing to
>>> +///         //   the vector allocated by `alloc` function above.
>>> +///         unsafe { NonNull::new_unchecked(uptr as *mut u8) }
>>> +///     }
>>> +///
>>> +///     unsafe fn read_end(_pool: &MyZpool, _handle: ZpoolHandle, _handle_mem: NonNull<u8>) {}
>>> +///
>>> +///     unsafe fn write(_p: &MyZpool, handle: ZpoolHandle, h_mem: NonNull<u8>, mem_len: usize) {
>>> +///         let uptr = handle.as_raw() & !0x3F;
>>> +///         // SAFETY:
>>> +///         // - `h_mem` is a valid non-null pointer provided by zswap.
>>> +///         // - `uptr` is derived from handle by zeroing 6 lower bits where we store the power
>>> +///         //   denominator for the vector capacity. As noted above, the result will be exactly the
>>> +///         //   pointer to the area allocated by `KVec`. Thus, uptr is a valid pointer pointing to
>>> +///         //   the vector allocated by `alloc` function above.
>>> +///         unsafe {
>>> +///             copy_nonoverlapping(h_mem.as_ptr().cast(), uptr as *mut c_void, mem_len)
>>> +///         };
>>> +///     }
>>> +///
>>> +///     fn total_pages(pool: &MyZpool) -> u64 {
>>> +///         pool.bytes_used.load(Ordering::Relaxed) >> PAGE_SHIFT
>>> +///     }
>>> +/// }
>>> +///
>>> +/// // Uncomment this for compile time registration (disabled to avoid build error when KUNIT
>>> +/// // tests and zsmalloc are enabled):
>>> +/// // kernel::DECLARE_ZPOOL!(MyZpoolDriver);
>>> +/// ```
>>> +///
>>> +pub trait ZpoolDriver {
>>> +    /// Opaque Rust representation of `struct zpool`.
>>> +    type Pool: ForeignOwnable;
>> 
>> Also what happened to my comment on v4 of this patchset?
>> 
>> https://lore.kernel.org/all/DCLK1YG1L5TZ.1VMGX131LII9V@kernel.org:
>> 
>>>> It can indeed but then the ZpoolDriver trait will have to be extended
>>>> with functions like into_raw() and from_raw(), because zpool expects
>>>> *mut c_void, so on the Adapter side it will look like
>>>>
>>>>       extern "C" fn create_(name: *const c_uchar, gfp: u32) -> *mut c_void {
>>>>           // SAFETY: the memory pointed to by name is guaranteed by zpool
>>>> to be a valid string
>>>>           let pool = unsafe { T::create(CStr::from_char_ptr(name),
>>>> Flags::from_raw(gfp)) };
>>>>           match pool {
>>>>               Err(_) => null_mut(),
>>>>               Ok(p) => T::into_raw(p).cast(),
>>>>           }
>>>>       }
>>>>
>>>> The question is, why does this make it better?
>>>
>>> No, thanks for sharing that function. Then the question becomes, do you
>>> really need `ForeignOwnable`? Or is `KBox` enough? Do you really want
>>> people to use `Arc<MyZPool>`? Because `BorrowedMut` of `Arc` is the same
>>> as it's `Borrowed` variant (it's read-only after all).
>>>
>>> If you can get away with just `Box` (you might want people to choose
>>> their allocator, which is fine IMO), then I'd do so.
>> 
>> I still think that if you can use `Box<Self>` the trait is going to be
>> much easier to understand.
>
> Okay, thanks, I'll work on that.

If you do need `Arc` support, then I'd keep it this way, but I don't
know if you do need it.

>>> +
>>> +    /// Create a pool.
>>> +    fn create(name: CString, gfp: Flags) -> Result<Self::Pool>;
>>> +
>>> +    /// Allocate an object of `size` bytes from `pool`, with the allocation flags `gfp` and
>>> +    /// preferred NUMA node `nid`. If the allocation is successful, an opaque handle is returned.
>>> +    fn malloc(
>>> +        pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>,
>>> +        size: usize,
>>> +        gfp: Flags,
>>> +        nid: NumaNode,
>>> +    ) -> Result<ZpoolHandle>;
>>> +
>>> +    /// Free an object previously allocated from the `pool`, represented by `handle`.
>>> +    ///
>>> +    /// # Safety
>>> +    ///
>>> +    /// - `handle` must be a valid handle previously returned by `malloc`.
>>> +    /// - `handle` must not be used any more after the call to `free`.
>>> +    unsafe fn free(pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>, handle: ZpoolHandle);
>>> +
>>> +    /// Make all the necessary preparations for the caller to be able to read from the object
>>> +    /// represented by `handle` and return a valid pointer to that object's memory to be read.
>>> +    ///
>>> +    /// # Safety
>>> +    ///
>>> +    /// - `handle` must be a valid handle previously returned by `malloc`.
>>> +    /// - `read_end` with the same `handle` must be called for each `read_begin`.
>>> +    unsafe fn read_begin(
>>> +        pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>,
>>> +        handle: ZpoolHandle,
>>> +    ) -> NonNull<u8>;
>>> +
>>> +    /// Finish reading from a previously allocated `handle`. `handle_mem` must be the pointer
>>> +    /// previously returned by `read_begin`.
>>> +    ///
>>> +    /// # Safety
>>> +    ///
>>> +    /// - `handle` must be a valid handle previously returned by `malloc`.
>>> +    /// - `handle_mem` must be the pointer previously returned by `read_begin`.
>>> +    unsafe fn read_end(
>>> +        pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>,
>>> +        handle: ZpoolHandle,
>>> +        handle_mem: NonNull<u8>,
>>> +    );
>>> +
>>> +    /// Write to the object represented by a previously allocated `handle`. `handle_mem` points
>>> +    /// to the memory to copy data from, and `mem_len` defines the length of the data block to
>>> +    /// be copied.
>>> +    ///
>>> +    /// # Safety
>>> +    ///
>>> +    /// - `handle` must be a valid handle previously returned by `malloc`.
>>> +    /// - `handle_mem` must be a valid pointer into the allocated memory aread represented by
>>> +    ///   `handle`.
>>> +    /// - `handle_mem + mem_len - 1` must not point outside the allocated memory area.
>>> +    unsafe fn write(
>>> +        pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>,
>>> +        handle: ZpoolHandle,
>>> +        handle_mem: NonNull<u8>,
>>> +        mem_len: usize,
>>> +    );
>>> +
>>> +    /// Get the number of pages used by the `pool`.
>>> +    fn total_pages(pool: <Self::Pool as ForeignOwnable>::Borrowed<'_>) -> u64;
>>> +}
>>> +
>>> +/// Provide a zpool allocator to zswap
>>> +#[macro_export]
>>> +macro_rules! DECLARE_ZPOOL {
>> 
>> Why all caps and not snake case?
>
> C style, sorry :) Will fix.
>
>>> +    ($t: ident) => {
>>> +        use core::ptr::null_mut;
>>> +        use kernel::error::from_result;
>>> +        use kernel::types::ForeignOwnable;
>> 
>> Macros shouldn't have `use` statements in a non-local area (so inside
>> `const` bodies and modules is fine).
>> 
>>> +
>>> +        const __LOG_PREFIX: &[u8] = b"zpool_rust\0";
>> 
>> This seems wrong. Why do you need to generate this? Shouldn't the user
>> still invoke `module!` or a similar macro?
>
> Unfortunately not. As we have discussed at Kangrejos, the zpool 
> implementation as a driver is on its way out [1] and there has to be 
> more voices against that for it to be stopped. If we now are dealing 
> with the build time API (i. e. a wrapper over zsmalloc functions) then 
> we have to define a build time macro, be that DECLARE_ZPOOL or 
> DeclareZpool :)

But this macro shouldn't declare the `__LOG_PREFIX` const? What if I
want to have a zpool driver in addition to having a module? Or is that
impossible to begin with?

>>> +
>>> +        fn handle_from_result<T, F>(f: F) -> T
>>> +        where
>>> +            T: From<usize>,
>>> +            F: FnOnce() -> Result<T>,
>>> +        {
>>> +            match f() {
>>> +                Ok(v) => v,
>>> +                Err(e) => T::from(0),
>>> +            }
>>> +        }
>> 
>> Why is this function inside the macro?
>
> Doesn't seem to be needed elsewhere. I could put it in a separate patch 
> for error.rs as a complement to from_result() but I thought no one was 
> interested in this case.

Now that I look more closely at this function, why does it even exist?
You only have a single usage here and since this is a macro, you could
just perform the match there...

>>> +
>>> +        /// Create a pool.
>>> +        #[no_mangle]
>>> +        pub unsafe extern "C" fn zpool_create_pool(name: *const c_uchar) -> *mut c_void {
>> 
>> Missing safety docs.
>> 
>>> +            match (|| -> Result<<$t as ZpoolDriver>::Pool> {
>>> +                // SAFETY: the memory pointed to by name is guaranteed by `zpool` to be a valid
>>> +                // string.
>>> +                let name_r = unsafe { CStr::from_char_ptr(name).to_cstring() }?;
>>> +                $t::create(name_r, flags::GFP_KERNEL)
>>> +            })() {
>>> +                Err(_) => null_mut(),
>>> +                Ok(pool) => <$t as ZpoolDriver>::Pool::into_foreign(pool),
>>> +            }
>>> +        }
>>> +
>>> +        /// Destroy the pool.
>>> +        #[no_mangle]
>>> +        pub unsafe extern "C" fn zpool_destroy_pool(pool: *mut c_void) {
>>> +            // SAFETY: The pointer originates from an `into_foreign` call.
>>> +            drop(unsafe { <$t as ZpoolDriver>::Pool::from_foreign(pool) })
>>> +        }
>> 
>> Have you tried to use the same pattern that we use for the many
>> different `Registration` types in the kernel? For example take a look at
>> `Adapter<T>` from `rust/kernel/net/phy.rs`. That way you don't need a
>> macro and can make the registration a function that takes a generic `T:
>> ZpoolDriver`.
>
> That's what was in the previous version. Unfortunately it won't work any 
> more for the reason described above, see also [1].

I see. But this API design is a bit weird IMO, since now you can't have
two Zpool drivers at the same time (even if only one of them is used at
runtime)? Or is that something that you wouldn't do in the first place?

I also dislike that we have to generate these `no_mangle` functions from
within a macro...

---
Cheers,
Benno

> [1] 
> https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/commit/mm?h=mm-stable&id=2ccd9fecd9163f168761d4398564c81554f636ef


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [syzbot ci] Re: rust: zpool: add API for C and Rust
  2025-09-24  5:46     ` Vitaly Wool
@ 2025-09-24 17:38       ` Nhat Pham
  2025-09-26 14:27         ` Vlastimil Babka
  0 siblings, 1 reply; 11+ messages in thread
From: Nhat Pham @ 2025-09-24 17:38 UTC (permalink / raw)
  To: Vitaly Wool
  Cc: Johannes Weiner, syzbot ci, a.hindborg, akpm, alex.gaynor,
	aliceryhl, bjorn3_gh, boqun.feng, chengming.zhou, dakr, david,
	gary, gregkh, liam.howlett, linux-kernel, linux-mm,
	lorenzo.stoakes, lossin, mhocko, minchan, ojeda, rppt,
	rust-for-linux, senozhatsky, surenb, tmgross, vbabka, yosry.ahmed,
	syzbot, syzkaller-bugs

On Tue, Sep 23, 2025 at 10:46 PM Vitaly Wool <vitaly.wool@konsulko.se> wrote:
>
>
>
> On 9/23/25 23:59, Johannes Weiner wrote:
> > On Tue, Sep 23, 2025 at 09:50:10AM -0700, syzbot ci wrote:
> >> syzbot ci has tested the following series
> >>
> >> [v6] rust: zpool: add API for C and Rust
> >> https://lore.kernel.org/all/20250923102547.2545992-1-vitaly.wool@konsulko.se
> >> * [PATCH v6 1/2] mm: reinstate zpool as a thin API
> >> * [PATCH v6 2/2] rust: zpool: add abstraction for zpool drivers
> >>
> >> and found the following issues:
> >> * BUG: unable to handle kernel NULL pointer dereference in zswap_store
> >> * KASAN: slab-out-of-bounds Read in zpool_get_total_pages
> >> * KASAN: slab-out-of-bounds Read in zswap_store
> >> * KASAN: slab-use-after-free Read in zpool_get_total_pages
> >> * KASAN: use-after-free Read in zpool_get_total_pages
> >>
> >> Full report is available here:
> >> https://ci.syzbot.org/series/e8b22352-ae56-4d7c-9113-75573acf2b64
> >>
> >> ***
> >>
> >> BUG: unable to handle kernel NULL pointer dereference in zswap_store
> >
> > struct zpool {
> >       void *pool;
> > };
> >
> > struct zpool *zpool_create_pool(const char *name) \
> > { \
> >       return (struct zpool *) prefix ## _create_pool(name); \
> > } \
> >
> > u64 zpool_get_total_pages(struct zpool *zpool) \
> > { \
> >       return prefix ## _get_total_pages(zpool->pool); \
> > }
> >
> > You create the zpool by simply casting the backend pool, but then you
> > deref it twice as if it were an actual container for the backend pool.
> >
> > I'm guessing you didn't test this even superficially?
>
> LOL, no, forgot to run git commit --amend so came up with a wrong version.
>
> The Rust version is correct though.
>
> > This also still proposes an API with no in-kernel user.
>
> That's not correct, zsmalloc is the user.

A single user does not an API make.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [syzbot ci] Re: rust: zpool: add API for C and Rust
  2025-09-24 17:38       ` Nhat Pham
@ 2025-09-26 14:27         ` Vlastimil Babka
  0 siblings, 0 replies; 11+ messages in thread
From: Vlastimil Babka @ 2025-09-26 14:27 UTC (permalink / raw)
  To: Nhat Pham, Vitaly Wool
  Cc: Johannes Weiner, syzbot ci, a.hindborg, akpm, alex.gaynor,
	aliceryhl, bjorn3_gh, boqun.feng, chengming.zhou, dakr, david,
	gary, gregkh, liam.howlett, linux-kernel, linux-mm,
	lorenzo.stoakes, lossin, mhocko, minchan, ojeda, rppt,
	rust-for-linux, senozhatsky, surenb, tmgross, yosry.ahmed, syzbot,
	syzkaller-bugs

On 9/24/25 19:38, Nhat Pham wrote:
> On Tue, Sep 23, 2025 at 10:46 PM Vitaly Wool <vitaly.wool@konsulko.se> wrote:
>>
>> LOL, no, forgot to run git commit --amend so came up with a wrong version.
>>
>> The Rust version is correct though.
>>
>> > This also still proposes an API with no in-kernel user.
>>
>> That's not correct, zsmalloc is the user.
> 
> A single user does not an API make.

IIRC what was suggested is to implement the zsmalloc API directly. What does
the extra inline function layer get us in case of a compile-time switch?

And do you need the Rust abstraction or just can make it part of the zblock
itself? You don't expect there to be more Rust-based backends than zblock, no?


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-09-26 14:27 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-23 10:25 [PATCH v6 0/2] rust: zpool: add API for C and Rust Vitaly Wool
2025-09-23 10:26 ` [PATCH v6 1/2] mm: reinstate zpool as a thin API Vitaly Wool
2025-09-23 10:27 ` [PATCH v6 2/2] rust: zpool: add abstraction for zpool drivers Vitaly Wool
2025-09-23 21:49   ` Benno Lossin
2025-09-24  9:23     ` Vitaly Wool
2025-09-24 17:32       ` Benno Lossin
2025-09-23 16:50 ` [syzbot ci] Re: rust: zpool: add API for C and Rust syzbot ci
2025-09-23 21:59   ` Johannes Weiner
2025-09-24  5:46     ` Vitaly Wool
2025-09-24 17:38       ` Nhat Pham
2025-09-26 14:27         ` Vlastimil Babka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).