[PATCH RFC 0/8] mm/slab: enable runtime sheaves tuning

Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH RFC 0/8] mm/slab: enable runtime sheaves tuning
@ 2026-05-15 16:24 Harry Yoo (Oracle)
  2026-05-15 16:24 ` [PATCH RFC 1/8] mm/slab: do not store cache pointer in struct slab_sheaf Harry Yoo (Oracle)
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: Harry Yoo (Oracle) @ 2026-05-15 16:24 UTC (permalink / raw)
  To: Vlastimil Babka, Andrew Morton, Hao Li, Christoph Lameter,
	David Rientjes, Roman Gushchin
  Cc: linux-mm, linux-kernel, Suren Baghdasaryan, Liam R. Howlett

Background
==========

Sheaves were introduced in v6.18, and starting from v7.0, they are
enabled for all slab caches (except for kmem_cache{,_node}). In the
pre-sheaves era, there was a cpu_partial parameter to tune the number
of objects cached per CPU. However, sheaves don't have an equivalent
and the sheaf capacity is determined in the kernel code.

The goal is to allow tuning sheaves at runtime by the next LTS.

Overview
========

This patchset does two main things:

  1. Make the sheaf_capacity sysfs attribute writable so that the number
     of objects cached per CPU can be changed at runtime, and

  2. Expose MAX_FULL_SHEAVES and MAX_EMPTY_SHEAVES as sysfs attributes
     rather than constants, so that users can tune them.

Measuring the performance impact of these tunables is TBD.

Roughly, the sequence to change sheaf_capacity is as follows:

  1. Disable sheaves. Make all online CPUs replace their main sheaves
     with the bootstrap sheaf under local_lock and wait for completion.

  2. Wait for all in-flight RCU callbacks to be processed.

  3. Flush and free all existing sheaves.

  4. Re-enable sheaves with a new capacity.

Challenges
==========

1. Allocations and frees can happen concurrently at any point between
   these steps, and we cannot introduce heavyweight synchronization
   mechanisms on the fastpath.

2. Currently, cache_has_sheaves() checks whether a cache has sheaves.
   This works now because sheaves cannot be enabled or disabled once
   the cache is created.

   The question "Does this cache has sheaves?" should be split into
   "Does this cache support sheaves?" and "Does this CPU actually has
    sheaves enabled right now?".

3. Once the sheaf capacity update is complete, no sheaf with stale
   capacity must remain. Flushing and freeing all existing sheaves is
   relatively simple, but under the current design it is quite
   challenging to prevent sheaves with stale capacity to be installed
   to the pcs or the barn. Reading s->sheaf_capacity without an
   expensive synchronization primitive is racy.

   Patch 6 introduces a copy of s->sheaf_capacity to struct
   slub_percpu_sheaves to address this. pcs->capacity is copied from
   s->sheaf_capacity and it is stable under local_lock. If
   s->sheaf_capacity and pcs->capacity don't match, the sheaf_capacity
   writer is responsible for flushing and freeing them before completing
   the process.

Patch Sequence
==============

Patch 1-3: A per-sheaf capacity is required for the following steps,
but I didn't want to grow struct slab_sheaf. So patch 1 drops the cache
pointer (which was used only on the slowpath), patch 2 changs
sheaf_capacity from unsigned int to unsigned short, and patch 3 adds
per-sheaf capacity.

Actually, the size is shrunken after those patches.

After (24 bytes, excluding the objects flex array):

struct slab_sheaf {
	union {
		struct rcu_head rcu_head;
		struct list_head barn_list;
		bool pfmemalloc;
	};
	unsigned short capacity;
	unsigned short size;
	int node;
	void *objects[];
};

Patch 4 allows bootstrap_cache_sheaves() to fail so that it can be
used to re-enable sheaves without panicking the kernel.

Patch 5 splits cache_has_sheaves() into cache_supports_sheaves()
and pcs_has_sheaves().

Patch 6 enables tuning the sheaf capacity at runtime.

Patch 7 adds lockdep asserts to verify the new rule "Always hold
local_lock when accessing the barn" to make sure there is no sheaf
with stale capacity.

Patch 8 turns MAX_FULL_SHEAVES and MAX_EMPTY_SHEAVES into sysfs
attributes (max_full_sheaves, max_empty_sheaves) and allows tuning.

RFC V1 is also available in git at:
https://git.kernel.org/pub/scm/linux/kernel/git/harry/linux.git/log/?h=sheaves-tuning-rfc-v1r1

Signed-off-by: Harry Yoo (Oracle) <harry@kernel.org>
---
Harry Yoo (Oracle) (8):
      mm/slab: do not store cache pointer in struct slab_sheaf
      mm/slab: change sheaf_capacity type to unsigned short
      mm/slab: track capacity per sheaf
      mm/slab: allow bootstrap_cache_sheaves() to fail
      mm/slab: rework cache_has_sheaves() to check immutable properties only
      mm/slab: allow changing sheaf_capacity at runtime
      mm/slab: add pcs->lock lockdep assert when accessing the barn
      mm/slab: allow changing max_{full,empty}_sheaves at runtime

 include/linux/slab.h         |   8 +-
 mm/slab.h                    |  40 ++-
 mm/slab_common.c             |   2 +-
 mm/slub.c                    | 715 ++++++++++++++++++++++++++++++-------------
 tools/include/linux/slab.h   |  14 +-
 tools/testing/shared/linux.c |   4 +-
 6 files changed, 563 insertions(+), 220 deletions(-)
---
base-commit: e98d21c170b01ddef366f023bbfcf6b31509fa83
change-id: 20260515-sheaves-tuning-e1f897dc7f5e

Best regards,
--  
Cheers,
Harry / Hyeonggon

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH RFC 1/8] mm/slab: do not store cache pointer in struct slab_sheaf
  2026-05-15 16:24 [PATCH RFC 0/8] mm/slab: enable runtime sheaves tuning Harry Yoo (Oracle)
@ 2026-05-15 16:24 ` Harry Yoo (Oracle)
  2026-05-15 16:24 ` [PATCH RFC 2/8] mm/slab: change sheaf_capacity type to unsigned short Harry Yoo (Oracle)
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Harry Yoo (Oracle) @ 2026-05-15 16:24 UTC (permalink / raw)
  To: Vlastimil Babka, Andrew Morton, Hao Li, Christoph Lameter,
	David Rientjes, Roman Gushchin
  Cc: linux-mm, linux-kernel, Suren Baghdasaryan, Liam R. Howlett

The `cache` field of struct slab_sheaf is only read on the slow path
when freeing an RCU sheaf. Storing it in every sheaf is an overkill.

Drop the field. In rcu_free_sheaf() and rcu_free_sheaf_nobarn(),
fetch the kmem_cache pointer via
virt_to_slab(sheaf->objects[0])->slab_cache instead.

As sheaf is only attached to pcs->rcu_free once it holds at least one
object, the lookup is safe. Add a WARN_ON_ONCE() in case an empty
sheaf ever reaches the RCU free path. In that case, the cache is
unknown, so free_empty_sheaf() now tolerates a NULL cache argument.
However, the case is never expected to trigger.

While at it, remove the stale comment in init_percpu_sheaves().

Signed-off-by: Harry Yoo (Oracle) <harry@kernel.org>
---
 mm/slub.c | 26 +++++++++++++-------------
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 5ef54d546bc2..75281eb802de 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -422,7 +422,6 @@ struct slab_sheaf {
 			bool pfmemalloc;
 		};
 	};
-	struct kmem_cache *cache;
 	unsigned int size;
 	int node; /* only used for rcu_sheaf */
 	void *objects[];
@@ -2781,8 +2780,6 @@ static struct slab_sheaf *__alloc_empty_sheaf(struct kmem_cache *s, gfp_t gfp,
 	if (unlikely(!sheaf))
 		return NULL;
 
-	sheaf->cache = s;
-
 	stat(s, SHEAF_ALLOC);
 
 	return sheaf;
@@ -2802,13 +2799,14 @@ static void free_empty_sheaf(struct kmem_cache *s, struct slab_sheaf *sheaf)
 	 * warning, therefore replace NULL with CODETAG_EMPTY to indicate
 	 * that the extension for this sheaf is expected to be NULL.
 	 */
-	if (s->flags & SLAB_KMALLOC)
+	if (s && (s->flags & SLAB_KMALLOC))
 		mark_obj_codetag_empty(sheaf);
 
 	VM_WARN_ON_ONCE(sheaf->size > 0);
 	kfree(sheaf);
 
-	stat(s, SHEAF_FREE);
+	if (s)
+		stat(s, SHEAF_FREE);
 }
 
 static unsigned int
@@ -2968,12 +2966,15 @@ static void rcu_free_sheaf_nobarn(struct rcu_head *head)
 	struct kmem_cache *s;
 
 	sheaf = container_of(head, struct slab_sheaf, rcu_head);
-	s = sheaf->cache;
+	if (WARN_ON_ONCE(!sheaf->size)) {
+		free_empty_sheaf(NULL, sheaf);
+		return;
+	}
 
+	s = virt_to_slab(sheaf->objects[0])->slab_cache;
 	__rcu_free_sheaf_prepare(s, sheaf);
 
 	sheaf_flush_unused(s, sheaf);
-
 	free_empty_sheaf(s, sheaf);
 }
 
@@ -5019,7 +5020,6 @@ kmem_cache_prefill_sheaf(struct kmem_cache *s, gfp_t gfp, unsigned int size)
 			return NULL;
 
 		stat(s, SHEAF_PREFILL_OVERSIZE);
-		sheaf->cache = s;
 		sheaf->capacity = size;
 
 		/*
@@ -5873,8 +5873,12 @@ static void rcu_free_sheaf(struct rcu_head *head)
 	struct kmem_cache *s;
 
 	sheaf = container_of(head, struct slab_sheaf, rcu_head);
+	if (WARN_ON_ONCE(!sheaf->size)) {
+		free_empty_sheaf(NULL, sheaf);
+		return;
+	}
 
-	s = sheaf->cache;
+	s = virt_to_slab(sheaf->objects[0])->slab_cache;
 
 	/*
 	 * This may remove some objects due to slab_free_hook() returning false,
@@ -7616,10 +7620,6 @@ static int init_percpu_sheaves(struct kmem_cache *s)
 		 * It's also safe to share the single static bootstrap_sheaf
 		 * with zero-sized objects array as it's never modified.
 		 *
-		 * Bootstrap_sheaf also has NULL pointer to kmem_cache so we
-		 * recognize it and not attempt to free it when destroying the
-		 * cache.
-		 *
 		 * We keep bootstrap_sheaf for kmem_cache and kmem_cache_node,
 		 * caches with debug enabled, and all caches with SLUB_TINY.
 		 * For kmalloc caches it's used temporarily during the initial

-- 
2.43.0



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH RFC 2/8] mm/slab: change sheaf_capacity type to unsigned short
  2026-05-15 16:24 [PATCH RFC 0/8] mm/slab: enable runtime sheaves tuning Harry Yoo (Oracle)
  2026-05-15 16:24 ` [PATCH RFC 1/8] mm/slab: do not store cache pointer in struct slab_sheaf Harry Yoo (Oracle)
@ 2026-05-15 16:24 ` Harry Yoo (Oracle)
  2026-05-15 16:24 ` [PATCH RFC 3/8] mm/slab: track capacity per sheaf Harry Yoo (Oracle)
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Harry Yoo (Oracle) @ 2026-05-15 16:24 UTC (permalink / raw)
  To: Vlastimil Babka, Andrew Morton, Hao Li, Christoph Lameter,
	David Rientjes, Roman Gushchin
  Cc: linux-mm, linux-kernel, Suren Baghdasaryan, Liam R. Howlett

Change struct kmem_cache.sheaf_capacity and the matching
kmem_cache_args field from unsigned int to unsigned short, so that
we can add a new field later without growing the struct size.
unsigned short is a reasonable size for any realistic configurations.

Signed-off-by: Harry Yoo (Oracle) <harry@kernel.org>
---
 include/linux/slab.h         |  8 ++++----
 mm/slab.h                    |  2 +-
 mm/slub.c                    | 34 +++++++++++++++++-----------------
 tools/include/linux/slab.h   | 14 +++++++-------
 tools/testing/shared/linux.c |  4 ++--
 5 files changed, 31 insertions(+), 31 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 2b5ab488e96b..6f023f04763a 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -371,7 +371,7 @@ struct kmem_cache_args {
 	 *
 	 * %0 means no sheaves will be created.
 	 */
-	unsigned int sheaf_capacity;
+	unsigned short sheaf_capacity;
 };
 
 struct kmem_cache *__kmem_cache_create_args(const char *name,
@@ -828,10 +828,10 @@ void *kmem_cache_alloc_node_noprof(struct kmem_cache *s, gfp_t flags,
 #define kmem_cache_alloc_node(...)	alloc_hooks(kmem_cache_alloc_node_noprof(__VA_ARGS__))
 
 struct slab_sheaf *
-kmem_cache_prefill_sheaf(struct kmem_cache *s, gfp_t gfp, unsigned int size);
+kmem_cache_prefill_sheaf(struct kmem_cache *s, gfp_t gfp, unsigned short size);
 
 int kmem_cache_refill_sheaf(struct kmem_cache *s, gfp_t gfp,
-		struct slab_sheaf **sheafp, unsigned int size);
+		struct slab_sheaf **sheafp, unsigned short size);
 
 void kmem_cache_return_sheaf(struct kmem_cache *s, gfp_t gfp,
 				       struct slab_sheaf *sheaf);
@@ -841,7 +841,7 @@ void *kmem_cache_alloc_from_sheaf_noprof(struct kmem_cache *cachep, gfp_t gfp,
 #define kmem_cache_alloc_from_sheaf(...)	\
 			alloc_hooks(kmem_cache_alloc_from_sheaf_noprof(__VA_ARGS__))
 
-unsigned int kmem_cache_sheaf_size(struct slab_sheaf *sheaf);
+unsigned short kmem_cache_sheaf_size(struct slab_sheaf *sheaf);
 
 /*
  * These macros allow declaring a kmem_buckets * parameter alongside size, which
diff --git a/mm/slab.h b/mm/slab.h
index bf2f87acf5e3..dfbe73011cb8 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -204,7 +204,7 @@ struct kmem_cache {
 	unsigned int object_size;	/* Object size without metadata */
 	struct reciprocal_value reciprocal_size;
 	unsigned int offset;		/* Free pointer offset */
-	unsigned int sheaf_capacity;
+	unsigned short sheaf_capacity;
 	struct kmem_cache_order_objects oo;
 
 	/* Allocation and freeing of slabs */
diff --git a/mm/slub.c b/mm/slub.c
index 75281eb802de..a1974523bba9 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -418,11 +418,11 @@ struct slab_sheaf {
 		struct list_head barn_list;
 		/* only used for prefilled sheafs */
 		struct {
-			unsigned int capacity;
+			unsigned short capacity;
 			bool pfmemalloc;
 		};
 	};
-	unsigned int size;
+	unsigned short size;
 	int node; /* only used for rcu_sheaf */
 	void *objects[];
 };
@@ -2756,7 +2756,7 @@ static inline void *setup_object(struct kmem_cache *s, void *object)
 }
 
 static struct slab_sheaf *__alloc_empty_sheaf(struct kmem_cache *s, gfp_t gfp,
-					      unsigned int capacity)
+					      unsigned short capacity)
 {
 	struct slab_sheaf *sheaf;
 	size_t sheaf_size;
@@ -2854,10 +2854,10 @@ static void __kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p);
  *
  * Returns how many objects are remaining to be flushed
  */
-static unsigned int __sheaf_flush_main_batch(struct kmem_cache *s)
+static unsigned short __sheaf_flush_main_batch(struct kmem_cache *s)
 {
 	struct slub_percpu_sheaves *pcs;
-	unsigned int batch, remaining;
+	unsigned short batch, remaining;
 	void *objects[PCS_BATCH_MAX];
 	struct slab_sheaf *sheaf;
 
@@ -2884,7 +2884,7 @@ static unsigned int __sheaf_flush_main_batch(struct kmem_cache *s)
 
 static void sheaf_flush_main(struct kmem_cache *s)
 {
-	unsigned int remaining;
+	unsigned short remaining;
 
 	do {
 		local_lock(&s->cpu_sheaves->lock);
@@ -2899,7 +2899,7 @@ static void sheaf_flush_main(struct kmem_cache *s)
  */
 static bool sheaf_try_flush_main(struct kmem_cache *s)
 {
-	unsigned int remaining;
+	unsigned short remaining;
 	bool ret = false;
 
 	do {
@@ -4849,7 +4849,7 @@ unsigned int alloc_from_pcs_bulk(struct kmem_cache *s, gfp_t gfp, size_t size,
 do_alloc:
 
 	main = pcs->main;
-	batch = min(size, main->size);
+	batch = min_t(size_t, size, main->size);
 
 	main->size -= batch;
 	memcpy(p, main->objects + main->size, batch * sizeof(void *));
@@ -5004,7 +5004,7 @@ static int __kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags,
  * return NULL if sheaf allocation or prefilling failed
  */
 struct slab_sheaf *
-kmem_cache_prefill_sheaf(struct kmem_cache *s, gfp_t gfp, unsigned int size)
+kmem_cache_prefill_sheaf(struct kmem_cache *s, gfp_t gfp, unsigned short size)
 {
 	struct slub_percpu_sheaves *pcs;
 	struct slab_sheaf *sheaf = NULL;
@@ -5146,7 +5146,7 @@ void kmem_cache_return_sheaf(struct kmem_cache *s, gfp_t gfp,
  * In practice we always refill to full sheaf's capacity.
  */
 int kmem_cache_refill_sheaf(struct kmem_cache *s, gfp_t gfp,
-			    struct slab_sheaf **sheafp, unsigned int size)
+			    struct slab_sheaf **sheafp, unsigned short size)
 {
 	struct slab_sheaf *sheaf;
 
@@ -5225,7 +5225,7 @@ kmem_cache_alloc_from_sheaf_noprof(struct kmem_cache *s, gfp_t gfp,
 	return ret;
 }
 
-unsigned int kmem_cache_sheaf_size(struct slab_sheaf *sheaf)
+unsigned short kmem_cache_sheaf_size(struct slab_sheaf *sheaf)
 {
 	return sheaf->size;
 }
@@ -6172,7 +6172,7 @@ static void free_to_pcs_bulk(struct kmem_cache *s, size_t size, void **p)
 
 do_free:
 	main = pcs->main;
-	batch = min(size, s->sheaf_capacity - main->size);
+	batch = min_t(size_t, size, s->sheaf_capacity - main->size);
 
 	memcpy(main->objects + main->size, p, batch * sizeof(void *));
 	main->size += batch;
@@ -7759,11 +7759,11 @@ static int init_kmem_cache_nodes(struct kmem_cache *s)
 	return 1;
 }
 
-static unsigned int calculate_sheaf_capacity(struct kmem_cache *s,
-					     struct kmem_cache_args *args)
+static unsigned short calculate_sheaf_capacity(struct kmem_cache *s,
+					       struct kmem_cache_args *args)
 
 {
-	unsigned int capacity;
+	unsigned short capacity;
 	size_t size;
 
 
@@ -8466,7 +8466,7 @@ static struct kmem_cache * __init bootstrap(struct kmem_cache *static_cache)
 static void __init bootstrap_cache_sheaves(struct kmem_cache *s)
 {
 	struct kmem_cache_args empty_args = {};
-	unsigned int capacity;
+	unsigned short capacity;
 	bool failed = false;
 	int node, cpu;
 
@@ -9091,7 +9091,7 @@ SLAB_ATTR_RO(order);
 
 static ssize_t sheaf_capacity_show(struct kmem_cache *s, char *buf)
 {
-	return sysfs_emit(buf, "%u\n", s->sheaf_capacity);
+	return sysfs_emit(buf, "%hu\n", s->sheaf_capacity);
 }
 SLAB_ATTR_RO(sheaf_capacity);
 
diff --git a/tools/include/linux/slab.h b/tools/include/linux/slab.h
index 6d8e9413d5a4..76d0b9da6cfe 100644
--- a/tools/include/linux/slab.h
+++ b/tools/include/linux/slab.h
@@ -47,7 +47,7 @@ struct kmem_cache {
 	pthread_mutex_t lock;
 	unsigned int size;
 	unsigned int align;
-	unsigned int sheaf_capacity;
+	unsigned short sheaf_capacity;
 	int nr_objs;
 	void *objs;
 	void (*ctor)(void *);
@@ -70,7 +70,7 @@ struct kmem_cache_args {
 	/**
 	 * @sheaf_capacity: The maximum size of the sheaf.
 	 */
-	unsigned int sheaf_capacity;
+	unsigned short sheaf_capacity;
 	/**
 	 * @useroffset: Usercopy region offset.
 	 *
@@ -127,10 +127,10 @@ struct slab_sheaf {
 	union {
 		struct list_head barn_list;
 		/* only used for prefilled sheafs */
-		unsigned int capacity;
+		unsigned short capacity;
 	};
 	struct kmem_cache *cache;
-	unsigned int size;
+	unsigned short size;
 	int node; /* only used for rcu_sheaf */
 	void *objects[];
 };
@@ -186,7 +186,7 @@ void kmem_cache_free_bulk(struct kmem_cache *cachep, size_t size, void **list);
 int kmem_cache_alloc_bulk(struct kmem_cache *cachep, gfp_t gfp, size_t size,
 			  void **list);
 struct slab_sheaf *
-kmem_cache_prefill_sheaf(struct kmem_cache *s, gfp_t gfp, unsigned int size);
+kmem_cache_prefill_sheaf(struct kmem_cache *s, gfp_t gfp, unsigned short size);
 
 void *
 kmem_cache_alloc_from_sheaf(struct kmem_cache *s, gfp_t gfp,
@@ -195,9 +195,9 @@ kmem_cache_alloc_from_sheaf(struct kmem_cache *s, gfp_t gfp,
 void kmem_cache_return_sheaf(struct kmem_cache *s, gfp_t gfp,
 		struct slab_sheaf *sheaf);
 int kmem_cache_refill_sheaf(struct kmem_cache *s, gfp_t gfp,
-		struct slab_sheaf **sheafp, unsigned int size);
+		struct slab_sheaf **sheafp, unsigned short size);
 
-static inline unsigned int kmem_cache_sheaf_size(struct slab_sheaf *sheaf)
+static inline unsigned short kmem_cache_sheaf_size(struct slab_sheaf *sheaf)
 {
 	return sheaf->size;
 }
diff --git a/tools/testing/shared/linux.c b/tools/testing/shared/linux.c
index 8c7257155958..2da3a6617d87 100644
--- a/tools/testing/shared/linux.c
+++ b/tools/testing/shared/linux.c
@@ -252,7 +252,7 @@ __kmem_cache_create_args(const char *name, unsigned int size,
 }
 
 struct slab_sheaf *
-kmem_cache_prefill_sheaf(struct kmem_cache *s, gfp_t gfp, unsigned int size)
+kmem_cache_prefill_sheaf(struct kmem_cache *s, gfp_t gfp, unsigned short size)
 {
 	struct slab_sheaf *sheaf;
 	unsigned int capacity;
@@ -281,7 +281,7 @@ kmem_cache_prefill_sheaf(struct kmem_cache *s, gfp_t gfp, unsigned int size)
 }
 
 int kmem_cache_refill_sheaf(struct kmem_cache *s, gfp_t gfp,
-		 struct slab_sheaf **sheafp, unsigned int size)
+		 struct slab_sheaf **sheafp, unsigned short size)
 {
 	struct slab_sheaf *sheaf = *sheafp;
 	int refill;

-- 
2.43.0



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH RFC 3/8] mm/slab: track capacity per sheaf
  2026-05-15 16:24 [PATCH RFC 0/8] mm/slab: enable runtime sheaves tuning Harry Yoo (Oracle)
  2026-05-15 16:24 ` [PATCH RFC 1/8] mm/slab: do not store cache pointer in struct slab_sheaf Harry Yoo (Oracle)
  2026-05-15 16:24 ` [PATCH RFC 2/8] mm/slab: change sheaf_capacity type to unsigned short Harry Yoo (Oracle)
@ 2026-05-15 16:24 ` Harry Yoo (Oracle)
  2026-05-15 16:24 ` [PATCH RFC 4/8] mm/slab: allow bootstrap_cache_sheaves() to fail Harry Yoo (Oracle)
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Harry Yoo (Oracle) @ 2026-05-15 16:24 UTC (permalink / raw)
  To: Vlastimil Babka, Andrew Morton, Hao Li, Christoph Lameter,
	David Rientjes, Roman Gushchin
  Cc: linux-mm, linux-kernel, Suren Baghdasaryan, Liam R. Howlett

Currently, only prefilled sheaves have a capacity field, used to
record the requested (possibly oversized) capacity. To allow
changing sheaf capacity at runtime, track the capacity for each
sheaf so that checking if a sheaf is full would work even when
changing cache capacity concurrently.

Signed-off-by: Harry Yoo (Oracle) <harry@kernel.org>
---
 mm/slub.c | 31 +++++++++++++++----------------
 1 file changed, 15 insertions(+), 16 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index a1974523bba9..44f36ae32570 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -417,11 +417,9 @@ struct slab_sheaf {
 		struct rcu_head rcu_head;
 		struct list_head barn_list;
 		/* only used for prefilled sheafs */
-		struct {
-			unsigned short capacity;
-			bool pfmemalloc;
-		};
+		bool pfmemalloc;
 	};
+	unsigned short capacity;
 	unsigned short size;
 	int node; /* only used for rcu_sheaf */
 	void *objects[];
@@ -2780,6 +2778,8 @@ static struct slab_sheaf *__alloc_empty_sheaf(struct kmem_cache *s, gfp_t gfp,
 	if (unlikely(!sheaf))
 		return NULL;
 
+	sheaf->capacity = capacity;
+
 	stat(s, SHEAF_ALLOC);
 
 	return sheaf;
@@ -2816,7 +2816,7 @@ refill_objects(struct kmem_cache *s, void **p, gfp_t gfp, unsigned int min,
 static int refill_sheaf(struct kmem_cache *s, struct slab_sheaf *sheaf,
 			 gfp_t gfp)
 {
-	int to_fill = s->sheaf_capacity - sheaf->size;
+	int to_fill = sheaf->capacity - sheaf->size;
 	int filled;
 
 	if (!to_fill)
@@ -5063,7 +5063,6 @@ kmem_cache_prefill_sheaf(struct kmem_cache *s, gfp_t gfp, unsigned short size)
 		sheaf = alloc_empty_sheaf(s, gfp);
 
 	if (sheaf) {
-		sheaf->capacity = s->sheaf_capacity;
 		sheaf->pfmemalloc = false;
 
 		if (sheaf->size < size &&
@@ -5688,13 +5687,13 @@ static void __pcs_install_empty_sheaf(struct kmem_cache *s,
 	 * Unlikely because if the main sheaf had space, we would have just
 	 * freed to it. Get rid of our empty sheaf.
 	 */
-	if (pcs->main->size < s->sheaf_capacity) {
+	if (pcs->main->size < pcs->main->capacity) {
 		barn_put_empty_sheaf(barn, empty);
 		return;
 	}
 
 	/* Also unlikely for the same reason */
-	if (pcs->spare->size < s->sheaf_capacity) {
+	if (pcs->spare->size < pcs->spare->capacity) {
 		swap(pcs->main, pcs->spare);
 		barn_put_empty_sheaf(barn, empty);
 		return;
@@ -5752,7 +5751,7 @@ __pcs_replace_full_main(struct kmem_cache *s, struct slub_percpu_sheaves *pcs,
 		goto alloc_empty;
 	}
 
-	if (pcs->spare->size < s->sheaf_capacity) {
+	if (pcs->spare->size < pcs->spare->capacity) {
 		swap(pcs->main, pcs->spare);
 		return pcs;
 	}
@@ -5819,7 +5818,7 @@ __pcs_replace_full_main(struct kmem_cache *s, struct slub_percpu_sheaves *pcs,
 	 * but in case we got preempted or migrated, we need to
 	 * check again
 	 */
-	if (pcs->main->size == s->sheaf_capacity)
+	if (pcs->main->size == pcs->main->capacity)
 		goto restart;
 
 	return pcs;
@@ -5850,7 +5849,7 @@ bool free_to_pcs(struct kmem_cache *s, void *object, bool allow_spin)
 
 	pcs = this_cpu_ptr(s->cpu_sheaves);
 
-	if (unlikely(pcs->main->size == s->sheaf_capacity)) {
+	if (unlikely(pcs->main->size == pcs->main->capacity)) {
 
 		pcs = __pcs_replace_full_main(s, pcs, allow_spin);
 		if (unlikely(!pcs))
@@ -6015,7 +6014,7 @@ bool __kfree_rcu_sheaf(struct kmem_cache *s, void *obj)
 	 */
 	rcu_sheaf->objects[rcu_sheaf->size++] = obj;
 
-	if (likely(rcu_sheaf->size < s->sheaf_capacity)) {
+	if (likely(rcu_sheaf->size < rcu_sheaf->capacity)) {
 		rcu_sheaf = NULL;
 	} else {
 		pcs->rcu_free = NULL;
@@ -6139,7 +6138,7 @@ static void free_to_pcs_bulk(struct kmem_cache *s, size_t size, void **p)
 
 	pcs = this_cpu_ptr(s->cpu_sheaves);
 
-	if (likely(pcs->main->size < s->sheaf_capacity))
+	if (likely(pcs->main->size < pcs->main->capacity))
 		goto do_free;
 
 	barn = get_barn(s);
@@ -6156,7 +6155,7 @@ static void free_to_pcs_bulk(struct kmem_cache *s, size_t size, void **p)
 		goto do_free;
 	}
 
-	if (pcs->spare->size < s->sheaf_capacity) {
+	if (pcs->spare->size < pcs->spare->capacity) {
 		swap(pcs->main, pcs->spare);
 		goto do_free;
 	}
@@ -6172,7 +6171,7 @@ static void free_to_pcs_bulk(struct kmem_cache *s, size_t size, void **p)
 
 do_free:
 	main = pcs->main;
-	batch = min_t(size_t, size, s->sheaf_capacity - main->size);
+	batch = min_t(size_t, size, main->capacity - main->size);
 
 	memcpy(main->objects + main->size, p, batch * sizeof(void *));
 	main->size += batch;
@@ -7613,7 +7612,7 @@ static int init_percpu_sheaves(struct kmem_cache *s)
 
 		/*
 		 * Bootstrap sheaf has zero size so fast-path allocation fails.
-		 * It has also size == s->sheaf_capacity, so fast-path free
+		 * It has also size == sheaf->capacity, so fast-path free
 		 * fails. In the slow paths we recognize the situation by
 		 * checking s->sheaf_capacity. This allows fast paths to assume
 		 * s->cpu_sheaves and pcs->main always exists and are valid.

-- 
2.43.0



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH RFC 4/8] mm/slab: allow bootstrap_cache_sheaves() to fail
  2026-05-15 16:24 [PATCH RFC 0/8] mm/slab: enable runtime sheaves tuning Harry Yoo (Oracle)
                   ` (2 preceding siblings ...)
  2026-05-15 16:24 ` [PATCH RFC 3/8] mm/slab: track capacity per sheaf Harry Yoo (Oracle)
@ 2026-05-15 16:24 ` Harry Yoo (Oracle)
  2026-05-15 16:24 ` [PATCH RFC 5/8] mm/slab: rework cache_has_sheaves() to check immutable properties only Harry Yoo (Oracle)
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Harry Yoo (Oracle) @ 2026-05-15 16:24 UTC (permalink / raw)
  To: Vlastimil Babka, Andrew Morton, Hao Li, Christoph Lameter,
	David Rientjes, Roman Gushchin
  Cc: linux-mm, linux-kernel, Suren Baghdasaryan, Liam R. Howlett

Panicking on sheaf allocation failure is acceptable during boot, but
to allow changing the sheaf capacity at runtime, the bootstrap path
must be able to propagate errors instead. Return an error code from
bootstrap_cache_sheaves() so callers can decide how to react.

Change it to return an int (0 on success, negative errno on failure),
accept capacity as a parameter, and drop __init. Callers without a
user-specified capacity pass zero to use the default capacity
calculated by the slab allocator. Failures are now handled by the
caller.

Signed-off-by: Harry Yoo (Oracle) <harry@kernel.org>
---
 mm/slub.c | 46 ++++++++++++++++++++++++++--------------------
 1 file changed, 26 insertions(+), 20 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 44f36ae32570..fb98d0da5c78 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -8462,18 +8462,18 @@ static struct kmem_cache * __init bootstrap(struct kmem_cache *static_cache)
  * init_kmem_cache_nodes(). For normal kmalloc caches we have to bootstrap it
  * since sheaves and barns are allocated by kmalloc.
  */
-static void __init bootstrap_cache_sheaves(struct kmem_cache *s)
+static int bootstrap_cache_sheaves(struct kmem_cache *s,
+				   unsigned short capacity)
 {
 	struct kmem_cache_args empty_args = {};
-	unsigned short capacity;
-	bool failed = false;
-	int node, cpu;
+	int node, cpu, err = 0;
 
-	capacity = calculate_sheaf_capacity(s, &empty_args);
+	if (!capacity)
+		capacity = calculate_sheaf_capacity(s, &empty_args);
 
 	/* capacity can be 0 due to debugging or SLUB_TINY */
 	if (!capacity)
-		return;
+		return 0;
 
 	for_each_node_mask(node, slab_barn_nodes) {
 		struct node_barn *barn;
@@ -8481,7 +8481,7 @@ static void __init bootstrap_cache_sheaves(struct kmem_cache *s)
 		barn = kmalloc_node(sizeof(*barn), GFP_KERNEL, node);
 
 		if (!barn) {
-			failed = true;
+			err = -ENOMEM;
 			goto out;
 		}
 
@@ -8497,31 +8497,37 @@ static void __init bootstrap_cache_sheaves(struct kmem_cache *s)
 		pcs->main = __alloc_empty_sheaf(s, GFP_KERNEL, capacity);
 
 		if (!pcs->main) {
-			failed = true;
+			err = -ENOMEM;
 			break;
 		}
 	}
 
 out:
-	/*
-	 * It's still early in boot so treat this like same as a failure to
-	 * create the kmalloc cache in the first place
-	 */
-	if (failed)
-		panic("Out of memory when creating kmem_cache %s\n", s->name);
+	if (!err)
+		s->sheaf_capacity = capacity;
 
-	s->sheaf_capacity = capacity;
+	return err;
 }
 
+#define for_each_normal_kmalloc_cache(s, type, idx) \
+	for (type = KMALLOC_NORMAL; type <= KMALLOC_RANDOM_END; type++)	\
+		for (idx = 0; idx < KMALLOC_SHIFT_HIGH + 1; idx++)	\
+			if ((s = kmalloc_caches[type][idx]))
+
 static void __init bootstrap_kmalloc_sheaves(void)
 {
 	enum kmalloc_cache_type type;
+	struct kmem_cache *s;
+	int idx;
 
-	for (type = KMALLOC_NORMAL; type <= KMALLOC_RANDOM_END; type++) {
-		for (int idx = 0; idx < KMALLOC_SHIFT_HIGH + 1; idx++) {
-			if (kmalloc_caches[type][idx])
-				bootstrap_cache_sheaves(kmalloc_caches[type][idx]);
-		}
+	for_each_normal_kmalloc_cache(s, type, idx) {
+		/*
+		 * It's still early in boot so treat this as a failure to
+		 * create the kmalloc cache in the first place.
+		 */
+		if (bootstrap_cache_sheaves(s, 0))
+			panic("Out of memory when creating kmem_cache %s\n",
+			      s->name);
 	}
 }
 

-- 
2.43.0



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH RFC 5/8] mm/slab: rework cache_has_sheaves() to check immutable properties only
  2026-05-15 16:24 [PATCH RFC 0/8] mm/slab: enable runtime sheaves tuning Harry Yoo (Oracle)
                   ` (3 preceding siblings ...)
  2026-05-15 16:24 ` [PATCH RFC 4/8] mm/slab: allow bootstrap_cache_sheaves() to fail Harry Yoo (Oracle)
@ 2026-05-15 16:24 ` Harry Yoo (Oracle)
  2026-05-15 16:24 ` [PATCH RFC 6/8] mm/slab: allow changing sheaf_capacity at runtime Harry Yoo (Oracle)
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Harry Yoo (Oracle) @ 2026-05-15 16:24 UTC (permalink / raw)
  To: Vlastimil Babka, Andrew Morton, Hao Li, Christoph Lameter,
	David Rientjes, Roman Gushchin
  Cc: linux-mm, linux-kernel, Suren Baghdasaryan, Liam R. Howlett

Currently the sheaf capacity is determined when a cache is created and
never changes, with normal kmalloc caches as the only exception.
Checking whether s->sheaf_capacity is non-zero is therefore
sufficient for cache_has_sheaves() to work correctly.

However, once s->sheaf_capacity becomes mutable at runtime, both the
name and the implementation become confusing and racy: a cache that
currently has sheaves may have them disabled at runtime, or vice versa.

Except for normal kmalloc caches, what callers of cache_has_sheaves()
actually want to know depends only on properties that do not change:

  1. Whether the cache has certain flags (SLAB_NO_OBJ_EXT,
     SLAB_NOLEAKTRACE, SLAB_DEBUG_FLAGS)
  2. Whether a certain build option is enabled (CONFIG_SLUB_TINY)

Since these never change at runtime, check them directly instead of
going through s->sheaf_capacity. To avoid confusion, rename
cache_has_sheaves() to cache_supports_sheaves().

Normal kmalloc caches need special handling. They don't have sheaves
initially and only get them later via bootstrap_kmalloc_sheaves().
That said, cache_supports_sheaves() can return true while a cache's
percpu sheaves still point at the shared bootstrap_sheaf.

This special handling might sound like it applies only to normal
kmalloc caches, but the same handling is needed when sheaf capacity
can change.

The existing callers of cache_has_sheaves() fall into two categories.

The first category performs operations on the whole cache:
kvfree_rcu barrier, cache destruction, sheaf flushing, and CPU/memory
hot(un)plug. These should not skip caches that support sheaves, no
matter whether they actually have sheaves. If such an operation actually
needs to access percpu sheaves, use the new pcs_has_sheaves() helper
to skip CPUs whose pcs->main points to the bootstrap_sheaf.

The second category allocates from or frees to percpu sheaves directly
(in the slowpath). These should confirm pcs_has_sheaves() returns true
before proceeding.

In addition, init_kmem_cache_nodes() skips barn allocation for normal
kmalloc caches. Their barns are set up later by
bootstrap_kmalloc_sheaves().

Change calculate_sheaf_capacity() to call cache_supports_sheaves()
directly instead of open-coding the same conditions.

Signed-off-by: Harry Yoo (Oracle) <harry@kernel.org>
---
 mm/slab.h        | 36 ++++++++++++++++++++++++
 mm/slab_common.c |  2 +-
 mm/slub.c        | 85 ++++++++++++++++++++++++++++++++------------------------
 3 files changed, 86 insertions(+), 37 deletions(-)

diff --git a/mm/slab.h b/mm/slab.h
index dfbe73011cb8..907a8207809c 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -481,6 +481,42 @@ static inline bool kmem_cache_debug_flags(struct kmem_cache *s, slab_flags_t fla
 	return false;
 }
 
+static inline bool kmem_cache_debug(struct kmem_cache *s)
+{
+	return kmem_cache_debug_flags(s, SLAB_DEBUG_FLAGS);
+}
+
+/*
+ * Every cache has !NULL s->cpu_sheaves but they may point to the
+ * bootstrap_sheaf temporarily during init, or permanently for the boot caches
+ * and caches with debugging enabled, or all caches with CONFIG_SLUB_TINY. This
+ * helper distinguishes whether cache supports real non-bootstrap sheaves.
+ *
+ * Return false when the cache does not support sheaves.
+ *
+ * When it returns true, the cache may or may not have sheaves.
+ * Callers who access percpu sheaves must verify that they actually have
+ * sheaves enabled.
+ */
+static inline bool cache_supports_sheaves(struct kmem_cache *s)
+{
+	if (IS_ENABLED(CONFIG_SLUB_TINY))
+		return false;
+
+	if (kmem_cache_debug(s))
+		return false;
+	/*
+	 * Bootstrap caches can't have sheaves for now (SLAB_NO_OBJ_EXT).
+	 * SLAB_NOLEAKTRACE caches (e.g., kmemleak's object_cache) must not
+	 * have sheaves to avoid recursion when sheaf allocation triggers
+	 * kmemleak tracking.
+	 */
+	if (s->flags & (SLAB_NO_OBJ_EXT | SLAB_NOLEAKTRACE))
+		return false;
+
+	return true;
+}
+
 #if IS_ENABLED(CONFIG_SLUB_DEBUG) && IS_ENABLED(CONFIG_KUNIT)
 bool slab_in_kunit_test(void);
 #else
diff --git a/mm/slab_common.c b/mm/slab_common.c
index d5a70a831a2a..3092c1c3f284 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -2109,7 +2109,7 @@ EXPORT_SYMBOL_GPL(kvfree_rcu_barrier);
  */
 void kvfree_rcu_barrier_on_cache(struct kmem_cache *s)
 {
-	if (cache_has_sheaves(s)) {
+	if (cache_supports_sheaves(s)) {
 		flush_rcu_sheaves_on_cache(s);
 		rcu_barrier();
 	}
diff --git a/mm/slub.c b/mm/slub.c
index fb98d0da5c78..c746c9b48728 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -238,11 +238,6 @@ struct slab_obj_iter {
 #endif
 };
 
-static inline bool kmem_cache_debug(struct kmem_cache *s)
-{
-	return kmem_cache_debug_flags(s, SLAB_DEBUG_FLAGS);
-}
-
 void *fixup_red_left(struct kmem_cache *s, void *p)
 {
 	if (kmem_cache_debug_flags(s, SLAB_RED_ZONE))
@@ -432,6 +427,23 @@ struct slub_percpu_sheaves {
 	struct slab_sheaf *rcu_free; /* for batching kfree_rcu() */
 };
 
+static struct slab_sheaf bootstrap_sheaf = {};
+
+static inline bool pcs_has_sheaves_unlocked(struct slub_percpu_sheaves *pcs)
+{
+	/* Test CONFIG_SLUB_TINY for code elimination purposes */
+	if (IS_ENABLED(CONFIG_SLUB_TINY))
+		return false;
+
+	return unlikely(pcs->main != &bootstrap_sheaf);
+}
+
+static inline bool pcs_has_sheaves(struct slub_percpu_sheaves *pcs)
+{
+	lockdep_assert_held(&pcs->lock);
+	return pcs_has_sheaves_unlocked(pcs);
+}
+
 /*
  * The slab lists for all objects.
  */
@@ -3045,8 +3057,7 @@ static void pcs_destroy(struct kmem_cache *s)
 	if (!s->cpu_sheaves)
 		return;
 
-	/* pcs->main can only point to the bootstrap sheaf, nothing to free */
-	if (!cache_has_sheaves(s))
+	if (!cache_supports_sheaves(s))
 		goto free_pcs;
 
 	for_each_possible_cpu(cpu) {
@@ -3058,6 +3069,9 @@ static void pcs_destroy(struct kmem_cache *s)
 		if (!pcs->main)
 			continue;
 
+		if (!pcs_has_sheaves_unlocked(pcs))
+			continue;
+
 		/*
 		 * We have already passed __kmem_cache_shutdown() so everything
 		 * was flushed and there should be no objects allocated from
@@ -3949,7 +3963,7 @@ static bool has_pcs_used(int cpu, struct kmem_cache *s)
 {
 	struct slub_percpu_sheaves *pcs;
 
-	if (!cache_has_sheaves(s))
+	if (!cache_supports_sheaves(s))
 		return false;
 
 	pcs = per_cpu_ptr(s->cpu_sheaves, cpu);
@@ -3971,7 +3985,7 @@ static void flush_cpu_sheaves(struct work_struct *w)
 
 	s = sfw->s;
 
-	if (cache_has_sheaves(s))
+	if (cache_supports_sheaves(s))
 		pcs_flush_all(s);
 }
 
@@ -4074,7 +4088,7 @@ void flush_all_rcu_sheaves(void)
 	mutex_lock(&slab_mutex);
 
 	list_for_each_entry(s, &slab_caches, list) {
-		if (!cache_has_sheaves(s))
+		if (!cache_supports_sheaves(s))
 			continue;
 		flush_rcu_sheaves_on_cache(s);
 	}
@@ -4109,7 +4123,7 @@ static int slub_cpu_setup(unsigned int cpu)
 		/*
 		 * barn might already exist if a previous callback failed midway
 		 */
-		if (!cache_has_sheaves(s) || get_barn_node(s, nid))
+		if (!cache_supports_sheaves(s) || get_barn_node(s, nid))
 			continue;
 
 		barn = kmalloc_node(sizeof(*barn), GFP_KERNEL, nid);
@@ -4140,7 +4154,7 @@ static int slub_cpu_dead(unsigned int cpu)
 
 	mutex_lock(&slab_mutex);
 	list_for_each_entry(s, &slab_caches, list) {
-		if (cache_has_sheaves(s))
+		if (cache_supports_sheaves(s))
 			__pcs_flush_all_cpu(s, cpu);
 	}
 	mutex_unlock(&slab_mutex);
@@ -4612,8 +4626,8 @@ __pcs_replace_empty_main(struct kmem_cache *s, struct slub_percpu_sheaves *pcs,
 
 	lockdep_assert_held(this_cpu_ptr(&s->cpu_sheaves->lock));
 
-	/* Bootstrap or debug cache, back off */
-	if (unlikely(!cache_has_sheaves(s))) {
+	/* Sheaves are not supported or disabled for this cache */
+	if (unlikely(!pcs_has_sheaves(pcs))) {
 		local_unlock(&s->cpu_sheaves->lock);
 		return NULL;
 	}
@@ -4809,7 +4823,7 @@ unsigned int alloc_from_pcs_bulk(struct kmem_cache *s, gfp_t gfp, size_t size,
 		struct slab_sheaf *full;
 		struct node_barn *barn;
 
-		if (unlikely(!cache_has_sheaves(s))) {
+		if (unlikely(!pcs_has_sheaves(pcs))) {
 			local_unlock(&s->cpu_sheaves->lock);
 			return allocated;
 		}
@@ -5727,8 +5741,8 @@ __pcs_replace_full_main(struct kmem_cache *s, struct slub_percpu_sheaves *pcs,
 restart:
 	lockdep_assert_held(this_cpu_ptr(&s->cpu_sheaves->lock));
 
-	/* Bootstrap or debug cache, back off */
-	if (unlikely(!cache_has_sheaves(s))) {
+	/* Sheaves are not supported or disabled for this cache */
+	if (unlikely(!pcs_has_sheaves(pcs))) {
 		local_unlock(&s->cpu_sheaves->lock);
 		return NULL;
 	}
@@ -5959,8 +5973,8 @@ bool __kfree_rcu_sheaf(struct kmem_cache *s, void *obj)
 		struct slab_sheaf *empty;
 		struct node_barn *barn;
 
-		/* Bootstrap or debug cache, fall back */
-		if (unlikely(!cache_has_sheaves(s))) {
+		/* Sheaves are not supported or disabled for this cache */
+		if (unlikely(!pcs_has_sheaves(pcs))) {
 			local_unlock(&s->cpu_sheaves->lock);
 			goto fail;
 		}
@@ -6138,6 +6152,11 @@ static void free_to_pcs_bulk(struct kmem_cache *s, size_t size, void **p)
 
 	pcs = this_cpu_ptr(s->cpu_sheaves);
 
+	if (unlikely(!pcs_has_sheaves(pcs))) {
+		local_unlock(&s->cpu_sheaves->lock);
+		goto fallback;
+	}
+
 	if (likely(pcs->main->size < pcs->main->capacity))
 		goto do_free;
 
@@ -7131,7 +7150,7 @@ void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p)
 	 * freeing to sheaves is so incompatible with the detached freelist so
 	 * once we go that way, we have to do everything differently
 	 */
-	if (s && cache_has_sheaves(s)) {
+	if (s && cache_supports_sheaves(s)) {
 		free_to_pcs_bulk(s, size, p);
 		return;
 	}
@@ -7600,7 +7619,6 @@ static inline int alloc_kmem_cache_stats(struct kmem_cache *s)
 
 static int init_percpu_sheaves(struct kmem_cache *s)
 {
-	static struct slab_sheaf bootstrap_sheaf = {};
 	int cpu;
 
 	for_each_possible_cpu(cpu) {
@@ -7614,7 +7632,7 @@ static int init_percpu_sheaves(struct kmem_cache *s)
 		 * Bootstrap sheaf has zero size so fast-path allocation fails.
 		 * It has also size == sheaf->capacity, so fast-path free
 		 * fails. In the slow paths we recognize the situation by
-		 * checking s->sheaf_capacity. This allows fast paths to assume
+		 * pcs_has_sheaves(). This allows fast paths to assume
 		 * s->cpu_sheaves and pcs->main always exists and are valid.
 		 * It's also safe to share the single static bootstrap_sheaf
 		 * with zero-sized objects array as it's never modified.
@@ -7631,6 +7649,7 @@ static int init_percpu_sheaves(struct kmem_cache *s)
 
 		if (!pcs->main)
 			return -ENOMEM;
+
 	}
 
 	return 0;
@@ -7740,7 +7759,11 @@ static int init_kmem_cache_nodes(struct kmem_cache *s)
 		s->per_node[node].node = n;
 	}
 
-	if (slab_state == DOWN || !cache_has_sheaves(s))
+	if (slab_state == DOWN || !cache_supports_sheaves(s))
+		return 1;
+
+	/* Enable sheaves later to avoid the chicken and egg problem */
+	if (is_kmalloc_normal(s))
 		return 1;
 
 	for_each_node_mask(node, slab_barn_nodes) {
@@ -7765,17 +7788,7 @@ static unsigned short calculate_sheaf_capacity(struct kmem_cache *s,
 	unsigned short capacity;
 	size_t size;
 
-
-	if (IS_ENABLED(CONFIG_SLUB_TINY) || s->flags & SLAB_DEBUG_FLAGS)
-		return 0;
-
-	/*
-	 * Bootstrap caches can't have sheaves for now (SLAB_NO_OBJ_EXT).
-	 * SLAB_NOLEAKTRACE caches (e.g., kmemleak's object_cache) must not
-	 * have sheaves to avoid recursion when sheaf allocation triggers
-	 * kmemleak tracking.
-	 */
-	if (s->flags & (SLAB_NO_OBJ_EXT | SLAB_NOLEAKTRACE))
+	if (!cache_supports_sheaves(s))
 		return 0;
 
 	/*
@@ -8040,7 +8053,7 @@ int __kmem_cache_shutdown(struct kmem_cache *s)
 	flush_all_cpus_locked(s);
 
 	/* we might have rcu sheaves in flight */
-	if (cache_has_sheaves(s))
+	if (cache_supports_sheaves(s))
 		rcu_barrier();
 
 	for_each_node(node) {
@@ -8361,7 +8374,7 @@ static int slab_mem_going_online_callback(int nid)
 		if (get_node(s, nid))
 			continue;
 
-		if (cache_has_sheaves(s) && !get_barn_node(s, nid)) {
+		if (cache_supports_sheaves(s) && !get_barn_node(s, nid)) {
 
 			barn = kmalloc_node(sizeof(*barn), GFP_KERNEL, nid);
 

-- 
2.43.0



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH RFC 6/8] mm/slab: allow changing sheaf_capacity at runtime
  2026-05-15 16:24 [PATCH RFC 0/8] mm/slab: enable runtime sheaves tuning Harry Yoo (Oracle)
                   ` (4 preceding siblings ...)
  2026-05-15 16:24 ` [PATCH RFC 5/8] mm/slab: rework cache_has_sheaves() to check immutable properties only Harry Yoo (Oracle)
@ 2026-05-15 16:24 ` Harry Yoo (Oracle)
  2026-05-15 16:24 ` [PATCH RFC 7/8] mm/slab: add pcs->lock lockdep assert when accessing the barn Harry Yoo (Oracle)
  2026-05-15 16:24 ` [PATCH RFC 8/8] mm/slab: allow changing max_{full,empty}_sheaves at runtime Harry Yoo (Oracle)
  7 siblings, 0 replies; 9+ messages in thread
From: Harry Yoo (Oracle) @ 2026-05-15 16:24 UTC (permalink / raw)
  To: Vlastimil Babka, Andrew Morton, Hao Li, Christoph Lameter,
	David Rientjes, Roman Gushchin
  Cc: linux-mm, linux-kernel, Suren Baghdasaryan, Liam R. Howlett

Make the sheaf_capacity sysfs attribute writable so that the sheaf
capacity can be tuned at runtime per-cache.

The steps to change sheaf capacity:

  1. Disable sheaves: make all online CPUs replace their main sheaves
     with the bootstrap sheaf under local_lock and wait for completion.
     For offline CPUs, update their pcs directly under cpu_hotplug_lock.

  2. Wait for pre-existing RCU callbacks to complete so that RCU sheaves
     are returned to the barn.

  3. Shrink the cache to flush and free all existing sheaves.

  4. Re-enable sheaves with the new capacity by calling
     bootstrap_cache_sheaves(). If this fails, sheaves remain disabled
     for the cache.

Use slab_mutex to serialize sheaf capacity updates.

If sheaves of different capacities can coexist after several updates
to the sheaf_capacity, performance becomes hard to predict. It is
important to allow only a single capacity at any given point in time.

To achieve that, it is required to check whether the sheaf capacity is
stale. However, performing this check without an expensive
synchronization mechanism (SRCU, atomics, etc.) is inevitably racy.

Instead, a copy of sheaf capacity is stored in struct
slub_percpu_sheaves and each CPU has its own copy of the capacity.
With that, we can guarantee that this value remains stable under
local_lock.

If local_lock is acquired while the sheaf capacity update is in
progress, then either:

  1. Sheaves on the CPU have already been disabled (meaning pcs->main
     points at bootstrap_sheaf, falling back to slowpath), or

  2. After releasing local_lock, percpu sheaves will be flushed and
     disabled for the CPU, and all sheaves in the barn will be
     flushed and freed.

It is guaranteed that no sheaf with stale capacity remains
after the process is complete as long as certain rules are followed.

The new rules to avoid sheaves of a stale capacity:

  1. Hold local_lock when getting/putting sheaves from/to the barn.

  2. When allocating new sheaves, check whether pcs->capacity and
     sheaf->capacity match after re-acquiring local_lock.

  3. If local_trylock fails, flush and free the sheaf. This should be
     rare.

Per the above rules, rcu_free_sheaf() now tries to acquire the pcs lock
and check whether the capacities match before putting the sheaf back
to the barn.

Signed-off-by: Harry Yoo (Oracle) <harry@kernel.org>
---
 mm/slub.c | 386 +++++++++++++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 308 insertions(+), 78 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index c746c9b48728..7def24fdfae6 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -425,6 +425,7 @@ struct slub_percpu_sheaves {
 	struct slab_sheaf *main; /* never NULL when unlocked */
 	struct slab_sheaf *spare; /* empty or full, may be NULL */
 	struct slab_sheaf *rcu_free; /* for batching kfree_rcu() */
+	unsigned short capacity;
 };
 
 static struct slab_sheaf bootstrap_sheaf = {};
@@ -492,6 +493,30 @@ static inline struct node_barn *get_barn(struct kmem_cache *s)
  */
 static nodemask_t slab_nodes;
 
+
+static inline
+unsigned short get_pcs_capacity(struct kmem_cache *s,
+				struct slub_percpu_sheaves *pcs)
+{
+	lockdep_assert_held(this_cpu_ptr(&s->cpu_sheaves->lock));
+	return pcs->capacity;
+}
+
+static inline void set_pcs_capacity(struct kmem_cache *s,
+				    struct slub_percpu_sheaves *pcs,
+				    unsigned short capacity)
+{
+	lockdep_assert_held(this_cpu_ptr(&s->cpu_sheaves->lock));
+	pcs->capacity = capacity;
+}
+
+static inline bool pcs_capacity_match(struct kmem_cache *s,
+				      struct slub_percpu_sheaves *pcs,
+				      struct slab_sheaf *sheaf)
+{
+	return get_pcs_capacity(s, pcs) == sheaf->capacity;
+}
+
 /*
  * Similar to slab_nodes but for where we have node_barn allocated.
  * Corresponds to N_ONLINE nodes.
@@ -507,6 +532,11 @@ struct slub_flush_work {
 	struct work_struct work;
 	struct kmem_cache *s;
 	bool skip;
+	/* for flushing sheaves */
+	bool disable_sheaves;
+	/* for enabling sheaves */
+	void **sheaves;
+	unsigned short capacity;
 };
 
 static DEFINE_MUTEX(flush_lock);
@@ -2774,6 +2804,10 @@ static struct slab_sheaf *__alloc_empty_sheaf(struct kmem_cache *s, gfp_t gfp,
 	if (gfp & __GFP_NO_OBJ_EXT)
 		return NULL;
 
+	/* Sheaves have been disabled */
+	if (!capacity)
+		return NULL;
+
 	gfp &= ~OBJCGS_CLEAR_MASK;
 
 	/*
@@ -2800,7 +2834,7 @@ static struct slab_sheaf *__alloc_empty_sheaf(struct kmem_cache *s, gfp_t gfp,
 static inline struct slab_sheaf *alloc_empty_sheaf(struct kmem_cache *s,
 						   gfp_t gfp)
 {
-	return __alloc_empty_sheaf(s, gfp, s->sheaf_capacity);
+	return __alloc_empty_sheaf(s, gfp, data_race(s->sheaf_capacity));
 }
 
 static void free_empty_sheaf(struct kmem_cache *s, struct slab_sheaf *sheaf)
@@ -2999,10 +3033,10 @@ static void rcu_free_sheaf_nobarn(struct rcu_head *head)
  * flushing operations are rare so let's keep it simple and flush to slabs
  * directly, skipping the barn
  */
-static void pcs_flush_all(struct kmem_cache *s)
+static void pcs_flush_all(struct kmem_cache *s, bool disable_sheaves)
 {
 	struct slub_percpu_sheaves *pcs;
-	struct slab_sheaf *spare, *rcu_free;
+	struct slab_sheaf *spare, *rcu_free, *main;
 
 	local_lock(&s->cpu_sheaves->lock);
 	pcs = this_cpu_ptr(s->cpu_sheaves);
@@ -3013,8 +3047,23 @@ static void pcs_flush_all(struct kmem_cache *s)
 	rcu_free = pcs->rcu_free;
 	pcs->rcu_free = NULL;
 
+	if (disable_sheaves && pcs_has_sheaves(pcs)) {
+		main = pcs->main;
+		pcs->main = &bootstrap_sheaf;
+		set_pcs_capacity(s, pcs, 0);
+	} else {
+		main = NULL;
+	}
+
 	local_unlock(&s->cpu_sheaves->lock);
 
+	if (main) {
+		sheaf_flush_unused(s, main);
+		free_empty_sheaf(s, main);
+	} else {
+		sheaf_flush_main(s);
+	}
+
 	if (spare) {
 		sheaf_flush_unused(s, spare);
 		free_empty_sheaf(s, spare);
@@ -3022,8 +3071,6 @@ static void pcs_flush_all(struct kmem_cache *s)
 
 	if (rcu_free)
 		call_rcu(&rcu_free->rcu_head, rcu_free_sheaf_nobarn);
-
-	sheaf_flush_main(s);
 }
 
 static void __pcs_flush_all_cpu(struct kmem_cache *s, unsigned int cpu)
@@ -3986,10 +4033,34 @@ static void flush_cpu_sheaves(struct work_struct *w)
 	s = sfw->s;
 
 	if (cache_supports_sheaves(s))
-		pcs_flush_all(s);
+		pcs_flush_all(s, sfw->disable_sheaves);
+}
+
+static void enable_cpu_sheaves(struct work_struct *w)
+{
+	struct kmem_cache *s;
+	struct slub_flush_work *sfw;
+	struct slab_sheaf *sheaf;
+	struct slub_percpu_sheaves *pcs;
+
+	sfw = container_of(w, struct slub_flush_work, work);
+
+	s = sfw->s;
+
+	local_lock(&s->cpu_sheaves->lock);
+	pcs = this_cpu_ptr(s->cpu_sheaves);
+	sheaf = sfw->sheaves[smp_processor_id()];
+
+	VM_WARN_ON_ONCE(pcs->rcu_free);
+	VM_WARN_ON_ONCE(pcs->spare);
+	VM_WARN_ON_ONCE(pcs->main != &bootstrap_sheaf);
+
+	pcs->main = sheaf;
+	set_pcs_capacity(s, pcs, sfw->capacity);
+	local_unlock(&s->cpu_sheaves->lock);
 }
 
-static void flush_all_cpus_locked(struct kmem_cache *s)
+static void flush_all_cpus_locked(struct kmem_cache *s, bool disable_sheaves)
 {
 	struct slub_flush_work *sfw;
 	unsigned int cpu;
@@ -3997,16 +4068,32 @@ static void flush_all_cpus_locked(struct kmem_cache *s)
 	lockdep_assert_cpus_held();
 	mutex_lock(&flush_lock);
 
-	for_each_online_cpu(cpu) {
+	for_each_possible_cpu(cpu) {
 		sfw = &per_cpu(slub_flush, cpu);
-		if (!has_pcs_used(cpu, s)) {
+
+		if (cpu_online(cpu)) {
+			/* Do not skip empty sheaves when disabling them */
+			if (!disable_sheaves && !has_pcs_used(cpu, s)) {
+				sfw->skip = true;
+				continue;
+			}
+			INIT_WORK(&sfw->work, flush_cpu_sheaves);
+			sfw->skip = false;
+			sfw->s = s;
+			sfw->disable_sheaves = disable_sheaves;
+			queue_work_on(cpu, flushwq, &sfw->work);
+		} else if (disable_sheaves) {
+			struct slub_percpu_sheaves *pcs;
+
 			sfw->skip = true;
-			continue;
+			pcs = per_cpu_ptr(s->cpu_sheaves, cpu);
+			if (pcs->main != &bootstrap_sheaf) {
+				sheaf_flush_unused(s, pcs->main);
+				free_empty_sheaf(s, pcs->main);
+				pcs->main = &bootstrap_sheaf;
+				pcs->capacity = 0;
+			}
 		}
-		INIT_WORK(&sfw->work, flush_cpu_sheaves);
-		sfw->skip = false;
-		sfw->s = s;
-		queue_work_on(cpu, flushwq, &sfw->work);
 	}
 
 	for_each_online_cpu(cpu) {
@@ -4019,10 +4106,10 @@ static void flush_all_cpus_locked(struct kmem_cache *s)
 	mutex_unlock(&flush_lock);
 }
 
-static void flush_all(struct kmem_cache *s)
+static void flush_all(struct kmem_cache *s, bool disable_sheaves)
 {
 	cpus_read_lock();
-	flush_all_cpus_locked(s);
+	flush_all_cpus_locked(s, disable_sheaves);
 	cpus_read_unlock();
 }
 
@@ -4690,10 +4777,21 @@ __pcs_replace_empty_main(struct kmem_cache *s, struct slub_percpu_sheaves *pcs,
 	full = empty;
 	empty = NULL;
 
-	if (!local_trylock(&s->cpu_sheaves->lock))
-		goto barn_put;
+	if (!local_trylock(&s->cpu_sheaves->lock)) {
+		sheaf_flush_unused(s, full);
+		free_empty_sheaf(s, full);
+		return NULL;
+	}
+
 	pcs = this_cpu_ptr(s->cpu_sheaves);
 
+	if (unlikely(!pcs_capacity_match(s, pcs, full))) {
+		local_unlock(&s->cpu_sheaves->lock);
+		sheaf_flush_unused(s, full);
+		free_empty_sheaf(s, full);
+		return NULL;
+	}
+
 	/*
 	 * If we put any empty or full sheaf to the barn below, it's due to
 	 * racing or being migrated to a different cpu. Breaching the barn's
@@ -4721,7 +4819,6 @@ __pcs_replace_empty_main(struct kmem_cache *s, struct slub_percpu_sheaves *pcs,
 		return pcs;
 	}
 
-barn_put:
 	barn_put_full_sheaf(barn, full);
 	stat(s, BARN_PUT);
 
@@ -5027,29 +5124,8 @@ kmem_cache_prefill_sheaf(struct kmem_cache *s, gfp_t gfp, unsigned short size)
 	if (unlikely(!size))
 		return NULL;
 
-	if (unlikely(size > s->sheaf_capacity)) {
-
-		sheaf = kzalloc_flex(*sheaf, objects, size, gfp);
-		if (!sheaf)
-			return NULL;
-
-		stat(s, SHEAF_PREFILL_OVERSIZE);
-		sheaf->capacity = size;
-
-		/*
-		 * we do not need to care about pfmemalloc here because oversize
-		 * sheaves area always flushed and freed when returned
-		 */
-		if (!__kmem_cache_alloc_bulk(s, gfp, size,
-					     &sheaf->objects[0])) {
-			kfree(sheaf);
-			return NULL;
-		}
-
-		sheaf->size = size;
-
-		return sheaf;
-	}
+	if (unlikely(size > data_race(s->sheaf_capacity)))
+		goto oversized;
 
 	local_lock(&s->cpu_sheaves->lock);
 	pcs = this_cpu_ptr(s->cpu_sheaves);
@@ -5072,11 +5148,16 @@ kmem_cache_prefill_sheaf(struct kmem_cache *s, gfp_t gfp, unsigned short size)
 
 	local_unlock(&s->cpu_sheaves->lock);
 
-
 	if (!sheaf)
 		sheaf = alloc_empty_sheaf(s, gfp);
 
 	if (sheaf) {
+		if (size > sheaf->capacity) {
+			sheaf_flush_unused(s, sheaf);
+			free_empty_sheaf(s, sheaf);
+			sheaf = NULL;
+			goto oversized;
+		}
 		sheaf->pfmemalloc = false;
 
 		if (sheaf->size < size &&
@@ -5087,6 +5168,28 @@ kmem_cache_prefill_sheaf(struct kmem_cache *s, gfp_t gfp, unsigned short size)
 		}
 	}
 
+	return sheaf;
+
+oversized:
+	sheaf = kzalloc_flex(*sheaf, objects, size, gfp);
+	if (!sheaf)
+		return NULL;
+
+	stat(s, SHEAF_PREFILL_OVERSIZE);
+	sheaf->capacity = size;
+
+	/*
+	 * we do not need to care about pfmemalloc here because oversize
+	 * sheaves area always flushed and freed when returned
+	 */
+	if (!__kmem_cache_alloc_bulk(s, gfp, size,
+				     &sheaf->objects[0])) {
+		kfree(sheaf);
+		return NULL;
+	}
+
+	sheaf->size = size;
+
 	return sheaf;
 }
 
@@ -5106,27 +5209,25 @@ void kmem_cache_return_sheaf(struct kmem_cache *s, gfp_t gfp,
 	struct slub_percpu_sheaves *pcs;
 	struct node_barn *barn;
 
-	if (unlikely((sheaf->capacity != s->sheaf_capacity)
-		     || sheaf->pfmemalloc)) {
-		sheaf_flush_unused(s, sheaf);
-		kfree(sheaf);
-		return;
-	}
+	if (unlikely((sheaf->capacity != data_race(s->sheaf_capacity))
+		     || sheaf->pfmemalloc))
+		goto free_sheaf;
 
 	local_lock(&s->cpu_sheaves->lock);
 	pcs = this_cpu_ptr(s->cpu_sheaves);
 	barn = get_barn(s);
 
+	if (!pcs_capacity_match(s, pcs, sheaf)) {
+		local_unlock(&s->cpu_sheaves->lock);
+		goto free_sheaf;
+	}
+
 	if (!pcs->spare) {
 		pcs->spare = sheaf;
-		sheaf = NULL;
 		stat(s, SHEAF_RETURN_FAST);
-	}
-
-	local_unlock(&s->cpu_sheaves->lock);
-
-	if (!sheaf)
+		local_unlock(&s->cpu_sheaves->lock);
 		return;
+	}
 
 	stat(s, SHEAF_RETURN_SLOW);
 
@@ -5134,15 +5235,32 @@ void kmem_cache_return_sheaf(struct kmem_cache *s, gfp_t gfp,
 	 * If the barn has too many full sheaves or we fail to refill the sheaf,
 	 * simply flush and free it.
 	 */
-	if (!barn || data_race(barn->nr_full) >= MAX_FULL_SHEAVES ||
-	    refill_sheaf(s, sheaf, gfp)) {
-		sheaf_flush_unused(s, sheaf);
-		free_empty_sheaf(s, sheaf);
-		return;
+	if (!barn || data_race(barn->nr_full) >= MAX_FULL_SHEAVES) {
+		local_unlock(&s->cpu_sheaves->lock);
+		goto free_sheaf;
+	}
+
+	local_unlock(&s->cpu_sheaves->lock);
+
+	if (refill_sheaf(s, sheaf, gfp))
+		goto free_sheaf;
+
+	local_lock(&s->cpu_sheaves->lock);
+	pcs = this_cpu_ptr(s->cpu_sheaves);
+
+	if (!pcs_capacity_match(s, pcs, sheaf)) {
+		local_unlock(&s->cpu_sheaves->lock);
+		goto free_sheaf;
 	}
 
 	barn_put_full_sheaf(barn, sheaf);
+	local_unlock(&s->cpu_sheaves->lock);
 	stat(s, BARN_PUT);
+	return;
+
+free_sheaf:
+	sheaf_flush_unused(s, sheaf);
+	free_empty_sheaf(s, sheaf);
 }
 
 /*
@@ -5839,11 +5957,18 @@ __pcs_replace_full_main(struct kmem_cache *s, struct slub_percpu_sheaves *pcs,
 
 got_empty:
 	if (!local_trylock(&s->cpu_sheaves->lock)) {
-		barn_put_empty_sheaf(barn, empty);
+		free_empty_sheaf(s, empty);
 		return NULL;
 	}
 
 	pcs = this_cpu_ptr(s->cpu_sheaves);
+
+	if (unlikely(!pcs_capacity_match(s, pcs, empty))) {
+		local_unlock(&s->cpu_sheaves->lock);
+		free_empty_sheaf(s, empty);
+		return NULL;
+	}
+
 	__pcs_install_empty_sheaf(s, pcs, empty, barn);
 
 	return pcs;
@@ -5884,6 +6009,7 @@ static void rcu_free_sheaf(struct rcu_head *head)
 	struct slab_sheaf *sheaf;
 	struct node_barn *barn = NULL;
 	struct kmem_cache *s;
+	struct slub_percpu_sheaves *pcs;
 
 	sheaf = container_of(head, struct slab_sheaf, rcu_head);
 	if (WARN_ON_ONCE(!sheaf->size)) {
@@ -5905,11 +6031,20 @@ static void rcu_free_sheaf(struct rcu_head *head)
 	 * slab so simply flush everything.
 	 */
 	if (__rcu_free_sheaf_prepare(s, sheaf))
-		goto flush;
+		goto flush_unlocked;
 
 	barn = get_barn_node(s, sheaf->node);
 	if (!barn)
-		goto flush;
+		goto flush_unlocked;
+
+	if (!local_trylock(&s->cpu_sheaves->lock))
+		goto flush_unlocked;
+
+	pcs = this_cpu_ptr(s->cpu_sheaves);
+	if (!pcs_capacity_match(s, pcs, sheaf)) {
+		local_unlock(&s->cpu_sheaves->lock);
+		goto flush_unlocked;
+	}
 
 	/* due to slab_free_hook() */
 	if (unlikely(sheaf->size == 0))
@@ -5924,19 +6059,27 @@ static void rcu_free_sheaf(struct rcu_head *head)
 	if (data_race(barn->nr_full) < MAX_FULL_SHEAVES) {
 		stat(s, BARN_PUT);
 		barn_put_full_sheaf(barn, sheaf);
+		local_unlock(&s->cpu_sheaves->lock);
 		return;
 	}
 
-flush:
 	stat(s, BARN_PUT_FAIL);
 	sheaf_flush_unused(s, sheaf);
 
 empty:
 	if (barn && data_race(barn->nr_empty) < MAX_EMPTY_SHEAVES) {
 		barn_put_empty_sheaf(barn, sheaf);
+		local_unlock(&s->cpu_sheaves->lock);
 		return;
 	}
 
+	local_unlock(&s->cpu_sheaves->lock);
+	free_empty_sheaf(s, sheaf);
+	return;
+
+flush_unlocked:
+	stat(s, BARN_PUT_FAIL);
+	sheaf_flush_unused(s, sheaf);
 	free_empty_sheaf(s, sheaf);
 }
 
@@ -6006,12 +6149,18 @@ bool __kfree_rcu_sheaf(struct kmem_cache *s, void *obj)
 			goto fail;
 
 		if (!local_trylock(&s->cpu_sheaves->lock)) {
-			barn_put_empty_sheaf(barn, empty);
+			free_empty_sheaf(s, empty);
 			goto fail;
 		}
 
 		pcs = this_cpu_ptr(s->cpu_sheaves);
 
+		if (unlikely(!pcs_capacity_match(s, pcs, empty))) {
+			local_unlock(&s->cpu_sheaves->lock);
+			free_empty_sheaf(s, empty);
+			goto fail;
+		}
+
 		if (unlikely(pcs->rcu_free))
 			barn_put_empty_sheaf(barn, empty);
 		else
@@ -7650,6 +7799,7 @@ static int init_percpu_sheaves(struct kmem_cache *s)
 		if (!pcs->main)
 			return -ENOMEM;
 
+		pcs->capacity = s->sheaf_capacity;
 	}
 
 	return 0;
@@ -8050,7 +8200,7 @@ int __kmem_cache_shutdown(struct kmem_cache *s)
 	int node;
 	struct kmem_cache_node *n;
 
-	flush_all_cpus_locked(s);
+	flush_all_cpus_locked(s, /* disable_sheaves = */false);
 
 	/* we might have rcu sheaves in flight */
 	if (cache_supports_sheaves(s))
@@ -8334,7 +8484,7 @@ static int __kmem_cache_do_shrink(struct kmem_cache *s)
 
 int __kmem_cache_shrink(struct kmem_cache *s)
 {
-	flush_all(s);
+	flush_all(s, /* disable_sheaves = */false);
 	return __kmem_cache_do_shrink(s);
 }
 
@@ -8344,7 +8494,7 @@ static int slab_mem_going_offline_callback(void)
 
 	mutex_lock(&slab_mutex);
 	list_for_each_entry(s, &slab_caches, list) {
-		flush_all_cpus_locked(s);
+		flush_all_cpus_locked(s, /* disable_sheaves = */false);
 		__kmem_cache_do_shrink(s);
 	}
 	mutex_unlock(&slab_mutex);
@@ -8479,7 +8629,11 @@ static int bootstrap_cache_sheaves(struct kmem_cache *s,
 				   unsigned short capacity)
 {
 	struct kmem_cache_args empty_args = {};
+	struct slub_flush_work *sfw;
 	int node, cpu, err = 0;
+	void **sheaves = NULL;
+
+	lockdep_assert_cpus_held();
 
 	if (!capacity)
 		capacity = calculate_sheaf_capacity(s, &empty_args);
@@ -8491,6 +8645,9 @@ static int bootstrap_cache_sheaves(struct kmem_cache *s,
 	for_each_node_mask(node, slab_barn_nodes) {
 		struct node_barn *barn;
 
+		if (s->per_node[node].barn)
+			continue;
+
 		barn = kmalloc_node(sizeof(*barn), GFP_KERNEL, node);
 
 		if (!barn) {
@@ -8502,23 +8659,62 @@ static int bootstrap_cache_sheaves(struct kmem_cache *s,
 		s->per_node[node].barn = barn;
 	}
 
+	sheaves = kmalloc_array(nr_cpu_ids, sizeof(*sheaves),
+				GFP_KERNEL | __GFP_ZERO);
+	if (!sheaves) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	/* Do not queue the work if any of the allocations fails */
+	for_each_possible_cpu(cpu) {
+		sheaves[cpu] = __alloc_empty_sheaf(s, GFP_KERNEL, capacity);
+
+		if (!sheaves[cpu]) {
+			err = -ENOMEM;
+			goto out_free_sheaves;
+		}
+	}
+
+	mutex_lock(&flush_lock);
 	for_each_possible_cpu(cpu) {
 		struct slub_percpu_sheaves *pcs;
 
 		pcs = per_cpu_ptr(s->cpu_sheaves, cpu);
+		sfw = &per_cpu(slub_flush, cpu);
 
-		pcs->main = __alloc_empty_sheaf(s, GFP_KERNEL, capacity);
-
-		if (!pcs->main) {
-			err = -ENOMEM;
-			break;
+		if (!cpu_online(cpu) || slab_state == UP) {
+			pcs->main = sheaves[cpu];
+			pcs->capacity = capacity;
+			sfw->skip = true;
+		} else {
+			INIT_WORK(&sfw->work, enable_cpu_sheaves);
+			sfw->s = s;
+			sfw->sheaves = sheaves;
+			sfw->capacity = capacity;
+			sfw->skip = false;
+			queue_work_on(cpu, flushwq, &sfw->work);
 		}
 	}
 
-out:
-	if (!err)
-		s->sheaf_capacity = capacity;
+	for_each_online_cpu(cpu) {
+		sfw = &per_cpu(slub_flush, cpu);
+		if (sfw->skip)
+			continue;
+		flush_work(&sfw->work);
+	}
+	mutex_unlock(&flush_lock);
 
+	kfree(sheaves);
+	s->sheaf_capacity = capacity;
+	return 0;
+
+out_free_sheaves:
+	for_each_possible_cpu(cpu)
+		if (sheaves[cpu])
+			free_empty_sheaf(s, sheaves[cpu]);
+	kfree(sheaves);
+out:
 	return err;
 }
 
@@ -8533,6 +8729,7 @@ static void __init bootstrap_kmalloc_sheaves(void)
 	struct kmem_cache *s;
 	int idx;
 
+	cpus_read_lock();
 	for_each_normal_kmalloc_cache(s, type, idx) {
 		/*
 		 * It's still early in boot so treat this as a failure to
@@ -8542,6 +8739,7 @@ static void __init bootstrap_kmalloc_sheaves(void)
 			panic("Out of memory when creating kmem_cache %s\n",
 			      s->name);
 	}
+	cpus_read_unlock();
 }
 
 void __init kmem_cache_init(void)
@@ -8799,7 +8997,7 @@ long validate_slab_cache(struct kmem_cache *s)
 	if (!obj_map)
 		return -ENOMEM;
 
-	flush_all(s);
+	flush_all(s, /* disable_sheaves = */false);
 	for_each_kmem_cache_node(s, node, n)
 		count += validate_slab_node(s, n, obj_map);
 
@@ -9111,7 +9309,39 @@ static ssize_t sheaf_capacity_show(struct kmem_cache *s, char *buf)
 {
 	return sysfs_emit(buf, "%hu\n", s->sheaf_capacity);
 }
-SLAB_ATTR_RO(sheaf_capacity);
+static ssize_t sheaf_capacity_store(struct kmem_cache *s,
+				    const char *buf, size_t length)
+{
+	unsigned short capacity;
+	int err;
+
+	err = kstrtou16(buf, 10, &capacity);
+	if (err)
+		return err;
+
+	if (!cache_supports_sheaves(s))
+		return -EOPNOTSUPP;
+
+	cpus_read_lock();
+	mutex_lock(&slab_mutex);
+	flush_all_cpus_locked(s, /* disable_sheaves = */true);
+	rcu_barrier();
+	__kmem_cache_do_shrink(s);
+	s->sheaf_capacity = 0;
+	if (capacity) {
+		err = bootstrap_cache_sheaves(s, capacity);
+		if (err) {
+			mutex_unlock(&slab_mutex);
+			cpus_read_unlock();
+			return err;
+		}
+	}
+	mutex_unlock(&slab_mutex);
+	cpus_read_unlock();
+
+	return length;
+}
+SLAB_ATTR(sheaf_capacity);
 
 static ssize_t min_partial_show(struct kmem_cache *s, char *buf)
 {

-- 
2.43.0



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH RFC 7/8] mm/slab: add pcs->lock lockdep assert when accessing the barn
  2026-05-15 16:24 [PATCH RFC 0/8] mm/slab: enable runtime sheaves tuning Harry Yoo (Oracle)
                   ` (5 preceding siblings ...)
  2026-05-15 16:24 ` [PATCH RFC 6/8] mm/slab: allow changing sheaf_capacity at runtime Harry Yoo (Oracle)
@ 2026-05-15 16:24 ` Harry Yoo (Oracle)
  2026-05-15 16:24 ` [PATCH RFC 8/8] mm/slab: allow changing max_{full,empty}_sheaves at runtime Harry Yoo (Oracle)
  7 siblings, 0 replies; 9+ messages in thread
From: Harry Yoo (Oracle) @ 2026-05-15 16:24 UTC (permalink / raw)
  To: Vlastimil Babka, Andrew Morton, Hao Li, Christoph Lameter,
	David Rientjes, Roman Gushchin
  Cc: linux-mm, linux-kernel, Suren Baghdasaryan, Liam R. Howlett

If the cache's capacity changes while a CPU is getting/putting
a sheaf from/to the barn, the writer performing the capacity change
is responsible for flushing and freeing those stale sheaves.
However, that can be done only if CPUs hold pcs->lock when accessing
the barn.

Add lockdep_assert_held() on the pcs lock whenever moving a sheaf
to/from the barn. Since struct slab_sheaf no longer has the
cache pointer, add a new parameter for the cache pointer.

When lockdep is disabled, the assert is a no-op and the compiler can
optimize away the unused parameter (since these helpers are static).

Signed-off-by: Harry Yoo (Oracle) <harry@kernel.org>
---
 mm/slub.c | 70 +++++++++++++++++++++++++++++++++++++++------------------------
 1 file changed, 43 insertions(+), 27 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 7def24fdfae6..856639d3d3f0 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3142,12 +3142,15 @@ static void pcs_destroy(struct kmem_cache *s)
 	s->cpu_sheaves = NULL;
 }
 
-static struct slab_sheaf *barn_get_empty_sheaf(struct node_barn *barn,
+static struct slab_sheaf *barn_get_empty_sheaf(struct kmem_cache *s,
+					       struct node_barn *barn,
 					       bool allow_spin)
 {
 	struct slab_sheaf *empty = NULL;
 	unsigned long flags;
 
+	lockdep_assert_held(this_cpu_ptr(&s->cpu_sheaves->lock));
+
 	if (!data_race(barn->nr_empty))
 		return NULL;
 
@@ -3174,10 +3177,13 @@ static struct slab_sheaf *barn_get_empty_sheaf(struct node_barn *barn,
  * empty or full sheaf limits for simplicity.
  */
 
-static void barn_put_empty_sheaf(struct node_barn *barn, struct slab_sheaf *sheaf)
+static void barn_put_empty_sheaf(struct kmem_cache *s, struct node_barn *barn,
+				 struct slab_sheaf *sheaf)
 {
 	unsigned long flags;
 
+	lockdep_assert_held(this_cpu_ptr(&s->cpu_sheaves->lock));
+
 	spin_lock_irqsave(&barn->lock, flags);
 
 	list_add(&sheaf->barn_list, &barn->sheaves_empty);
@@ -3186,10 +3192,13 @@ static void barn_put_empty_sheaf(struct node_barn *barn, struct slab_sheaf *shea
 	spin_unlock_irqrestore(&barn->lock, flags);
 }
 
-static void barn_put_full_sheaf(struct node_barn *barn, struct slab_sheaf *sheaf)
+static void barn_put_full_sheaf(struct kmem_cache *s, struct node_barn *barn,
+				struct slab_sheaf *sheaf)
 {
 	unsigned long flags;
 
+	lockdep_assert_held(this_cpu_ptr(&s->cpu_sheaves->lock));
+
 	spin_lock_irqsave(&barn->lock, flags);
 
 	list_add(&sheaf->barn_list, &barn->sheaves_full);
@@ -3198,11 +3207,14 @@ static void barn_put_full_sheaf(struct node_barn *barn, struct slab_sheaf *sheaf
 	spin_unlock_irqrestore(&barn->lock, flags);
 }
 
-static struct slab_sheaf *barn_get_full_or_empty_sheaf(struct node_barn *barn)
+static struct slab_sheaf *barn_get_full_or_empty_sheaf(struct kmem_cache *s,
+						       struct node_barn *barn)
 {
 	struct slab_sheaf *sheaf = NULL;
 	unsigned long flags;
 
+	lockdep_assert_held(this_cpu_ptr(&s->cpu_sheaves->lock));
+
 	if (!data_race(barn->nr_full) && !data_race(barn->nr_empty))
 		return NULL;
 
@@ -3231,12 +3243,14 @@ static struct slab_sheaf *barn_get_full_or_empty_sheaf(struct node_barn *barn)
  * change.
  */
 static struct slab_sheaf *
-barn_replace_empty_sheaf(struct node_barn *barn, struct slab_sheaf *empty,
-			 bool allow_spin)
+barn_replace_empty_sheaf(struct kmem_cache *s, struct node_barn *barn,
+			 struct slab_sheaf *empty, bool allow_spin)
 {
 	struct slab_sheaf *full = NULL;
 	unsigned long flags;
 
+	lockdep_assert_held(this_cpu_ptr(&s->cpu_sheaves->lock));
+
 	if (!data_race(barn->nr_full))
 		return NULL;
 
@@ -3264,12 +3278,14 @@ barn_replace_empty_sheaf(struct node_barn *barn, struct slab_sheaf *empty,
  * barn. But if there are too many full sheaves, reject this with -E2BIG.
  */
 static struct slab_sheaf *
-barn_replace_full_sheaf(struct node_barn *barn, struct slab_sheaf *full,
-			bool allow_spin)
+barn_replace_full_sheaf(struct kmem_cache *s, struct node_barn *barn,
+			struct slab_sheaf *full, bool allow_spin)
 {
 	struct slab_sheaf *empty;
 	unsigned long flags;
 
+	lockdep_assert_held(this_cpu_ptr(&s->cpu_sheaves->lock));
+
 	/* we don't repeat this check under barn->lock as it's not critical */
 	if (data_race(barn->nr_full) >= MAX_FULL_SHEAVES)
 		return ERR_PTR(-E2BIG);
@@ -4732,7 +4748,7 @@ __pcs_replace_empty_main(struct kmem_cache *s, struct slub_percpu_sheaves *pcs,
 
 	allow_spin = gfpflags_allow_spinning(gfp);
 
-	full = barn_replace_empty_sheaf(barn, pcs->main, allow_spin);
+	full = barn_replace_empty_sheaf(s, barn, pcs->main, allow_spin);
 
 	if (full) {
 		stat(s, BARN_GET);
@@ -4747,7 +4763,7 @@ __pcs_replace_empty_main(struct kmem_cache *s, struct slub_percpu_sheaves *pcs,
 			empty = pcs->spare;
 			pcs->spare = NULL;
 		} else {
-			empty = barn_get_empty_sheaf(barn, true);
+			empty = barn_get_empty_sheaf(s, barn, true);
 		}
 	}
 
@@ -4803,7 +4819,7 @@ __pcs_replace_empty_main(struct kmem_cache *s, struct slub_percpu_sheaves *pcs,
 		if (!pcs->spare)
 			pcs->spare = pcs->main;
 		else
-			barn_put_empty_sheaf(barn, pcs->main);
+			barn_put_empty_sheaf(s, barn, pcs->main);
 		pcs->main = full;
 		return pcs;
 	}
@@ -4814,12 +4830,12 @@ __pcs_replace_empty_main(struct kmem_cache *s, struct slub_percpu_sheaves *pcs,
 	}
 
 	if (pcs->spare->size == 0) {
-		barn_put_empty_sheaf(barn, pcs->spare);
+		barn_put_empty_sheaf(s, barn, pcs->spare);
 		pcs->spare = full;
 		return pcs;
 	}
 
-	barn_put_full_sheaf(barn, full);
+	barn_put_full_sheaf(s, barn, full);
 	stat(s, BARN_PUT);
 
 	return pcs;
@@ -4936,7 +4952,7 @@ unsigned int alloc_from_pcs_bulk(struct kmem_cache *s, gfp_t gfp, size_t size,
 			return allocated;
 		}
 
-		full = barn_replace_empty_sheaf(barn, pcs->main,
+		full = barn_replace_empty_sheaf(s, barn, pcs->main,
 						gfpflags_allow_spinning(gfp));
 
 		if (full) {
@@ -5139,7 +5155,7 @@ kmem_cache_prefill_sheaf(struct kmem_cache *s, gfp_t gfp, unsigned short size)
 
 		stat(s, SHEAF_PREFILL_SLOW);
 		if (barn)
-			sheaf = barn_get_full_or_empty_sheaf(barn);
+			sheaf = barn_get_full_or_empty_sheaf(s, barn);
 		if (sheaf && sheaf->size)
 			stat(s, BARN_GET);
 		else
@@ -5253,7 +5269,7 @@ void kmem_cache_return_sheaf(struct kmem_cache *s, gfp_t gfp,
 		goto free_sheaf;
 	}
 
-	barn_put_full_sheaf(barn, sheaf);
+	barn_put_full_sheaf(s, barn, sheaf);
 	local_unlock(&s->cpu_sheaves->lock);
 	stat(s, BARN_PUT);
 	return;
@@ -5820,14 +5836,14 @@ static void __pcs_install_empty_sheaf(struct kmem_cache *s,
 	 * freed to it. Get rid of our empty sheaf.
 	 */
 	if (pcs->main->size < pcs->main->capacity) {
-		barn_put_empty_sheaf(barn, empty);
+		barn_put_empty_sheaf(s, barn, empty);
 		return;
 	}
 
 	/* Also unlikely for the same reason */
 	if (pcs->spare->size < pcs->spare->capacity) {
 		swap(pcs->main, pcs->spare);
-		barn_put_empty_sheaf(barn, empty);
+		barn_put_empty_sheaf(s, barn, empty);
 		return;
 	}
 
@@ -5835,7 +5851,7 @@ static void __pcs_install_empty_sheaf(struct kmem_cache *s,
 	 * We probably failed barn_replace_full_sheaf() due to no empty sheaf
 	 * available there, but we allocated one, so finish the job.
 	 */
-	barn_put_full_sheaf(barn, pcs->main);
+	barn_put_full_sheaf(s, barn, pcs->main);
 	stat(s, BARN_PUT);
 	pcs->main = empty;
 }
@@ -5874,7 +5890,7 @@ __pcs_replace_full_main(struct kmem_cache *s, struct slub_percpu_sheaves *pcs,
 	put_fail = false;
 
 	if (!pcs->spare) {
-		empty = barn_get_empty_sheaf(barn, allow_spin);
+		empty = barn_get_empty_sheaf(s, barn, allow_spin);
 		if (empty) {
 			pcs->spare = pcs->main;
 			pcs->main = empty;
@@ -5888,7 +5904,7 @@ __pcs_replace_full_main(struct kmem_cache *s, struct slub_percpu_sheaves *pcs,
 		return pcs;
 	}
 
-	empty = barn_replace_full_sheaf(barn, pcs->main, allow_spin);
+	empty = barn_replace_full_sheaf(s, barn, pcs->main, allow_spin);
 
 	if (!IS_ERR(empty)) {
 		stat(s, BARN_PUT);
@@ -6058,7 +6074,7 @@ static void rcu_free_sheaf(struct rcu_head *head)
 
 	if (data_race(barn->nr_full) < MAX_FULL_SHEAVES) {
 		stat(s, BARN_PUT);
-		barn_put_full_sheaf(barn, sheaf);
+		barn_put_full_sheaf(s, barn, sheaf);
 		local_unlock(&s->cpu_sheaves->lock);
 		return;
 	}
@@ -6068,7 +6084,7 @@ static void rcu_free_sheaf(struct rcu_head *head)
 
 empty:
 	if (barn && data_race(barn->nr_empty) < MAX_EMPTY_SHEAVES) {
-		barn_put_empty_sheaf(barn, sheaf);
+		barn_put_empty_sheaf(s, barn, sheaf);
 		local_unlock(&s->cpu_sheaves->lock);
 		return;
 	}
@@ -6134,7 +6150,7 @@ bool __kfree_rcu_sheaf(struct kmem_cache *s, void *obj)
 			goto fail;
 		}
 
-		empty = barn_get_empty_sheaf(barn, true);
+		empty = barn_get_empty_sheaf(s, barn, true);
 
 		if (empty) {
 			pcs->rcu_free = empty;
@@ -6162,7 +6178,7 @@ bool __kfree_rcu_sheaf(struct kmem_cache *s, void *obj)
 		}
 
 		if (unlikely(pcs->rcu_free))
-			barn_put_empty_sheaf(barn, empty);
+			barn_put_empty_sheaf(s, barn, empty);
 		else
 			pcs->rcu_free = empty;
 	}
@@ -6314,7 +6330,7 @@ static void free_to_pcs_bulk(struct kmem_cache *s, size_t size, void **p)
 		goto no_empty;
 
 	if (!pcs->spare) {
-		empty = barn_get_empty_sheaf(barn, true);
+		empty = barn_get_empty_sheaf(s, barn, true);
 		if (!empty)
 			goto no_empty;
 
@@ -6328,7 +6344,7 @@ static void free_to_pcs_bulk(struct kmem_cache *s, size_t size, void **p)
 		goto do_free;
 	}
 
-	empty = barn_replace_full_sheaf(barn, pcs->main, true);
+	empty = barn_replace_full_sheaf(s, barn, pcs->main, true);
 	if (IS_ERR(empty)) {
 		stat(s, BARN_PUT_FAIL);
 		goto no_empty;

-- 
2.43.0



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH RFC 8/8] mm/slab: allow changing max_{full,empty}_sheaves at runtime
  2026-05-15 16:24 [PATCH RFC 0/8] mm/slab: enable runtime sheaves tuning Harry Yoo (Oracle)
                   ` (6 preceding siblings ...)
  2026-05-15 16:24 ` [PATCH RFC 7/8] mm/slab: add pcs->lock lockdep assert when accessing the barn Harry Yoo (Oracle)
@ 2026-05-15 16:24 ` Harry Yoo (Oracle)
  7 siblings, 0 replies; 9+ messages in thread
From: Harry Yoo (Oracle) @ 2026-05-15 16:24 UTC (permalink / raw)
  To: Vlastimil Babka, Andrew Morton, Hao Li, Christoph Lameter,
	David Rientjes, Roman Gushchin
  Cc: linux-mm, linux-kernel, Suren Baghdasaryan, Liam R. Howlett

Replace MAX_FULL_SHEAVES and MAX_EMPTY_SHEAVES with per-cache tunables,
and expose them via sysfs attributes as max_{full,empty}_sheaves.
Keep the default value 10 to preserve the existing behavior.

Let us measure the impact of this parameter and discuss whether
it is actually needed before landing this in mainline.

Signed-off-by: Harry Yoo (Oracle) <harry@kernel.org>
---
 mm/slab.h |  2 ++
 mm/slub.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++++-------
 2 files changed, 50 insertions(+), 7 deletions(-)

diff --git a/mm/slab.h b/mm/slab.h
index 907a8207809c..22df364a2ef7 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -205,6 +205,8 @@ struct kmem_cache {
 	struct reciprocal_value reciprocal_size;
 	unsigned int offset;		/* Free pointer offset */
 	unsigned short sheaf_capacity;
+	unsigned short max_full_sheaves;
+	unsigned short max_empty_sheaves;
 	struct kmem_cache_order_objects oo;
 
 	/* Allocation and freeing of slabs */
diff --git a/mm/slub.c b/mm/slub.c
index 856639d3d3f0..e9b33567d98c 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -396,9 +396,6 @@ void stat_add(const struct kmem_cache *s, enum stat_item si, int v)
 #endif
 }
 
-#define MAX_FULL_SHEAVES	10
-#define MAX_EMPTY_SHEAVES	10
-
 struct node_barn {
 	spinlock_t lock;
 	struct list_head sheaves_full;
@@ -3287,7 +3284,7 @@ barn_replace_full_sheaf(struct kmem_cache *s, struct node_barn *barn,
 	lockdep_assert_held(this_cpu_ptr(&s->cpu_sheaves->lock));
 
 	/* we don't repeat this check under barn->lock as it's not critical */
-	if (data_race(barn->nr_full) >= MAX_FULL_SHEAVES)
+	if (data_race(barn->nr_full) >= s->max_full_sheaves)
 		return ERR_PTR(-E2BIG);
 	if (!data_race(barn->nr_empty))
 		return ERR_PTR(-ENOMEM);
@@ -5251,7 +5248,7 @@ void kmem_cache_return_sheaf(struct kmem_cache *s, gfp_t gfp,
 	 * If the barn has too many full sheaves or we fail to refill the sheaf,
 	 * simply flush and free it.
 	 */
-	if (!barn || data_race(barn->nr_full) >= MAX_FULL_SHEAVES) {
+	if (!barn || data_race(barn->nr_full) >= s->max_full_sheaves) {
 		local_unlock(&s->cpu_sheaves->lock);
 		goto free_sheaf;
 	}
@@ -6072,7 +6069,7 @@ static void rcu_free_sheaf(struct rcu_head *head)
 	 * limit but that should be rare and harmless.
 	 */
 
-	if (data_race(barn->nr_full) < MAX_FULL_SHEAVES) {
+	if (data_race(barn->nr_full) < s->max_full_sheaves) {
 		stat(s, BARN_PUT);
 		barn_put_full_sheaf(s, barn, sheaf);
 		local_unlock(&s->cpu_sheaves->lock);
@@ -6083,7 +6080,7 @@ static void rcu_free_sheaf(struct rcu_head *head)
 	sheaf_flush_unused(s, sheaf);
 
 empty:
-	if (barn && data_race(barn->nr_empty) < MAX_EMPTY_SHEAVES) {
+	if (barn && data_race(barn->nr_empty) < s->max_empty_sheaves) {
 		barn_put_empty_sheaf(s, barn, sheaf);
 		local_unlock(&s->cpu_sheaves->lock);
 		return;
@@ -8843,6 +8840,8 @@ int do_kmem_cache_create(struct kmem_cache *s, const char *name,
 #endif
 	s->align = args->align;
 	s->ctor = args->ctor;
+	s->max_full_sheaves = 10;
+	s->max_empty_sheaves = 10;
 #ifdef CONFIG_HARDENED_USERCOPY
 	s->useroffset = args->useroffset;
 	s->usersize = args->usersize;
@@ -9359,6 +9358,46 @@ static ssize_t sheaf_capacity_store(struct kmem_cache *s,
 }
 SLAB_ATTR(sheaf_capacity);
 
+static ssize_t max_full_sheaves_show(struct kmem_cache *s, char *buf)
+{
+	return sysfs_emit(buf, "%hu\n", s->max_full_sheaves);
+}
+
+static ssize_t max_full_sheaves_store(struct kmem_cache *s, const char *buf,
+				      size_t length)
+{
+	unsigned short max_full_sheaves;
+	int err;
+
+	err = kstrtou16(buf, 10, &max_full_sheaves);
+	if (err)
+		return err;
+
+	s->max_full_sheaves = max_full_sheaves;
+	return length;
+}
+SLAB_ATTR(max_full_sheaves);
+
+static ssize_t max_empty_sheaves_show(struct kmem_cache *s, char *buf)
+{
+	return sysfs_emit(buf, "%hu\n", s->max_empty_sheaves);
+}
+
+static ssize_t max_empty_sheaves_store(struct kmem_cache *s, const char *buf,
+				       size_t length)
+{
+	unsigned short max_empty_sheaves;
+	int err;
+
+	err = kstrtou16(buf, 10, &max_empty_sheaves);
+	if (err)
+		return err;
+
+	s->max_empty_sheaves = max_empty_sheaves;
+	return length;
+}
+SLAB_ATTR(max_empty_sheaves);
+
 static ssize_t min_partial_show(struct kmem_cache *s, char *buf)
 {
 	return sysfs_emit(buf, "%lu\n", s->min_partial);
@@ -9721,6 +9760,8 @@ static const struct attribute *const slab_attrs[] = {
 	&objs_per_slab_attr.attr,
 	&order_attr.attr,
 	&sheaf_capacity_attr.attr,
+	&max_full_sheaves_attr.attr,
+	&max_empty_sheaves_attr.attr,
 	&min_partial_attr.attr,
 	&cpu_partial_attr.attr,
 	&objects_partial_attr.attr,

-- 
2.43.0



^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2026-05-15 16:25 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-15 16:24 [PATCH RFC 0/8] mm/slab: enable runtime sheaves tuning Harry Yoo (Oracle)
2026-05-15 16:24 ` [PATCH RFC 1/8] mm/slab: do not store cache pointer in struct slab_sheaf Harry Yoo (Oracle)
2026-05-15 16:24 ` [PATCH RFC 2/8] mm/slab: change sheaf_capacity type to unsigned short Harry Yoo (Oracle)
2026-05-15 16:24 ` [PATCH RFC 3/8] mm/slab: track capacity per sheaf Harry Yoo (Oracle)
2026-05-15 16:24 ` [PATCH RFC 4/8] mm/slab: allow bootstrap_cache_sheaves() to fail Harry Yoo (Oracle)
2026-05-15 16:24 ` [PATCH RFC 5/8] mm/slab: rework cache_has_sheaves() to check immutable properties only Harry Yoo (Oracle)
2026-05-15 16:24 ` [PATCH RFC 6/8] mm/slab: allow changing sheaf_capacity at runtime Harry Yoo (Oracle)
2026-05-15 16:24 ` [PATCH RFC 7/8] mm/slab: add pcs->lock lockdep assert when accessing the barn Harry Yoo (Oracle)
2026-05-15 16:24 ` [PATCH RFC 8/8] mm/slab: allow changing max_{full,empty}_sheaves at runtime Harry Yoo (Oracle)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox