[RFC V2 PATCH 0/5] mm/slab: reduce slab accounting memory overhead by allocating slabobj

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [RFC V2 PATCH 0/5] mm/slab: reduce slab accounting memory overhead by allocating slabobj_ext metadata within unused slab space
@ 2025-08-27 11:37 Harry Yoo
  2025-08-27 11:37 ` [RFC V2 PATCH 1/5] mm/slab: ensure all metadata in slab object is word-aligned Harry Yoo
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Harry Yoo @ 2025-08-27 11:37 UTC (permalink / raw)
  To: akpm, vbabka
  Cc: andreyknvl, cl, dvyukov, glider, hannes, linux-mm, mhocko,
	muchun.song, rientjes, roman.gushchin, ryabinin.a.a, shakeel.butt,
	surenb, vincenzo.frascino, yeoreum.yun, harry.yoo

RFC v1: https://lore.kernel.org/linux-mm/20250613063336.5833-1-harry.yoo@oracle.com

RFC v1 -> v2:
- Adjust Vlastimil's suggestion (patch 2, 3, 5), implemented case 2
  described below. Thanks!
- Fix unaligned metadata address with SLAB_STORE_USER (patch 1)
- When memcg and mem profiling are disabled, do not allocate slabobj_ext
  metadata (patch 4)

When CONFIG_MEMCG and CONFIG_MEM_ALLOC_PROFILING are enabled,
the kernel allocates two pointers per object: one for the memory cgroup
(obj_cgroup) to which it belongs, and another for the code location
that requested the allocation.

In two special cases, this overhead can be eliminated by allocating
slabobj_ext metadata from unused space within a slab:

  Case 1. The "leftover" space after the last slab object is larger than
          the size of an array of slabobj_ext.

  Case 2. The per-object alignment padding is larger than
          sizeof(struct slabobj_ext).

For these two cases, one or two pointers can be saved per slab object.
Examples: ext4 inode cache (case 1) and xfs inode cache (case 2).
That's approximately 0.7-0.8% (memcg) or 1.5-1.6%% (memcg + mem profiling)
of the total inode cache size.

Implementing case 2 is not straightforward, because the existing code
assumes that slab->obj_exts is an array of slabobj_ext, while case 2
breaks the assumption.

As suggested by Vlastimil, abstract access to individual slabobj_ext
metadata via a new helper named slab_obj_ext():

static inline struct slabobj_ext *slab_obj_ext(struct slab *slab,
                                               unsigned long obj_exts,
                                               unsigned int index)
{
        return (struct slabobj_ext *)(obj_exts + slab_get_stride(slab) * index);
} 

In the normal case (including case 1), slab->obj_exts points to an array
of slabobj_ext, and the stride is sizeof(struct slabobj_ext).

In case 2, the stride is s->size and
slab->obj_exts = slab_address(slab) + s->red_left_pad + (offset of slabobj_ext)

With this approach, the memcg charging fastpath doesn't need to care the
storage method of slabobj_ext.

# Microbenchmark Results

To measure the performance impact of this series, Vlastimil's
microbenchmark [1] (modified to add more sizes) is used.
Because performance is measured in cycles, lower is better.

The baseline is slab tree without the series, and the compared kernel
includes patch 2 through 5. (Performance is measured before writing
patch 1)

[1] https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/commit/?h=slub-percpu-sheaves-v5-benchmarking&id=3def9df24be06fd609395f178d0cecec2c02154e

SET                                                     |  no_memcg Δ% |     memcg Δ%
------------------------------------------------------------------------------------
BATCH SIZE: 1 SHUFFLED: NO SIZE: 64                     |        -0.43 |        -0.00
BATCH SIZE: 1 SHUFFLED: NO SIZE: 1120                   |         0.28 |         0.29
BATCH SIZE: 1 SHUFFLED: NO SIZE: 96 (HWCACHE ALIGN)     |        -0.24 |        -0.19
BATCH SIZE: 10 SHUFFLED: NO SIZE: 64                    |        -1.09 |        -0.74
BATCH SIZE: 10 SHUFFLED: NO SIZE: 1120                  |         0.92 |         0.36
BATCH SIZE: 10 SHUFFLED: NO SIZE: 96 (HWCACHE ALIGN)    |        -0.44 |        -0.72
BATCH SIZE: 100 SHUFFLED: NO SIZE: 64                   |        -1.66 |        -2.38
BATCH SIZE: 100 SHUFFLED: NO SIZE: 1120                 |        -1.36 |        -1.36
BATCH SIZE: 100 SHUFFLED: NO SIZE: 96 (HWCACHE ALIGN)   |        -1.88 |        -2.54
BATCH SIZE: 1000 SHUFFLED: NO SIZE: 64                  |        -2.98 |        -2.79
BATCH SIZE: 1000 SHUFFLED: NO SIZE: 1120                |        -1.55 |        -1.48
BATCH SIZE: 1000 SHUFFLED: NO SIZE: 96 (HWCACHE ALIGN)  |        -3.39 |        -4.05
BATCH SIZE: 10 SHUFFLED: YES SIZE: 64                   |        -0.64 |        -1.22
BATCH SIZE: 10 SHUFFLED: YES SIZE: 1120                 |        -0.74 |        -0.42
BATCH SIZE: 10 SHUFFLED: YES SIZE: 96 (HWCACHE ALIGN)   |        -0.59 |        -1.11
BATCH SIZE: 100 SHUFFLED: YES SIZE: 64                  |        -1.99 |        -2.80
BATCH SIZE: 100 SHUFFLED: YES SIZE: 1120                |        -2.56 |        -2.74
BATCH SIZE: 100 SHUFFLED: YES SIZE: 96 (HWCACHE ALIGN)  |        -3.43 |        -3.21
BATCH SIZE: 1000 SHUFFLED: YES SIZE: 64                 |        -3.40 |        -3.13
BATCH SIZE: 1000 SHUFFLED: YES SIZE: 1120               |        -1.89 |        -1.99
BATCH SIZE: 1000 SHUFFLED: YES SIZE: 96 (HWCACHE ALIGN) |        -4.41 |        -4.53

No red flag from the microbenchmark results.

Note 1: I suspect that the reduction in cycles is due to changes in code
layout rather than performance benefits of the series, because when I
applied patch 2 and 3, delta is about 0.3%~2.5%, and then delta drops
(as shown in the table) after applying patch 4 and 5, which doesn't
make much sense.

Note 2: When the kernel was modified to allocate slabobj_ext even when
SLAB_ACCOUNT was not set and memory profiling was disabled,
the "no_memcg Δ%" regressed by 10%. This is the main reason slabobj_ext
is allocated only when either memcg or memory profiling requires
the metadata at the time slabs are created.

# TODO

- Do not unpoison slabobj_ext in case 2. Instead, disable KASAN
  while accessing slabobj_ext if s->flags & SLAB_OBJ_EXT_IN_OBJ
  is not zero.

Harry Yoo (5):
  mm/slab: ensure all metadata in slab object is word-aligned
  mm/slab: abstract slabobj_ext access via new slab_obj_ext() helper
  mm/slab: use stride to access slabobj_ext
  mm/slab: save memory by allocating slabobj_ext array from leftover
  mm/slab: place slabobj_ext metadata in unused space within s->size

 include/linux/slab.h |   3 +
 mm/kasan/kasan.h     |   4 +-
 mm/memcontrol.c      |  23 ++--
 mm/slab.h            |  53 ++++++++-
 mm/slab_common.c     |   6 +-
 mm/slub.c            | 276 ++++++++++++++++++++++++++++++++++++++-----
 6 files changed, 317 insertions(+), 48 deletions(-)

-- 
2.43.0

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [RFC V2 PATCH 1/5] mm/slab: ensure all metadata in slab object is word-aligned
  2025-08-27 11:37 [RFC V2 PATCH 0/5] mm/slab: reduce slab accounting memory overhead by allocating slabobj_ext metadata within unused slab space Harry Yoo
@ 2025-08-27 11:37 ` Harry Yoo
  2025-08-27 11:37 ` [RFC V2 PATCH 2/5] mm/slab: abstract slabobj_ext access via new slab_obj_ext() helper Harry Yoo
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Harry Yoo @ 2025-08-27 11:37 UTC (permalink / raw)
  To: akpm, vbabka
  Cc: andreyknvl, cl, dvyukov, glider, hannes, linux-mm, mhocko,
	muchun.song, rientjes, roman.gushchin, ryabinin.a.a, shakeel.butt,
	surenb, vincenzo.frascino, yeoreum.yun, harry.yoo

When SLAB_STORE_USER debug flag is used, any metadata after the original
kmalloc request size (orig_size) is not properly aligned in 64-bit
architectures because its type is unsigned int. When both KASAN and
SLAB_STORE_USER are enabled, kasan_alloc_meta is not properly aligned.

Because not all architectures can handle unaligned memory accesses,
ensure that any metadata (track, orig_size, kasan_{alloc,free}_meta)
in a slab object is word-aligned. struct track, kasan_{alloc,free}_meta
are aligned by adding __aligned(sizeof(unsigned long)).

For orig_size, to avoid confusion that orig_size is an unsigned long,
use ALIGN(sizeof(unsigned int), sizeof(unsigned long)) to indicate that
its size is unsigned int, but it must be aligned to the word boundary.
On 64-bit architectures, this allocates 8 bytes for orig_size, but it's
acceptable since this is for debugging purposes and not for production
use.

Fixes: 6edf2576a6cc ("mm/slub: enable debugging memory wasting of kmalloc")
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
---

The combination of:
- architectures that cannot handle unaligned memory access
- KASAN
- SLAB_STORE_USER

sounds quite niche, does anyone think it needs to be backported to
-stable?

 mm/kasan/kasan.h |  4 ++--
 mm/slub.c        | 16 +++++++++++-----
 2 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/mm/kasan/kasan.h b/mm/kasan/kasan.h
index 129178be5e64..d4ea7ecc20c3 100644
--- a/mm/kasan/kasan.h
+++ b/mm/kasan/kasan.h
@@ -265,7 +265,7 @@ struct kasan_alloc_meta {
 	struct kasan_track alloc_track;
 	/* Free track is stored in kasan_free_meta. */
 	depot_stack_handle_t aux_stack[2];
-};
+} __aligned(sizeof(unsigned long));
 
 struct qlist_node {
 	struct qlist_node *next;
@@ -289,7 +289,7 @@ struct qlist_node {
 struct kasan_free_meta {
 	struct qlist_node quarantine_link;
 	struct kasan_track free_track;
-};
+} __aligned(sizeof(unsigned long));
 
 #endif /* CONFIG_KASAN_GENERIC */
 
diff --git a/mm/slub.c b/mm/slub.c
index 0ef2ba459ef9..9e91f2016697 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -343,7 +343,7 @@ struct track {
 	int cpu;		/* Was running on cpu */
 	int pid;		/* Pid context */
 	unsigned long when;	/* When did the operation occur */
-};
+} __aligned(sizeof(unsigned long));
 
 enum track_item { TRACK_ALLOC, TRACK_FREE };
 
@@ -1180,7 +1180,7 @@ static void print_trailer(struct kmem_cache *s, struct slab *slab, u8 *p)
 		off += 2 * sizeof(struct track);
 
 	if (slub_debug_orig_size(s))
-		off += sizeof(unsigned int);
+		off += ALIGN(sizeof(unsigned int), sizeof(unsigned long));
 
 	off += kasan_metadata_size(s, false);
 
@@ -1376,7 +1376,8 @@ static int check_pad_bytes(struct kmem_cache *s, struct slab *slab, u8 *p)
 		off += 2 * sizeof(struct track);
 
 		if (s->flags & SLAB_KMALLOC)
-			off += sizeof(unsigned int);
+			off += ALIGN(sizeof(unsigned int),
+				     sizeof(unsigned long));
 	}
 
 	off += kasan_metadata_size(s, false);
@@ -7286,9 +7287,14 @@ static int calculate_sizes(struct kmem_cache_args *args, struct kmem_cache *s)
 		 */
 		size += 2 * sizeof(struct track);
 
-		/* Save the original kmalloc request size */
+		/*
+		 * Save the original kmalloc request size.
+		 * Although the request size is an unsigned int,
+		 * make sure that is aligned to word boundary.
+		 */
 		if (flags & SLAB_KMALLOC)
-			size += sizeof(unsigned int);
+			size += ALIGN(sizeof(unsigned int),
+				      sizeof(unsigned long));
 	}
 #endif
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [RFC V2 PATCH 2/5] mm/slab: abstract slabobj_ext access via new slab_obj_ext() helper
  2025-08-27 11:37 [RFC V2 PATCH 0/5] mm/slab: reduce slab accounting memory overhead by allocating slabobj_ext metadata within unused slab space Harry Yoo
  2025-08-27 11:37 ` [RFC V2 PATCH 1/5] mm/slab: ensure all metadata in slab object is word-aligned Harry Yoo
@ 2025-08-27 11:37 ` Harry Yoo
  2025-08-27 11:37 ` [RFC V2 PATCH 3/5] mm/slab: use stride to access slabobj_ext Harry Yoo
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Harry Yoo @ 2025-08-27 11:37 UTC (permalink / raw)
  To: akpm, vbabka
  Cc: andreyknvl, cl, dvyukov, glider, hannes, linux-mm, mhocko,
	muchun.song, rientjes, roman.gushchin, ryabinin.a.a, shakeel.butt,
	surenb, vincenzo.frascino, yeoreum.yun, harry.yoo

Currently, the slab allocator assumes that slab->obj_exts is a pointer
to an array of struct slabobj_ext objects. However, to support storage
methods where struct slabobj_ext is embedded within objects, the slab
allocator should not make this assumption. Instead of directly
dereferencing the slabobj_exts array, abstract access to
struct slabobj_ext via helper functions.

Introduce a new API slabobj_ext metadata access:

  slab_obj_ext(slab, obj_exts, index) - returns the pointer to
  struct slabobj_ext element at the given index.

Directly dereferencing the return value of slab_obj_exts() is no longer
allowed. Instead, slab_obj_ext() must always be used to access
individual struct slabobj_ext objects.

Convert all users to use these APIs.
No functional changes intended.

Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
---
 mm/memcontrol.c | 23 ++++++++++++++++-------
 mm/slab.h       | 43 +++++++++++++++++++++++++++++++++++++------
 mm/slub.c       | 46 +++++++++++++++++++++++++++-------------------
 3 files changed, 80 insertions(+), 32 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 8dd7fbed5a94..2a9dc246e802 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2566,7 +2566,8 @@ struct mem_cgroup *mem_cgroup_from_obj_folio(struct folio *folio, void *p)
 	 * slab->obj_exts.
 	 */
 	if (folio_test_slab(folio)) {
-		struct slabobj_ext *obj_exts;
+		unsigned long obj_exts;
+		struct slabobj_ext *obj_ext;
 		struct slab *slab;
 		unsigned int off;
 
@@ -2576,8 +2577,9 @@ struct mem_cgroup *mem_cgroup_from_obj_folio(struct folio *folio, void *p)
 			return NULL;
 
 		off = obj_to_index(slab->slab_cache, slab, p);
-		if (obj_exts[off].objcg)
-			return obj_cgroup_memcg(obj_exts[off].objcg);
+		obj_ext = slab_obj_ext(slab, obj_exts, off);
+		if (obj_ext->objcg)
+			return obj_cgroup_memcg(obj_ext->objcg);
 
 		return NULL;
 	}
@@ -3168,6 +3170,9 @@ bool __memcg_slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
 	}
 
 	for (i = 0; i < size; i++) {
+		unsigned long obj_exts;
+		struct slabobj_ext *obj_ext;
+
 		slab = virt_to_slab(p[i]);
 
 		if (!slab_obj_exts(slab) &&
@@ -3190,29 +3195,33 @@ bool __memcg_slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
 					slab_pgdat(slab), cache_vmstat_idx(s)))
 			return false;
 
+		obj_exts = slab_obj_exts(slab);
 		off = obj_to_index(s, slab, p[i]);
+		obj_ext = slab_obj_ext(slab, obj_exts, off);
 		obj_cgroup_get(objcg);
-		slab_obj_exts(slab)[off].objcg = objcg;
+		obj_ext->objcg = objcg;
 	}
 
 	return true;
 }
 
 void __memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
-			    void **p, int objects, struct slabobj_ext *obj_exts)
+			    void **p, int objects, unsigned long obj_exts)
 {
 	size_t obj_size = obj_full_size(s);
 
 	for (int i = 0; i < objects; i++) {
 		struct obj_cgroup *objcg;
+		struct slabobj_ext *obj_ext;
 		unsigned int off;
 
 		off = obj_to_index(s, slab, p[i]);
-		objcg = obj_exts[off].objcg;
+		obj_ext = slab_obj_ext(slab, obj_exts, off);
+		objcg = obj_ext->objcg;
 		if (!objcg)
 			continue;
 
-		obj_exts[off].objcg = NULL;
+		obj_ext->objcg = NULL;
 		refill_obj_stock(objcg, obj_size, true, -obj_size,
 				 slab_pgdat(slab), cache_vmstat_idx(s));
 		obj_cgroup_put(objcg);
diff --git a/mm/slab.h b/mm/slab.h
index f1866f2d9b21..4eb63360fdb5 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -522,10 +522,12 @@ static inline bool slab_in_kunit_test(void) { return false; }
  * associated with a slab.
  * @slab: a pointer to the slab struct
  *
- * Returns a pointer to the object extension vector associated with the slab,
- * or NULL if no such vector has been associated yet.
+ * Returns the address of the object extension vector associated with the slab,
+ * or zero if no such vector has been associated yet.
+ * Do not dereference the return value directly; use slab_obj_ext() to access
+ * its elements.
  */
-static inline struct slabobj_ext *slab_obj_exts(struct slab *slab)
+static inline unsigned long slab_obj_exts(struct slab *slab)
 {
 	unsigned long obj_exts = READ_ONCE(slab->obj_exts);
 
@@ -534,7 +536,30 @@ static inline struct slabobj_ext *slab_obj_exts(struct slab *slab)
 							slab_page(slab));
 	VM_BUG_ON_PAGE(obj_exts & MEMCG_DATA_KMEM, slab_page(slab));
 #endif
-	return (struct slabobj_ext *)(obj_exts & ~OBJEXTS_FLAGS_MASK);
+
+	return obj_exts & ~OBJEXTS_FLAGS_MASK;
+}
+
+/*
+ * slab_obj_ext - get the pointer to the slab object extension metadata
+ * associated with an object in a slab.
+ * @slab: a pointer to the slab struct
+ * @obj_exts: a pointer to the object extension vector
+ * @index: an index of the object
+ *
+ * Returns a pointer to the object extension associated with the object.
+ */
+static inline struct slabobj_ext *slab_obj_ext(struct slab *slab,
+					       unsigned long obj_exts,
+					       unsigned int index)
+{
+	struct slabobj_ext *obj_ext;
+
+	VM_WARN_ON_ONCE(!slab_obj_exts(slab));
+	VM_WARN_ON_ONCE(obj_exts != slab_obj_exts(slab));
+
+	obj_ext = (struct slabobj_ext *)obj_exts;
+	return &obj_ext[index];
 }
 
 int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
@@ -542,7 +567,13 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
 
 #else /* CONFIG_SLAB_OBJ_EXT */
 
-static inline struct slabobj_ext *slab_obj_exts(struct slab *slab)
+static inline unsigned long slab_obj_exts(struct slab *slab)
+{
+	return false;
+}
+
+static inline struct slabobj_ext *slab_obj_ext(struct slab *slab,
+					       unsigned int index)
 {
 	return NULL;
 }
@@ -559,7 +590,7 @@ static inline enum node_stat_item cache_vmstat_idx(struct kmem_cache *s)
 bool __memcg_slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
 				  gfp_t flags, size_t size, void **p);
 void __memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
-			    void **p, int objects, struct slabobj_ext *obj_exts);
+			    void **p, int objects, unsigned long obj_exts);
 #endif
 
 void kvfree_rcu_cb(struct rcu_head *head);
diff --git a/mm/slub.c b/mm/slub.c
index 9e91f2016697..33e2692ca618 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2028,9 +2028,12 @@ static inline void mark_objexts_empty(struct slabobj_ext *obj_exts)
 	if (slab_exts) {
 		unsigned int offs = obj_to_index(obj_exts_slab->slab_cache,
 						 obj_exts_slab, obj_exts);
+		struct slab slabobj_ext *ext = slab_obj_ext(obj_exts_slab,
+							    slab_exts, offs);
+
 		/* codetag should be NULL */
-		WARN_ON(slab_exts[offs].ref.ct);
-		set_codetag_empty(&slab_exts[offs].ref);
+		WARN_ON(ext->ref.ct);
+		set_codetag_empty(&ext->ref);
 	}
 }
 
@@ -2129,7 +2132,7 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
 
 static inline void free_slab_obj_exts(struct slab *slab)
 {
-	struct slabobj_ext *obj_exts;
+	unsigned long obj_exts;
 
 	obj_exts = slab_obj_exts(slab);
 	if (!obj_exts)
@@ -2142,8 +2145,8 @@ static inline void free_slab_obj_exts(struct slab *slab)
 	 * NULL, therefore replace NULL with CODETAG_EMPTY to indicate that
 	 * the extension for obj_exts is expected to be NULL.
 	 */
-	mark_objexts_empty(obj_exts);
-	kfree(obj_exts);
+	mark_objexts_empty((struct slabobj_ext *)obj_exts);
+	kfree((void *)obj_exts);
 	slab->obj_exts = 0;
 }
 
@@ -2168,9 +2171,10 @@ static inline void free_slab_obj_exts(struct slab *slab)
 #ifdef CONFIG_MEM_ALLOC_PROFILING
 
 static inline struct slabobj_ext *
-prepare_slab_obj_exts_hook(struct kmem_cache *s, gfp_t flags, void *p)
+prepare_slab_obj_ext_hook(struct kmem_cache *s, gfp_t flags, void *p)
 {
 	struct slab *slab;
+	unsigned long obj_exts;
 
 	if (!p)
 		return NULL;
@@ -2182,30 +2186,32 @@ prepare_slab_obj_exts_hook(struct kmem_cache *s, gfp_t flags, void *p)
 		return NULL;
 
 	slab = virt_to_slab(p);
-	if (!slab_obj_exts(slab) &&
+	obj_exts = slab_obj_exts(slab);
+	if (!obj_exts &&
 	    alloc_slab_obj_exts(slab, s, flags, false)) {
 		pr_warn_once("%s, %s: Failed to create slab extension vector!\n",
 			     __func__, s->name);
 		return NULL;
 	}
 
-	return slab_obj_exts(slab) + obj_to_index(s, slab, p);
+	obj_exts = slab_obj_exts(slab);
+	return slab_obj_ext(slab, obj_exts, obj_to_index(s, slab, p));
 }
 
 /* Should be called only if mem_alloc_profiling_enabled() */
 static noinline void
 __alloc_tagging_slab_alloc_hook(struct kmem_cache *s, void *object, gfp_t flags)
 {
-	struct slabobj_ext *obj_exts;
+	struct slabobj_ext *obj_ext;
 
-	obj_exts = prepare_slab_obj_exts_hook(s, flags, object);
+	obj_ext = prepare_slab_obj_ext_hook(s, flags, object);
 	/*
 	 * Currently obj_exts is used only for allocation profiling.
 	 * If other users appear then mem_alloc_profiling_enabled()
 	 * check should be added before alloc_tag_add().
 	 */
-	if (likely(obj_exts))
-		alloc_tag_add(&obj_exts->ref, current->alloc_tag, s->size);
+	if (likely(obj_ext))
+		alloc_tag_add(&obj_ext->ref, current->alloc_tag, s->size);
 }
 
 static inline void
@@ -2220,8 +2226,8 @@ static noinline void
 __alloc_tagging_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p,
 			       int objects)
 {
-	struct slabobj_ext *obj_exts;
 	int i;
+	unsigned long obj_exts;
 
 	/* slab->obj_exts might not be NULL if it was created for MEMCG accounting. */
 	if (s->flags & (SLAB_NO_OBJ_EXT | SLAB_NOLEAKTRACE))
@@ -2234,7 +2240,7 @@ __alloc_tagging_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p
 	for (i = 0; i < objects; i++) {
 		unsigned int off = obj_to_index(s, slab, p[i]);
 
-		alloc_tag_sub(&obj_exts[off].ref, s->size);
+		alloc_tag_sub(&slab_obj_ext(slab, obj_exts, off)->ref, s->size);
 	}
 }
 
@@ -2293,7 +2299,7 @@ static __fastpath_inline
 void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p,
 			  int objects)
 {
-	struct slabobj_ext *obj_exts;
+	unsigned long obj_exts;
 
 	if (!memcg_kmem_online())
 		return;
@@ -2308,7 +2314,8 @@ void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p,
 static __fastpath_inline
 bool memcg_slab_post_charge(void *p, gfp_t flags)
 {
-	struct slabobj_ext *slab_exts;
+	unsigned long obj_exts;
+	struct slabobj_ext *obj_ext;
 	struct kmem_cache *s;
 	struct folio *folio;
 	struct slab *slab;
@@ -2348,10 +2355,11 @@ bool memcg_slab_post_charge(void *p, gfp_t flags)
 		return true;
 
 	/* Ignore already charged objects. */
-	slab_exts = slab_obj_exts(slab);
-	if (slab_exts) {
+	obj_exts = slab_obj_exts(slab);
+	if (obj_exts) {
 		off = obj_to_index(s, slab, p);
-		if (unlikely(slab_exts[off].objcg))
+		obj_ext = slab_obj_ext(slab, obj_exts, off);
+		if (unlikely(obj_ext->objcg))
 			return true;
 	}
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [RFC V2 PATCH 3/5] mm/slab: use stride to access slabobj_ext
  2025-08-27 11:37 [RFC V2 PATCH 0/5] mm/slab: reduce slab accounting memory overhead by allocating slabobj_ext metadata within unused slab space Harry Yoo
  2025-08-27 11:37 ` [RFC V2 PATCH 1/5] mm/slab: ensure all metadata in slab object is word-aligned Harry Yoo
  2025-08-27 11:37 ` [RFC V2 PATCH 2/5] mm/slab: abstract slabobj_ext access via new slab_obj_ext() helper Harry Yoo
@ 2025-08-27 11:37 ` Harry Yoo
  2025-08-27 11:37 ` [RFC V2 PATCH 4/5] mm/slab: save memory by allocating slabobj_ext array from leftover Harry Yoo
  2025-08-27 11:37 ` [RFC V2 PATCH 5/5] mm/slab: place slabobj_ext metadata in unused space within s->size Harry Yoo
  4 siblings, 0 replies; 6+ messages in thread
From: Harry Yoo @ 2025-08-27 11:37 UTC (permalink / raw)
  To: akpm, vbabka
  Cc: andreyknvl, cl, dvyukov, glider, hannes, linux-mm, mhocko,
	muchun.song, rientjes, roman.gushchin, ryabinin.a.a, shakeel.butt,
	surenb, vincenzo.frascino, yeoreum.yun, harry.yoo

Use a configurable stride value when accessing slab object extension
metadata instead of assuming a fixed sizeof(struct slabobj_ext).
Store stride value in the lower 16 bits of struct slab.__page_type.

This allows for flexibility in cases where the extension is embedded
within slab objects.

Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
---
 mm/slab.h | 18 ++++++++++++++----
 mm/slub.c |  2 ++
 2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/mm/slab.h b/mm/slab.h
index 4eb63360fdb5..3bded6bd0152 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -540,6 +540,15 @@ static inline unsigned long slab_obj_exts(struct slab *slab)
 	return obj_exts & ~OBJEXTS_FLAGS_MASK;
 }
 
+static inline void slab_set_stride(struct slab *slab, unsigned int stride)
+{
+	slab->__page_type |= stride;
+}
+static inline unsigned int slab_get_stride(struct slab *slab)
+{
+	return slab->__page_type & USHRT_MAX;
+}
+
 /*
  * slab_obj_ext - get the pointer to the slab object extension metadata
  * associated with an object in a slab.
@@ -553,13 +562,10 @@ static inline struct slabobj_ext *slab_obj_ext(struct slab *slab,
 					       unsigned long obj_exts,
 					       unsigned int index)
 {
-	struct slabobj_ext *obj_ext;
-
 	VM_WARN_ON_ONCE(!slab_obj_exts(slab));
 	VM_WARN_ON_ONCE(obj_exts != slab_obj_exts(slab));
 
-	obj_ext = (struct slabobj_ext *)obj_exts;
-	return &obj_ext[index];
+	return (struct slabobj_ext *)(obj_exts + slab_get_stride(slab) * index);
 }
 
 int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
@@ -578,6 +584,10 @@ static inline struct slabobj_ext *slab_obj_ext(struct slab *slab,
 	return NULL;
 }
 
+static inline void slab_set_stride(struct slab *slab, unsigned int stride) { }
+static inline unsigned int slab_get_stride(struct slab *slab) { return 0; }
+
+
 #endif /* CONFIG_SLAB_OBJ_EXT */
 
 static inline enum node_stat_item cache_vmstat_idx(struct kmem_cache *s)
diff --git a/mm/slub.c b/mm/slub.c
index 33e2692ca618..b80bf3a24ab9 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2107,6 +2107,8 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
 #endif
 	old_exts = READ_ONCE(slab->obj_exts);
 	handle_failed_objexts_alloc(old_exts, vec, objects);
+	slab_set_stride(slab, sizeof(struct slabobj_ext));
+
 	if (new_slab) {
 		/*
 		 * If the slab is brand new and nobody can yet access its
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [RFC V2 PATCH 4/5] mm/slab: save memory by allocating slabobj_ext array from leftover
  2025-08-27 11:37 [RFC V2 PATCH 0/5] mm/slab: reduce slab accounting memory overhead by allocating slabobj_ext metadata within unused slab space Harry Yoo
                   ` (2 preceding siblings ...)
  2025-08-27 11:37 ` [RFC V2 PATCH 3/5] mm/slab: use stride to access slabobj_ext Harry Yoo
@ 2025-08-27 11:37 ` Harry Yoo
  2025-08-27 11:37 ` [RFC V2 PATCH 5/5] mm/slab: place slabobj_ext metadata in unused space within s->size Harry Yoo
  4 siblings, 0 replies; 6+ messages in thread
From: Harry Yoo @ 2025-08-27 11:37 UTC (permalink / raw)
  To: akpm, vbabka
  Cc: andreyknvl, cl, dvyukov, glider, hannes, linux-mm, mhocko,
	muchun.song, rientjes, roman.gushchin, ryabinin.a.a, shakeel.butt,
	surenb, vincenzo.frascino, yeoreum.yun, harry.yoo

The leftover space in a slab is always smaller than s->size, and
kmem caches for large objects that are not power-of-two sizes tend to have
a greater amount of leftover space per slab. In some cases, the leftover
space is larger than the size of the slabobj_ext array for the slab.

An excellent example of such a cache is ext4_inode_cache. On my system,
the object size is 1144, with a preferred order of 3, 28 objects per slab,
and 736 bytes of leftover space per slab.

Since the size of the slabobj_ext array is only 224 bytes (w/o mem
profiling) or 448 bytes (w/ mem profiling) per slab, the entire array
fits within the leftover space.

Allocate the slabobj_exts array from this unused space instead of using
kcalloc(), when it is large enough. The array is always allocated when
creating new slabs, because implementing lazy allocation correctly is
difficult without expensive synchronization.

To avoid unnecessary overhead when MEMCG (with SLAB_ACCOUNT) and
MEM_ALLOC_PROFILING are not used for the cache, only allocate the
slabobj_ext array only when either of them are enabled when slabs are
created.

[ MEMCG=y, MEM_ALLOC_PROFILING=y ]

Before patch (run updatedb):
  Slab:            5815196 kB
  SReclaimable:    5042824 kB
  SUnreclaim:       772372 kB

After patch (run updatedb):
  Slab:            5748664 kB
  SReclaimable:    5041608 kB
  SUnreclaim:       707084 kB (-63.75 MiB)

[ MEMCG=y, MEM_ALLOC_PROFILING=n ]

Before patch (run updatedb):
  Slab:            5637764 kB
  SReclaimable:    5042428 kB
  SUnreclaim:       595284 kB

After patch (run updatedb):
  Slab:            5598992 kB
  SReclaimable:    5042248 kB
  SUnreclaim:       560396 kB (-34.07 MiB)

Enjoy the memory savings!

Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
---
 mm/slub.c | 145 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 140 insertions(+), 5 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index b80bf3a24ab9..ad9a1cae48b2 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -878,6 +878,94 @@ static inline unsigned int get_orig_size(struct kmem_cache *s, void *object)
 	return *(unsigned int *)p;
 }
 
+#ifdef CONFIG_SLAB_OBJ_EXT
+
+/*
+ * Check if memory cgroup or memory allocation profiling is enabled.
+ * If enabled, SLUB tries to reduce memory overhead of accounting
+ * slab objects. If neither is enabled when this function is called,
+ * the optimization is simply skipped to avoid affecting caches that do not
+ * need slabobj_ext metadata.
+ *
+ * However, this may disable optimization when memory cgroup or memory
+ * allocation profiling is used, but slabs are created too early
+ * even before those subsystems are initialized.
+ */
+static inline bool need_slab_obj_exts(struct kmem_cache *s)
+{
+	if (!mem_cgroup_disabled() && (s->flags & SLAB_ACCOUNT))
+		return true;
+
+	if (mem_alloc_profiling_enabled())
+		return true;
+
+	return false;
+}
+
+static inline unsigned int obj_exts_size_in_slab(struct slab *slab)
+{
+	return sizeof(struct slabobj_ext) * slab->objects;
+}
+
+static inline unsigned long obj_exts_offset_in_slab(struct kmem_cache *s,
+						    struct slab *slab)
+{
+	unsigned long objext_offset;
+
+	objext_offset = s->red_left_pad + s->size * slab->objects;
+	objext_offset = ALIGN(objext_offset, sizeof(struct slabobj_ext));
+	return objext_offset;
+}
+
+static inline bool obj_exts_fit_within_slab_leftover(struct kmem_cache *s,
+						     struct slab *slab)
+{
+	unsigned long objext_offset = obj_exts_offset_in_slab(s, slab);
+	unsigned long objext_size = obj_exts_size_in_slab(slab);
+
+	return objext_offset + objext_size <= slab_size(slab);
+}
+
+static inline bool obj_exts_in_slab(struct kmem_cache *s, struct slab *slab)
+{
+	unsigned long obj_exts;
+
+	if (!obj_exts_fit_within_slab_leftover(s, slab))
+		return false;
+
+	obj_exts = (unsigned long)slab_address(slab);
+	obj_exts += obj_exts_offset_in_slab(s, slab);
+	return obj_exts == slab_obj_exts(slab);
+}
+#else
+static inline bool need_slab_obj_exts(struct kmem_cache *s)
+{
+	return false;
+}
+
+static inline unsigned int obj_exts_size_in_slab(struct slab *slab)
+{
+	return 0;
+}
+
+static inline unsigned long obj_exts_offset_in_slab(struct kmem_cache *s,
+						    struct slab *slab)
+{
+	return 0;
+}
+
+static inline bool obj_exts_fit_within_slab_leftover(struct kmem_cache *s,
+						     struct slab *slab)
+{
+	return false;
+}
+
+static inline bool obj_exts_in_slab(struct kmem_cache *s, struct slab *slab)
+{
+	return false;
+}
+#endif
+
 #ifdef CONFIG_SLUB_DEBUG
 static unsigned long object_map[BITS_TO_LONGS(MAX_OBJS_PER_PAGE)];
 static DEFINE_SPINLOCK(object_map_lock);
@@ -1406,7 +1494,15 @@ slab_pad_check(struct kmem_cache *s, struct slab *slab)
 	start = slab_address(slab);
 	length = slab_size(slab);
 	end = start + length;
-	remainder = length % s->size;
+
+	if (obj_exts_in_slab(s, slab)) {
+		remainder = length;
+		remainder -= obj_exts_offset_in_slab(s, slab);
+		remainder -= obj_exts_size_in_slab(slab);
+	} else {
+		remainder = length % s->size;
+	}
+
 	if (!remainder)
 		return;
 
@@ -2140,6 +2236,11 @@ static inline void free_slab_obj_exts(struct slab *slab)
 	if (!obj_exts)
 		return;
 
+	if (obj_exts_in_slab(slab->slab_cache, slab)) {
+		slab->obj_exts = 0;
+		return;
+	}
+
 	/*
 	 * obj_exts was created with __GFP_NO_OBJ_EXT flag, therefore its
 	 * corresponding extension will be NULL. alloc_tag_sub() will throw a
@@ -2152,6 +2253,29 @@ static inline void free_slab_obj_exts(struct slab *slab)
 	slab->obj_exts = 0;
 }
 
+/*
+ * Try to allocate slabobj_ext array from unused space.
+ * This function must be called on a freshly allocated slab to prevent
+ * concurrency problems.
+ */
+static void alloc_slab_obj_exts_early(struct kmem_cache *s, struct slab *slab)
+{
+	void *addr;
+
+	if (!need_slab_obj_exts(s))
+		return;
+
+	if (obj_exts_fit_within_slab_leftover(s, slab)) {
+		addr = slab_address(slab) + obj_exts_offset_in_slab(s, slab);
+		kasan_unpoison_range(addr, obj_exts_size_in_slab(slab));
+		memset(addr, 0, obj_exts_size_in_slab(slab));
+		slab->obj_exts = (unsigned long)addr;
+		if (IS_ENABLED(CONFIG_MEMCG))
+			slab->obj_exts |= MEMCG_DATA_OBJEXTS;
+		slab_set_stride(slab, sizeof(struct slabobj_ext));
+	}
+}
+
 #else /* CONFIG_SLAB_OBJ_EXT */
 
 static inline void init_slab_obj_exts(struct slab *slab)
@@ -2168,6 +2292,11 @@ static inline void free_slab_obj_exts(struct slab *slab)
 {
 }
 
+static inline void alloc_slab_obj_exts_early(struct kmem_cache *s,
+						       struct slab *slab)
+{
+}
+
 #endif /* CONFIG_SLAB_OBJ_EXT */
 
 #ifdef CONFIG_MEM_ALLOC_PROFILING
@@ -3125,7 +3254,9 @@ static inline bool shuffle_freelist(struct kmem_cache *s, struct slab *slab)
 static __always_inline void account_slab(struct slab *slab, int order,
 					 struct kmem_cache *s, gfp_t gfp)
 {
-	if (memcg_kmem_online() && (s->flags & SLAB_ACCOUNT))
+	if (memcg_kmem_online() &&
+			(s->flags & SLAB_ACCOUNT) &&
+			!slab_obj_exts(slab))
 		alloc_slab_obj_exts(slab, s, gfp, true);
 
 	mod_node_page_state(slab_pgdat(slab), cache_vmstat_idx(s),
@@ -3184,9 +3315,6 @@ static struct slab *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
 	slab->objects = oo_objects(oo);
 	slab->inuse = 0;
 	slab->frozen = 0;
-	init_slab_obj_exts(slab);
-
-	account_slab(slab, oo_order(oo), s, flags);
 
 	slab->slab_cache = s;
 
@@ -3195,6 +3323,13 @@ static struct slab *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
 	start = slab_address(slab);
 
 	setup_slab_debug(s, slab, start);
+	init_slab_obj_exts(slab);
+	/*
+	 * Poison the slab before initializing the slabobj_ext array
+	 * to prevent the array from being overwritten.
+	 */
+	alloc_slab_obj_exts_early(s, slab);
+	account_slab(slab, oo_order(oo), s, flags);
 
 	shuffle = shuffle_freelist(s, slab);
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [RFC V2 PATCH 5/5] mm/slab: place slabobj_ext metadata in unused space within s->size
  2025-08-27 11:37 [RFC V2 PATCH 0/5] mm/slab: reduce slab accounting memory overhead by allocating slabobj_ext metadata within unused slab space Harry Yoo
                   ` (3 preceding siblings ...)
  2025-08-27 11:37 ` [RFC V2 PATCH 4/5] mm/slab: save memory by allocating slabobj_ext array from leftover Harry Yoo
@ 2025-08-27 11:37 ` Harry Yoo
  4 siblings, 0 replies; 6+ messages in thread
From: Harry Yoo @ 2025-08-27 11:37 UTC (permalink / raw)
  To: akpm, vbabka
  Cc: andreyknvl, cl, dvyukov, glider, hannes, linux-mm, mhocko,
	muchun.song, rientjes, roman.gushchin, ryabinin.a.a, shakeel.butt,
	surenb, vincenzo.frascino, yeoreum.yun, harry.yoo

When a cache has high s->align value and s->object_size is not aligned
to it, each object ends up with some unused space because of alignment.
If this wasted space is big enough, we can use it to store the
slabobj_ext metadata instead of wasting it.

On my system, this happens with caches like kmem_cache, mm_struct, pid,
task_struct, sighand_cache, xfs_inode, and others.

To place the slabobj_ext metadata within each object, the existing
slab_obj_ext() logic can still be used by setting:

  - slab->obj_exts = slab_address(slab) + s->red_left_zone +
                     (slabobj_ext offset)
  - stride = s->size

slab_obj_ext() doesn't need know where the metadata is stored,
so this method works without adding extra overhead to slab_obj_ext().

A good example benefiting from this optimization is xfs_inode
(object_size: 992, align: 64). To measure memory savings, an XFS
filesystem with 6 millions of files was created, and updatedb was run
within that filesystem.

[ MEMCG=y, MEM_ALLOC_PROFILING=y ]

Before patch (run updatedb):
  Slab:            8409212 kB
  SReclaimable:    7314864 kB
  SUnreclaim:      1094348 kB

After patch (run updatedb):
  Slab:            8313324 kB
  SReclaimable:    7318176 kB
  SUnreclaim:       995148 kB (-96.87 MiB)

[ MEMCG=y, MEM_ALLOC_PROFILING=n ]

Before patch (run updatedb):
  Slab:            8081708 kB
  SReclaimable:    7314400 kB
  SUnreclaim:       767308 kB

After patch (run updatedb):
  Slab:            8034676 kB
  SReclaimable:    7314532 kB
  SUnreclaim:       720144 kB (-46.06 MiB)

Enjoy the memory savings!

Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
---
 include/linux/slab.h |  3 ++
 mm/slab_common.c     |  6 ++--
 mm/slub.c            | 69 ++++++++++++++++++++++++++++++++++++++++++--
 3 files changed, 73 insertions(+), 5 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 680193356ac7..279d35b40e8e 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -59,6 +59,7 @@ enum _slab_flag_bits {
 	_SLAB_CMPXCHG_DOUBLE,
 #ifdef CONFIG_SLAB_OBJ_EXT
 	_SLAB_NO_OBJ_EXT,
+	_SLAB_OBJ_EXT_IN_OBJ,
 #endif
 	_SLAB_FLAGS_LAST_BIT
 };
@@ -240,8 +241,10 @@ enum _slab_flag_bits {
 /* Slab created using create_boot_cache */
 #ifdef CONFIG_SLAB_OBJ_EXT
 #define SLAB_NO_OBJ_EXT		__SLAB_FLAG_BIT(_SLAB_NO_OBJ_EXT)
+#define SLAB_OBJ_EXT_IN_OBJ	__SLAB_FLAG_BIT(_SLAB_OBJ_EXT_IN_OBJ)
 #else
 #define SLAB_NO_OBJ_EXT		__SLAB_FLAG_UNUSED
+#define SLAB_OBJ_EXT_IN_OBJ	__SLAB_FLAG_UNUSED
 #endif
 
 /*
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 08f5baee1309..cbd85eecd430 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -43,11 +43,13 @@ DEFINE_MUTEX(slab_mutex);
 struct kmem_cache *kmem_cache;
 
 /*
- * Set of flags that will prevent slab merging
+ * Set of flags that will prevent slab merging.
+ * Any flag that adds per-object metadata should be included,
+ * since slab merging can update s->inuse that affects the metadata layout.
  */
 #define SLAB_NEVER_MERGE (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER | \
 		SLAB_TRACE | SLAB_TYPESAFE_BY_RCU | SLAB_NOLEAKTRACE | \
-		SLAB_FAILSLAB | SLAB_NO_MERGE)
+		SLAB_FAILSLAB | SLAB_NO_MERGE | SLAB_OBJ_EXT_IN_OBJ)
 
 #define SLAB_MERGE_SAME (SLAB_RECLAIM_ACCOUNT | SLAB_CACHE_DMA | \
 			 SLAB_CACHE_DMA32 | SLAB_ACCOUNT)
diff --git a/mm/slub.c b/mm/slub.c
index ad9a1cae48b2..6689131761c5 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -937,6 +937,26 @@ static inline bool obj_exts_in_slab(struct kmem_cache *s, struct slab *slab)
 	obj_exts += obj_exts_offset_in_slab(s, slab);
 	return obj_exts == slab_obj_exts(slab);
 }
+
+static bool obj_exts_in_object(struct kmem_cache *s)
+{
+	return s->flags & SLAB_OBJ_EXT_IN_OBJ;
+}
+
+static unsigned int obj_exts_offset_in_object(struct kmem_cache *s)
+{
+	unsigned int offset = get_info_end(s);
+
+	if (kmem_cache_debug_flags(s, SLAB_STORE_USER))
+		offset += sizeof(struct track) * 2;
+
+	if (slub_debug_orig_size(s))
+		offset += ALIGN(sizeof(unsigned int), sizeof(unsigned long));
+
+	offset += kasan_metadata_size(s, false);
+
+	return offset;
+}
 #else
 static inline bool need_slab_obj_exts(struct kmem_cache *s)
 {
@@ -964,6 +984,17 @@ static inline bool obj_exts_in_slab(struct kmem_cache *s, struct slab *slab)
 {
 	return false;
 }
+
+static inline bool obj_exts_in_object(struct kmem_cache *s)
+{
+	return false;
+}
+
+static inline unsigned int obj_exts_offset_in_object(struct kmem_cache *s)
+{
+	return 0;
+}
+
 #endif
 
 #ifdef CONFIG_SLUB_DEBUG
@@ -1272,6 +1303,9 @@ static void print_trailer(struct kmem_cache *s, struct slab *slab, u8 *p)
 
 	off += kasan_metadata_size(s, false);
 
+	if (obj_exts_in_object(s))
+		off += sizeof(struct slabobj_ext);
+
 	if (off != size_from_object(s))
 		/* Beginning of the filler is the free pointer */
 		print_section(KERN_ERR, "Padding  ", p + off,
@@ -1441,7 +1475,10 @@ check_bytes_and_report(struct kmem_cache *s, struct slab *slab,
  * 	A. Free pointer (if we cannot overwrite object on free)
  * 	B. Tracking data for SLAB_STORE_USER
  *	C. Original request size for kmalloc object (SLAB_STORE_USER enabled)
- *	D. Padding to reach required alignment boundary or at minimum
+ *	D. KASAN alloc metadata (KASAN enabled)
+ *	E. struct slabobj_ext to store accounting metadata
+ *	   (SLAB_OBJ_EXT_IN_OBJ enabled)
+ *	F. Padding to reach required alignment boundary or at minimum
  * 		one word if debugging is on to be able to detect writes
  * 		before the word boundary.
  *
@@ -1470,6 +1507,9 @@ static int check_pad_bytes(struct kmem_cache *s, struct slab *slab, u8 *p)
 
 	off += kasan_metadata_size(s, false);
 
+	if (obj_exts_in_object(s))
+		off += sizeof(struct slabobj_ext);
+
 	if (size_from_object(s) == off)
 		return 1;
 
@@ -2236,7 +2276,8 @@ static inline void free_slab_obj_exts(struct slab *slab)
 	if (!obj_exts)
 		return;
 
-	if (obj_exts_in_slab(slab->slab_cache, slab)) {
+	if (obj_exts_in_slab(slab->slab_cache, slab) ||
+			obj_exts_in_object(slab->slab_cache)) {
 		slab->obj_exts = 0;
 		return;
 	}
@@ -2273,6 +2314,21 @@ static void alloc_slab_obj_exts_early(struct kmem_cache *s, struct slab *slab)
 		if (IS_ENABLED(CONFIG_MEMCG))
 			slab->obj_exts |= MEMCG_DATA_OBJEXTS;
 		slab_set_stride(slab, sizeof(struct slabobj_ext));
+	} else if (obj_exts_in_object(s)) {
+		unsigned int offset = obj_exts_offset_in_object(s);
+
+		slab->obj_exts = (unsigned long)slab_address(slab);
+	        slab->obj_exts += s->red_left_pad;
+		slab->obj_exts += obj_exts_offset_in_object(s);
+		if (IS_ENABLED(CONFIG_MEMCG))
+			slab->obj_exts |= MEMCG_DATA_OBJEXTS;
+		slab_set_stride(slab, s->size);
+
+		for_each_object(addr, s, slab_address(slab), slab->objects) {
+			kasan_unpoison_range(addr + offset,
+					     sizeof(struct slabobj_ext));
+			memset(addr + offset, 0, sizeof(struct slabobj_ext));
+		}
 	}
 }
 
@@ -7354,6 +7410,7 @@ static int calculate_sizes(struct kmem_cache_args *args, struct kmem_cache *s)
 {
 	slab_flags_t flags = s->flags;
 	unsigned int size = s->object_size;
+	unsigned int aligned_size;
 	unsigned int order;
 
 	/*
@@ -7466,7 +7523,13 @@ static int calculate_sizes(struct kmem_cache_args *args, struct kmem_cache *s)
 	 * offset 0. In order to align the objects we have to simply size
 	 * each object to conform to the alignment.
 	 */
-	size = ALIGN(size, s->align);
+	aligned_size = ALIGN(size, s->align);
+#ifdef CONFIG_SLAB_OBJ_EXT
+	if (aligned_size - size >= sizeof(struct slabobj_ext))
+		s->flags |= SLAB_OBJ_EXT_IN_OBJ;
+#endif
+	size = aligned_size;
+
 	s->size = size;
 	s->reciprocal_size = reciprocal_value(size);
 	order = calculate_order(size);
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-08-27 11:38 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-27 11:37 [RFC V2 PATCH 0/5] mm/slab: reduce slab accounting memory overhead by allocating slabobj_ext metadata within unused slab space Harry Yoo
2025-08-27 11:37 ` [RFC V2 PATCH 1/5] mm/slab: ensure all metadata in slab object is word-aligned Harry Yoo
2025-08-27 11:37 ` [RFC V2 PATCH 2/5] mm/slab: abstract slabobj_ext access via new slab_obj_ext() helper Harry Yoo
2025-08-27 11:37 ` [RFC V2 PATCH 3/5] mm/slab: use stride to access slabobj_ext Harry Yoo
2025-08-27 11:37 ` [RFC V2 PATCH 4/5] mm/slab: save memory by allocating slabobj_ext array from leftover Harry Yoo
2025-08-27 11:37 ` [RFC V2 PATCH 5/5] mm/slab: place slabobj_ext metadata in unused space within s->size Harry Yoo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).