* [RFC V2 PATCH 1/5] mm/slab: ensure all metadata in slab object is word-aligned
2025-08-27 11:37 [RFC V2 PATCH 0/5] mm/slab: reduce slab accounting memory overhead by allocating slabobj_ext metadata within unused slab space Harry Yoo
@ 2025-08-27 11:37 ` Harry Yoo
2025-08-27 11:37 ` [RFC V2 PATCH 2/5] mm/slab: abstract slabobj_ext access via new slab_obj_ext() helper Harry Yoo
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Harry Yoo @ 2025-08-27 11:37 UTC (permalink / raw)
To: akpm, vbabka
Cc: andreyknvl, cl, dvyukov, glider, hannes, linux-mm, mhocko,
muchun.song, rientjes, roman.gushchin, ryabinin.a.a, shakeel.butt,
surenb, vincenzo.frascino, yeoreum.yun, harry.yoo
When SLAB_STORE_USER debug flag is used, any metadata after the original
kmalloc request size (orig_size) is not properly aligned in 64-bit
architectures because its type is unsigned int. When both KASAN and
SLAB_STORE_USER are enabled, kasan_alloc_meta is not properly aligned.
Because not all architectures can handle unaligned memory accesses,
ensure that any metadata (track, orig_size, kasan_{alloc,free}_meta)
in a slab object is word-aligned. struct track, kasan_{alloc,free}_meta
are aligned by adding __aligned(sizeof(unsigned long)).
For orig_size, to avoid confusion that orig_size is an unsigned long,
use ALIGN(sizeof(unsigned int), sizeof(unsigned long)) to indicate that
its size is unsigned int, but it must be aligned to the word boundary.
On 64-bit architectures, this allocates 8 bytes for orig_size, but it's
acceptable since this is for debugging purposes and not for production
use.
Fixes: 6edf2576a6cc ("mm/slub: enable debugging memory wasting of kmalloc")
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
---
The combination of:
- architectures that cannot handle unaligned memory access
- KASAN
- SLAB_STORE_USER
sounds quite niche, does anyone think it needs to be backported to
-stable?
mm/kasan/kasan.h | 4 ++--
mm/slub.c | 16 +++++++++++-----
2 files changed, 13 insertions(+), 7 deletions(-)
diff --git a/mm/kasan/kasan.h b/mm/kasan/kasan.h
index 129178be5e64..d4ea7ecc20c3 100644
--- a/mm/kasan/kasan.h
+++ b/mm/kasan/kasan.h
@@ -265,7 +265,7 @@ struct kasan_alloc_meta {
struct kasan_track alloc_track;
/* Free track is stored in kasan_free_meta. */
depot_stack_handle_t aux_stack[2];
-};
+} __aligned(sizeof(unsigned long));
struct qlist_node {
struct qlist_node *next;
@@ -289,7 +289,7 @@ struct qlist_node {
struct kasan_free_meta {
struct qlist_node quarantine_link;
struct kasan_track free_track;
-};
+} __aligned(sizeof(unsigned long));
#endif /* CONFIG_KASAN_GENERIC */
diff --git a/mm/slub.c b/mm/slub.c
index 0ef2ba459ef9..9e91f2016697 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -343,7 +343,7 @@ struct track {
int cpu; /* Was running on cpu */
int pid; /* Pid context */
unsigned long when; /* When did the operation occur */
-};
+} __aligned(sizeof(unsigned long));
enum track_item { TRACK_ALLOC, TRACK_FREE };
@@ -1180,7 +1180,7 @@ static void print_trailer(struct kmem_cache *s, struct slab *slab, u8 *p)
off += 2 * sizeof(struct track);
if (slub_debug_orig_size(s))
- off += sizeof(unsigned int);
+ off += ALIGN(sizeof(unsigned int), sizeof(unsigned long));
off += kasan_metadata_size(s, false);
@@ -1376,7 +1376,8 @@ static int check_pad_bytes(struct kmem_cache *s, struct slab *slab, u8 *p)
off += 2 * sizeof(struct track);
if (s->flags & SLAB_KMALLOC)
- off += sizeof(unsigned int);
+ off += ALIGN(sizeof(unsigned int),
+ sizeof(unsigned long));
}
off += kasan_metadata_size(s, false);
@@ -7286,9 +7287,14 @@ static int calculate_sizes(struct kmem_cache_args *args, struct kmem_cache *s)
*/
size += 2 * sizeof(struct track);
- /* Save the original kmalloc request size */
+ /*
+ * Save the original kmalloc request size.
+ * Although the request size is an unsigned int,
+ * make sure that is aligned to word boundary.
+ */
if (flags & SLAB_KMALLOC)
- size += sizeof(unsigned int);
+ size += ALIGN(sizeof(unsigned int),
+ sizeof(unsigned long));
}
#endif
--
2.43.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [RFC V2 PATCH 2/5] mm/slab: abstract slabobj_ext access via new slab_obj_ext() helper
2025-08-27 11:37 [RFC V2 PATCH 0/5] mm/slab: reduce slab accounting memory overhead by allocating slabobj_ext metadata within unused slab space Harry Yoo
2025-08-27 11:37 ` [RFC V2 PATCH 1/5] mm/slab: ensure all metadata in slab object is word-aligned Harry Yoo
@ 2025-08-27 11:37 ` Harry Yoo
2025-08-27 11:37 ` [RFC V2 PATCH 3/5] mm/slab: use stride to access slabobj_ext Harry Yoo
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Harry Yoo @ 2025-08-27 11:37 UTC (permalink / raw)
To: akpm, vbabka
Cc: andreyknvl, cl, dvyukov, glider, hannes, linux-mm, mhocko,
muchun.song, rientjes, roman.gushchin, ryabinin.a.a, shakeel.butt,
surenb, vincenzo.frascino, yeoreum.yun, harry.yoo
Currently, the slab allocator assumes that slab->obj_exts is a pointer
to an array of struct slabobj_ext objects. However, to support storage
methods where struct slabobj_ext is embedded within objects, the slab
allocator should not make this assumption. Instead of directly
dereferencing the slabobj_exts array, abstract access to
struct slabobj_ext via helper functions.
Introduce a new API slabobj_ext metadata access:
slab_obj_ext(slab, obj_exts, index) - returns the pointer to
struct slabobj_ext element at the given index.
Directly dereferencing the return value of slab_obj_exts() is no longer
allowed. Instead, slab_obj_ext() must always be used to access
individual struct slabobj_ext objects.
Convert all users to use these APIs.
No functional changes intended.
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
---
mm/memcontrol.c | 23 ++++++++++++++++-------
mm/slab.h | 43 +++++++++++++++++++++++++++++++++++++------
mm/slub.c | 46 +++++++++++++++++++++++++++-------------------
3 files changed, 80 insertions(+), 32 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 8dd7fbed5a94..2a9dc246e802 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2566,7 +2566,8 @@ struct mem_cgroup *mem_cgroup_from_obj_folio(struct folio *folio, void *p)
* slab->obj_exts.
*/
if (folio_test_slab(folio)) {
- struct slabobj_ext *obj_exts;
+ unsigned long obj_exts;
+ struct slabobj_ext *obj_ext;
struct slab *slab;
unsigned int off;
@@ -2576,8 +2577,9 @@ struct mem_cgroup *mem_cgroup_from_obj_folio(struct folio *folio, void *p)
return NULL;
off = obj_to_index(slab->slab_cache, slab, p);
- if (obj_exts[off].objcg)
- return obj_cgroup_memcg(obj_exts[off].objcg);
+ obj_ext = slab_obj_ext(slab, obj_exts, off);
+ if (obj_ext->objcg)
+ return obj_cgroup_memcg(obj_ext->objcg);
return NULL;
}
@@ -3168,6 +3170,9 @@ bool __memcg_slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
}
for (i = 0; i < size; i++) {
+ unsigned long obj_exts;
+ struct slabobj_ext *obj_ext;
+
slab = virt_to_slab(p[i]);
if (!slab_obj_exts(slab) &&
@@ -3190,29 +3195,33 @@ bool __memcg_slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
slab_pgdat(slab), cache_vmstat_idx(s)))
return false;
+ obj_exts = slab_obj_exts(slab);
off = obj_to_index(s, slab, p[i]);
+ obj_ext = slab_obj_ext(slab, obj_exts, off);
obj_cgroup_get(objcg);
- slab_obj_exts(slab)[off].objcg = objcg;
+ obj_ext->objcg = objcg;
}
return true;
}
void __memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
- void **p, int objects, struct slabobj_ext *obj_exts)
+ void **p, int objects, unsigned long obj_exts)
{
size_t obj_size = obj_full_size(s);
for (int i = 0; i < objects; i++) {
struct obj_cgroup *objcg;
+ struct slabobj_ext *obj_ext;
unsigned int off;
off = obj_to_index(s, slab, p[i]);
- objcg = obj_exts[off].objcg;
+ obj_ext = slab_obj_ext(slab, obj_exts, off);
+ objcg = obj_ext->objcg;
if (!objcg)
continue;
- obj_exts[off].objcg = NULL;
+ obj_ext->objcg = NULL;
refill_obj_stock(objcg, obj_size, true, -obj_size,
slab_pgdat(slab), cache_vmstat_idx(s));
obj_cgroup_put(objcg);
diff --git a/mm/slab.h b/mm/slab.h
index f1866f2d9b21..4eb63360fdb5 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -522,10 +522,12 @@ static inline bool slab_in_kunit_test(void) { return false; }
* associated with a slab.
* @slab: a pointer to the slab struct
*
- * Returns a pointer to the object extension vector associated with the slab,
- * or NULL if no such vector has been associated yet.
+ * Returns the address of the object extension vector associated with the slab,
+ * or zero if no such vector has been associated yet.
+ * Do not dereference the return value directly; use slab_obj_ext() to access
+ * its elements.
*/
-static inline struct slabobj_ext *slab_obj_exts(struct slab *slab)
+static inline unsigned long slab_obj_exts(struct slab *slab)
{
unsigned long obj_exts = READ_ONCE(slab->obj_exts);
@@ -534,7 +536,30 @@ static inline struct slabobj_ext *slab_obj_exts(struct slab *slab)
slab_page(slab));
VM_BUG_ON_PAGE(obj_exts & MEMCG_DATA_KMEM, slab_page(slab));
#endif
- return (struct slabobj_ext *)(obj_exts & ~OBJEXTS_FLAGS_MASK);
+
+ return obj_exts & ~OBJEXTS_FLAGS_MASK;
+}
+
+/*
+ * slab_obj_ext - get the pointer to the slab object extension metadata
+ * associated with an object in a slab.
+ * @slab: a pointer to the slab struct
+ * @obj_exts: a pointer to the object extension vector
+ * @index: an index of the object
+ *
+ * Returns a pointer to the object extension associated with the object.
+ */
+static inline struct slabobj_ext *slab_obj_ext(struct slab *slab,
+ unsigned long obj_exts,
+ unsigned int index)
+{
+ struct slabobj_ext *obj_ext;
+
+ VM_WARN_ON_ONCE(!slab_obj_exts(slab));
+ VM_WARN_ON_ONCE(obj_exts != slab_obj_exts(slab));
+
+ obj_ext = (struct slabobj_ext *)obj_exts;
+ return &obj_ext[index];
}
int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
@@ -542,7 +567,13 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
#else /* CONFIG_SLAB_OBJ_EXT */
-static inline struct slabobj_ext *slab_obj_exts(struct slab *slab)
+static inline unsigned long slab_obj_exts(struct slab *slab)
+{
+ return false;
+}
+
+static inline struct slabobj_ext *slab_obj_ext(struct slab *slab,
+ unsigned int index)
{
return NULL;
}
@@ -559,7 +590,7 @@ static inline enum node_stat_item cache_vmstat_idx(struct kmem_cache *s)
bool __memcg_slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
gfp_t flags, size_t size, void **p);
void __memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
- void **p, int objects, struct slabobj_ext *obj_exts);
+ void **p, int objects, unsigned long obj_exts);
#endif
void kvfree_rcu_cb(struct rcu_head *head);
diff --git a/mm/slub.c b/mm/slub.c
index 9e91f2016697..33e2692ca618 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2028,9 +2028,12 @@ static inline void mark_objexts_empty(struct slabobj_ext *obj_exts)
if (slab_exts) {
unsigned int offs = obj_to_index(obj_exts_slab->slab_cache,
obj_exts_slab, obj_exts);
+ struct slab slabobj_ext *ext = slab_obj_ext(obj_exts_slab,
+ slab_exts, offs);
+
/* codetag should be NULL */
- WARN_ON(slab_exts[offs].ref.ct);
- set_codetag_empty(&slab_exts[offs].ref);
+ WARN_ON(ext->ref.ct);
+ set_codetag_empty(&ext->ref);
}
}
@@ -2129,7 +2132,7 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
static inline void free_slab_obj_exts(struct slab *slab)
{
- struct slabobj_ext *obj_exts;
+ unsigned long obj_exts;
obj_exts = slab_obj_exts(slab);
if (!obj_exts)
@@ -2142,8 +2145,8 @@ static inline void free_slab_obj_exts(struct slab *slab)
* NULL, therefore replace NULL with CODETAG_EMPTY to indicate that
* the extension for obj_exts is expected to be NULL.
*/
- mark_objexts_empty(obj_exts);
- kfree(obj_exts);
+ mark_objexts_empty((struct slabobj_ext *)obj_exts);
+ kfree((void *)obj_exts);
slab->obj_exts = 0;
}
@@ -2168,9 +2171,10 @@ static inline void free_slab_obj_exts(struct slab *slab)
#ifdef CONFIG_MEM_ALLOC_PROFILING
static inline struct slabobj_ext *
-prepare_slab_obj_exts_hook(struct kmem_cache *s, gfp_t flags, void *p)
+prepare_slab_obj_ext_hook(struct kmem_cache *s, gfp_t flags, void *p)
{
struct slab *slab;
+ unsigned long obj_exts;
if (!p)
return NULL;
@@ -2182,30 +2186,32 @@ prepare_slab_obj_exts_hook(struct kmem_cache *s, gfp_t flags, void *p)
return NULL;
slab = virt_to_slab(p);
- if (!slab_obj_exts(slab) &&
+ obj_exts = slab_obj_exts(slab);
+ if (!obj_exts &&
alloc_slab_obj_exts(slab, s, flags, false)) {
pr_warn_once("%s, %s: Failed to create slab extension vector!\n",
__func__, s->name);
return NULL;
}
- return slab_obj_exts(slab) + obj_to_index(s, slab, p);
+ obj_exts = slab_obj_exts(slab);
+ return slab_obj_ext(slab, obj_exts, obj_to_index(s, slab, p));
}
/* Should be called only if mem_alloc_profiling_enabled() */
static noinline void
__alloc_tagging_slab_alloc_hook(struct kmem_cache *s, void *object, gfp_t flags)
{
- struct slabobj_ext *obj_exts;
+ struct slabobj_ext *obj_ext;
- obj_exts = prepare_slab_obj_exts_hook(s, flags, object);
+ obj_ext = prepare_slab_obj_ext_hook(s, flags, object);
/*
* Currently obj_exts is used only for allocation profiling.
* If other users appear then mem_alloc_profiling_enabled()
* check should be added before alloc_tag_add().
*/
- if (likely(obj_exts))
- alloc_tag_add(&obj_exts->ref, current->alloc_tag, s->size);
+ if (likely(obj_ext))
+ alloc_tag_add(&obj_ext->ref, current->alloc_tag, s->size);
}
static inline void
@@ -2220,8 +2226,8 @@ static noinline void
__alloc_tagging_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p,
int objects)
{
- struct slabobj_ext *obj_exts;
int i;
+ unsigned long obj_exts;
/* slab->obj_exts might not be NULL if it was created for MEMCG accounting. */
if (s->flags & (SLAB_NO_OBJ_EXT | SLAB_NOLEAKTRACE))
@@ -2234,7 +2240,7 @@ __alloc_tagging_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p
for (i = 0; i < objects; i++) {
unsigned int off = obj_to_index(s, slab, p[i]);
- alloc_tag_sub(&obj_exts[off].ref, s->size);
+ alloc_tag_sub(&slab_obj_ext(slab, obj_exts, off)->ref, s->size);
}
}
@@ -2293,7 +2299,7 @@ static __fastpath_inline
void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p,
int objects)
{
- struct slabobj_ext *obj_exts;
+ unsigned long obj_exts;
if (!memcg_kmem_online())
return;
@@ -2308,7 +2314,8 @@ void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p,
static __fastpath_inline
bool memcg_slab_post_charge(void *p, gfp_t flags)
{
- struct slabobj_ext *slab_exts;
+ unsigned long obj_exts;
+ struct slabobj_ext *obj_ext;
struct kmem_cache *s;
struct folio *folio;
struct slab *slab;
@@ -2348,10 +2355,11 @@ bool memcg_slab_post_charge(void *p, gfp_t flags)
return true;
/* Ignore already charged objects. */
- slab_exts = slab_obj_exts(slab);
- if (slab_exts) {
+ obj_exts = slab_obj_exts(slab);
+ if (obj_exts) {
off = obj_to_index(s, slab, p);
- if (unlikely(slab_exts[off].objcg))
+ obj_ext = slab_obj_ext(slab, obj_exts, off);
+ if (unlikely(obj_ext->objcg))
return true;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [RFC V2 PATCH 3/5] mm/slab: use stride to access slabobj_ext
2025-08-27 11:37 [RFC V2 PATCH 0/5] mm/slab: reduce slab accounting memory overhead by allocating slabobj_ext metadata within unused slab space Harry Yoo
2025-08-27 11:37 ` [RFC V2 PATCH 1/5] mm/slab: ensure all metadata in slab object is word-aligned Harry Yoo
2025-08-27 11:37 ` [RFC V2 PATCH 2/5] mm/slab: abstract slabobj_ext access via new slab_obj_ext() helper Harry Yoo
@ 2025-08-27 11:37 ` Harry Yoo
2025-08-27 11:37 ` [RFC V2 PATCH 4/5] mm/slab: save memory by allocating slabobj_ext array from leftover Harry Yoo
2025-08-27 11:37 ` [RFC V2 PATCH 5/5] mm/slab: place slabobj_ext metadata in unused space within s->size Harry Yoo
4 siblings, 0 replies; 6+ messages in thread
From: Harry Yoo @ 2025-08-27 11:37 UTC (permalink / raw)
To: akpm, vbabka
Cc: andreyknvl, cl, dvyukov, glider, hannes, linux-mm, mhocko,
muchun.song, rientjes, roman.gushchin, ryabinin.a.a, shakeel.butt,
surenb, vincenzo.frascino, yeoreum.yun, harry.yoo
Use a configurable stride value when accessing slab object extension
metadata instead of assuming a fixed sizeof(struct slabobj_ext).
Store stride value in the lower 16 bits of struct slab.__page_type.
This allows for flexibility in cases where the extension is embedded
within slab objects.
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
---
mm/slab.h | 18 ++++++++++++++----
mm/slub.c | 2 ++
2 files changed, 16 insertions(+), 4 deletions(-)
diff --git a/mm/slab.h b/mm/slab.h
index 4eb63360fdb5..3bded6bd0152 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -540,6 +540,15 @@ static inline unsigned long slab_obj_exts(struct slab *slab)
return obj_exts & ~OBJEXTS_FLAGS_MASK;
}
+static inline void slab_set_stride(struct slab *slab, unsigned int stride)
+{
+ slab->__page_type |= stride;
+}
+static inline unsigned int slab_get_stride(struct slab *slab)
+{
+ return slab->__page_type & USHRT_MAX;
+}
+
/*
* slab_obj_ext - get the pointer to the slab object extension metadata
* associated with an object in a slab.
@@ -553,13 +562,10 @@ static inline struct slabobj_ext *slab_obj_ext(struct slab *slab,
unsigned long obj_exts,
unsigned int index)
{
- struct slabobj_ext *obj_ext;
-
VM_WARN_ON_ONCE(!slab_obj_exts(slab));
VM_WARN_ON_ONCE(obj_exts != slab_obj_exts(slab));
- obj_ext = (struct slabobj_ext *)obj_exts;
- return &obj_ext[index];
+ return (struct slabobj_ext *)(obj_exts + slab_get_stride(slab) * index);
}
int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
@@ -578,6 +584,10 @@ static inline struct slabobj_ext *slab_obj_ext(struct slab *slab,
return NULL;
}
+static inline void slab_set_stride(struct slab *slab, unsigned int stride) { }
+static inline unsigned int slab_get_stride(struct slab *slab) { return 0; }
+
+
#endif /* CONFIG_SLAB_OBJ_EXT */
static inline enum node_stat_item cache_vmstat_idx(struct kmem_cache *s)
diff --git a/mm/slub.c b/mm/slub.c
index 33e2692ca618..b80bf3a24ab9 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2107,6 +2107,8 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
#endif
old_exts = READ_ONCE(slab->obj_exts);
handle_failed_objexts_alloc(old_exts, vec, objects);
+ slab_set_stride(slab, sizeof(struct slabobj_ext));
+
if (new_slab) {
/*
* If the slab is brand new and nobody can yet access its
--
2.43.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [RFC V2 PATCH 4/5] mm/slab: save memory by allocating slabobj_ext array from leftover
2025-08-27 11:37 [RFC V2 PATCH 0/5] mm/slab: reduce slab accounting memory overhead by allocating slabobj_ext metadata within unused slab space Harry Yoo
` (2 preceding siblings ...)
2025-08-27 11:37 ` [RFC V2 PATCH 3/5] mm/slab: use stride to access slabobj_ext Harry Yoo
@ 2025-08-27 11:37 ` Harry Yoo
2025-08-27 11:37 ` [RFC V2 PATCH 5/5] mm/slab: place slabobj_ext metadata in unused space within s->size Harry Yoo
4 siblings, 0 replies; 6+ messages in thread
From: Harry Yoo @ 2025-08-27 11:37 UTC (permalink / raw)
To: akpm, vbabka
Cc: andreyknvl, cl, dvyukov, glider, hannes, linux-mm, mhocko,
muchun.song, rientjes, roman.gushchin, ryabinin.a.a, shakeel.butt,
surenb, vincenzo.frascino, yeoreum.yun, harry.yoo
The leftover space in a slab is always smaller than s->size, and
kmem caches for large objects that are not power-of-two sizes tend to have
a greater amount of leftover space per slab. In some cases, the leftover
space is larger than the size of the slabobj_ext array for the slab.
An excellent example of such a cache is ext4_inode_cache. On my system,
the object size is 1144, with a preferred order of 3, 28 objects per slab,
and 736 bytes of leftover space per slab.
Since the size of the slabobj_ext array is only 224 bytes (w/o mem
profiling) or 448 bytes (w/ mem profiling) per slab, the entire array
fits within the leftover space.
Allocate the slabobj_exts array from this unused space instead of using
kcalloc(), when it is large enough. The array is always allocated when
creating new slabs, because implementing lazy allocation correctly is
difficult without expensive synchronization.
To avoid unnecessary overhead when MEMCG (with SLAB_ACCOUNT) and
MEM_ALLOC_PROFILING are not used for the cache, only allocate the
slabobj_ext array only when either of them are enabled when slabs are
created.
[ MEMCG=y, MEM_ALLOC_PROFILING=y ]
Before patch (run updatedb):
Slab: 5815196 kB
SReclaimable: 5042824 kB
SUnreclaim: 772372 kB
After patch (run updatedb):
Slab: 5748664 kB
SReclaimable: 5041608 kB
SUnreclaim: 707084 kB (-63.75 MiB)
[ MEMCG=y, MEM_ALLOC_PROFILING=n ]
Before patch (run updatedb):
Slab: 5637764 kB
SReclaimable: 5042428 kB
SUnreclaim: 595284 kB
After patch (run updatedb):
Slab: 5598992 kB
SReclaimable: 5042248 kB
SUnreclaim: 560396 kB (-34.07 MiB)
Enjoy the memory savings!
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
---
mm/slub.c | 145 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 140 insertions(+), 5 deletions(-)
diff --git a/mm/slub.c b/mm/slub.c
index b80bf3a24ab9..ad9a1cae48b2 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -878,6 +878,94 @@ static inline unsigned int get_orig_size(struct kmem_cache *s, void *object)
return *(unsigned int *)p;
}
+#ifdef CONFIG_SLAB_OBJ_EXT
+
+/*
+ * Check if memory cgroup or memory allocation profiling is enabled.
+ * If enabled, SLUB tries to reduce memory overhead of accounting
+ * slab objects. If neither is enabled when this function is called,
+ * the optimization is simply skipped to avoid affecting caches that do not
+ * need slabobj_ext metadata.
+ *
+ * However, this may disable optimization when memory cgroup or memory
+ * allocation profiling is used, but slabs are created too early
+ * even before those subsystems are initialized.
+ */
+static inline bool need_slab_obj_exts(struct kmem_cache *s)
+{
+ if (!mem_cgroup_disabled() && (s->flags & SLAB_ACCOUNT))
+ return true;
+
+ if (mem_alloc_profiling_enabled())
+ return true;
+
+ return false;
+}
+
+static inline unsigned int obj_exts_size_in_slab(struct slab *slab)
+{
+ return sizeof(struct slabobj_ext) * slab->objects;
+}
+
+static inline unsigned long obj_exts_offset_in_slab(struct kmem_cache *s,
+ struct slab *slab)
+{
+ unsigned long objext_offset;
+
+ objext_offset = s->red_left_pad + s->size * slab->objects;
+ objext_offset = ALIGN(objext_offset, sizeof(struct slabobj_ext));
+ return objext_offset;
+}
+
+static inline bool obj_exts_fit_within_slab_leftover(struct kmem_cache *s,
+ struct slab *slab)
+{
+ unsigned long objext_offset = obj_exts_offset_in_slab(s, slab);
+ unsigned long objext_size = obj_exts_size_in_slab(slab);
+
+ return objext_offset + objext_size <= slab_size(slab);
+}
+
+static inline bool obj_exts_in_slab(struct kmem_cache *s, struct slab *slab)
+{
+ unsigned long obj_exts;
+
+ if (!obj_exts_fit_within_slab_leftover(s, slab))
+ return false;
+
+ obj_exts = (unsigned long)slab_address(slab);
+ obj_exts += obj_exts_offset_in_slab(s, slab);
+ return obj_exts == slab_obj_exts(slab);
+}
+#else
+static inline bool need_slab_obj_exts(struct kmem_cache *s)
+{
+ return false;
+}
+
+static inline unsigned int obj_exts_size_in_slab(struct slab *slab)
+{
+ return 0;
+}
+
+static inline unsigned long obj_exts_offset_in_slab(struct kmem_cache *s,
+ struct slab *slab)
+{
+ return 0;
+}
+
+static inline bool obj_exts_fit_within_slab_leftover(struct kmem_cache *s,
+ struct slab *slab)
+{
+ return false;
+}
+
+static inline bool obj_exts_in_slab(struct kmem_cache *s, struct slab *slab)
+{
+ return false;
+}
+#endif
+
#ifdef CONFIG_SLUB_DEBUG
static unsigned long object_map[BITS_TO_LONGS(MAX_OBJS_PER_PAGE)];
static DEFINE_SPINLOCK(object_map_lock);
@@ -1406,7 +1494,15 @@ slab_pad_check(struct kmem_cache *s, struct slab *slab)
start = slab_address(slab);
length = slab_size(slab);
end = start + length;
- remainder = length % s->size;
+
+ if (obj_exts_in_slab(s, slab)) {
+ remainder = length;
+ remainder -= obj_exts_offset_in_slab(s, slab);
+ remainder -= obj_exts_size_in_slab(slab);
+ } else {
+ remainder = length % s->size;
+ }
+
if (!remainder)
return;
@@ -2140,6 +2236,11 @@ static inline void free_slab_obj_exts(struct slab *slab)
if (!obj_exts)
return;
+ if (obj_exts_in_slab(slab->slab_cache, slab)) {
+ slab->obj_exts = 0;
+ return;
+ }
+
/*
* obj_exts was created with __GFP_NO_OBJ_EXT flag, therefore its
* corresponding extension will be NULL. alloc_tag_sub() will throw a
@@ -2152,6 +2253,29 @@ static inline void free_slab_obj_exts(struct slab *slab)
slab->obj_exts = 0;
}
+/*
+ * Try to allocate slabobj_ext array from unused space.
+ * This function must be called on a freshly allocated slab to prevent
+ * concurrency problems.
+ */
+static void alloc_slab_obj_exts_early(struct kmem_cache *s, struct slab *slab)
+{
+ void *addr;
+
+ if (!need_slab_obj_exts(s))
+ return;
+
+ if (obj_exts_fit_within_slab_leftover(s, slab)) {
+ addr = slab_address(slab) + obj_exts_offset_in_slab(s, slab);
+ kasan_unpoison_range(addr, obj_exts_size_in_slab(slab));
+ memset(addr, 0, obj_exts_size_in_slab(slab));
+ slab->obj_exts = (unsigned long)addr;
+ if (IS_ENABLED(CONFIG_MEMCG))
+ slab->obj_exts |= MEMCG_DATA_OBJEXTS;
+ slab_set_stride(slab, sizeof(struct slabobj_ext));
+ }
+}
+
#else /* CONFIG_SLAB_OBJ_EXT */
static inline void init_slab_obj_exts(struct slab *slab)
@@ -2168,6 +2292,11 @@ static inline void free_slab_obj_exts(struct slab *slab)
{
}
+static inline void alloc_slab_obj_exts_early(struct kmem_cache *s,
+ struct slab *slab)
+{
+}
+
#endif /* CONFIG_SLAB_OBJ_EXT */
#ifdef CONFIG_MEM_ALLOC_PROFILING
@@ -3125,7 +3254,9 @@ static inline bool shuffle_freelist(struct kmem_cache *s, struct slab *slab)
static __always_inline void account_slab(struct slab *slab, int order,
struct kmem_cache *s, gfp_t gfp)
{
- if (memcg_kmem_online() && (s->flags & SLAB_ACCOUNT))
+ if (memcg_kmem_online() &&
+ (s->flags & SLAB_ACCOUNT) &&
+ !slab_obj_exts(slab))
alloc_slab_obj_exts(slab, s, gfp, true);
mod_node_page_state(slab_pgdat(slab), cache_vmstat_idx(s),
@@ -3184,9 +3315,6 @@ static struct slab *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
slab->objects = oo_objects(oo);
slab->inuse = 0;
slab->frozen = 0;
- init_slab_obj_exts(slab);
-
- account_slab(slab, oo_order(oo), s, flags);
slab->slab_cache = s;
@@ -3195,6 +3323,13 @@ static struct slab *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
start = slab_address(slab);
setup_slab_debug(s, slab, start);
+ init_slab_obj_exts(slab);
+ /*
+ * Poison the slab before initializing the slabobj_ext array
+ * to prevent the array from being overwritten.
+ */
+ alloc_slab_obj_exts_early(s, slab);
+ account_slab(slab, oo_order(oo), s, flags);
shuffle = shuffle_freelist(s, slab);
--
2.43.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [RFC V2 PATCH 5/5] mm/slab: place slabobj_ext metadata in unused space within s->size
2025-08-27 11:37 [RFC V2 PATCH 0/5] mm/slab: reduce slab accounting memory overhead by allocating slabobj_ext metadata within unused slab space Harry Yoo
` (3 preceding siblings ...)
2025-08-27 11:37 ` [RFC V2 PATCH 4/5] mm/slab: save memory by allocating slabobj_ext array from leftover Harry Yoo
@ 2025-08-27 11:37 ` Harry Yoo
4 siblings, 0 replies; 6+ messages in thread
From: Harry Yoo @ 2025-08-27 11:37 UTC (permalink / raw)
To: akpm, vbabka
Cc: andreyknvl, cl, dvyukov, glider, hannes, linux-mm, mhocko,
muchun.song, rientjes, roman.gushchin, ryabinin.a.a, shakeel.butt,
surenb, vincenzo.frascino, yeoreum.yun, harry.yoo
When a cache has high s->align value and s->object_size is not aligned
to it, each object ends up with some unused space because of alignment.
If this wasted space is big enough, we can use it to store the
slabobj_ext metadata instead of wasting it.
On my system, this happens with caches like kmem_cache, mm_struct, pid,
task_struct, sighand_cache, xfs_inode, and others.
To place the slabobj_ext metadata within each object, the existing
slab_obj_ext() logic can still be used by setting:
- slab->obj_exts = slab_address(slab) + s->red_left_zone +
(slabobj_ext offset)
- stride = s->size
slab_obj_ext() doesn't need know where the metadata is stored,
so this method works without adding extra overhead to slab_obj_ext().
A good example benefiting from this optimization is xfs_inode
(object_size: 992, align: 64). To measure memory savings, an XFS
filesystem with 6 millions of files was created, and updatedb was run
within that filesystem.
[ MEMCG=y, MEM_ALLOC_PROFILING=y ]
Before patch (run updatedb):
Slab: 8409212 kB
SReclaimable: 7314864 kB
SUnreclaim: 1094348 kB
After patch (run updatedb):
Slab: 8313324 kB
SReclaimable: 7318176 kB
SUnreclaim: 995148 kB (-96.87 MiB)
[ MEMCG=y, MEM_ALLOC_PROFILING=n ]
Before patch (run updatedb):
Slab: 8081708 kB
SReclaimable: 7314400 kB
SUnreclaim: 767308 kB
After patch (run updatedb):
Slab: 8034676 kB
SReclaimable: 7314532 kB
SUnreclaim: 720144 kB (-46.06 MiB)
Enjoy the memory savings!
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
---
include/linux/slab.h | 3 ++
mm/slab_common.c | 6 ++--
mm/slub.c | 69 ++++++++++++++++++++++++++++++++++++++++++--
3 files changed, 73 insertions(+), 5 deletions(-)
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 680193356ac7..279d35b40e8e 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -59,6 +59,7 @@ enum _slab_flag_bits {
_SLAB_CMPXCHG_DOUBLE,
#ifdef CONFIG_SLAB_OBJ_EXT
_SLAB_NO_OBJ_EXT,
+ _SLAB_OBJ_EXT_IN_OBJ,
#endif
_SLAB_FLAGS_LAST_BIT
};
@@ -240,8 +241,10 @@ enum _slab_flag_bits {
/* Slab created using create_boot_cache */
#ifdef CONFIG_SLAB_OBJ_EXT
#define SLAB_NO_OBJ_EXT __SLAB_FLAG_BIT(_SLAB_NO_OBJ_EXT)
+#define SLAB_OBJ_EXT_IN_OBJ __SLAB_FLAG_BIT(_SLAB_OBJ_EXT_IN_OBJ)
#else
#define SLAB_NO_OBJ_EXT __SLAB_FLAG_UNUSED
+#define SLAB_OBJ_EXT_IN_OBJ __SLAB_FLAG_UNUSED
#endif
/*
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 08f5baee1309..cbd85eecd430 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -43,11 +43,13 @@ DEFINE_MUTEX(slab_mutex);
struct kmem_cache *kmem_cache;
/*
- * Set of flags that will prevent slab merging
+ * Set of flags that will prevent slab merging.
+ * Any flag that adds per-object metadata should be included,
+ * since slab merging can update s->inuse that affects the metadata layout.
*/
#define SLAB_NEVER_MERGE (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER | \
SLAB_TRACE | SLAB_TYPESAFE_BY_RCU | SLAB_NOLEAKTRACE | \
- SLAB_FAILSLAB | SLAB_NO_MERGE)
+ SLAB_FAILSLAB | SLAB_NO_MERGE | SLAB_OBJ_EXT_IN_OBJ)
#define SLAB_MERGE_SAME (SLAB_RECLAIM_ACCOUNT | SLAB_CACHE_DMA | \
SLAB_CACHE_DMA32 | SLAB_ACCOUNT)
diff --git a/mm/slub.c b/mm/slub.c
index ad9a1cae48b2..6689131761c5 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -937,6 +937,26 @@ static inline bool obj_exts_in_slab(struct kmem_cache *s, struct slab *slab)
obj_exts += obj_exts_offset_in_slab(s, slab);
return obj_exts == slab_obj_exts(slab);
}
+
+static bool obj_exts_in_object(struct kmem_cache *s)
+{
+ return s->flags & SLAB_OBJ_EXT_IN_OBJ;
+}
+
+static unsigned int obj_exts_offset_in_object(struct kmem_cache *s)
+{
+ unsigned int offset = get_info_end(s);
+
+ if (kmem_cache_debug_flags(s, SLAB_STORE_USER))
+ offset += sizeof(struct track) * 2;
+
+ if (slub_debug_orig_size(s))
+ offset += ALIGN(sizeof(unsigned int), sizeof(unsigned long));
+
+ offset += kasan_metadata_size(s, false);
+
+ return offset;
+}
#else
static inline bool need_slab_obj_exts(struct kmem_cache *s)
{
@@ -964,6 +984,17 @@ static inline bool obj_exts_in_slab(struct kmem_cache *s, struct slab *slab)
{
return false;
}
+
+static inline bool obj_exts_in_object(struct kmem_cache *s)
+{
+ return false;
+}
+
+static inline unsigned int obj_exts_offset_in_object(struct kmem_cache *s)
+{
+ return 0;
+}
+
#endif
#ifdef CONFIG_SLUB_DEBUG
@@ -1272,6 +1303,9 @@ static void print_trailer(struct kmem_cache *s, struct slab *slab, u8 *p)
off += kasan_metadata_size(s, false);
+ if (obj_exts_in_object(s))
+ off += sizeof(struct slabobj_ext);
+
if (off != size_from_object(s))
/* Beginning of the filler is the free pointer */
print_section(KERN_ERR, "Padding ", p + off,
@@ -1441,7 +1475,10 @@ check_bytes_and_report(struct kmem_cache *s, struct slab *slab,
* A. Free pointer (if we cannot overwrite object on free)
* B. Tracking data for SLAB_STORE_USER
* C. Original request size for kmalloc object (SLAB_STORE_USER enabled)
- * D. Padding to reach required alignment boundary or at minimum
+ * D. KASAN alloc metadata (KASAN enabled)
+ * E. struct slabobj_ext to store accounting metadata
+ * (SLAB_OBJ_EXT_IN_OBJ enabled)
+ * F. Padding to reach required alignment boundary or at minimum
* one word if debugging is on to be able to detect writes
* before the word boundary.
*
@@ -1470,6 +1507,9 @@ static int check_pad_bytes(struct kmem_cache *s, struct slab *slab, u8 *p)
off += kasan_metadata_size(s, false);
+ if (obj_exts_in_object(s))
+ off += sizeof(struct slabobj_ext);
+
if (size_from_object(s) == off)
return 1;
@@ -2236,7 +2276,8 @@ static inline void free_slab_obj_exts(struct slab *slab)
if (!obj_exts)
return;
- if (obj_exts_in_slab(slab->slab_cache, slab)) {
+ if (obj_exts_in_slab(slab->slab_cache, slab) ||
+ obj_exts_in_object(slab->slab_cache)) {
slab->obj_exts = 0;
return;
}
@@ -2273,6 +2314,21 @@ static void alloc_slab_obj_exts_early(struct kmem_cache *s, struct slab *slab)
if (IS_ENABLED(CONFIG_MEMCG))
slab->obj_exts |= MEMCG_DATA_OBJEXTS;
slab_set_stride(slab, sizeof(struct slabobj_ext));
+ } else if (obj_exts_in_object(s)) {
+ unsigned int offset = obj_exts_offset_in_object(s);
+
+ slab->obj_exts = (unsigned long)slab_address(slab);
+ slab->obj_exts += s->red_left_pad;
+ slab->obj_exts += obj_exts_offset_in_object(s);
+ if (IS_ENABLED(CONFIG_MEMCG))
+ slab->obj_exts |= MEMCG_DATA_OBJEXTS;
+ slab_set_stride(slab, s->size);
+
+ for_each_object(addr, s, slab_address(slab), slab->objects) {
+ kasan_unpoison_range(addr + offset,
+ sizeof(struct slabobj_ext));
+ memset(addr + offset, 0, sizeof(struct slabobj_ext));
+ }
}
}
@@ -7354,6 +7410,7 @@ static int calculate_sizes(struct kmem_cache_args *args, struct kmem_cache *s)
{
slab_flags_t flags = s->flags;
unsigned int size = s->object_size;
+ unsigned int aligned_size;
unsigned int order;
/*
@@ -7466,7 +7523,13 @@ static int calculate_sizes(struct kmem_cache_args *args, struct kmem_cache *s)
* offset 0. In order to align the objects we have to simply size
* each object to conform to the alignment.
*/
- size = ALIGN(size, s->align);
+ aligned_size = ALIGN(size, s->align);
+#ifdef CONFIG_SLAB_OBJ_EXT
+ if (aligned_size - size >= sizeof(struct slabobj_ext))
+ s->flags |= SLAB_OBJ_EXT_IN_OBJ;
+#endif
+ size = aligned_size;
+
s->size = size;
s->reciprocal_size = reciprocal_value(size);
order = calculate_order(size);
--
2.43.0
^ permalink raw reply related [flat|nested] 6+ messages in thread