From: Harry Yoo <harry.yoo@oracle.com>
To: Vlastimil Babka <vbabka@suse.cz>,
David Rientjes <rientjes@google.com>,
Christoph Lameter <cl@linux.com>,
Andrew Morton <akpm@linux-foundation.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
Michal Hocko <mhocko@kernel.org>,
Roman Gushchin <roman.gushchin@linux.dev>,
Shakeel Butt <shakeel.butt@linux.dev>,
Muchun Song <muchun.song@linux.dev>,
Suren Baghdasaryan <surenb@google.com>,
Kent Overstreet <kent.overstreet@linux.dev>,
Andrey Ryabinin <ryabinin.a.a@gmail.com>,
Alexander Potapenko <glider@google.com>,
Andrey Konovalov <andreyknvl@gmail.com>,
Dmitry Vyukov <dvyukov@google.com>,
Vincenzo Frascino <vincenzo.frascino@arm.com>,
linux-mm@kvack.org
Subject: Re: [RFC PATCH] mm/slab: save memory by allocating slabobj_ext array from leftover
Date: Fri, 13 Jun 2025 16:11:08 +0900 [thread overview]
Message-ID: <aEvPDA-EbCB6bBk0@hyeyoo> (raw)
In-Reply-To: <20250613063336.5833-1-harry.yoo@oracle.com>
On Fri, Jun 13, 2025 at 03:33:36PM +0900, Harry Yoo wrote:
> The leftover space in a slab is always smaller than s->size, and
> kmem caches for large objects that are not power-of-two sizes tend to have
> a greater amount of leftover space per slab. In some cases, the leftover
> space is larger than the size of the slabobj_ext array for the slab.
>
> An excellent example of such a cache is ext4_inode_cache. On my system,
> the object size is 1144, with a preferred order of 3, 28 objects per slab,
> and 736 bytes of leftover space per slab.
>
> Since the size of the slabobj_ext array is only 224 bytes (w/o mem
> profiling) or 448 bytes (w/ mem profiling) per slab, the entire array
> fits within the leftover space.
>
> Allocate slabobj_exts array from this unused space instead of using
> kcalloc(), when it is large enough.
>
> Enjoy the memory savings!
Oops, I put this sentence twice in the changelog ;)
There's also a build error I missed when both MEMCG and
MEM_ALLOC_PROFILING are not configured:
hyeyoo@hyeyoo ~/slab-misc (slab/for-next)> make -j24 mm/slub.o
DESCEND objtool
CALL scripts/checksyscalls.sh
INSTALL libsubcmd_headers
CC mm/slub.o
mm/slub.c: In function ‘unaccount_slab’:
mm/slub.c:2682:23: error: ‘struct slab’ has no member named ‘obj_exts’; did you mean ‘objects’?
2682 | slab->obj_exts = 0;
| ^~~~~~~~
| objects
make[3]: *** [scripts/Makefile.build:203: mm/slub.o] Error 1
make[2]: *** [scripts/Makefile.build:461: mm] Error 2
make[1]: *** [/home/hyeyoo/slab-misc/Makefile:2011: .] Error 2
make: *** [Makefile:248: __sub-make] Error 2
Will fix it in the next revision, but let me wait a bit for some
feedback.
> [ MEMCG=y, MEM_ALLOC_PROFILING=y ]
>
> Before patch (run updatedb):
> Slab: 5815196 kB
> SReclaimable: 5042824 kB
> SUnreclaim: 772372 kB
>
> After patch (run updatedb):
> Slab: 5748664 kB
> SReclaimable: 5041608 kB
> SUnreclaim: 707084 kB (-63.75 MiB)
>
> [ MEMCG=y, MEM_ALLOC_PROFILING=n ]
>
> Before patch (run updatedb):
> Slab: 5637764 kB
> SReclaimable: 5042428 kB
> SUnreclaim: 595284 kB
>
> After patch (run updatedb):
> Slab: 5598992 kB
> SReclaimable: 5042248 kB
> SUnreclaim: 560396 kB (-34.07 MiB)
>
> This saves from hundreds of KiBs up to several tens of MiBs of memory
> on my machine, depending on the config and slab memory usage.
>
> Enjoy the memory savings!
>
> Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
> ---
> KASAN folks: Should we also poison the array before freeing the slab?
> If so, which API would be appropriate to use?
>
> mm/slub.c | 95 ++++++++++++++++++++++++++++++++++++++++++++++++++-----
> 1 file changed, 87 insertions(+), 8 deletions(-)
>
> diff --git a/mm/slub.c b/mm/slub.c
> index cf3637324243..20f0f76f0c65 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -785,6 +785,49 @@ static inline unsigned int get_orig_size(struct kmem_cache *s, void *object)
> return *(unsigned int *)p;
> }
>
> +#ifdef CONFIG_SLAB_OBJ_EXT
> +static inline unsigned int obj_exts_size(struct slab *slab)
> +{
> + return sizeof(struct slabobj_ext) * slab->objects;
> +}
> +
> +static unsigned long obj_exts_offset(struct kmem_cache *s,
> + struct slab *slab)
> +{
> + unsigned long objext_offset;
> +
> + objext_offset = s->red_left_pad + s->size * slab->objects;
> + objext_offset = ALIGN(objext_offset, sizeof(struct slabobj_ext));
> + return objext_offset;
> +}
> +
> +static bool can_alloc_obj_exts_from_leftover(struct kmem_cache *s,
> + struct slab *slab)
> +{
> + unsigned long objext_offset = obj_exts_offset(s, slab);
> + unsigned long objext_size = obj_exts_size(slab);
> +
> + return objext_offset + objext_size <= slab_size(slab);
> +}
> +#else
> +static inline unsigned int obj_exts_size(struct slab *slab)
> +{
> + return 0;
> +}
> +
> +static unsigned long obj_exts_offset(struct kmem_cache *s,
> + struct slab *slab)
> +{
> + return 0;
> +}
> +
> +static inline bool can_alloc_obj_exts_from_leftover(struct kmem_cache *s,
> + struct slab *slab)
> +{
> + return false;
> +}
> +#endif
> +
> #ifdef CONFIG_SLUB_DEBUG
> static unsigned long object_map[BITS_TO_LONGS(MAX_OBJS_PER_PAGE)];
> static DEFINE_SPINLOCK(object_map_lock);
> @@ -1307,7 +1350,15 @@ slab_pad_check(struct kmem_cache *s, struct slab *slab)
> start = slab_address(slab);
> length = slab_size(slab);
> end = start + length;
> - remainder = length % s->size;
> +
> + if (can_alloc_obj_exts_from_leftover(s, slab)) {
> + remainder = length;
> + remainder -= obj_exts_offset(s, slab);
> + remainder -= obj_exts_size(slab);
> + } else {
> + remainder = length % s->size;
> + }
> +
> if (!remainder)
> return;
>
> @@ -2049,6 +2100,21 @@ static noinline void free_slab_obj_exts(struct slab *slab)
> slab->obj_exts = 0;
> }
>
> +static void try_to_alloc_obj_exts_from_leftover(struct kmem_cache *s,
> + struct slab *slab)
> +{
> + if (can_alloc_obj_exts_from_leftover(s, slab)) {
> + void *addr = slab_address(slab) + obj_exts_offset(s, slab);
> +
> + slab->obj_exts = (unsigned long)addr;
> + kasan_unpoison_range(addr, obj_exts_size(slab));
> + memset(addr, 0, obj_exts_size(slab));
> +#ifdef CONFIG_MEMCG
> + slab->obj_exts |= MEMCG_DATA_OBJEXTS;
> +#endif
> + }
> +}
> +
> static inline bool need_slab_obj_ext(void)
> {
> if (mem_alloc_profiling_enabled())
> @@ -2077,6 +2143,11 @@ static inline void free_slab_obj_exts(struct slab *slab)
> {
> }
>
> +static inline void try_to_alloc_obj_exts_from_leftover(struct kmem_cache *s,
> + struct slab *slab)
> +{
> +}
> +
> static inline bool need_slab_obj_ext(void)
> {
> return false;
> @@ -2592,7 +2663,9 @@ static inline bool shuffle_freelist(struct kmem_cache *s, struct slab *slab)
> static __always_inline void account_slab(struct slab *slab, int order,
> struct kmem_cache *s, gfp_t gfp)
> {
> - if (memcg_kmem_online() && (s->flags & SLAB_ACCOUNT))
> + if (memcg_kmem_online() &&
> + (s->flags & SLAB_ACCOUNT) &&
> + !slab_obj_exts(slab))
> alloc_slab_obj_exts(slab, s, gfp, true);
>
> mod_node_page_state(slab_pgdat(slab), cache_vmstat_idx(s),
> @@ -2602,11 +2675,16 @@ static __always_inline void account_slab(struct slab *slab, int order,
> static __always_inline void unaccount_slab(struct slab *slab, int order,
> struct kmem_cache *s)
> {
> - if (memcg_kmem_online() || need_slab_obj_ext())
> - free_slab_obj_exts(slab);
> -
> mod_node_page_state(slab_pgdat(slab), cache_vmstat_idx(s),
> -(PAGE_SIZE << order));
> +
> + if (can_alloc_obj_exts_from_leftover(s, slab)) {
> + slab->obj_exts = 0;
> + return;
> + }
> +
> + if (memcg_kmem_online() || need_slab_obj_ext())
> + free_slab_obj_exts(slab);
> }
>
> static struct slab *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
> @@ -2647,9 +2725,6 @@ static struct slab *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
> slab->objects = oo_objects(oo);
> slab->inuse = 0;
> slab->frozen = 0;
> - init_slab_obj_exts(slab);
> -
> - account_slab(slab, oo_order(oo), s, flags);
>
> slab->slab_cache = s;
>
> @@ -2658,6 +2733,10 @@ static struct slab *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
> start = slab_address(slab);
>
> setup_slab_debug(s, slab, start);
> + init_slab_obj_exts(slab);
> + /* Initialize the slabobj_ext array after poisoning the slab */
> + try_to_alloc_obj_exts_from_leftover(s, slab);
> + account_slab(slab, oo_order(oo), s, flags);
>
> shuffle = shuffle_freelist(s, slab);
>
> --
> 2.43.0
>
--
Cheers,
Harry / Hyeonggon
next prev parent reply other threads:[~2025-06-13 7:11 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-13 6:33 [RFC PATCH] mm/slab: save memory by allocating slabobj_ext array from leftover Harry Yoo
2025-06-13 7:11 ` Harry Yoo [this message]
2025-06-13 11:42 ` Yeoreum Yun
2025-06-13 17:58 ` Harry Yoo
2025-06-13 16:04 ` Christoph Lameter (Ampere)
2025-06-13 17:47 ` Harry Yoo
2025-06-16 11:00 ` Harry Yoo
2025-06-19 7:56 ` Vlastimil Babka
2025-08-05 11:57 ` Harry Yoo
2025-08-08 14:44 ` Vlastimil Babka
2025-08-27 11:40 ` Harry Yoo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aEvPDA-EbCB6bBk0@hyeyoo \
--to=harry.yoo@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=andreyknvl@gmail.com \
--cc=cl@linux.com \
--cc=dvyukov@google.com \
--cc=glider@google.com \
--cc=hannes@cmpxchg.org \
--cc=kent.overstreet@linux.dev \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=muchun.song@linux.dev \
--cc=rientjes@google.com \
--cc=roman.gushchin@linux.dev \
--cc=ryabinin.a.a@gmail.com \
--cc=shakeel.butt@linux.dev \
--cc=surenb@google.com \
--cc=vbabka@suse.cz \
--cc=vincenzo.frascino@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.