[RFC V2 PATCH 0/5] mm/slab: reduce slab accounting memory overhead by allocating slabobj_ext metadata within unused slab space

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Harry Yoo <harry.yoo@oracle.com>
To: akpm@linux-foundation.org, vbabka@suse.cz
Cc: andreyknvl@gmail.com, cl@linux.com, dvyukov@google.com,
	glider@google.com, hannes@cmpxchg.org, linux-mm@kvack.org,
	mhocko@kernel.org, muchun.song@linux.dev, rientjes@google.com,
	roman.gushchin@linux.dev, ryabinin.a.a@gmail.com,
	shakeel.butt@linux.dev, surenb@google.com,
	vincenzo.frascino@arm.com, yeoreum.yun@arm.com,
	harry.yoo@oracle.com
Subject: [RFC V2 PATCH 0/5] mm/slab: reduce slab accounting memory overhead by allocating slabobj_ext metadata within unused slab space
Date: Wed, 27 Aug 2025 20:37:13 +0900	[thread overview]
Message-ID: <20250827113726.707801-1-harry.yoo@oracle.com> (raw)

RFC v1: https://lore.kernel.org/linux-mm/20250613063336.5833-1-harry.yoo@oracle.com

RFC v1 -> v2:
- Adjust Vlastimil's suggestion (patch 2, 3, 5), implemented case 2
  described below. Thanks!
- Fix unaligned metadata address with SLAB_STORE_USER (patch 1)
- When memcg and mem profiling are disabled, do not allocate slabobj_ext
  metadata (patch 4)

When CONFIG_MEMCG and CONFIG_MEM_ALLOC_PROFILING are enabled,
the kernel allocates two pointers per object: one for the memory cgroup
(obj_cgroup) to which it belongs, and another for the code location
that requested the allocation.

In two special cases, this overhead can be eliminated by allocating
slabobj_ext metadata from unused space within a slab:

  Case 1. The "leftover" space after the last slab object is larger than
          the size of an array of slabobj_ext.

  Case 2. The per-object alignment padding is larger than
          sizeof(struct slabobj_ext).

For these two cases, one or two pointers can be saved per slab object.
Examples: ext4 inode cache (case 1) and xfs inode cache (case 2).
That's approximately 0.7-0.8% (memcg) or 1.5-1.6%% (memcg + mem profiling)
of the total inode cache size.

Implementing case 2 is not straightforward, because the existing code
assumes that slab->obj_exts is an array of slabobj_ext, while case 2
breaks the assumption.

As suggested by Vlastimil, abstract access to individual slabobj_ext
metadata via a new helper named slab_obj_ext():

static inline struct slabobj_ext *slab_obj_ext(struct slab *slab,
                                               unsigned long obj_exts,
                                               unsigned int index)
{
        return (struct slabobj_ext *)(obj_exts + slab_get_stride(slab) * index);
} 

In the normal case (including case 1), slab->obj_exts points to an array
of slabobj_ext, and the stride is sizeof(struct slabobj_ext).

In case 2, the stride is s->size and
slab->obj_exts = slab_address(slab) + s->red_left_pad + (offset of slabobj_ext)

With this approach, the memcg charging fastpath doesn't need to care the
storage method of slabobj_ext.

# Microbenchmark Results

To measure the performance impact of this series, Vlastimil's
microbenchmark [1] (modified to add more sizes) is used.
Because performance is measured in cycles, lower is better.

The baseline is slab tree without the series, and the compared kernel
includes patch 2 through 5. (Performance is measured before writing
patch 1)

[1] https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/commit/?h=slub-percpu-sheaves-v5-benchmarking&id=3def9df24be06fd609395f178d0cecec2c02154e

SET                                                     |  no_memcg Δ% |     memcg Δ%
------------------------------------------------------------------------------------
BATCH SIZE: 1 SHUFFLED: NO SIZE: 64                     |        -0.43 |        -0.00
BATCH SIZE: 1 SHUFFLED: NO SIZE: 1120                   |         0.28 |         0.29
BATCH SIZE: 1 SHUFFLED: NO SIZE: 96 (HWCACHE ALIGN)     |        -0.24 |        -0.19
BATCH SIZE: 10 SHUFFLED: NO SIZE: 64                    |        -1.09 |        -0.74
BATCH SIZE: 10 SHUFFLED: NO SIZE: 1120                  |         0.92 |         0.36
BATCH SIZE: 10 SHUFFLED: NO SIZE: 96 (HWCACHE ALIGN)    |        -0.44 |        -0.72
BATCH SIZE: 100 SHUFFLED: NO SIZE: 64                   |        -1.66 |        -2.38
BATCH SIZE: 100 SHUFFLED: NO SIZE: 1120                 |        -1.36 |        -1.36
BATCH SIZE: 100 SHUFFLED: NO SIZE: 96 (HWCACHE ALIGN)   |        -1.88 |        -2.54
BATCH SIZE: 1000 SHUFFLED: NO SIZE: 64                  |        -2.98 |        -2.79
BATCH SIZE: 1000 SHUFFLED: NO SIZE: 1120                |        -1.55 |        -1.48
BATCH SIZE: 1000 SHUFFLED: NO SIZE: 96 (HWCACHE ALIGN)  |        -3.39 |        -4.05
BATCH SIZE: 10 SHUFFLED: YES SIZE: 64                   |        -0.64 |        -1.22
BATCH SIZE: 10 SHUFFLED: YES SIZE: 1120                 |        -0.74 |        -0.42
BATCH SIZE: 10 SHUFFLED: YES SIZE: 96 (HWCACHE ALIGN)   |        -0.59 |        -1.11
BATCH SIZE: 100 SHUFFLED: YES SIZE: 64                  |        -1.99 |        -2.80
BATCH SIZE: 100 SHUFFLED: YES SIZE: 1120                |        -2.56 |        -2.74
BATCH SIZE: 100 SHUFFLED: YES SIZE: 96 (HWCACHE ALIGN)  |        -3.43 |        -3.21
BATCH SIZE: 1000 SHUFFLED: YES SIZE: 64                 |        -3.40 |        -3.13
BATCH SIZE: 1000 SHUFFLED: YES SIZE: 1120               |        -1.89 |        -1.99
BATCH SIZE: 1000 SHUFFLED: YES SIZE: 96 (HWCACHE ALIGN) |        -4.41 |        -4.53

No red flag from the microbenchmark results.

Note 1: I suspect that the reduction in cycles is due to changes in code
layout rather than performance benefits of the series, because when I
applied patch 2 and 3, delta is about 0.3%~2.5%, and then delta drops
(as shown in the table) after applying patch 4 and 5, which doesn't
make much sense.

Note 2: When the kernel was modified to allocate slabobj_ext even when
SLAB_ACCOUNT was not set and memory profiling was disabled,
the "no_memcg Δ%" regressed by 10%. This is the main reason slabobj_ext
is allocated only when either memcg or memory profiling requires
the metadata at the time slabs are created.

# TODO

- Do not unpoison slabobj_ext in case 2. Instead, disable KASAN
  while accessing slabobj_ext if s->flags & SLAB_OBJ_EXT_IN_OBJ
  is not zero.

Harry Yoo (5):
  mm/slab: ensure all metadata in slab object is word-aligned
  mm/slab: abstract slabobj_ext access via new slab_obj_ext() helper
  mm/slab: use stride to access slabobj_ext
  mm/slab: save memory by allocating slabobj_ext array from leftover
  mm/slab: place slabobj_ext metadata in unused space within s->size

 include/linux/slab.h |   3 +
 mm/kasan/kasan.h     |   4 +-
 mm/memcontrol.c      |  23 ++--
 mm/slab.h            |  53 ++++++++-
 mm/slab_common.c     |   6 +-
 mm/slub.c            | 276 ++++++++++++++++++++++++++++++++++++++-----
 6 files changed, 317 insertions(+), 48 deletions(-)

-- 
2.43.0

next             reply	other threads:[~2025-08-27 11:37 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-27 11:37 Harry Yoo [this message]
2025-08-27 11:37 ` [RFC V2 PATCH 1/5] mm/slab: ensure all metadata in slab object is word-aligned Harry Yoo
2025-08-27 11:37 ` [RFC V2 PATCH 2/5] mm/slab: abstract slabobj_ext access via new slab_obj_ext() helper Harry Yoo
2025-08-27 11:37 ` [RFC V2 PATCH 3/5] mm/slab: use stride to access slabobj_ext Harry Yoo
2025-08-27 11:37 ` [RFC V2 PATCH 4/5] mm/slab: save memory by allocating slabobj_ext array from leftover Harry Yoo
2025-08-27 11:37 ` [RFC V2 PATCH 5/5] mm/slab: place slabobj_ext metadata in unused space within s->size Harry Yoo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250827113726.707801-1-harry.yoo@oracle.com \
    --to=harry.yoo@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=andreyknvl@gmail.com \
    --cc=cl@linux.com \
    --cc=dvyukov@google.com \
    --cc=glider@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=muchun.song@linux.dev \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=ryabinin.a.a@gmail.com \
    --cc=shakeel.butt@linux.dev \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=vincenzo.frascino@arm.com \
    --cc=yeoreum.yun@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).