* [RFC PATCH V3 1/7] mm/slab: allow specifying freepointer offset when using constructor
2025-10-27 12:28 [RFC PATCH V3 0/7] mm/slab: reduce slab accounting memory overhead by allocating slabobj_ext metadata within unused slab space Harry Yoo
@ 2025-10-27 12:28 ` Harry Yoo
2025-10-28 17:43 ` Suren Baghdasaryan
2025-10-27 12:28 ` [RFC PATCH V3 2/7] ext4: specify the free pointer offset for ext4_inode_cache Harry Yoo
` (6 subsequent siblings)
7 siblings, 1 reply; 34+ messages in thread
From: Harry Yoo @ 2025-10-27 12:28 UTC (permalink / raw)
To: akpm, vbabka
Cc: andreyknvl, cl, dvyukov, glider, hannes, linux-mm, mhocko,
muchun.song, rientjes, roman.gushchin, ryabinin.a.a, shakeel.butt,
surenb, vincenzo.frascino, yeoreum.yun, harry.yoo, tytso,
adilger.kernel, linux-ext4, linux-kernel
When a slab cache has a constructor, the free pointer is placed after the
object because certain fields must not be overwritten even after the
object is freed.
However, some fields that the constructor does not care can safely be
overwritten. Allow specifying the free pointer offset within the object,
reducing the overall object size when some fields can be reused for the
free pointer.
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
---
mm/slab_common.c | 2 +-
mm/slub.c | 6 ++++--
2 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 932d13ada36c..2c2ed2452271 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -231,7 +231,7 @@ static struct kmem_cache *create_cache(const char *name,
err = -EINVAL;
if (args->use_freeptr_offset &&
(args->freeptr_offset >= object_size ||
- !(flags & SLAB_TYPESAFE_BY_RCU) ||
+ (!(flags & SLAB_TYPESAFE_BY_RCU) && !args->ctor) ||
!IS_ALIGNED(args->freeptr_offset, __alignof__(freeptr_t))))
goto out;
diff --git a/mm/slub.c b/mm/slub.c
index 462a39d57b3a..64705cb3734f 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -7781,7 +7781,8 @@ static int calculate_sizes(struct kmem_cache_args *args, struct kmem_cache *s)
s->inuse = size;
if (((flags & SLAB_TYPESAFE_BY_RCU) && !args->use_freeptr_offset) ||
- (flags & SLAB_POISON) || s->ctor ||
+ (flags & SLAB_POISON) ||
+ (s->ctor && !args->use_freeptr_offset) ||
((flags & SLAB_RED_ZONE) &&
(s->object_size < sizeof(void *) || slub_debug_orig_size(s)))) {
/*
@@ -7802,7 +7803,8 @@ static int calculate_sizes(struct kmem_cache_args *args, struct kmem_cache *s)
*/
s->offset = size;
size += sizeof(void *);
- } else if ((flags & SLAB_TYPESAFE_BY_RCU) && args->use_freeptr_offset) {
+ } else if (((flags & SLAB_TYPESAFE_BY_RCU) || s->ctor) &&
+ args->use_freeptr_offset) {
s->offset = args->freeptr_offset;
} else {
/*
--
2.43.0
^ permalink raw reply related [flat|nested] 34+ messages in thread* Re: [RFC PATCH V3 1/7] mm/slab: allow specifying freepointer offset when using constructor
2025-10-27 12:28 ` [RFC PATCH V3 1/7] mm/slab: allow specifying freepointer offset when using constructor Harry Yoo
@ 2025-10-28 17:43 ` Suren Baghdasaryan
2025-10-29 7:10 ` Harry Yoo
0 siblings, 1 reply; 34+ messages in thread
From: Suren Baghdasaryan @ 2025-10-28 17:43 UTC (permalink / raw)
To: Harry Yoo
Cc: akpm, vbabka, andreyknvl, cl, dvyukov, glider, hannes, linux-mm,
mhocko, muchun.song, rientjes, roman.gushchin, ryabinin.a.a,
shakeel.butt, vincenzo.frascino, yeoreum.yun, tytso,
adilger.kernel, linux-ext4, linux-kernel
On Mon, Oct 27, 2025 at 5:29 AM Harry Yoo <harry.yoo@oracle.com> wrote:
>
> When a slab cache has a constructor, the free pointer is placed after the
> object because certain fields must not be overwritten even after the
> object is freed.
>
> However, some fields that the constructor does not care can safely be
> overwritten. Allow specifying the free pointer offset within the object,
> reducing the overall object size when some fields can be reused for the
> free pointer.
Documentation explicitly says that ctor currently isn't supported with
custom free pointers:
https://elixir.bootlin.com/linux/v6.18-rc3/source/include/linux/slab.h#L318
It obviously needs to be updated but I suspect there was a reason for
this limitation. Have you investigated why it's not supported? I
remember looking into it when I was converting vm_area_struct cache to
use SLAB_TYPESAFE_BY_RCU but I can't recall the details now...
>
> Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
> ---
> mm/slab_common.c | 2 +-
> mm/slub.c | 6 ++++--
> 2 files changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 932d13ada36c..2c2ed2452271 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -231,7 +231,7 @@ static struct kmem_cache *create_cache(const char *name,
> err = -EINVAL;
> if (args->use_freeptr_offset &&
> (args->freeptr_offset >= object_size ||
> - !(flags & SLAB_TYPESAFE_BY_RCU) ||
> + (!(flags & SLAB_TYPESAFE_BY_RCU) && !args->ctor) ||
> !IS_ALIGNED(args->freeptr_offset, __alignof__(freeptr_t))))
> goto out;
>
> diff --git a/mm/slub.c b/mm/slub.c
> index 462a39d57b3a..64705cb3734f 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -7781,7 +7781,8 @@ static int calculate_sizes(struct kmem_cache_args *args, struct kmem_cache *s)
> s->inuse = size;
>
> if (((flags & SLAB_TYPESAFE_BY_RCU) && !args->use_freeptr_offset) ||
> - (flags & SLAB_POISON) || s->ctor ||
> + (flags & SLAB_POISON) ||
> + (s->ctor && !args->use_freeptr_offset) ||
> ((flags & SLAB_RED_ZONE) &&
> (s->object_size < sizeof(void *) || slub_debug_orig_size(s)))) {
> /*
> @@ -7802,7 +7803,8 @@ static int calculate_sizes(struct kmem_cache_args *args, struct kmem_cache *s)
> */
> s->offset = size;
> size += sizeof(void *);
> - } else if ((flags & SLAB_TYPESAFE_BY_RCU) && args->use_freeptr_offset) {
> + } else if (((flags & SLAB_TYPESAFE_BY_RCU) || s->ctor) &&
> + args->use_freeptr_offset) {
> s->offset = args->freeptr_offset;
> } else {
> /*
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: [RFC PATCH V3 1/7] mm/slab: allow specifying freepointer offset when using constructor
2025-10-28 17:43 ` Suren Baghdasaryan
@ 2025-10-29 7:10 ` Harry Yoo
2025-10-30 14:35 ` Vlastimil Babka
0 siblings, 1 reply; 34+ messages in thread
From: Harry Yoo @ 2025-10-29 7:10 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: akpm, vbabka, andreyknvl, cl, dvyukov, glider, hannes, linux-mm,
mhocko, muchun.song, rientjes, roman.gushchin, ryabinin.a.a,
shakeel.butt, vincenzo.frascino, yeoreum.yun, tytso,
adilger.kernel, linux-ext4, linux-kernel
On Tue, Oct 28, 2025 at 10:43:16AM -0700, Suren Baghdasaryan wrote:
> On Mon, Oct 27, 2025 at 5:29 AM Harry Yoo <harry.yoo@oracle.com> wrote:
> >
> > When a slab cache has a constructor, the free pointer is placed after the
> > object because certain fields must not be overwritten even after the
> > object is freed.
> >
> > However, some fields that the constructor does not care can safely be
> > overwritten. Allow specifying the free pointer offset within the object,
> > reducing the overall object size when some fields can be reused for the
> > free pointer.
Hi Suren, really appreciate you looking into it!
> Documentation explicitly says that ctor currently isn't supported with
> custom free pointers:
> https://elixir.bootlin.com/linux/v6.18-rc3/source/include/linux/slab.h*L318
> It obviously needs to be updated but I suspect there was a reason for
> this limitation. Have you investigated why it's not supported?
commit 879fb3c274c12 ("mm: add kmem_cache_create_rcu()") says:
> When a kmem cache is created with SLAB_TYPESAFE_BY_RCU the free pointer
> must be located outside of the object because we don't know what part of
> the memory can safely be overwritten as it may be needed to prevent
> object recycling.
The reason the slab allocator requires the free pointer to be
outside the object is the same: we don't know which fields
should not be overwritten, since users may assume a certain state
for specific fields in newly allocated objects.
If users don't initialize certain fields in the constructor, they
should not assume any particular state for those fields, and they may
therefore be overwritten.
> That has the consequence that SLAB_TYPESAFE_BY_RCU may end up adding a
> new cacheline. This is the case for e.g., struct file. After having it
> shrunk down by 40 bytes and having it fit in three cachelines we still
> have SLAB_TYPESAFE_BY_RCU adding a fourth cacheline because it needs to
> accommodate the free pointer.
>
> Add a new kmem_cache_create_rcu() function that allows the caller to
> specify an offset where the free pointer is supposed to be placed.
I'm not sure why Christian added support only for SLAB_TYPESAFE_BY_RCU
and not for constructors, but I don't see anything that would prevent
extending it to support constructors as well.
> I remember looking into it when I was converting vm_area_struct cache to
> use SLAB_TYPESAFE_BY_RCU but I can't recall the details now...
--
Cheers,
Harry / Hyeonggon
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: [RFC PATCH V3 1/7] mm/slab: allow specifying freepointer offset when using constructor
2025-10-29 7:10 ` Harry Yoo
@ 2025-10-30 14:35 ` Vlastimil Babka
0 siblings, 0 replies; 34+ messages in thread
From: Vlastimil Babka @ 2025-10-30 14:35 UTC (permalink / raw)
To: Harry Yoo, Suren Baghdasaryan, Christian Brauner
Cc: akpm, andreyknvl, cl, dvyukov, glider, hannes, linux-mm, mhocko,
muchun.song, rientjes, roman.gushchin, ryabinin.a.a, shakeel.butt,
vincenzo.frascino, yeoreum.yun, tytso, adilger.kernel, linux-ext4,
linux-kernel
On 10/29/25 08:10, Harry Yoo wrote:
> On Tue, Oct 28, 2025 at 10:43:16AM -0700, Suren Baghdasaryan wrote:
>> On Mon, Oct 27, 2025 at 5:29 AM Harry Yoo <harry.yoo@oracle.com> wrote:
>> >
>> > When a slab cache has a constructor, the free pointer is placed after the
>> > object because certain fields must not be overwritten even after the
>> > object is freed.
>> >
>> > However, some fields that the constructor does not care can safely be
>> > overwritten. Allow specifying the free pointer offset within the object,
>> > reducing the overall object size when some fields can be reused for the
>> > free pointer.
>
> Hi Suren, really appreciate you looking into it!
>
>> Documentation explicitly says that ctor currently isn't supported with
>> custom free pointers:
>> https://elixir.bootlin.com/linux/v6.18-rc3/source/include/linux/slab.h*L318
>> It obviously needs to be updated but I suspect there was a reason for
>> this limitation. Have you investigated why it's not supported?
>
> commit 879fb3c274c12 ("mm: add kmem_cache_create_rcu()") says:
>> When a kmem cache is created with SLAB_TYPESAFE_BY_RCU the free pointer
>> must be located outside of the object because we don't know what part of
>> the memory can safely be overwritten as it may be needed to prevent
>> object recycling.
>
> The reason the slab allocator requires the free pointer to be
> outside the object is the same: we don't know which fields
> should not be overwritten, since users may assume a certain state
> for specific fields in newly allocated objects.
>
> If users don't initialize certain fields in the constructor, they
> should not assume any particular state for those fields, and they may
> therefore be overwritten.
>
>> That has the consequence that SLAB_TYPESAFE_BY_RCU may end up adding a
>> new cacheline. This is the case for e.g., struct file. After having it
>> shrunk down by 40 bytes and having it fit in three cachelines we still
>> have SLAB_TYPESAFE_BY_RCU adding a fourth cacheline because it needs to
>> accommodate the free pointer.
>>
>> Add a new kmem_cache_create_rcu() function that allows the caller to
>> specify an offset where the free pointer is supposed to be placed.
>
> I'm not sure why Christian added support only for SLAB_TYPESAFE_BY_RCU
> and not for constructors, but I don't see anything that would prevent
> extending it to support constructors as well.
IIRC we considered it and only left it for later because there was no user
yet, so we wouldn't have a proof that it works and we're not missing
something. If you have a user now, it's legit to do it, and there are no
known theoretical obstacles to it. Obviously docs should be updated at the
same time then.
>> I remember looking into it when I was converting vm_area_struct cache to
>> use SLAB_TYPESAFE_BY_RCU but I can't recall the details now...
>
^ permalink raw reply [flat|nested] 34+ messages in thread
* [RFC PATCH V3 2/7] ext4: specify the free pointer offset for ext4_inode_cache
2025-10-27 12:28 [RFC PATCH V3 0/7] mm/slab: reduce slab accounting memory overhead by allocating slabobj_ext metadata within unused slab space Harry Yoo
2025-10-27 12:28 ` [RFC PATCH V3 1/7] mm/slab: allow specifying freepointer offset when using constructor Harry Yoo
@ 2025-10-27 12:28 ` Harry Yoo
2025-10-28 17:22 ` Suren Baghdasaryan
2025-10-27 12:28 ` [RFC PATCH V3 3/7] mm/slab: abstract slabobj_ext access via new slab_obj_ext() helper Harry Yoo
` (5 subsequent siblings)
7 siblings, 1 reply; 34+ messages in thread
From: Harry Yoo @ 2025-10-27 12:28 UTC (permalink / raw)
To: akpm, vbabka
Cc: andreyknvl, cl, dvyukov, glider, hannes, linux-mm, mhocko,
muchun.song, rientjes, roman.gushchin, ryabinin.a.a, shakeel.butt,
surenb, vincenzo.frascino, yeoreum.yun, harry.yoo, tytso,
adilger.kernel, linux-ext4, linux-kernel
Convert ext4_inode_cache to use the kmem_cache_args interface and
specify a free pointer offset.
Since ext4_inode_cache uses a constructor, the free pointer would be
placed after the object to overwriting fields used by the constructor.
However, some fields such as ->i_flags are not used by the constructor
and can safely be repurposed for the free pointer.
Specify the free pointer offset at i_flags to reduce the object size.
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
---
fs/ext4/super.c | 20 ++++++++++++++------
1 file changed, 14 insertions(+), 6 deletions(-)
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 699c15db28a8..2860e0ee913f 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1474,12 +1474,20 @@ static void init_once(void *foo)
static int __init init_inodecache(void)
{
- ext4_inode_cachep = kmem_cache_create_usercopy("ext4_inode_cache",
- sizeof(struct ext4_inode_info), 0,
- SLAB_RECLAIM_ACCOUNT | SLAB_ACCOUNT,
- offsetof(struct ext4_inode_info, i_data),
- sizeof_field(struct ext4_inode_info, i_data),
- init_once);
+ struct kmem_cache_args args = {
+ .align = 0,
+ .useroffset = offsetof(struct ext4_inode_info, i_data),
+ .usersize = sizeof_field(struct ext4_inode_info, i_data),
+ .use_freeptr_offset = true,
+ .freeptr_offset = offsetof(struct ext4_inode_info, i_flags),
+ .ctor = init_once,
+ };
+
+ ext4_inode_cachep = kmem_cache_create("ext4_inode_cache",
+ sizeof(struct ext4_inode_info),
+ &args,
+ SLAB_RECLAIM_ACCOUNT | SLAB_ACCOUNT);
+
if (ext4_inode_cachep == NULL)
return -ENOMEM;
return 0;
--
2.43.0
^ permalink raw reply related [flat|nested] 34+ messages in thread* Re: [RFC PATCH V3 2/7] ext4: specify the free pointer offset for ext4_inode_cache
2025-10-27 12:28 ` [RFC PATCH V3 2/7] ext4: specify the free pointer offset for ext4_inode_cache Harry Yoo
@ 2025-10-28 17:22 ` Suren Baghdasaryan
2025-10-28 17:25 ` Suren Baghdasaryan
0 siblings, 1 reply; 34+ messages in thread
From: Suren Baghdasaryan @ 2025-10-28 17:22 UTC (permalink / raw)
To: Harry Yoo
Cc: akpm, vbabka, andreyknvl, cl, dvyukov, glider, hannes, linux-mm,
mhocko, muchun.song, rientjes, roman.gushchin, ryabinin.a.a,
shakeel.butt, vincenzo.frascino, yeoreum.yun, tytso,
adilger.kernel, linux-ext4, linux-kernel
On Mon, Oct 27, 2025 at 5:29 AM Harry Yoo <harry.yoo@oracle.com> wrote:
>
> Convert ext4_inode_cache to use the kmem_cache_args interface and
> specify a free pointer offset.
>
> Since ext4_inode_cache uses a constructor, the free pointer would be
> placed after the object to overwriting fields used by the constructor.
> However, some fields such as ->i_flags are not used by the constructor
> and can safely be repurposed for the free pointer.
>
> Specify the free pointer offset at i_flags to reduce the object size.
>
> Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
> ---
> fs/ext4/super.c | 20 ++++++++++++++------
> 1 file changed, 14 insertions(+), 6 deletions(-)
>
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 699c15db28a8..2860e0ee913f 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -1474,12 +1474,20 @@ static void init_once(void *foo)
>
> static int __init init_inodecache(void)
> {
> - ext4_inode_cachep = kmem_cache_create_usercopy("ext4_inode_cache",
> - sizeof(struct ext4_inode_info), 0,
> - SLAB_RECLAIM_ACCOUNT | SLAB_ACCOUNT,
> - offsetof(struct ext4_inode_info, i_data),
> - sizeof_field(struct ext4_inode_info, i_data),
> - init_once);
> + struct kmem_cache_args args = {
> + .align = 0,
> + .useroffset = offsetof(struct ext4_inode_info, i_data),
> + .usersize = sizeof_field(struct ext4_inode_info, i_data),
> + .use_freeptr_offset = true,
> + .freeptr_offset = offsetof(struct ext4_inode_info, i_flags),
Hi Harry,
AFAIK freeptr_offset can be used only with SLAB_TYPESAFE_BY_RCU caches
(see https://elixir.bootlin.com/linux/v6.18-rc3/source/include/linux/slab.h#L302)
and check at https://elixir.bootlin.com/linux/v6.18-rc3/source/mm/slab_common.c#L234
should fail otherwise. The cache you are changing does not seem to
have this flag set.
Thanks,
Suren.
> + .ctor = init_once,
> + };
> +
> + ext4_inode_cachep = kmem_cache_create("ext4_inode_cache",
> + sizeof(struct ext4_inode_info),
> + &args,
> + SLAB_RECLAIM_ACCOUNT | SLAB_ACCOUNT);
> +
> if (ext4_inode_cachep == NULL)
> return -ENOMEM;
> return 0;
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: [RFC PATCH V3 2/7] ext4: specify the free pointer offset for ext4_inode_cache
2025-10-28 17:22 ` Suren Baghdasaryan
@ 2025-10-28 17:25 ` Suren Baghdasaryan
0 siblings, 0 replies; 34+ messages in thread
From: Suren Baghdasaryan @ 2025-10-28 17:25 UTC (permalink / raw)
To: Harry Yoo
Cc: akpm, vbabka, andreyknvl, cl, dvyukov, glider, hannes, linux-mm,
mhocko, muchun.song, rientjes, roman.gushchin, ryabinin.a.a,
shakeel.butt, vincenzo.frascino, yeoreum.yun, tytso,
adilger.kernel, linux-ext4, linux-kernel
On Tue, Oct 28, 2025 at 10:22 AM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Mon, Oct 27, 2025 at 5:29 AM Harry Yoo <harry.yoo@oracle.com> wrote:
> >
> > Convert ext4_inode_cache to use the kmem_cache_args interface and
> > specify a free pointer offset.
> >
> > Since ext4_inode_cache uses a constructor, the free pointer would be
> > placed after the object to overwriting fields used by the constructor.
> > However, some fields such as ->i_flags are not used by the constructor
> > and can safely be repurposed for the free pointer.
> >
> > Specify the free pointer offset at i_flags to reduce the object size.
> >
> > Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
> > ---
> > fs/ext4/super.c | 20 ++++++++++++++------
> > 1 file changed, 14 insertions(+), 6 deletions(-)
> >
> > diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> > index 699c15db28a8..2860e0ee913f 100644
> > --- a/fs/ext4/super.c
> > +++ b/fs/ext4/super.c
> > @@ -1474,12 +1474,20 @@ static void init_once(void *foo)
> >
> > static int __init init_inodecache(void)
> > {
> > - ext4_inode_cachep = kmem_cache_create_usercopy("ext4_inode_cache",
> > - sizeof(struct ext4_inode_info), 0,
> > - SLAB_RECLAIM_ACCOUNT | SLAB_ACCOUNT,
> > - offsetof(struct ext4_inode_info, i_data),
> > - sizeof_field(struct ext4_inode_info, i_data),
> > - init_once);
> > + struct kmem_cache_args args = {
> > + .align = 0,
> > + .useroffset = offsetof(struct ext4_inode_info, i_data),
> > + .usersize = sizeof_field(struct ext4_inode_info, i_data),
> > + .use_freeptr_offset = true,
> > + .freeptr_offset = offsetof(struct ext4_inode_info, i_flags),
>
> Hi Harry,
> AFAIK freeptr_offset can be used only with SLAB_TYPESAFE_BY_RCU caches
> (see https://elixir.bootlin.com/linux/v6.18-rc3/source/include/linux/slab.h#L302)
> and check at https://elixir.bootlin.com/linux/v6.18-rc3/source/mm/slab_common.c#L234
> should fail otherwise. The cache you are changing does not seem to
> have this flag set.
Oh, sorry, your patches got reordered in my mailbox and I missed the
first one where you are removing this limitation. Let me review that
first. Sorry for the noise.
> Thanks,
> Suren.
>
> > + .ctor = init_once,
> > + };
> > +
> > + ext4_inode_cachep = kmem_cache_create("ext4_inode_cache",
> > + sizeof(struct ext4_inode_info),
> > + &args,
> > + SLAB_RECLAIM_ACCOUNT | SLAB_ACCOUNT);
> > +
> > if (ext4_inode_cachep == NULL)
> > return -ENOMEM;
> > return 0;
> > --
> > 2.43.0
> >
^ permalink raw reply [flat|nested] 34+ messages in thread
* [RFC PATCH V3 3/7] mm/slab: abstract slabobj_ext access via new slab_obj_ext() helper
2025-10-27 12:28 [RFC PATCH V3 0/7] mm/slab: reduce slab accounting memory overhead by allocating slabobj_ext metadata within unused slab space Harry Yoo
2025-10-27 12:28 ` [RFC PATCH V3 1/7] mm/slab: allow specifying freepointer offset when using constructor Harry Yoo
2025-10-27 12:28 ` [RFC PATCH V3 2/7] ext4: specify the free pointer offset for ext4_inode_cache Harry Yoo
@ 2025-10-27 12:28 ` Harry Yoo
2025-10-28 17:55 ` Suren Baghdasaryan
2025-10-27 12:28 ` [RFC PATCH V3 4/7] mm/slab: use stride to access slabobj_ext Harry Yoo
` (4 subsequent siblings)
7 siblings, 1 reply; 34+ messages in thread
From: Harry Yoo @ 2025-10-27 12:28 UTC (permalink / raw)
To: akpm, vbabka
Cc: andreyknvl, cl, dvyukov, glider, hannes, linux-mm, mhocko,
muchun.song, rientjes, roman.gushchin, ryabinin.a.a, shakeel.butt,
surenb, vincenzo.frascino, yeoreum.yun, harry.yoo, tytso,
adilger.kernel, linux-ext4, linux-kernel
Currently, the slab allocator assumes that slab->obj_exts is a pointer
to an array of struct slabobj_ext objects. However, to support storage
methods where struct slabobj_ext is embedded within objects, the slab
allocator should not make this assumption. Instead of directly
dereferencing the slabobj_exts array, abstract access to
struct slabobj_ext via helper functions.
Introduce a new API slabobj_ext metadata access:
slab_obj_ext(slab, obj_exts, index) - returns the pointer to
struct slabobj_ext element at the given index.
Directly dereferencing the return value of slab_obj_exts() is no longer
allowed. Instead, slab_obj_ext() must always be used to access
individual struct slabobj_ext objects.
Convert all users to use these APIs.
No functional changes intended.
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
---
mm/memcontrol.c | 23 ++++++++++++++++-------
mm/slab.h | 43 ++++++++++++++++++++++++++++++++++++------
mm/slub.c | 50 ++++++++++++++++++++++++++++---------------------
3 files changed, 82 insertions(+), 34 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 8dd7fbed5a94..2a9dc246e802 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2566,7 +2566,8 @@ struct mem_cgroup *mem_cgroup_from_obj_folio(struct folio *folio, void *p)
* slab->obj_exts.
*/
if (folio_test_slab(folio)) {
- struct slabobj_ext *obj_exts;
+ unsigned long obj_exts;
+ struct slabobj_ext *obj_ext;
struct slab *slab;
unsigned int off;
@@ -2576,8 +2577,9 @@ struct mem_cgroup *mem_cgroup_from_obj_folio(struct folio *folio, void *p)
return NULL;
off = obj_to_index(slab->slab_cache, slab, p);
- if (obj_exts[off].objcg)
- return obj_cgroup_memcg(obj_exts[off].objcg);
+ obj_ext = slab_obj_ext(slab, obj_exts, off);
+ if (obj_ext->objcg)
+ return obj_cgroup_memcg(obj_ext->objcg);
return NULL;
}
@@ -3168,6 +3170,9 @@ bool __memcg_slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
}
for (i = 0; i < size; i++) {
+ unsigned long obj_exts;
+ struct slabobj_ext *obj_ext;
+
slab = virt_to_slab(p[i]);
if (!slab_obj_exts(slab) &&
@@ -3190,29 +3195,33 @@ bool __memcg_slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
slab_pgdat(slab), cache_vmstat_idx(s)))
return false;
+ obj_exts = slab_obj_exts(slab);
off = obj_to_index(s, slab, p[i]);
+ obj_ext = slab_obj_ext(slab, obj_exts, off);
obj_cgroup_get(objcg);
- slab_obj_exts(slab)[off].objcg = objcg;
+ obj_ext->objcg = objcg;
}
return true;
}
void __memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
- void **p, int objects, struct slabobj_ext *obj_exts)
+ void **p, int objects, unsigned long obj_exts)
{
size_t obj_size = obj_full_size(s);
for (int i = 0; i < objects; i++) {
struct obj_cgroup *objcg;
+ struct slabobj_ext *obj_ext;
unsigned int off;
off = obj_to_index(s, slab, p[i]);
- objcg = obj_exts[off].objcg;
+ obj_ext = slab_obj_ext(slab, obj_exts, off);
+ objcg = obj_ext->objcg;
if (!objcg)
continue;
- obj_exts[off].objcg = NULL;
+ obj_ext->objcg = NULL;
refill_obj_stock(objcg, obj_size, true, -obj_size,
slab_pgdat(slab), cache_vmstat_idx(s));
obj_cgroup_put(objcg);
diff --git a/mm/slab.h b/mm/slab.h
index d63cc9b5e313..df2c987d950d 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -528,10 +528,12 @@ static inline bool slab_in_kunit_test(void) { return false; }
* associated with a slab.
* @slab: a pointer to the slab struct
*
- * Returns a pointer to the object extension vector associated with the slab,
- * or NULL if no such vector has been associated yet.
+ * Returns the address of the object extension vector associated with the slab,
+ * or zero if no such vector has been associated yet.
+ * Do not dereference the return value directly; use slab_obj_ext() to access
+ * its elements.
*/
-static inline struct slabobj_ext *slab_obj_exts(struct slab *slab)
+static inline unsigned long slab_obj_exts(struct slab *slab)
{
unsigned long obj_exts = READ_ONCE(slab->obj_exts);
@@ -544,7 +546,30 @@ static inline struct slabobj_ext *slab_obj_exts(struct slab *slab)
obj_exts != OBJEXTS_ALLOC_FAIL, slab_page(slab));
VM_BUG_ON_PAGE(obj_exts & MEMCG_DATA_KMEM, slab_page(slab));
#endif
- return (struct slabobj_ext *)(obj_exts & ~OBJEXTS_FLAGS_MASK);
+
+ return obj_exts & ~OBJEXTS_FLAGS_MASK;
+}
+
+/*
+ * slab_obj_ext - get the pointer to the slab object extension metadata
+ * associated with an object in a slab.
+ * @slab: a pointer to the slab struct
+ * @obj_exts: a pointer to the object extension vector
+ * @index: an index of the object
+ *
+ * Returns a pointer to the object extension associated with the object.
+ */
+static inline struct slabobj_ext *slab_obj_ext(struct slab *slab,
+ unsigned long obj_exts,
+ unsigned int index)
+{
+ struct slabobj_ext *obj_ext;
+
+ VM_WARN_ON_ONCE(!slab_obj_exts(slab));
+ VM_WARN_ON_ONCE(obj_exts != slab_obj_exts(slab));
+
+ obj_ext = (struct slabobj_ext *)obj_exts;
+ return &obj_ext[index];
}
int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
@@ -552,7 +577,13 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
#else /* CONFIG_SLAB_OBJ_EXT */
-static inline struct slabobj_ext *slab_obj_exts(struct slab *slab)
+static inline unsigned long slab_obj_exts(struct slab *slab)
+{
+ return false;
+}
+
+static inline struct slabobj_ext *slab_obj_ext(struct slab *slab,
+ unsigned int index)
{
return NULL;
}
@@ -569,7 +600,7 @@ static inline enum node_stat_item cache_vmstat_idx(struct kmem_cache *s)
bool __memcg_slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
gfp_t flags, size_t size, void **p);
void __memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
- void **p, int objects, struct slabobj_ext *obj_exts);
+ void **p, int objects, unsigned long obj_exts);
#endif
void kvfree_rcu_cb(struct rcu_head *head);
diff --git a/mm/slub.c b/mm/slub.c
index 64705cb3734f..ae73403f8c29 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2031,7 +2031,7 @@ static bool freelist_corrupted(struct kmem_cache *s, struct slab *slab,
static inline void mark_objexts_empty(struct slabobj_ext *obj_exts)
{
- struct slabobj_ext *slab_exts;
+ unsigned long slab_exts;
struct slab *obj_exts_slab;
obj_exts_slab = virt_to_slab(obj_exts);
@@ -2039,9 +2039,12 @@ static inline void mark_objexts_empty(struct slabobj_ext *obj_exts)
if (slab_exts) {
unsigned int offs = obj_to_index(obj_exts_slab->slab_cache,
obj_exts_slab, obj_exts);
+ struct slabobj_ext *ext = slab_obj_ext(obj_exts_slab,
+ slab_exts, offs);
+
/* codetag should be NULL */
- WARN_ON(slab_exts[offs].ref.ct);
- set_codetag_empty(&slab_exts[offs].ref);
+ WARN_ON(ext->ref.ct);
+ set_codetag_empty(&ext->ref);
}
}
@@ -2159,7 +2162,7 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
static inline void free_slab_obj_exts(struct slab *slab)
{
- struct slabobj_ext *obj_exts;
+ unsigned long obj_exts;
obj_exts = slab_obj_exts(slab);
if (!obj_exts)
@@ -2172,11 +2175,11 @@ static inline void free_slab_obj_exts(struct slab *slab)
* NULL, therefore replace NULL with CODETAG_EMPTY to indicate that
* the extension for obj_exts is expected to be NULL.
*/
- mark_objexts_empty(obj_exts);
+ mark_objexts_empty((struct slabobj_ext *)obj_exts);
if (unlikely(READ_ONCE(slab->obj_exts) & OBJEXTS_NOSPIN_ALLOC))
- kfree_nolock(obj_exts);
+ kfree_nolock((void *)obj_exts);
else
- kfree(obj_exts);
+ kfree((void *)obj_exts);
slab->obj_exts = 0;
}
@@ -2201,9 +2204,10 @@ static inline void free_slab_obj_exts(struct slab *slab)
#ifdef CONFIG_MEM_ALLOC_PROFILING
static inline struct slabobj_ext *
-prepare_slab_obj_exts_hook(struct kmem_cache *s, gfp_t flags, void *p)
+prepare_slab_obj_ext_hook(struct kmem_cache *s, gfp_t flags, void *p)
{
struct slab *slab;
+ unsigned long obj_exts;
if (!p)
return NULL;
@@ -2215,30 +2219,32 @@ prepare_slab_obj_exts_hook(struct kmem_cache *s, gfp_t flags, void *p)
return NULL;
slab = virt_to_slab(p);
- if (!slab_obj_exts(slab) &&
+ obj_exts = slab_obj_exts(slab);
+ if (!obj_exts &&
alloc_slab_obj_exts(slab, s, flags, false)) {
pr_warn_once("%s, %s: Failed to create slab extension vector!\n",
__func__, s->name);
return NULL;
}
- return slab_obj_exts(slab) + obj_to_index(s, slab, p);
+ obj_exts = slab_obj_exts(slab);
+ return slab_obj_ext(slab, obj_exts, obj_to_index(s, slab, p));
}
/* Should be called only if mem_alloc_profiling_enabled() */
static noinline void
__alloc_tagging_slab_alloc_hook(struct kmem_cache *s, void *object, gfp_t flags)
{
- struct slabobj_ext *obj_exts;
+ struct slabobj_ext *obj_ext;
- obj_exts = prepare_slab_obj_exts_hook(s, flags, object);
+ obj_ext = prepare_slab_obj_ext_hook(s, flags, object);
/*
* Currently obj_exts is used only for allocation profiling.
* If other users appear then mem_alloc_profiling_enabled()
* check should be added before alloc_tag_add().
*/
- if (likely(obj_exts))
- alloc_tag_add(&obj_exts->ref, current->alloc_tag, s->size);
+ if (likely(obj_ext))
+ alloc_tag_add(&obj_ext->ref, current->alloc_tag, s->size);
}
static inline void
@@ -2253,8 +2259,8 @@ static noinline void
__alloc_tagging_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p,
int objects)
{
- struct slabobj_ext *obj_exts;
int i;
+ unsigned long obj_exts;
/* slab->obj_exts might not be NULL if it was created for MEMCG accounting. */
if (s->flags & (SLAB_NO_OBJ_EXT | SLAB_NOLEAKTRACE))
@@ -2267,7 +2273,7 @@ __alloc_tagging_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p
for (i = 0; i < objects; i++) {
unsigned int off = obj_to_index(s, slab, p[i]);
- alloc_tag_sub(&obj_exts[off].ref, s->size);
+ alloc_tag_sub(&slab_obj_ext(slab, obj_exts, off)->ref, s->size);
}
}
@@ -2326,7 +2332,7 @@ static __fastpath_inline
void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p,
int objects)
{
- struct slabobj_ext *obj_exts;
+ unsigned long obj_exts;
if (!memcg_kmem_online())
return;
@@ -2341,7 +2347,8 @@ void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p,
static __fastpath_inline
bool memcg_slab_post_charge(void *p, gfp_t flags)
{
- struct slabobj_ext *slab_exts;
+ unsigned long obj_exts;
+ struct slabobj_ext *obj_ext;
struct kmem_cache *s;
struct folio *folio;
struct slab *slab;
@@ -2381,10 +2388,11 @@ bool memcg_slab_post_charge(void *p, gfp_t flags)
return true;
/* Ignore already charged objects. */
- slab_exts = slab_obj_exts(slab);
- if (slab_exts) {
+ obj_exts = slab_obj_exts(slab);
+ if (obj_exts) {
off = obj_to_index(s, slab, p);
- if (unlikely(slab_exts[off].objcg))
+ obj_ext = slab_obj_ext(slab, obj_exts, off);
+ if (unlikely(obj_ext->objcg))
return true;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 34+ messages in thread* Re: [RFC PATCH V3 3/7] mm/slab: abstract slabobj_ext access via new slab_obj_ext() helper
2025-10-27 12:28 ` [RFC PATCH V3 3/7] mm/slab: abstract slabobj_ext access via new slab_obj_ext() helper Harry Yoo
@ 2025-10-28 17:55 ` Suren Baghdasaryan
2025-10-29 8:49 ` Harry Yoo
0 siblings, 1 reply; 34+ messages in thread
From: Suren Baghdasaryan @ 2025-10-28 17:55 UTC (permalink / raw)
To: Harry Yoo
Cc: akpm, vbabka, andreyknvl, cl, dvyukov, glider, hannes, linux-mm,
mhocko, muchun.song, rientjes, roman.gushchin, ryabinin.a.a,
shakeel.butt, vincenzo.frascino, yeoreum.yun, tytso,
adilger.kernel, linux-ext4, linux-kernel
On Mon, Oct 27, 2025 at 5:29 AM Harry Yoo <harry.yoo@oracle.com> wrote:
>
> Currently, the slab allocator assumes that slab->obj_exts is a pointer
> to an array of struct slabobj_ext objects. However, to support storage
> methods where struct slabobj_ext is embedded within objects, the slab
> allocator should not make this assumption. Instead of directly
> dereferencing the slabobj_exts array, abstract access to
> struct slabobj_ext via helper functions.
>
> Introduce a new API slabobj_ext metadata access:
>
> slab_obj_ext(slab, obj_exts, index) - returns the pointer to
> struct slabobj_ext element at the given index.
>
> Directly dereferencing the return value of slab_obj_exts() is no longer
> allowed. Instead, slab_obj_ext() must always be used to access
> individual struct slabobj_ext objects.
If direct access to the vector is not allowed, it would be better to
eliminate slab_obj_exts() function completely and use the new
slab_obj_ext() instead. I think that's possible. We might need an
additional `bool is_slab_obj_exts()` helper for an early check before
we calculate the object index but that's quite easy.
>
> Convert all users to use these APIs.
> No functional changes intended.
>
> Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
> ---
> mm/memcontrol.c | 23 ++++++++++++++++-------
> mm/slab.h | 43 ++++++++++++++++++++++++++++++++++++------
> mm/slub.c | 50 ++++++++++++++++++++++++++++---------------------
> 3 files changed, 82 insertions(+), 34 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 8dd7fbed5a94..2a9dc246e802 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2566,7 +2566,8 @@ struct mem_cgroup *mem_cgroup_from_obj_folio(struct folio *folio, void *p)
> * slab->obj_exts.
> */
> if (folio_test_slab(folio)) {
> - struct slabobj_ext *obj_exts;
> + unsigned long obj_exts;
> + struct slabobj_ext *obj_ext;
> struct slab *slab;
> unsigned int off;
>
> @@ -2576,8 +2577,9 @@ struct mem_cgroup *mem_cgroup_from_obj_folio(struct folio *folio, void *p)
> return NULL;
>
> off = obj_to_index(slab->slab_cache, slab, p);
> - if (obj_exts[off].objcg)
> - return obj_cgroup_memcg(obj_exts[off].objcg);
> + obj_ext = slab_obj_ext(slab, obj_exts, off);
> + if (obj_ext->objcg)
> + return obj_cgroup_memcg(obj_ext->objcg);
>
> return NULL;
> }
> @@ -3168,6 +3170,9 @@ bool __memcg_slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
> }
>
> for (i = 0; i < size; i++) {
> + unsigned long obj_exts;
> + struct slabobj_ext *obj_ext;
> +
> slab = virt_to_slab(p[i]);
>
> if (!slab_obj_exts(slab) &&
> @@ -3190,29 +3195,33 @@ bool __memcg_slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
> slab_pgdat(slab), cache_vmstat_idx(s)))
> return false;
>
> + obj_exts = slab_obj_exts(slab);
> off = obj_to_index(s, slab, p[i]);
> + obj_ext = slab_obj_ext(slab, obj_exts, off);
> obj_cgroup_get(objcg);
> - slab_obj_exts(slab)[off].objcg = objcg;
> + obj_ext->objcg = objcg;
> }
>
> return true;
> }
>
> void __memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
> - void **p, int objects, struct slabobj_ext *obj_exts)
> + void **p, int objects, unsigned long obj_exts)
> {
> size_t obj_size = obj_full_size(s);
>
> for (int i = 0; i < objects; i++) {
> struct obj_cgroup *objcg;
> + struct slabobj_ext *obj_ext;
> unsigned int off;
>
> off = obj_to_index(s, slab, p[i]);
> - objcg = obj_exts[off].objcg;
> + obj_ext = slab_obj_ext(slab, obj_exts, off);
> + objcg = obj_ext->objcg;
> if (!objcg)
> continue;
>
> - obj_exts[off].objcg = NULL;
> + obj_ext->objcg = NULL;
> refill_obj_stock(objcg, obj_size, true, -obj_size,
> slab_pgdat(slab), cache_vmstat_idx(s));
> obj_cgroup_put(objcg);
> diff --git a/mm/slab.h b/mm/slab.h
> index d63cc9b5e313..df2c987d950d 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -528,10 +528,12 @@ static inline bool slab_in_kunit_test(void) { return false; }
> * associated with a slab.
> * @slab: a pointer to the slab struct
> *
> - * Returns a pointer to the object extension vector associated with the slab,
> - * or NULL if no such vector has been associated yet.
> + * Returns the address of the object extension vector associated with the slab,
> + * or zero if no such vector has been associated yet.
> + * Do not dereference the return value directly; use slab_obj_ext() to access
> + * its elements.
> */
> -static inline struct slabobj_ext *slab_obj_exts(struct slab *slab)
> +static inline unsigned long slab_obj_exts(struct slab *slab)
> {
> unsigned long obj_exts = READ_ONCE(slab->obj_exts);
>
> @@ -544,7 +546,30 @@ static inline struct slabobj_ext *slab_obj_exts(struct slab *slab)
> obj_exts != OBJEXTS_ALLOC_FAIL, slab_page(slab));
> VM_BUG_ON_PAGE(obj_exts & MEMCG_DATA_KMEM, slab_page(slab));
> #endif
> - return (struct slabobj_ext *)(obj_exts & ~OBJEXTS_FLAGS_MASK);
> +
> + return obj_exts & ~OBJEXTS_FLAGS_MASK;
> +}
> +
> +/*
> + * slab_obj_ext - get the pointer to the slab object extension metadata
> + * associated with an object in a slab.
> + * @slab: a pointer to the slab struct
> + * @obj_exts: a pointer to the object extension vector
> + * @index: an index of the object
> + *
> + * Returns a pointer to the object extension associated with the object.
> + */
> +static inline struct slabobj_ext *slab_obj_ext(struct slab *slab,
> + unsigned long obj_exts,
> + unsigned int index)
> +{
> + struct slabobj_ext *obj_ext;
> +
> + VM_WARN_ON_ONCE(!slab_obj_exts(slab));
> + VM_WARN_ON_ONCE(obj_exts != slab_obj_exts(slab));
> +
> + obj_ext = (struct slabobj_ext *)obj_exts;
> + return &obj_ext[index];
> }
>
> int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
> @@ -552,7 +577,13 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
>
> #else /* CONFIG_SLAB_OBJ_EXT */
>
> -static inline struct slabobj_ext *slab_obj_exts(struct slab *slab)
> +static inline unsigned long slab_obj_exts(struct slab *slab)
> +{
> + return false;
> +}
> +
> +static inline struct slabobj_ext *slab_obj_ext(struct slab *slab,
> + unsigned int index)
> {
> return NULL;
> }
> @@ -569,7 +600,7 @@ static inline enum node_stat_item cache_vmstat_idx(struct kmem_cache *s)
> bool __memcg_slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
> gfp_t flags, size_t size, void **p);
> void __memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
> - void **p, int objects, struct slabobj_ext *obj_exts);
> + void **p, int objects, unsigned long obj_exts);
> #endif
>
> void kvfree_rcu_cb(struct rcu_head *head);
> diff --git a/mm/slub.c b/mm/slub.c
> index 64705cb3734f..ae73403f8c29 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2031,7 +2031,7 @@ static bool freelist_corrupted(struct kmem_cache *s, struct slab *slab,
>
> static inline void mark_objexts_empty(struct slabobj_ext *obj_exts)
> {
> - struct slabobj_ext *slab_exts;
> + unsigned long slab_exts;
> struct slab *obj_exts_slab;
>
> obj_exts_slab = virt_to_slab(obj_exts);
> @@ -2039,9 +2039,12 @@ static inline void mark_objexts_empty(struct slabobj_ext *obj_exts)
> if (slab_exts) {
> unsigned int offs = obj_to_index(obj_exts_slab->slab_cache,
> obj_exts_slab, obj_exts);
> + struct slabobj_ext *ext = slab_obj_ext(obj_exts_slab,
> + slab_exts, offs);
> +
> /* codetag should be NULL */
> - WARN_ON(slab_exts[offs].ref.ct);
> - set_codetag_empty(&slab_exts[offs].ref);
> + WARN_ON(ext->ref.ct);
> + set_codetag_empty(&ext->ref);
> }
> }
>
> @@ -2159,7 +2162,7 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
>
> static inline void free_slab_obj_exts(struct slab *slab)
> {
> - struct slabobj_ext *obj_exts;
> + unsigned long obj_exts;
>
> obj_exts = slab_obj_exts(slab);
> if (!obj_exts)
> @@ -2172,11 +2175,11 @@ static inline void free_slab_obj_exts(struct slab *slab)
> * NULL, therefore replace NULL with CODETAG_EMPTY to indicate that
> * the extension for obj_exts is expected to be NULL.
> */
> - mark_objexts_empty(obj_exts);
> + mark_objexts_empty((struct slabobj_ext *)obj_exts);
> if (unlikely(READ_ONCE(slab->obj_exts) & OBJEXTS_NOSPIN_ALLOC))
> - kfree_nolock(obj_exts);
> + kfree_nolock((void *)obj_exts);
> else
> - kfree(obj_exts);
> + kfree((void *)obj_exts);
> slab->obj_exts = 0;
> }
>
> @@ -2201,9 +2204,10 @@ static inline void free_slab_obj_exts(struct slab *slab)
> #ifdef CONFIG_MEM_ALLOC_PROFILING
>
> static inline struct slabobj_ext *
> -prepare_slab_obj_exts_hook(struct kmem_cache *s, gfp_t flags, void *p)
> +prepare_slab_obj_ext_hook(struct kmem_cache *s, gfp_t flags, void *p)
> {
> struct slab *slab;
> + unsigned long obj_exts;
>
> if (!p)
> return NULL;
> @@ -2215,30 +2219,32 @@ prepare_slab_obj_exts_hook(struct kmem_cache *s, gfp_t flags, void *p)
> return NULL;
>
> slab = virt_to_slab(p);
> - if (!slab_obj_exts(slab) &&
> + obj_exts = slab_obj_exts(slab);
> + if (!obj_exts &&
> alloc_slab_obj_exts(slab, s, flags, false)) {
> pr_warn_once("%s, %s: Failed to create slab extension vector!\n",
> __func__, s->name);
> return NULL;
> }
>
> - return slab_obj_exts(slab) + obj_to_index(s, slab, p);
> + obj_exts = slab_obj_exts(slab);
> + return slab_obj_ext(slab, obj_exts, obj_to_index(s, slab, p));
> }
>
> /* Should be called only if mem_alloc_profiling_enabled() */
> static noinline void
> __alloc_tagging_slab_alloc_hook(struct kmem_cache *s, void *object, gfp_t flags)
> {
> - struct slabobj_ext *obj_exts;
> + struct slabobj_ext *obj_ext;
>
> - obj_exts = prepare_slab_obj_exts_hook(s, flags, object);
> + obj_ext = prepare_slab_obj_ext_hook(s, flags, object);
> /*
> * Currently obj_exts is used only for allocation profiling.
> * If other users appear then mem_alloc_profiling_enabled()
> * check should be added before alloc_tag_add().
> */
> - if (likely(obj_exts))
> - alloc_tag_add(&obj_exts->ref, current->alloc_tag, s->size);
> + if (likely(obj_ext))
> + alloc_tag_add(&obj_ext->ref, current->alloc_tag, s->size);
> }
>
> static inline void
> @@ -2253,8 +2259,8 @@ static noinline void
> __alloc_tagging_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p,
> int objects)
> {
> - struct slabobj_ext *obj_exts;
> int i;
> + unsigned long obj_exts;
>
> /* slab->obj_exts might not be NULL if it was created for MEMCG accounting. */
> if (s->flags & (SLAB_NO_OBJ_EXT | SLAB_NOLEAKTRACE))
> @@ -2267,7 +2273,7 @@ __alloc_tagging_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p
> for (i = 0; i < objects; i++) {
> unsigned int off = obj_to_index(s, slab, p[i]);
>
> - alloc_tag_sub(&obj_exts[off].ref, s->size);
> + alloc_tag_sub(&slab_obj_ext(slab, obj_exts, off)->ref, s->size);
> }
> }
>
> @@ -2326,7 +2332,7 @@ static __fastpath_inline
> void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p,
> int objects)
> {
> - struct slabobj_ext *obj_exts;
> + unsigned long obj_exts;
>
> if (!memcg_kmem_online())
> return;
> @@ -2341,7 +2347,8 @@ void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p,
> static __fastpath_inline
> bool memcg_slab_post_charge(void *p, gfp_t flags)
> {
> - struct slabobj_ext *slab_exts;
> + unsigned long obj_exts;
> + struct slabobj_ext *obj_ext;
> struct kmem_cache *s;
> struct folio *folio;
> struct slab *slab;
> @@ -2381,10 +2388,11 @@ bool memcg_slab_post_charge(void *p, gfp_t flags)
> return true;
>
> /* Ignore already charged objects. */
> - slab_exts = slab_obj_exts(slab);
> - if (slab_exts) {
> + obj_exts = slab_obj_exts(slab);
> + if (obj_exts) {
> off = obj_to_index(s, slab, p);
> - if (unlikely(slab_exts[off].objcg))
> + obj_ext = slab_obj_ext(slab, obj_exts, off);
> + if (unlikely(obj_ext->objcg))
> return true;
> }
>
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: [RFC PATCH V3 3/7] mm/slab: abstract slabobj_ext access via new slab_obj_ext() helper
2025-10-28 17:55 ` Suren Baghdasaryan
@ 2025-10-29 8:49 ` Harry Yoo
2025-10-29 15:24 ` Suren Baghdasaryan
0 siblings, 1 reply; 34+ messages in thread
From: Harry Yoo @ 2025-10-29 8:49 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: akpm, vbabka, andreyknvl, cl, dvyukov, glider, hannes, linux-mm,
mhocko, muchun.song, rientjes, roman.gushchin, ryabinin.a.a,
shakeel.butt, vincenzo.frascino, yeoreum.yun, tytso,
adilger.kernel, linux-ext4, linux-kernel
On Tue, Oct 28, 2025 at 10:55:39AM -0700, Suren Baghdasaryan wrote:
> On Mon, Oct 27, 2025 at 5:29 AM Harry Yoo <harry.yoo@oracle.com> wrote:
> >
> > Currently, the slab allocator assumes that slab->obj_exts is a pointer
> > to an array of struct slabobj_ext objects. However, to support storage
> > methods where struct slabobj_ext is embedded within objects, the slab
> > allocator should not make this assumption. Instead of directly
> > dereferencing the slabobj_exts array, abstract access to
> > struct slabobj_ext via helper functions.
> >
> > Introduce a new API slabobj_ext metadata access:
> >
> > slab_obj_ext(slab, obj_exts, index) - returns the pointer to
> > struct slabobj_ext element at the given index.
> >
> > Directly dereferencing the return value of slab_obj_exts() is no longer
> > allowed. Instead, slab_obj_ext() must always be used to access
> > individual struct slabobj_ext objects.
>
> If direct access to the vector is not allowed, it would be better to
> eliminate slab_obj_exts() function completely and use the new
> slab_obj_ext() instead. I think that's possible. We might need an
> additional `bool is_slab_obj_exts()` helper for an early check before
> we calculate the object index but that's quite easy.
Good point, but that way we cannot avoid reading slab->obj_exts
multiple times when we access slabobj_ext of multiple objects
as it's accessed via READ_ONCE().
--
Cheers,
Harry / Hyeonggon
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC PATCH V3 3/7] mm/slab: abstract slabobj_ext access via new slab_obj_ext() helper
2025-10-29 8:49 ` Harry Yoo
@ 2025-10-29 15:24 ` Suren Baghdasaryan
2025-10-30 1:26 ` Harry Yoo
0 siblings, 1 reply; 34+ messages in thread
From: Suren Baghdasaryan @ 2025-10-29 15:24 UTC (permalink / raw)
To: Harry Yoo
Cc: akpm, vbabka, andreyknvl, cl, dvyukov, glider, hannes, linux-mm,
mhocko, muchun.song, rientjes, roman.gushchin, ryabinin.a.a,
shakeel.butt, vincenzo.frascino, yeoreum.yun, tytso,
adilger.kernel, linux-ext4, linux-kernel
On Wed, Oct 29, 2025 at 1:49 AM Harry Yoo <harry.yoo@oracle.com> wrote:
>
> On Tue, Oct 28, 2025 at 10:55:39AM -0700, Suren Baghdasaryan wrote:
> > On Mon, Oct 27, 2025 at 5:29 AM Harry Yoo <harry.yoo@oracle.com> wrote:
> > >
> > > Currently, the slab allocator assumes that slab->obj_exts is a pointer
> > > to an array of struct slabobj_ext objects. However, to support storage
> > > methods where struct slabobj_ext is embedded within objects, the slab
> > > allocator should not make this assumption. Instead of directly
> > > dereferencing the slabobj_exts array, abstract access to
> > > struct slabobj_ext via helper functions.
> > >
> > > Introduce a new API slabobj_ext metadata access:
> > >
> > > slab_obj_ext(slab, obj_exts, index) - returns the pointer to
> > > struct slabobj_ext element at the given index.
> > >
> > > Directly dereferencing the return value of slab_obj_exts() is no longer
> > > allowed. Instead, slab_obj_ext() must always be used to access
> > > individual struct slabobj_ext objects.
> >
> > If direct access to the vector is not allowed, it would be better to
> > eliminate slab_obj_exts() function completely and use the new
> > slab_obj_ext() instead. I think that's possible. We might need an
> > additional `bool is_slab_obj_exts()` helper for an early check before
> > we calculate the object index but that's quite easy.
>
> Good point, but that way we cannot avoid reading slab->obj_exts
> multiple times when we access slabobj_ext of multiple objects
> as it's accessed via READ_ONCE().
True. I think we use slab->obj_exts to loop over its elements only in
two places: __memcg_slab_post_alloc_hook() and
__memcg_slab_free_hook(). I guess we could implement some kind of
slab_objext_foreach() construct to loop over all elements of
slab->obj_exts?
>
> --
> Cheers,
> Harry / Hyeonggon
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC PATCH V3 3/7] mm/slab: abstract slabobj_ext access via new slab_obj_ext() helper
2025-10-29 15:24 ` Suren Baghdasaryan
@ 2025-10-30 1:26 ` Harry Yoo
2025-10-30 5:03 ` Suren Baghdasaryan
0 siblings, 1 reply; 34+ messages in thread
From: Harry Yoo @ 2025-10-30 1:26 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: akpm, vbabka, andreyknvl, cl, dvyukov, glider, hannes, linux-mm,
mhocko, muchun.song, rientjes, roman.gushchin, ryabinin.a.a,
shakeel.butt, vincenzo.frascino, yeoreum.yun, tytso,
adilger.kernel, linux-ext4, linux-kernel
On Wed, Oct 29, 2025 at 08:24:35AM -0700, Suren Baghdasaryan wrote:
> On Wed, Oct 29, 2025 at 1:49 AM Harry Yoo <harry.yoo@oracle.com> wrote:
> >
> > On Tue, Oct 28, 2025 at 10:55:39AM -0700, Suren Baghdasaryan wrote:
> > > On Mon, Oct 27, 2025 at 5:29 AM Harry Yoo <harry.yoo@oracle.com> wrote:
> > > >
> > > > Currently, the slab allocator assumes that slab->obj_exts is a pointer
> > > > to an array of struct slabobj_ext objects. However, to support storage
> > > > methods where struct slabobj_ext is embedded within objects, the slab
> > > > allocator should not make this assumption. Instead of directly
> > > > dereferencing the slabobj_exts array, abstract access to
> > > > struct slabobj_ext via helper functions.
> > > >
> > > > Introduce a new API slabobj_ext metadata access:
> > > >
> > > > slab_obj_ext(slab, obj_exts, index) - returns the pointer to
> > > > struct slabobj_ext element at the given index.
> > > >
> > > > Directly dereferencing the return value of slab_obj_exts() is no longer
> > > > allowed. Instead, slab_obj_ext() must always be used to access
> > > > individual struct slabobj_ext objects.
> > >
> > > If direct access to the vector is not allowed, it would be better to
> > > eliminate slab_obj_exts() function completely and use the new
> > > slab_obj_ext() instead. I think that's possible. We might need an
> > > additional `bool is_slab_obj_exts()` helper for an early check before
> > > we calculate the object index but that's quite easy.
> >
> > Good point, but that way we cannot avoid reading slab->obj_exts
> > multiple times when we access slabobj_ext of multiple objects
> > as it's accessed via READ_ONCE().
>
> True. I think we use slab->obj_exts to loop over its elements only in
> two places: __memcg_slab_post_alloc_hook() and
> __memcg_slab_free_hook(). I guess we could implement some kind of
> slab_objext_foreach() construct to loop over all elements of
> slab->obj_exts?
Not sure if that would help here. In __memcg_slab_free_hook() we want to
iterate only some of (not all of) elements from the same slab
(we know they're from the same slab as we build detached freelist and
sort the array) and so we read slab->obj_exts only once.
In __memcg_slab_post_alloc_hook() we don't know if the objects are from
the same slab, so we read slab->obj_exts multiple times and charge them.
I think we need to either 1) remove slab_obj_exts() and
then introduce is_slab_obj_exts() and see if it has impact on
performance, or 2) keep it as-is.
Thanks!
--
Cheers,
Harry / Hyeonggon
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC PATCH V3 3/7] mm/slab: abstract slabobj_ext access via new slab_obj_ext() helper
2025-10-30 1:26 ` Harry Yoo
@ 2025-10-30 5:03 ` Suren Baghdasaryan
0 siblings, 0 replies; 34+ messages in thread
From: Suren Baghdasaryan @ 2025-10-30 5:03 UTC (permalink / raw)
To: Harry Yoo
Cc: akpm, vbabka, andreyknvl, cl, dvyukov, glider, hannes, linux-mm,
mhocko, muchun.song, rientjes, roman.gushchin, ryabinin.a.a,
shakeel.butt, vincenzo.frascino, yeoreum.yun, tytso,
adilger.kernel, linux-ext4, linux-kernel
On Wed, Oct 29, 2025 at 6:26 PM Harry Yoo <harry.yoo@oracle.com> wrote:
>
> On Wed, Oct 29, 2025 at 08:24:35AM -0700, Suren Baghdasaryan wrote:
> > On Wed, Oct 29, 2025 at 1:49 AM Harry Yoo <harry.yoo@oracle.com> wrote:
> > >
> > > On Tue, Oct 28, 2025 at 10:55:39AM -0700, Suren Baghdasaryan wrote:
> > > > On Mon, Oct 27, 2025 at 5:29 AM Harry Yoo <harry.yoo@oracle.com> wrote:
> > > > >
> > > > > Currently, the slab allocator assumes that slab->obj_exts is a pointer
> > > > > to an array of struct slabobj_ext objects. However, to support storage
> > > > > methods where struct slabobj_ext is embedded within objects, the slab
> > > > > allocator should not make this assumption. Instead of directly
> > > > > dereferencing the slabobj_exts array, abstract access to
> > > > > struct slabobj_ext via helper functions.
> > > > >
> > > > > Introduce a new API slabobj_ext metadata access:
> > > > >
> > > > > slab_obj_ext(slab, obj_exts, index) - returns the pointer to
> > > > > struct slabobj_ext element at the given index.
> > > > >
> > > > > Directly dereferencing the return value of slab_obj_exts() is no longer
> > > > > allowed. Instead, slab_obj_ext() must always be used to access
> > > > > individual struct slabobj_ext objects.
> > > >
> > > > If direct access to the vector is not allowed, it would be better to
> > > > eliminate slab_obj_exts() function completely and use the new
> > > > slab_obj_ext() instead. I think that's possible. We might need an
> > > > additional `bool is_slab_obj_exts()` helper for an early check before
> > > > we calculate the object index but that's quite easy.
> > >
> > > Good point, but that way we cannot avoid reading slab->obj_exts
> > > multiple times when we access slabobj_ext of multiple objects
> > > as it's accessed via READ_ONCE().
> >
> > True. I think we use slab->obj_exts to loop over its elements only in
> > two places: __memcg_slab_post_alloc_hook() and
> > __memcg_slab_free_hook(). I guess we could implement some kind of
> > slab_objext_foreach() construct to loop over all elements of
> > slab->obj_exts?
>
> Not sure if that would help here. In __memcg_slab_free_hook() we want to
> iterate only some of (not all of) elements from the same slab
> (we know they're from the same slab as we build detached freelist and
> sort the array) and so we read slab->obj_exts only once.
>
> In __memcg_slab_post_alloc_hook() we don't know if the objects are from
> the same slab, so we read slab->obj_exts multiple times and charge them.
>
> I think we need to either 1) remove slab_obj_exts() and
> then introduce is_slab_obj_exts() and see if it has impact on
> performance, or 2) keep it as-is.
Ok, it sounds like too much effort for avoiding a direct accessor.
Let's go with (2) for now.
>
> Thanks!
>
> --
> Cheers,
> Harry / Hyeonggon
^ permalink raw reply [flat|nested] 34+ messages in thread
* [RFC PATCH V3 4/7] mm/slab: use stride to access slabobj_ext
2025-10-27 12:28 [RFC PATCH V3 0/7] mm/slab: reduce slab accounting memory overhead by allocating slabobj_ext metadata within unused slab space Harry Yoo
` (2 preceding siblings ...)
2025-10-27 12:28 ` [RFC PATCH V3 3/7] mm/slab: abstract slabobj_ext access via new slab_obj_ext() helper Harry Yoo
@ 2025-10-27 12:28 ` Harry Yoo
2025-10-28 20:10 ` Suren Baghdasaryan
2025-10-27 12:28 ` [RFC PATCH V3 5/7] mm/memcontrol,alloc_tag: handle slabobj_ext access under KASAN poison Harry Yoo
` (3 subsequent siblings)
7 siblings, 1 reply; 34+ messages in thread
From: Harry Yoo @ 2025-10-27 12:28 UTC (permalink / raw)
To: akpm, vbabka
Cc: andreyknvl, cl, dvyukov, glider, hannes, linux-mm, mhocko,
muchun.song, rientjes, roman.gushchin, ryabinin.a.a, shakeel.butt,
surenb, vincenzo.frascino, yeoreum.yun, harry.yoo, tytso,
adilger.kernel, linux-ext4, linux-kernel
Use a configurable stride value when accessing slab object extension
metadata instead of assuming a fixed sizeof(struct slabobj_ext).
Store stride value in free bits of slab->counters field. This allows
for flexibility in cases where the extension is embedded within
slab objects.
Since these free bits exist only on 64-bit, any future optimizations
that need to change stride value cannot be enabled on 32-bit architectures.
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
---
mm/slab.h | 37 +++++++++++++++++++++++++++++++++----
mm/slub.c | 2 ++
2 files changed, 35 insertions(+), 4 deletions(-)
diff --git a/mm/slab.h b/mm/slab.h
index df2c987d950d..22ee28cb55e1 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -83,6 +83,14 @@ struct slab {
* that the slab was corrupted
*/
unsigned frozen:1;
+#ifdef CONFIG_64BIT
+ /*
+ * Some optimizations use free bits in 'counters' field
+ * to save memory. In case ->stride field is not available,
+ * such optimizations are disabled.
+ */
+ unsigned short stride;
+#endif
};
};
};
@@ -550,6 +558,26 @@ static inline unsigned long slab_obj_exts(struct slab *slab)
return obj_exts & ~OBJEXTS_FLAGS_MASK;
}
+#ifdef CONFIG_64BIT
+static inline void slab_set_stride(struct slab *slab, unsigned short stride)
+{
+ slab->stride = stride;
+}
+static inline unsigned short slab_get_stride(struct slab *slab)
+{
+ return slab->stride;
+}
+#else
+static inline void slab_set_stride(struct slab *slab, unsigned short stride)
+{
+ VM_WARN_ON_ONCE(stride != sizeof(struct slabobj_ext));
+}
+static inline unsigned short slab_get_stride(struct slab *slab)
+{
+ return sizeof(struct slabobj_ext);
+}
+#endif
+
/*
* slab_obj_ext - get the pointer to the slab object extension metadata
* associated with an object in a slab.
@@ -563,13 +591,10 @@ static inline struct slabobj_ext *slab_obj_ext(struct slab *slab,
unsigned long obj_exts,
unsigned int index)
{
- struct slabobj_ext *obj_ext;
-
VM_WARN_ON_ONCE(!slab_obj_exts(slab));
VM_WARN_ON_ONCE(obj_exts != slab_obj_exts(slab));
- obj_ext = (struct slabobj_ext *)obj_exts;
- return &obj_ext[index];
+ return (struct slabobj_ext *)(obj_exts + slab_get_stride(slab) * index);
}
int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
@@ -588,6 +613,10 @@ static inline struct slabobj_ext *slab_obj_ext(struct slab *slab,
return NULL;
}
+static inline void slab_set_stride(struct slab *slab, unsigned int stride) { }
+static inline unsigned int slab_get_stride(struct slab *slab) { return 0; }
+
+
#endif /* CONFIG_SLAB_OBJ_EXT */
static inline enum node_stat_item cache_vmstat_idx(struct kmem_cache *s)
diff --git a/mm/slub.c b/mm/slub.c
index ae73403f8c29..4383740a4d34 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2134,6 +2134,8 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
#endif
old_exts = READ_ONCE(slab->obj_exts);
handle_failed_objexts_alloc(old_exts, vec, objects);
+ slab_set_stride(slab, sizeof(struct slabobj_ext));
+
if (new_slab) {
/*
* If the slab is brand new and nobody can yet access its
--
2.43.0
^ permalink raw reply related [flat|nested] 34+ messages in thread* Re: [RFC PATCH V3 4/7] mm/slab: use stride to access slabobj_ext
2025-10-27 12:28 ` [RFC PATCH V3 4/7] mm/slab: use stride to access slabobj_ext Harry Yoo
@ 2025-10-28 20:10 ` Suren Baghdasaryan
0 siblings, 0 replies; 34+ messages in thread
From: Suren Baghdasaryan @ 2025-10-28 20:10 UTC (permalink / raw)
To: Harry Yoo
Cc: akpm, vbabka, andreyknvl, cl, dvyukov, glider, hannes, linux-mm,
mhocko, muchun.song, rientjes, roman.gushchin, ryabinin.a.a,
shakeel.butt, vincenzo.frascino, yeoreum.yun, tytso,
adilger.kernel, linux-ext4, linux-kernel
On Mon, Oct 27, 2025 at 5:29 AM Harry Yoo <harry.yoo@oracle.com> wrote:
>
> Use a configurable stride value when accessing slab object extension
> metadata instead of assuming a fixed sizeof(struct slabobj_ext).
>
> Store stride value in free bits of slab->counters field. This allows
> for flexibility in cases where the extension is embedded within
> slab objects.
>
> Since these free bits exist only on 64-bit, any future optimizations
> that need to change stride value cannot be enabled on 32-bit architectures.
>
> Suggested-by: Vlastimil Babka <vbabka@suse.cz>
> Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
I hope slab_obj_exts() can be removed in the next revision, but otherwise LGTM.
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
> ---
> mm/slab.h | 37 +++++++++++++++++++++++++++++++++----
> mm/slub.c | 2 ++
> 2 files changed, 35 insertions(+), 4 deletions(-)
>
> diff --git a/mm/slab.h b/mm/slab.h
> index df2c987d950d..22ee28cb55e1 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -83,6 +83,14 @@ struct slab {
> * that the slab was corrupted
> */
> unsigned frozen:1;
> +#ifdef CONFIG_64BIT
> + /*
> + * Some optimizations use free bits in 'counters' field
> + * to save memory. In case ->stride field is not available,
> + * such optimizations are disabled.
> + */
> + unsigned short stride;
> +#endif
> };
> };
> };
> @@ -550,6 +558,26 @@ static inline unsigned long slab_obj_exts(struct slab *slab)
> return obj_exts & ~OBJEXTS_FLAGS_MASK;
> }
>
> +#ifdef CONFIG_64BIT
> +static inline void slab_set_stride(struct slab *slab, unsigned short stride)
> +{
> + slab->stride = stride;
> +}
> +static inline unsigned short slab_get_stride(struct slab *slab)
> +{
> + return slab->stride;
> +}
> +#else
> +static inline void slab_set_stride(struct slab *slab, unsigned short stride)
> +{
> + VM_WARN_ON_ONCE(stride != sizeof(struct slabobj_ext));
> +}
> +static inline unsigned short slab_get_stride(struct slab *slab)
> +{
> + return sizeof(struct slabobj_ext);
> +}
> +#endif
> +
> /*
> * slab_obj_ext - get the pointer to the slab object extension metadata
> * associated with an object in a slab.
> @@ -563,13 +591,10 @@ static inline struct slabobj_ext *slab_obj_ext(struct slab *slab,
> unsigned long obj_exts,
> unsigned int index)
> {
> - struct slabobj_ext *obj_ext;
> -
> VM_WARN_ON_ONCE(!slab_obj_exts(slab));
> VM_WARN_ON_ONCE(obj_exts != slab_obj_exts(slab));
>
> - obj_ext = (struct slabobj_ext *)obj_exts;
> - return &obj_ext[index];
> + return (struct slabobj_ext *)(obj_exts + slab_get_stride(slab) * index);
> }
>
> int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
> @@ -588,6 +613,10 @@ static inline struct slabobj_ext *slab_obj_ext(struct slab *slab,
> return NULL;
> }
>
> +static inline void slab_set_stride(struct slab *slab, unsigned int stride) { }
> +static inline unsigned int slab_get_stride(struct slab *slab) { return 0; }
> +
> +
> #endif /* CONFIG_SLAB_OBJ_EXT */
>
> static inline enum node_stat_item cache_vmstat_idx(struct kmem_cache *s)
> diff --git a/mm/slub.c b/mm/slub.c
> index ae73403f8c29..4383740a4d34 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2134,6 +2134,8 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
> #endif
> old_exts = READ_ONCE(slab->obj_exts);
> handle_failed_objexts_alloc(old_exts, vec, objects);
> + slab_set_stride(slab, sizeof(struct slabobj_ext));
> +
> if (new_slab) {
> /*
> * If the slab is brand new and nobody can yet access its
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 34+ messages in thread
* [RFC PATCH V3 5/7] mm/memcontrol,alloc_tag: handle slabobj_ext access under KASAN poison
2025-10-27 12:28 [RFC PATCH V3 0/7] mm/slab: reduce slab accounting memory overhead by allocating slabobj_ext metadata within unused slab space Harry Yoo
` (3 preceding siblings ...)
2025-10-27 12:28 ` [RFC PATCH V3 4/7] mm/slab: use stride to access slabobj_ext Harry Yoo
@ 2025-10-27 12:28 ` Harry Yoo
2025-10-28 23:03 ` Suren Baghdasaryan
2025-10-27 12:28 ` [RFC PATCH V3 6/7] mm/slab: save memory by allocating slabobj_ext array from leftover Harry Yoo
` (2 subsequent siblings)
7 siblings, 1 reply; 34+ messages in thread
From: Harry Yoo @ 2025-10-27 12:28 UTC (permalink / raw)
To: akpm, vbabka
Cc: andreyknvl, cl, dvyukov, glider, hannes, linux-mm, mhocko,
muchun.song, rientjes, roman.gushchin, ryabinin.a.a, shakeel.butt,
surenb, vincenzo.frascino, yeoreum.yun, harry.yoo, tytso,
adilger.kernel, linux-ext4, linux-kernel
In the near future, slabobj_ext may reside outside the allocated slab
object range within a slab, which could be reported as an out-of-bounds
access by KASAN. To prevent false positives, explicitly disable KASAN
and KMSAN checks when accessing slabobj_ext.
While an alternative approach could be to unpoison slabobj_ext,
out-of-bounds accesses outside the slab allocator are generally more
common.
Move metadata_access_enable()/disable() helpers to mm/slab.h so that
it can be used outside mm/slub.c. Wrap accesses to slabobj_ext metadata
in memcg and alloc_tag code with these helpers.
Call kasan_reset_tag() in slab_obj_ext() before returning the address to
prevent SW or HW tag-based KASAN from reporting false positives.
Suggested-by: Andrey Konovalov <andreyknvl@gmail.com>
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
---
mm/memcontrol.c | 15 ++++++++++++---
mm/slab.h | 24 +++++++++++++++++++++++-
mm/slub.c | 33 +++++++++++++--------------------
3 files changed, 48 insertions(+), 24 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 2a9dc246e802..38e6e9099ff5 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2570,17 +2570,22 @@ struct mem_cgroup *mem_cgroup_from_obj_folio(struct folio *folio, void *p)
struct slabobj_ext *obj_ext;
struct slab *slab;
unsigned int off;
+ struct mem_cgroup *memcg;
slab = folio_slab(folio);
obj_exts = slab_obj_exts(slab);
if (!obj_exts)
return NULL;
+ metadata_access_enable();
off = obj_to_index(slab->slab_cache, slab, p);
obj_ext = slab_obj_ext(slab, obj_exts, off);
- if (obj_ext->objcg)
- return obj_cgroup_memcg(obj_ext->objcg);
-
+ if (obj_ext->objcg) {
+ memcg = obj_cgroup_memcg(obj_ext->objcg);
+ metadata_access_disable();
+ return memcg;
+ }
+ metadata_access_disable();
return NULL;
}
@@ -3197,9 +3202,11 @@ bool __memcg_slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
obj_exts = slab_obj_exts(slab);
off = obj_to_index(s, slab, p[i]);
+ metadata_access_enable();
obj_ext = slab_obj_ext(slab, obj_exts, off);
obj_cgroup_get(objcg);
obj_ext->objcg = objcg;
+ metadata_access_disable();
}
return true;
@@ -3210,6 +3217,7 @@ void __memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
{
size_t obj_size = obj_full_size(s);
+ metadata_access_enable();
for (int i = 0; i < objects; i++) {
struct obj_cgroup *objcg;
struct slabobj_ext *obj_ext;
@@ -3226,6 +3234,7 @@ void __memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
slab_pgdat(slab), cache_vmstat_idx(s));
obj_cgroup_put(objcg);
}
+ metadata_access_disable();
}
/*
diff --git a/mm/slab.h b/mm/slab.h
index 22ee28cb55e1..13f4ca65cb42 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -591,10 +591,14 @@ static inline struct slabobj_ext *slab_obj_ext(struct slab *slab,
unsigned long obj_exts,
unsigned int index)
{
+ struct slabobj_ext *obj_ext;
+
VM_WARN_ON_ONCE(!slab_obj_exts(slab));
VM_WARN_ON_ONCE(obj_exts != slab_obj_exts(slab));
- return (struct slabobj_ext *)(obj_exts + slab_get_stride(slab) * index);
+ obj_ext = (struct slabobj_ext *)(obj_exts +
+ slab_get_stride(slab) * index);
+ return kasan_reset_tag(obj_ext);
}
int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
@@ -625,6 +629,24 @@ static inline enum node_stat_item cache_vmstat_idx(struct kmem_cache *s)
NR_SLAB_RECLAIMABLE_B : NR_SLAB_UNRECLAIMABLE_B;
}
+/*
+ * slub is about to manipulate internal object metadata. This memory lies
+ * outside the range of the allocated object, so accessing it would normally
+ * be reported by kasan as a bounds error. metadata_access_enable() is used
+ * to tell kasan that these accesses are OK.
+ */
+static inline void metadata_access_enable(void)
+{
+ kasan_disable_current();
+ kmsan_disable_current();
+}
+
+static inline void metadata_access_disable(void)
+{
+ kmsan_enable_current();
+ kasan_enable_current();
+}
+
#ifdef CONFIG_MEMCG
bool __memcg_slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
gfp_t flags, size_t size, void **p);
diff --git a/mm/slub.c b/mm/slub.c
index 4383740a4d34..13acc9437ef5 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -973,24 +973,6 @@ static slab_flags_t slub_debug;
static char *slub_debug_string;
static int disable_higher_order_debug;
-/*
- * slub is about to manipulate internal object metadata. This memory lies
- * outside the range of the allocated object, so accessing it would normally
- * be reported by kasan as a bounds error. metadata_access_enable() is used
- * to tell kasan that these accesses are OK.
- */
-static inline void metadata_access_enable(void)
-{
- kasan_disable_current();
- kmsan_disable_current();
-}
-
-static inline void metadata_access_disable(void)
-{
- kmsan_enable_current();
- kasan_enable_current();
-}
-
/*
* Object debugging
*/
@@ -2042,9 +2024,11 @@ static inline void mark_objexts_empty(struct slabobj_ext *obj_exts)
struct slabobj_ext *ext = slab_obj_ext(obj_exts_slab,
slab_exts, offs);
+ metadata_access_enable();
/* codetag should be NULL */
WARN_ON(ext->ref.ct);
set_codetag_empty(&ext->ref);
+ metadata_access_disable();
}
}
@@ -2245,8 +2229,11 @@ __alloc_tagging_slab_alloc_hook(struct kmem_cache *s, void *object, gfp_t flags)
* If other users appear then mem_alloc_profiling_enabled()
* check should be added before alloc_tag_add().
*/
- if (likely(obj_ext))
+ if (likely(obj_ext)) {
+ metadata_access_enable();
alloc_tag_add(&obj_ext->ref, current->alloc_tag, s->size);
+ metadata_access_disable();
+ }
}
static inline void
@@ -2272,11 +2259,13 @@ __alloc_tagging_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p
if (!obj_exts)
return;
+ metadata_access_enable();
for (i = 0; i < objects; i++) {
unsigned int off = obj_to_index(s, slab, p[i]);
alloc_tag_sub(&slab_obj_ext(slab, obj_exts, off)->ref, s->size);
}
+ metadata_access_disable();
}
static inline void
@@ -2394,8 +2383,12 @@ bool memcg_slab_post_charge(void *p, gfp_t flags)
if (obj_exts) {
off = obj_to_index(s, slab, p);
obj_ext = slab_obj_ext(slab, obj_exts, off);
- if (unlikely(obj_ext->objcg))
+ metadata_access_enable();
+ if (unlikely(obj_ext->objcg)) {
+ metadata_access_disable();
return true;
+ }
+ metadata_access_disable();
}
return __memcg_slab_post_alloc_hook(s, NULL, flags, 1, &p);
--
2.43.0
^ permalink raw reply related [flat|nested] 34+ messages in thread* Re: [RFC PATCH V3 5/7] mm/memcontrol,alloc_tag: handle slabobj_ext access under KASAN poison
2025-10-27 12:28 ` [RFC PATCH V3 5/7] mm/memcontrol,alloc_tag: handle slabobj_ext access under KASAN poison Harry Yoo
@ 2025-10-28 23:03 ` Suren Baghdasaryan
2025-10-29 8:06 ` Harry Yoo
0 siblings, 1 reply; 34+ messages in thread
From: Suren Baghdasaryan @ 2025-10-28 23:03 UTC (permalink / raw)
To: Harry Yoo
Cc: akpm, vbabka, andreyknvl, cl, dvyukov, glider, hannes, linux-mm,
mhocko, muchun.song, rientjes, roman.gushchin, ryabinin.a.a,
shakeel.butt, vincenzo.frascino, yeoreum.yun, tytso,
adilger.kernel, linux-ext4, linux-kernel
On Mon, Oct 27, 2025 at 5:29 AM Harry Yoo <harry.yoo@oracle.com> wrote:
>
> In the near future, slabobj_ext may reside outside the allocated slab
> object range within a slab, which could be reported as an out-of-bounds
> access by KASAN. To prevent false positives, explicitly disable KASAN
> and KMSAN checks when accessing slabobj_ext.
Hmm. This is fragile IMO. Every time someone accesses slabobj_ext they
should remember to call
metadata_access_enable/metadata_access_disable.
Have you considered replacing slab_obj_ext() function with
get_slab_obj_ext()/put_slab_obj_ext()? get_slab_obj_ext() can call
metadata_access_enable() and return slabobj_ext as it does today.
put_slab_obj_ext() will simple call metadata_access_disable(). WDYT?
>
> While an alternative approach could be to unpoison slabobj_ext,
> out-of-bounds accesses outside the slab allocator are generally more
> common.
>
> Move metadata_access_enable()/disable() helpers to mm/slab.h so that
> it can be used outside mm/slub.c. Wrap accesses to slabobj_ext metadata
> in memcg and alloc_tag code with these helpers.
>
> Call kasan_reset_tag() in slab_obj_ext() before returning the address to
> prevent SW or HW tag-based KASAN from reporting false positives.
>
> Suggested-by: Andrey Konovalov <andreyknvl@gmail.com>
> Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
> ---
> mm/memcontrol.c | 15 ++++++++++++---
> mm/slab.h | 24 +++++++++++++++++++++++-
> mm/slub.c | 33 +++++++++++++--------------------
> 3 files changed, 48 insertions(+), 24 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 2a9dc246e802..38e6e9099ff5 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2570,17 +2570,22 @@ struct mem_cgroup *mem_cgroup_from_obj_folio(struct folio *folio, void *p)
> struct slabobj_ext *obj_ext;
> struct slab *slab;
> unsigned int off;
> + struct mem_cgroup *memcg;
>
> slab = folio_slab(folio);
> obj_exts = slab_obj_exts(slab);
> if (!obj_exts)
> return NULL;
>
> + metadata_access_enable();
> off = obj_to_index(slab->slab_cache, slab, p);
> obj_ext = slab_obj_ext(slab, obj_exts, off);
> - if (obj_ext->objcg)
> - return obj_cgroup_memcg(obj_ext->objcg);
> -
> + if (obj_ext->objcg) {
> + memcg = obj_cgroup_memcg(obj_ext->objcg);
> + metadata_access_disable();
> + return memcg;
> + }
> + metadata_access_disable();
> return NULL;
> }
>
> @@ -3197,9 +3202,11 @@ bool __memcg_slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
>
> obj_exts = slab_obj_exts(slab);
> off = obj_to_index(s, slab, p[i]);
> + metadata_access_enable();
> obj_ext = slab_obj_ext(slab, obj_exts, off);
> obj_cgroup_get(objcg);
> obj_ext->objcg = objcg;
> + metadata_access_disable();
> }
>
> return true;
> @@ -3210,6 +3217,7 @@ void __memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
> {
> size_t obj_size = obj_full_size(s);
>
> + metadata_access_enable();
> for (int i = 0; i < objects; i++) {
> struct obj_cgroup *objcg;
> struct slabobj_ext *obj_ext;
> @@ -3226,6 +3234,7 @@ void __memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
> slab_pgdat(slab), cache_vmstat_idx(s));
> obj_cgroup_put(objcg);
> }
> + metadata_access_disable();
> }
>
> /*
> diff --git a/mm/slab.h b/mm/slab.h
> index 22ee28cb55e1..13f4ca65cb42 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -591,10 +591,14 @@ static inline struct slabobj_ext *slab_obj_ext(struct slab *slab,
> unsigned long obj_exts,
> unsigned int index)
> {
> + struct slabobj_ext *obj_ext;
> +
> VM_WARN_ON_ONCE(!slab_obj_exts(slab));
> VM_WARN_ON_ONCE(obj_exts != slab_obj_exts(slab));
>
> - return (struct slabobj_ext *)(obj_exts + slab_get_stride(slab) * index);
> + obj_ext = (struct slabobj_ext *)(obj_exts +
> + slab_get_stride(slab) * index);
> + return kasan_reset_tag(obj_ext);
> }
>
> int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
> @@ -625,6 +629,24 @@ static inline enum node_stat_item cache_vmstat_idx(struct kmem_cache *s)
> NR_SLAB_RECLAIMABLE_B : NR_SLAB_UNRECLAIMABLE_B;
> }
>
> +/*
> + * slub is about to manipulate internal object metadata. This memory lies
> + * outside the range of the allocated object, so accessing it would normally
> + * be reported by kasan as a bounds error. metadata_access_enable() is used
> + * to tell kasan that these accesses are OK.
> + */
> +static inline void metadata_access_enable(void)
> +{
> + kasan_disable_current();
> + kmsan_disable_current();
> +}
> +
> +static inline void metadata_access_disable(void)
> +{
> + kmsan_enable_current();
> + kasan_enable_current();
> +}
> +
> #ifdef CONFIG_MEMCG
> bool __memcg_slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
> gfp_t flags, size_t size, void **p);
> diff --git a/mm/slub.c b/mm/slub.c
> index 4383740a4d34..13acc9437ef5 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -973,24 +973,6 @@ static slab_flags_t slub_debug;
> static char *slub_debug_string;
> static int disable_higher_order_debug;
>
> -/*
> - * slub is about to manipulate internal object metadata. This memory lies
> - * outside the range of the allocated object, so accessing it would normally
> - * be reported by kasan as a bounds error. metadata_access_enable() is used
> - * to tell kasan that these accesses are OK.
> - */
> -static inline void metadata_access_enable(void)
> -{
> - kasan_disable_current();
> - kmsan_disable_current();
> -}
> -
> -static inline void metadata_access_disable(void)
> -{
> - kmsan_enable_current();
> - kasan_enable_current();
> -}
> -
> /*
> * Object debugging
> */
> @@ -2042,9 +2024,11 @@ static inline void mark_objexts_empty(struct slabobj_ext *obj_exts)
> struct slabobj_ext *ext = slab_obj_ext(obj_exts_slab,
> slab_exts, offs);
>
> + metadata_access_enable();
> /* codetag should be NULL */
> WARN_ON(ext->ref.ct);
> set_codetag_empty(&ext->ref);
> + metadata_access_disable();
> }
> }
>
> @@ -2245,8 +2229,11 @@ __alloc_tagging_slab_alloc_hook(struct kmem_cache *s, void *object, gfp_t flags)
> * If other users appear then mem_alloc_profiling_enabled()
> * check should be added before alloc_tag_add().
> */
> - if (likely(obj_ext))
> + if (likely(obj_ext)) {
> + metadata_access_enable();
> alloc_tag_add(&obj_ext->ref, current->alloc_tag, s->size);
> + metadata_access_disable();
> + }
> }
>
> static inline void
> @@ -2272,11 +2259,13 @@ __alloc_tagging_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p
> if (!obj_exts)
> return;
>
> + metadata_access_enable();
> for (i = 0; i < objects; i++) {
> unsigned int off = obj_to_index(s, slab, p[i]);
>
> alloc_tag_sub(&slab_obj_ext(slab, obj_exts, off)->ref, s->size);
> }
> + metadata_access_disable();
> }
>
> static inline void
> @@ -2394,8 +2383,12 @@ bool memcg_slab_post_charge(void *p, gfp_t flags)
> if (obj_exts) {
> off = obj_to_index(s, slab, p);
> obj_ext = slab_obj_ext(slab, obj_exts, off);
> - if (unlikely(obj_ext->objcg))
> + metadata_access_enable();
> + if (unlikely(obj_ext->objcg)) {
> + metadata_access_disable();
> return true;
> + }
> + metadata_access_disable();
> }
>
> return __memcg_slab_post_alloc_hook(s, NULL, flags, 1, &p);
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: [RFC PATCH V3 5/7] mm/memcontrol,alloc_tag: handle slabobj_ext access under KASAN poison
2025-10-28 23:03 ` Suren Baghdasaryan
@ 2025-10-29 8:06 ` Harry Yoo
2025-10-29 15:28 ` Suren Baghdasaryan
0 siblings, 1 reply; 34+ messages in thread
From: Harry Yoo @ 2025-10-29 8:06 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: akpm, vbabka, andreyknvl, cl, dvyukov, glider, hannes, linux-mm,
mhocko, muchun.song, rientjes, roman.gushchin, ryabinin.a.a,
shakeel.butt, vincenzo.frascino, yeoreum.yun, tytso,
adilger.kernel, linux-ext4, linux-kernel
On Tue, Oct 28, 2025 at 04:03:22PM -0700, Suren Baghdasaryan wrote:
> On Mon, Oct 27, 2025 at 5:29 AM Harry Yoo <harry.yoo@oracle.com> wrote:
> >
> > In the near future, slabobj_ext may reside outside the allocated slab
> > object range within a slab, which could be reported as an out-of-bounds
> > access by KASAN. To prevent false positives, explicitly disable KASAN
> > and KMSAN checks when accessing slabobj_ext.
>
> Hmm. This is fragile IMO. Every time someone accesses slabobj_ext they
> should remember to call
> metadata_access_enable/metadata_access_disable.
Good point!
> Have you considered replacing slab_obj_ext() function with
> get_slab_obj_ext()/put_slab_obj_ext()? get_slab_obj_ext() can call
> metadata_access_enable() and return slabobj_ext as it does today.
> put_slab_obj_ext() will simple call metadata_access_disable(). WDYT?
I did think about it, and I thought introducing get and put helpers
may be misunderstood as doing some kind of reference counting...
but yeah probably I'm being too paranoid and
I'll try this and document that
1) the user needs to use get and put pair to access slabobj_ext
metadata, and
2) calling get and put pair multiple times has no effect.
> > While an alternative approach could be to unpoison slabobj_ext,
> > out-of-bounds accesses outside the slab allocator are generally more
> > common.
> >
> > Move metadata_access_enable()/disable() helpers to mm/slab.h so that
> > it can be used outside mm/slub.c. Wrap accesses to slabobj_ext metadata
> > in memcg and alloc_tag code with these helpers.
> >
> > Call kasan_reset_tag() in slab_obj_ext() before returning the address to
> > prevent SW or HW tag-based KASAN from reporting false positives.
> >
> > Suggested-by: Andrey Konovalov <andreyknvl@gmail.com>
> > Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
> > ---
--
Cheers,
Harry / Hyeonggon
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC PATCH V3 5/7] mm/memcontrol,alloc_tag: handle slabobj_ext access under KASAN poison
2025-10-29 8:06 ` Harry Yoo
@ 2025-10-29 15:28 ` Suren Baghdasaryan
0 siblings, 0 replies; 34+ messages in thread
From: Suren Baghdasaryan @ 2025-10-29 15:28 UTC (permalink / raw)
To: Harry Yoo
Cc: akpm, vbabka, andreyknvl, cl, dvyukov, glider, hannes, linux-mm,
mhocko, muchun.song, rientjes, roman.gushchin, ryabinin.a.a,
shakeel.butt, vincenzo.frascino, yeoreum.yun, tytso,
adilger.kernel, linux-ext4, linux-kernel
On Wed, Oct 29, 2025 at 1:06 AM Harry Yoo <harry.yoo@oracle.com> wrote:
>
> On Tue, Oct 28, 2025 at 04:03:22PM -0700, Suren Baghdasaryan wrote:
> > On Mon, Oct 27, 2025 at 5:29 AM Harry Yoo <harry.yoo@oracle.com> wrote:
> > >
> > > In the near future, slabobj_ext may reside outside the allocated slab
> > > object range within a slab, which could be reported as an out-of-bounds
> > > access by KASAN. To prevent false positives, explicitly disable KASAN
> > > and KMSAN checks when accessing slabobj_ext.
> >
> > Hmm. This is fragile IMO. Every time someone accesses slabobj_ext they
> > should remember to call
> > metadata_access_enable/metadata_access_disable.
>
> Good point!
>
> > Have you considered replacing slab_obj_ext() function with
> > get_slab_obj_ext()/put_slab_obj_ext()? get_slab_obj_ext() can call
> > metadata_access_enable() and return slabobj_ext as it does today.
> > put_slab_obj_ext() will simple call metadata_access_disable(). WDYT?
>
> I did think about it, and I thought introducing get and put helpers
> may be misunderstood as doing some kind of reference counting...
Maybe there are better names but get/put I think are appropriate here.
get_cpu_ptr()/put_cpu_ptr() example is very similar to this.
>
> but yeah probably I'm being too paranoid and
> I'll try this and document that
>
> 1) the user needs to use get and put pair to access slabobj_ext
> metadata, and
>
> 2) calling get and put pair multiple times has no effect.
Yes, I think this would be less error-prone.
>
> > > While an alternative approach could be to unpoison slabobj_ext,
> > > out-of-bounds accesses outside the slab allocator are generally more
> > > common.
> > >
> > > Move metadata_access_enable()/disable() helpers to mm/slab.h so that
> > > it can be used outside mm/slub.c. Wrap accesses to slabobj_ext metadata
> > > in memcg and alloc_tag code with these helpers.
> > >
> > > Call kasan_reset_tag() in slab_obj_ext() before returning the address to
> > > prevent SW or HW tag-based KASAN from reporting false positives.
> > >
> > > Suggested-by: Andrey Konovalov <andreyknvl@gmail.com>
> > > Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
> > > ---
>
> --
> Cheers,
> Harry / Hyeonggon
^ permalink raw reply [flat|nested] 34+ messages in thread
* [RFC PATCH V3 6/7] mm/slab: save memory by allocating slabobj_ext array from leftover
2025-10-27 12:28 [RFC PATCH V3 0/7] mm/slab: reduce slab accounting memory overhead by allocating slabobj_ext metadata within unused slab space Harry Yoo
` (4 preceding siblings ...)
2025-10-27 12:28 ` [RFC PATCH V3 5/7] mm/memcontrol,alloc_tag: handle slabobj_ext access under KASAN poison Harry Yoo
@ 2025-10-27 12:28 ` Harry Yoo
2025-10-29 3:07 ` Suren Baghdasaryan
2025-10-29 18:45 ` Andrey Ryabinin
2025-10-27 12:28 ` [RFC PATCH V3 7/7] mm/slab: place slabobj_ext metadata in unused space within s->size Harry Yoo
2025-10-30 16:39 ` [RFC PATCH V3 0/7] mm/slab: reduce slab accounting memory overhead by allocating slabobj_ext metadata within unused slab space Vlastimil Babka
7 siblings, 2 replies; 34+ messages in thread
From: Harry Yoo @ 2025-10-27 12:28 UTC (permalink / raw)
To: akpm, vbabka
Cc: andreyknvl, cl, dvyukov, glider, hannes, linux-mm, mhocko,
muchun.song, rientjes, roman.gushchin, ryabinin.a.a, shakeel.butt,
surenb, vincenzo.frascino, yeoreum.yun, harry.yoo, tytso,
adilger.kernel, linux-ext4, linux-kernel
The leftover space in a slab is always smaller than s->size, and
kmem caches for large objects that are not power-of-two sizes tend to have
a greater amount of leftover space per slab. In some cases, the leftover
space is larger than the size of the slabobj_ext array for the slab.
An excellent example of such a cache is ext4_inode_cache. On my system,
the object size is 1144, with a preferred order of 3, 28 objects per slab,
and 736 bytes of leftover space per slab.
Since the size of the slabobj_ext array is only 224 bytes (w/o mem
profiling) or 448 bytes (w/ mem profiling) per slab, the entire array
fits within the leftover space.
Allocate the slabobj_exts array from this unused space instead of using
kcalloc(), when it is large enough. The array is always allocated when
creating new slabs, because implementing lazy allocation correctly is
difficult without expensive synchronization.
To avoid unnecessary overhead when MEMCG (with SLAB_ACCOUNT) and
MEM_ALLOC_PROFILING are not used for the cache, only allocate the
slabobj_ext array only when either of them are enabled when slabs are
created.
[ MEMCG=y, MEM_ALLOC_PROFILING=n ]
Before patch (creating 2M directories on ext4):
Slab: 3575348 kB
SReclaimable: 3137804 kB
SUnreclaim: 437544 kB
After patch (creating 2M directories on ext4):
Slab: 3558236 kB
SReclaimable: 3139268 kB
SUnreclaim: 418968 kB (-18.14 MiB)
Enjoy the memory savings!
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
---
mm/slub.c | 147 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 142 insertions(+), 5 deletions(-)
diff --git a/mm/slub.c b/mm/slub.c
index 13acc9437ef5..8101df5fdccf 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -884,6 +884,94 @@ static inline unsigned int get_orig_size(struct kmem_cache *s, void *object)
return *(unsigned int *)p;
}
+#ifdef CONFIG_SLAB_OBJ_EXT
+
+/*
+ * Check if memory cgroup or memory allocation profiling is enabled.
+ * If enabled, SLUB tries to reduce memory overhead of accounting
+ * slab objects. If neither is enabled when this function is called,
+ * the optimization is simply skipped to avoid affecting caches that do not
+ * need slabobj_ext metadata.
+ *
+ * However, this may disable optimization when memory cgroup or memory
+ * allocation profiling is used, but slabs are created too early
+ * even before those subsystems are initialized.
+ */
+static inline bool need_slab_obj_exts(struct kmem_cache *s)
+{
+ if (!mem_cgroup_disabled() && (s->flags & SLAB_ACCOUNT))
+ return true;
+
+ if (mem_alloc_profiling_enabled())
+ return true;
+
+ return false;
+}
+
+static inline unsigned int obj_exts_size_in_slab(struct slab *slab)
+{
+ return sizeof(struct slabobj_ext) * slab->objects;
+}
+
+static inline unsigned long obj_exts_offset_in_slab(struct kmem_cache *s,
+ struct slab *slab)
+{
+ unsigned long objext_offset;
+
+ objext_offset = s->red_left_pad + s->size * slab->objects;
+ objext_offset = ALIGN(objext_offset, sizeof(struct slabobj_ext));
+ return objext_offset;
+}
+
+static inline bool obj_exts_fit_within_slab_leftover(struct kmem_cache *s,
+ struct slab *slab)
+{
+ unsigned long objext_offset = obj_exts_offset_in_slab(s, slab);
+ unsigned long objext_size = obj_exts_size_in_slab(slab);
+
+ return objext_offset + objext_size <= slab_size(slab);
+}
+
+static inline bool obj_exts_in_slab(struct kmem_cache *s, struct slab *slab)
+{
+ unsigned long obj_exts;
+
+ if (!obj_exts_fit_within_slab_leftover(s, slab))
+ return false;
+
+ obj_exts = (unsigned long)slab_address(slab);
+ obj_exts += obj_exts_offset_in_slab(s, slab);
+ return obj_exts == slab_obj_exts(slab);
+}
+#else
+static inline bool need_slab_obj_exts(struct kmem_cache *s)
+{
+ return false;
+}
+
+static inline unsigned int obj_exts_size_in_slab(struct slab *slab)
+{
+ return 0;
+}
+
+static inline unsigned long obj_exts_offset_in_slab(struct kmem_cache *s,
+ struct slab *slab)
+{
+ return 0;
+}
+
+static inline bool obj_exts_fit_within_slab_leftover(struct kmem_cache *s,
+ struct slab *slab)
+{
+ return false;
+}
+
+static inline bool obj_exts_in_slab(struct kmem_cache *s, struct slab *slab)
+{
+ return false;
+}
+#endif
+
#ifdef CONFIG_SLUB_DEBUG
/*
@@ -1404,7 +1492,15 @@ slab_pad_check(struct kmem_cache *s, struct slab *slab)
start = slab_address(slab);
length = slab_size(slab);
end = start + length;
- remainder = length % s->size;
+
+ if (obj_exts_in_slab(s, slab)) {
+ remainder = length;
+ remainder -= obj_exts_offset_in_slab(s, slab);
+ remainder -= obj_exts_size_in_slab(slab);
+ } else {
+ remainder = length % s->size;
+ }
+
if (!remainder)
return;
@@ -2154,6 +2250,11 @@ static inline void free_slab_obj_exts(struct slab *slab)
if (!obj_exts)
return;
+ if (obj_exts_in_slab(slab->slab_cache, slab)) {
+ slab->obj_exts = 0;
+ return;
+ }
+
/*
* obj_exts was created with __GFP_NO_OBJ_EXT flag, therefore its
* corresponding extension will be NULL. alloc_tag_sub() will throw a
@@ -2169,6 +2270,31 @@ static inline void free_slab_obj_exts(struct slab *slab)
slab->obj_exts = 0;
}
+/*
+ * Try to allocate slabobj_ext array from unused space.
+ * This function must be called on a freshly allocated slab to prevent
+ * concurrency problems.
+ */
+static void alloc_slab_obj_exts_early(struct kmem_cache *s, struct slab *slab)
+{
+ void *addr;
+
+ if (!need_slab_obj_exts(s))
+ return;
+
+ metadata_access_enable();
+ if (obj_exts_fit_within_slab_leftover(s, slab)) {
+ addr = slab_address(slab) + obj_exts_offset_in_slab(s, slab);
+ addr = kasan_reset_tag(addr);
+ memset(addr, 0, obj_exts_size_in_slab(slab));
+ slab->obj_exts = (unsigned long)addr;
+ if (IS_ENABLED(CONFIG_MEMCG))
+ slab->obj_exts |= MEMCG_DATA_OBJEXTS;
+ slab_set_stride(slab, sizeof(struct slabobj_ext));
+ }
+ metadata_access_disable();
+}
+
#else /* CONFIG_SLAB_OBJ_EXT */
static inline void init_slab_obj_exts(struct slab *slab)
@@ -2185,6 +2311,11 @@ static inline void free_slab_obj_exts(struct slab *slab)
{
}
+static inline void alloc_slab_obj_exts_early(struct kmem_cache *s,
+ struct slab *slab)
+{
+}
+
#endif /* CONFIG_SLAB_OBJ_EXT */
#ifdef CONFIG_MEM_ALLOC_PROFILING
@@ -3155,7 +3286,9 @@ static inline bool shuffle_freelist(struct kmem_cache *s, struct slab *slab)
static __always_inline void account_slab(struct slab *slab, int order,
struct kmem_cache *s, gfp_t gfp)
{
- if (memcg_kmem_online() && (s->flags & SLAB_ACCOUNT))
+ if (memcg_kmem_online() &&
+ (s->flags & SLAB_ACCOUNT) &&
+ !slab_obj_exts(slab))
alloc_slab_obj_exts(slab, s, gfp, true);
mod_node_page_state(slab_pgdat(slab), cache_vmstat_idx(s),
@@ -3219,9 +3352,6 @@ static struct slab *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
slab->objects = oo_objects(oo);
slab->inuse = 0;
slab->frozen = 0;
- init_slab_obj_exts(slab);
-
- account_slab(slab, oo_order(oo), s, flags);
slab->slab_cache = s;
@@ -3230,6 +3360,13 @@ static struct slab *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
start = slab_address(slab);
setup_slab_debug(s, slab, start);
+ init_slab_obj_exts(slab);
+ /*
+ * Poison the slab before initializing the slabobj_ext array
+ * to prevent the array from being overwritten.
+ */
+ alloc_slab_obj_exts_early(s, slab);
+ account_slab(slab, oo_order(oo), s, flags);
shuffle = shuffle_freelist(s, slab);
--
2.43.0
^ permalink raw reply related [flat|nested] 34+ messages in thread* Re: [RFC PATCH V3 6/7] mm/slab: save memory by allocating slabobj_ext array from leftover
2025-10-27 12:28 ` [RFC PATCH V3 6/7] mm/slab: save memory by allocating slabobj_ext array from leftover Harry Yoo
@ 2025-10-29 3:07 ` Suren Baghdasaryan
2025-10-29 7:59 ` Harry Yoo
2025-10-29 18:45 ` Andrey Ryabinin
1 sibling, 1 reply; 34+ messages in thread
From: Suren Baghdasaryan @ 2025-10-29 3:07 UTC (permalink / raw)
To: Harry Yoo
Cc: akpm, vbabka, andreyknvl, cl, dvyukov, glider, hannes, linux-mm,
mhocko, muchun.song, rientjes, roman.gushchin, ryabinin.a.a,
shakeel.butt, vincenzo.frascino, yeoreum.yun, tytso,
adilger.kernel, linux-ext4, linux-kernel
On Mon, Oct 27, 2025 at 5:29 AM Harry Yoo <harry.yoo@oracle.com> wrote:
>
> The leftover space in a slab is always smaller than s->size, and
> kmem caches for large objects that are not power-of-two sizes tend to have
> a greater amount of leftover space per slab. In some cases, the leftover
> space is larger than the size of the slabobj_ext array for the slab.
>
> An excellent example of such a cache is ext4_inode_cache. On my system,
> the object size is 1144, with a preferred order of 3, 28 objects per slab,
> and 736 bytes of leftover space per slab.
>
> Since the size of the slabobj_ext array is only 224 bytes (w/o mem
> profiling) or 448 bytes (w/ mem profiling) per slab, the entire array
> fits within the leftover space.
>
> Allocate the slabobj_exts array from this unused space instead of using
> kcalloc(), when it is large enough. The array is always allocated when
> creating new slabs, because implementing lazy allocation correctly is
> difficult without expensive synchronization.
>
> To avoid unnecessary overhead when MEMCG (with SLAB_ACCOUNT) and
> MEM_ALLOC_PROFILING are not used for the cache, only allocate the
> slabobj_ext array only when either of them are enabled when slabs are
> created.
>
> [ MEMCG=y, MEM_ALLOC_PROFILING=n ]
>
> Before patch (creating 2M directories on ext4):
> Slab: 3575348 kB
> SReclaimable: 3137804 kB
> SUnreclaim: 437544 kB
>
> After patch (creating 2M directories on ext4):
> Slab: 3558236 kB
> SReclaimable: 3139268 kB
> SUnreclaim: 418968 kB (-18.14 MiB)
>
> Enjoy the memory savings!
>
> Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
> ---
> mm/slub.c | 147 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 142 insertions(+), 5 deletions(-)
>
> diff --git a/mm/slub.c b/mm/slub.c
> index 13acc9437ef5..8101df5fdccf 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -884,6 +884,94 @@ static inline unsigned int get_orig_size(struct kmem_cache *s, void *object)
> return *(unsigned int *)p;
> }
>
> +#ifdef CONFIG_SLAB_OBJ_EXT
> +
> +/*
> + * Check if memory cgroup or memory allocation profiling is enabled.
> + * If enabled, SLUB tries to reduce memory overhead of accounting
> + * slab objects. If neither is enabled when this function is called,
> + * the optimization is simply skipped to avoid affecting caches that do not
> + * need slabobj_ext metadata.
> + *
> + * However, this may disable optimization when memory cgroup or memory
> + * allocation profiling is used, but slabs are created too early
> + * even before those subsystems are initialized.
> + */
> +static inline bool need_slab_obj_exts(struct kmem_cache *s)
> +{
> + if (!mem_cgroup_disabled() && (s->flags & SLAB_ACCOUNT))
> + return true;
> +
> + if (mem_alloc_profiling_enabled())
> + return true;
> +
> + return false;
> +}
> +
> +static inline unsigned int obj_exts_size_in_slab(struct slab *slab)
> +{
> + return sizeof(struct slabobj_ext) * slab->objects;
> +}
> +
> +static inline unsigned long obj_exts_offset_in_slab(struct kmem_cache *s,
> + struct slab *slab)
> +{
> + unsigned long objext_offset;
> +
> + objext_offset = s->red_left_pad + s->size * slab->objects;
> + objext_offset = ALIGN(objext_offset, sizeof(struct slabobj_ext));
> + return objext_offset;
> +}
> +
> +static inline bool obj_exts_fit_within_slab_leftover(struct kmem_cache *s,
> + struct slab *slab)
> +{
> + unsigned long objext_offset = obj_exts_offset_in_slab(s, slab);
> + unsigned long objext_size = obj_exts_size_in_slab(slab);
> +
> + return objext_offset + objext_size <= slab_size(slab);
> +}
> +
> +static inline bool obj_exts_in_slab(struct kmem_cache *s, struct slab *slab)
> +{
> + unsigned long obj_exts;
> +
> + if (!obj_exts_fit_within_slab_leftover(s, slab))
> + return false;
> +
> + obj_exts = (unsigned long)slab_address(slab);
> + obj_exts += obj_exts_offset_in_slab(s, slab);
> + return obj_exts == slab_obj_exts(slab);
You can check that slab_obj_exts(slab) is not NULL before making the
above calculations.
> +}
> +#else
> +static inline bool need_slab_obj_exts(struct kmem_cache *s)
> +{
> + return false;
> +}
> +
> +static inline unsigned int obj_exts_size_in_slab(struct slab *slab)
> +{
> + return 0;
> +}
> +
> +static inline unsigned long obj_exts_offset_in_slab(struct kmem_cache *s,
> + struct slab *slab)
> +{
> + return 0;
> +}
> +
> +static inline bool obj_exts_fit_within_slab_leftover(struct kmem_cache *s,
> + struct slab *slab)
> +{
> + return false;
> +}
> +
> +static inline bool obj_exts_in_slab(struct kmem_cache *s, struct slab *slab)
> +{
> + return false;
> +}
> +#endif
> +
> #ifdef CONFIG_SLUB_DEBUG
>
> /*
> @@ -1404,7 +1492,15 @@ slab_pad_check(struct kmem_cache *s, struct slab *slab)
> start = slab_address(slab);
> length = slab_size(slab);
> end = start + length;
> - remainder = length % s->size;
> +
> + if (obj_exts_in_slab(s, slab)) {
> + remainder = length;
> + remainder -= obj_exts_offset_in_slab(s, slab);
> + remainder -= obj_exts_size_in_slab(slab);
> + } else {
> + remainder = length % s->size;
> + }
> +
> if (!remainder)
> return;
>
> @@ -2154,6 +2250,11 @@ static inline void free_slab_obj_exts(struct slab *slab)
> if (!obj_exts)
> return;
>
> + if (obj_exts_in_slab(slab->slab_cache, slab)) {
> + slab->obj_exts = 0;
> + return;
> + }
> +
> /*
> * obj_exts was created with __GFP_NO_OBJ_EXT flag, therefore its
> * corresponding extension will be NULL. alloc_tag_sub() will throw a
> @@ -2169,6 +2270,31 @@ static inline void free_slab_obj_exts(struct slab *slab)
> slab->obj_exts = 0;
> }
>
> +/*
> + * Try to allocate slabobj_ext array from unused space.
> + * This function must be called on a freshly allocated slab to prevent
> + * concurrency problems.
> + */
> +static void alloc_slab_obj_exts_early(struct kmem_cache *s, struct slab *slab)
> +{
> + void *addr;
> +
> + if (!need_slab_obj_exts(s))
> + return;
> +
> + metadata_access_enable();
> + if (obj_exts_fit_within_slab_leftover(s, slab)) {
> + addr = slab_address(slab) + obj_exts_offset_in_slab(s, slab);
> + addr = kasan_reset_tag(addr);
> + memset(addr, 0, obj_exts_size_in_slab(slab));
> + slab->obj_exts = (unsigned long)addr;
> + if (IS_ENABLED(CONFIG_MEMCG))
> + slab->obj_exts |= MEMCG_DATA_OBJEXTS;
> + slab_set_stride(slab, sizeof(struct slabobj_ext));
> + }
> + metadata_access_disable();
> +}
> +
> #else /* CONFIG_SLAB_OBJ_EXT */
>
> static inline void init_slab_obj_exts(struct slab *slab)
> @@ -2185,6 +2311,11 @@ static inline void free_slab_obj_exts(struct slab *slab)
> {
> }
>
> +static inline void alloc_slab_obj_exts_early(struct kmem_cache *s,
> + struct slab *slab)
> +{
> +}
> +
> #endif /* CONFIG_SLAB_OBJ_EXT */
>
> #ifdef CONFIG_MEM_ALLOC_PROFILING
> @@ -3155,7 +3286,9 @@ static inline bool shuffle_freelist(struct kmem_cache *s, struct slab *slab)
> static __always_inline void account_slab(struct slab *slab, int order,
> struct kmem_cache *s, gfp_t gfp)
> {
> - if (memcg_kmem_online() && (s->flags & SLAB_ACCOUNT))
> + if (memcg_kmem_online() &&
> + (s->flags & SLAB_ACCOUNT) &&
> + !slab_obj_exts(slab))
> alloc_slab_obj_exts(slab, s, gfp, true);
Don't you need to add a check for !obj_exts_in_slab() inside
alloc_slab_obj_exts() to avoid allocating slab->obj_exts?
>
> mod_node_page_state(slab_pgdat(slab), cache_vmstat_idx(s),
> @@ -3219,9 +3352,6 @@ static struct slab *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
> slab->objects = oo_objects(oo);slab_obj_exts
> slab->inuse = 0;
> slab->frozen = 0;
> - init_slab_obj_exts(slab);
> -
> - account_slab(slab, oo_order(oo), s, flags);
>
> slab->slab_cache = s;
>
> @@ -3230,6 +3360,13 @@ static struct slab *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
> start = slab_address(slab);
>
> setup_slab_debug(s, slab, start);
> + init_slab_obj_exts(slab);
> + /*
> + * Poison the slab before initializing the slabobj_ext array
> + * to prevent the array from being overwritten.
> + */
> + alloc_slab_obj_exts_early(s, slab);
> + account_slab(slab, oo_order(oo), s, flags);
alloc_slab_obj_exts() is called in 2 other places:
1. __memcg_slab_post_alloc_hook()
2. prepare_slab_obj_exts_hook()
Don't you need alloc_slab_obj_exts_early() there as well?
>
> shuffle = shuffle_freelist(s, slab);
>
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: [RFC PATCH V3 6/7] mm/slab: save memory by allocating slabobj_ext array from leftover
2025-10-29 3:07 ` Suren Baghdasaryan
@ 2025-10-29 7:59 ` Harry Yoo
2025-10-29 18:37 ` Suren Baghdasaryan
0 siblings, 1 reply; 34+ messages in thread
From: Harry Yoo @ 2025-10-29 7:59 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: akpm, vbabka, andreyknvl, cl, dvyukov, glider, hannes, linux-mm,
mhocko, muchun.song, rientjes, roman.gushchin, ryabinin.a.a,
shakeel.butt, vincenzo.frascino, yeoreum.yun, tytso,
adilger.kernel, linux-ext4, linux-kernel
On Tue, Oct 28, 2025 at 08:07:42PM -0700, Suren Baghdasaryan wrote:
> On Mon, Oct 27, 2025 at 5:29 AM Harry Yoo <harry.yoo@oracle.com> wrote:
> >
> > The leftover space in a slab is always smaller than s->size, and
> > kmem caches for large objects that are not power-of-two sizes tend to have
> > a greater amount of leftover space per slab. In some cases, the leftover
> > space is larger than the size of the slabobj_ext array for the slab.
> >
> > An excellent example of such a cache is ext4_inode_cache. On my system,
> > the object size is 1144, with a preferred order of 3, 28 objects per slab,
> > and 736 bytes of leftover space per slab.
> >
> > Since the size of the slabobj_ext array is only 224 bytes (w/o mem
> > profiling) or 448 bytes (w/ mem profiling) per slab, the entire array
> > fits within the leftover space.
> >
> > Allocate the slabobj_exts array from this unused space instead of using
> > kcalloc(), when it is large enough. The array is always allocated when
> > creating new slabs, because implementing lazy allocation correctly is
> > difficult without expensive synchronization.
> >
> > To avoid unnecessary overhead when MEMCG (with SLAB_ACCOUNT) and
> > MEM_ALLOC_PROFILING are not used for the cache, only allocate the
> > slabobj_ext array only when either of them are enabled when slabs are
> > created.
> >
> > [ MEMCG=y, MEM_ALLOC_PROFILING=n ]
> >
> > Before patch (creating 2M directories on ext4):
> > Slab: 3575348 kB
> > SReclaimable: 3137804 kB
> > SUnreclaim: 437544 kB
> >
> > After patch (creating 2M directories on ext4):
> > Slab: 3558236 kB
> > SReclaimable: 3139268 kB
> > SUnreclaim: 418968 kB (-18.14 MiB)
> >
> > Enjoy the memory savings!
> >
> > Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
> > ---
> > mm/slub.c | 147 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
> > 1 file changed, 142 insertions(+), 5 deletions(-)
> >
> > diff --git a/mm/slub.c b/mm/slub.c
> > index 13acc9437ef5..8101df5fdccf 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > +static inline bool obj_exts_in_slab(struct kmem_cache *s, struct slab *slab)
> > +{
> > + unsigned long obj_exts;
> > +
> > + if (!obj_exts_fit_within_slab_leftover(s, slab))
> > + return false;
> > +
> > + obj_exts = (unsigned long)slab_address(slab);
> > + obj_exts += obj_exts_offset_in_slab(s, slab);
> > + return obj_exts == slab_obj_exts(slab);
>
> You can check that slab_obj_exts(slab) is not NULL before making the
> above calculations.
Did you mean this?
if (!slab_obj_exts(slab))
return false;
If so, yes that makes sense.
> > @@ -2185,6 +2311,11 @@ static inline void free_slab_obj_exts(struct slab *slab)
> > {
> > }
> >
> > +static inline void alloc_slab_obj_exts_early(struct kmem_cache *s,
> > + struct slab *slab)
> > +{
> > +}
> > +
> > #endif /* CONFIG_SLAB_OBJ_EXT */
> >
> > #ifdef CONFIG_MEM_ALLOC_PROFILING
> > @@ -3155,7 +3286,9 @@ static inline bool shuffle_freelist(struct kmem_cache *s, struct slab *slab)
> > static __always_inline void account_slab(struct slab *slab, int order,
> > struct kmem_cache *s, gfp_t gfp)
> > {
> > - if (memcg_kmem_online() && (s->flags & SLAB_ACCOUNT))
> > + if (memcg_kmem_online() &&
> > + (s->flags & SLAB_ACCOUNT) &&
> > + !slab_obj_exts(slab))
> > alloc_slab_obj_exts(slab, s, gfp, true);
>
> Don't you need to add a check for !obj_exts_in_slab() inside
> alloc_slab_obj_exts() to avoid allocating slab->obj_exts?
slab_obj_exts() should have returned a nonzero value
and then we don't call alloc_slab_obj_exts()?
> > mod_node_page_state(slab_pgdat(slab), cache_vmstat_idx(s),
> > @@ -3219,9 +3352,6 @@ static struct slab *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
> > slab->objects = oo_objects(oo);slab_obj_exts
> > slab->inuse = 0;
> > slab->frozen = 0;
> > - init_slab_obj_exts(slab);
> > -
> > - account_slab(slab, oo_order(oo), s, flags);
> >
> > slab->slab_cache = s;
> >
> > @@ -3230,6 +3360,13 @@ static struct slab *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
> > start = slab_address(slab);
> >
> > setup_slab_debug(s, slab, start);
> > + init_slab_obj_exts(slab);
> > + /*
> > + * Poison the slab before initializing the slabobj_ext array
> > + * to prevent the array from being overwritten.
> > + */
> > + alloc_slab_obj_exts_early(s, slab);
> > + account_slab(slab, oo_order(oo), s, flags);
>
> alloc_slab_obj_exts() is called in 2 other places:
> 1. __memcg_slab_post_alloc_hook()
> 2. prepare_slab_obj_exts_hook()
>
> Don't you need alloc_slab_obj_exts_early() there as well?
That's good point, and I thought it's difficult to address
concurrency problem without using a per-slab lock.
Thread A Thread B
- sees slab->obj_exts == 0
- sees slab->obj_exts == 0
- allocates the vector from unused space
and initializes it.
- try cmpxchg()
- allocates the vector
from unused space and
initializes it.
(the vector is already
in use and it's overwritten!)
- try cmpxchg()
But since this is slowpath, using slab_{lock,unlock}() here is probably
fine. What do you think?
--
Cheers,
Harry / Hyeonggon
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: [RFC PATCH V3 6/7] mm/slab: save memory by allocating slabobj_ext array from leftover
2025-10-29 7:59 ` Harry Yoo
@ 2025-10-29 18:37 ` Suren Baghdasaryan
2025-10-30 0:40 ` Harry Yoo
0 siblings, 1 reply; 34+ messages in thread
From: Suren Baghdasaryan @ 2025-10-29 18:37 UTC (permalink / raw)
To: Harry Yoo
Cc: akpm, vbabka, andreyknvl, cl, dvyukov, glider, hannes, linux-mm,
mhocko, muchun.song, rientjes, roman.gushchin, ryabinin.a.a,
shakeel.butt, vincenzo.frascino, yeoreum.yun, tytso,
adilger.kernel, linux-ext4, linux-kernel
On Wed, Oct 29, 2025 at 1:00 AM Harry Yoo <harry.yoo@oracle.com> wrote:
>
> On Tue, Oct 28, 2025 at 08:07:42PM -0700, Suren Baghdasaryan wrote:
> > On Mon, Oct 27, 2025 at 5:29 AM Harry Yoo <harry.yoo@oracle.com> wrote:
> > >
> > > The leftover space in a slab is always smaller than s->size, and
> > > kmem caches for large objects that are not power-of-two sizes tend to have
> > > a greater amount of leftover space per slab. In some cases, the leftover
> > > space is larger than the size of the slabobj_ext array for the slab.
> > >
> > > An excellent example of such a cache is ext4_inode_cache. On my system,
> > > the object size is 1144, with a preferred order of 3, 28 objects per slab,
> > > and 736 bytes of leftover space per slab.
> > >
> > > Since the size of the slabobj_ext array is only 224 bytes (w/o mem
> > > profiling) or 448 bytes (w/ mem profiling) per slab, the entire array
> > > fits within the leftover space.
> > >
> > > Allocate the slabobj_exts array from this unused space instead of using
> > > kcalloc(), when it is large enough. The array is always allocated when
> > > creating new slabs, because implementing lazy allocation correctly is
> > > difficult without expensive synchronization.
> > >
> > > To avoid unnecessary overhead when MEMCG (with SLAB_ACCOUNT) and
> > > MEM_ALLOC_PROFILING are not used for the cache, only allocate the
> > > slabobj_ext array only when either of them are enabled when slabs are
> > > created.
> > >
> > > [ MEMCG=y, MEM_ALLOC_PROFILING=n ]
> > >
> > > Before patch (creating 2M directories on ext4):
> > > Slab: 3575348 kB
> > > SReclaimable: 3137804 kB
> > > SUnreclaim: 437544 kB
> > >
> > > After patch (creating 2M directories on ext4):
> > > Slab: 3558236 kB
> > > SReclaimable: 3139268 kB
> > > SUnreclaim: 418968 kB (-18.14 MiB)
> > >
> > > Enjoy the memory savings!
> > >
> > > Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
> > > ---
> > > mm/slub.c | 147 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
> > > 1 file changed, 142 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/mm/slub.c b/mm/slub.c
> > > index 13acc9437ef5..8101df5fdccf 100644
> > > --- a/mm/slub.c
> > > +++ b/mm/slub.c
> > > +static inline bool obj_exts_in_slab(struct kmem_cache *s, struct slab *slab)
> > > +{
> > > + unsigned long obj_exts;
> > > +
> > > + if (!obj_exts_fit_within_slab_leftover(s, slab))
> > > + return false;
> > > +
> > > + obj_exts = (unsigned long)slab_address(slab);
> > > + obj_exts += obj_exts_offset_in_slab(s, slab);
> > > + return obj_exts == slab_obj_exts(slab);
> >
> > You can check that slab_obj_exts(slab) is not NULL before making the
> > above calculations.
>
> Did you mean this?
>
> if (!slab_obj_exts(slab))
> return false;
Yes but you can store the returned value to reuse later in the last
"return obj_exts == slab_obj_exts(slab);" expression.
>
> If so, yes that makes sense.
>
> > > @@ -2185,6 +2311,11 @@ static inline void free_slab_obj_exts(struct slab *slab)
> > > {
> > > }
> > >
> > > +static inline void alloc_slab_obj_exts_early(struct kmem_cache *s,
> > > + struct slab *slab)
> > > +{
> > > +}
> > > +
> > > #endif /* CONFIG_SLAB_OBJ_EXT */
> > >
> > > #ifdef CONFIG_MEM_ALLOC_PROFILING
> > > @@ -3155,7 +3286,9 @@ static inline bool shuffle_freelist(struct kmem_cache *s, struct slab *slab)
> > > static __always_inline void account_slab(struct slab *slab, int order,
> > > struct kmem_cache *s, gfp_t gfp)
> > > {
> > > - if (memcg_kmem_online() && (s->flags & SLAB_ACCOUNT))
> > > + if (memcg_kmem_online() &&
> > > + (s->flags & SLAB_ACCOUNT) &&
> > > + !slab_obj_exts(slab))
> > > alloc_slab_obj_exts(slab, s, gfp, true);
> >
> > Don't you need to add a check for !obj_exts_in_slab() inside
> > alloc_slab_obj_exts() to avoid allocating slab->obj_exts?
>
> slab_obj_exts() should have returned a nonzero value
> and then we don't call alloc_slab_obj_exts()?
Sorry, I mean that you would need to check
obj_exts_fit_within_slab_leftover() inside alloc_slab_obj_exts() to
avoid allocating the vector when obj_exts can fit inside the slab
itself. This is because alloc_slab_obj_exts() can be called from other
places as well. However, from your next comment, I realize that your
intention might have been to keep those other callers intact and
allocate the vector separately even if the obj_exts could have been
squeezed inside the slab. Is that correct?
>
> > > mod_node_page_state(slab_pgdat(slab), cache_vmstat_idx(s),
> > > @@ -3219,9 +3352,6 @@ static struct slab *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
> > > slab->objects = oo_objects(oo);slab_obj_exts
> > > slab->inuse = 0;
> > > slab->frozen = 0;
> > > - init_slab_obj_exts(slab);
> > > -
> > > - account_slab(slab, oo_order(oo), s, flags);
> > >
> > > slab->slab_cache = s;
> > >
> > > @@ -3230,6 +3360,13 @@ static struct slab *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
> > > start = slab_address(slab);
> > >
> > > setup_slab_debug(s, slab, start);
> > > + init_slab_obj_exts(slab);
> > > + /*
> > > + * Poison the slab before initializing the slabobj_ext array
> > > + * to prevent the array from being overwritten.
> > > + */
> > > + alloc_slab_obj_exts_early(s, slab);
> > > + account_slab(slab, oo_order(oo), s, flags);
> >
> > alloc_slab_obj_exts() is called in 2 other places:
> > 1. __memcg_slab_post_alloc_hook()
> > 2. prepare_slab_obj_exts_hook()
> >
> > Don't you need alloc_slab_obj_exts_early() there as well?
>
> That's good point, and I thought it's difficult to address
> concurrency problem without using a per-slab lock.
>
> Thread A Thread B
> - sees slab->obj_exts == 0
> - sees slab->obj_exts == 0
> - allocates the vector from unused space
> and initializes it.
> - try cmpxchg()
> - allocates the vector
> from unused space and
> initializes it.
> (the vector is already
> in use and it's overwritten!)
>
> - try cmpxchg()
>
> But since this is slowpath, using slab_{lock,unlock}() here is probably
> fine. What do you think?
Ok, was your original intent to leave these callers as is and allocate
the vector like we do today even if obj_exts fit inside the slab?
>
> --
> Cheers,
> Harry / Hyeonggon
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: [RFC PATCH V3 6/7] mm/slab: save memory by allocating slabobj_ext array from leftover
2025-10-29 18:37 ` Suren Baghdasaryan
@ 2025-10-30 0:40 ` Harry Yoo
2025-10-30 16:33 ` Vlastimil Babka
0 siblings, 1 reply; 34+ messages in thread
From: Harry Yoo @ 2025-10-30 0:40 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: akpm, vbabka, andreyknvl, cl, dvyukov, glider, hannes, linux-mm,
mhocko, muchun.song, rientjes, roman.gushchin, ryabinin.a.a,
shakeel.butt, vincenzo.frascino, yeoreum.yun, tytso,
adilger.kernel, linux-ext4, linux-kernel
On Wed, Oct 29, 2025 at 11:37:27AM -0700, Suren Baghdasaryan wrote:
> On Wed, Oct 29, 2025 at 1:00 AM Harry Yoo <harry.yoo@oracle.com> wrote:
> >
> > On Tue, Oct 28, 2025 at 08:07:42PM -0700, Suren Baghdasaryan wrote:
> > > On Mon, Oct 27, 2025 at 5:29 AM Harry Yoo <harry.yoo@oracle.com> wrote:
> > > >
> > > > The leftover space in a slab is always smaller than s->size, and
> > > > kmem caches for large objects that are not power-of-two sizes tend to have
> > > > a greater amount of leftover space per slab. In some cases, the leftover
> > > > space is larger than the size of the slabobj_ext array for the slab.
> > > >
> > > > An excellent example of such a cache is ext4_inode_cache. On my system,
> > > > the object size is 1144, with a preferred order of 3, 28 objects per slab,
> > > > and 736 bytes of leftover space per slab.
> > > >
> > > > Since the size of the slabobj_ext array is only 224 bytes (w/o mem
> > > > profiling) or 448 bytes (w/ mem profiling) per slab, the entire array
> > > > fits within the leftover space.
> > > >
> > > > Allocate the slabobj_exts array from this unused space instead of using
> > > > kcalloc(), when it is large enough. The array is always allocated when
> > > > creating new slabs, because implementing lazy allocation correctly is
> > > > difficult without expensive synchronization.
> > > >
> > > > To avoid unnecessary overhead when MEMCG (with SLAB_ACCOUNT) and
> > > > MEM_ALLOC_PROFILING are not used for the cache, only allocate the
> > > > slabobj_ext array only when either of them are enabled when slabs are
> > > > created.
> > > >
> > > > [ MEMCG=y, MEM_ALLOC_PROFILING=n ]
> > > >
> > > > Before patch (creating 2M directories on ext4):
> > > > Slab: 3575348 kB
> > > > SReclaimable: 3137804 kB
> > > > SUnreclaim: 437544 kB
> > > >
> > > > After patch (creating 2M directories on ext4):
> > > > Slab: 3558236 kB
> > > > SReclaimable: 3139268 kB
> > > > SUnreclaim: 418968 kB (-18.14 MiB)
> > > >
> > > > Enjoy the memory savings!
> > > >
> > > > Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
> > > > ---
> > > > mm/slub.c | 147 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
> > > > 1 file changed, 142 insertions(+), 5 deletions(-)
> > > >
> > > > diff --git a/mm/slub.c b/mm/slub.c
> > > > index 13acc9437ef5..8101df5fdccf 100644
> > > > --- a/mm/slub.c
> > > > +++ b/mm/slub.c
> > > > +static inline bool obj_exts_in_slab(struct kmem_cache *s, struct slab *slab)
> > > > +{
> > > > + unsigned long obj_exts;
> > > > +
> > > > + if (!obj_exts_fit_within_slab_leftover(s, slab))
> > > > + return false;
> > > > +
> > > > + obj_exts = (unsigned long)slab_address(slab);
> > > > + obj_exts += obj_exts_offset_in_slab(s, slab);
> > > > + return obj_exts == slab_obj_exts(slab);
> > >
> > > You can check that slab_obj_exts(slab) is not NULL before making the
> > > above calculations.
> >
> > Did you mean this?
> >
> > if (!slab_obj_exts(slab))
> > return false;
>
> Yes but you can store the returned value to reuse later in the last
> "return obj_exts == slab_obj_exts(slab);" expression.
Okay, will do.
> > If so, yes that makes sense.
> >
> > > > @@ -2185,6 +2311,11 @@ static inline void free_slab_obj_exts(struct slab *slab)
> > > > {
> > > > }
> > > >
> > > > +static inline void alloc_slab_obj_exts_early(struct kmem_cache *s,
> > > > + struct slab *slab)
> > > > +{
> > > > +}
> > > > +
> > > > #endif /* CONFIG_SLAB_OBJ_EXT */
> > > >
> > > > #ifdef CONFIG_MEM_ALLOC_PROFILING
> > > > @@ -3155,7 +3286,9 @@ static inline bool shuffle_freelist(struct kmem_cache *s, struct slab *slab)
> > > > static __always_inline void account_slab(struct slab *slab, int order,
> > > > struct kmem_cache *s, gfp_t gfp)
> > > > {
> > > > - if (memcg_kmem_online() && (s->flags & SLAB_ACCOUNT))
> > > > + if (memcg_kmem_online() &&
> > > > + (s->flags & SLAB_ACCOUNT) &&
> > > > + !slab_obj_exts(slab))
> > > > alloc_slab_obj_exts(slab, s, gfp, true);
> > >
> > > Don't you need to add a check for !obj_exts_in_slab() inside
> > > alloc_slab_obj_exts() to avoid allocating slab->obj_exts?
> >
> > slab_obj_exts() should have returned a nonzero value
> > and then we don't call alloc_slab_obj_exts()?
>
> Sorry, I mean that you would need to check
> obj_exts_fit_within_slab_leftover() inside alloc_slab_obj_exts() to
> avoid allocating the vector when obj_exts can fit inside the slab
> itself. This is because alloc_slab_obj_exts() can be called from other
> places as well. However, from your next comment, I realize that your
> intention might have been to keep those other callers intact and
> allocate the vector separately even if the obj_exts could have been
> squeezed inside the slab. Is that correct?
Yes, that's correct!
> > > > mod_node_page_state(slab_pgdat(slab), cache_vmstat_idx(s),
> > > > @@ -3219,9 +3352,6 @@ static struct slab *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
> > > > slab->objects = oo_objects(oo);slab_obj_exts
> > > > slab->inuse = 0;
> > > > slab->frozen = 0;
> > > > - init_slab_obj_exts(slab);
> > > > -
> > > > - account_slab(slab, oo_order(oo), s, flags);
> > > >
> > > > slab->slab_cache = s;
> > > >
> > > > @@ -3230,6 +3360,13 @@ static struct slab *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
> > > > start = slab_address(slab);
> > > >
> > > > setup_slab_debug(s, slab, start);
> > > > + init_slab_obj_exts(slab);
> > > > + /*
> > > > + * Poison the slab before initializing the slabobj_ext array
> > > > + * to prevent the array from being overwritten.
> > > > + */
> > > > + alloc_slab_obj_exts_early(s, slab);
> > > > + account_slab(slab, oo_order(oo), s, flags);
> > >
> > > alloc_slab_obj_exts() is called in 2 other places:
> > > 1. __memcg_slab_post_alloc_hook()
> > > 2. prepare_slab_obj_exts_hook()
> > >
> > > Don't you need alloc_slab_obj_exts_early() there as well?
> >
> > That's good point, and I thought it's difficult to address
> > concurrency problem without using a per-slab lock.
> >
> > Thread A Thread B
> > - sees slab->obj_exts == 0
> > - sees slab->obj_exts == 0
> > - allocates the vector from unused space
> > and initializes it.
> > - try cmpxchg()
> > - allocates the vector
> > from unused space and
> > initializes it.
> > (the vector is already
> > in use and it's overwritten!)
> >
> > - try cmpxchg()
> >
> > But since this is slowpath, using slab_{lock,unlock}() here is probably
> > fine. What do you think?
>
> Ok, was your original intent to leave these callers as is and allocate
> the vector like we do today even if obj_exts fit inside the slab?
Yes that's what I intended, and maybe later we could allocate the vector
from the unused space even after the slab is allocated, as long as
it doesn't hurt performance.
> >
> > --
> > Cheers,
> > Harry / Hyeonggon
--
Cheers,
Harry / Hyeonggon
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: [RFC PATCH V3 6/7] mm/slab: save memory by allocating slabobj_ext array from leftover
2025-10-30 0:40 ` Harry Yoo
@ 2025-10-30 16:33 ` Vlastimil Babka
0 siblings, 0 replies; 34+ messages in thread
From: Vlastimil Babka @ 2025-10-30 16:33 UTC (permalink / raw)
To: Harry Yoo, Suren Baghdasaryan
Cc: akpm, andreyknvl, cl, dvyukov, glider, hannes, linux-mm, mhocko,
muchun.song, rientjes, roman.gushchin, ryabinin.a.a, shakeel.butt,
vincenzo.frascino, yeoreum.yun, tytso, adilger.kernel, linux-ext4,
linux-kernel
On 10/30/25 01:40, Harry Yoo wrote:
> On Wed, Oct 29, 2025 at 11:37:27AM -0700, Suren Baghdasaryan wrote:
>> > > > mod_node_page_state(slab_pgdat(slab), cache_vmstat_idx(s),
>> > > > @@ -3219,9 +3352,6 @@ static struct slab *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
>> > > > slab->objects = oo_objects(oo);slab_obj_exts
>> > > > slab->inuse = 0;
>> > > > slab->frozen = 0;
>> > > > - init_slab_obj_exts(slab);
>> > > > -
>> > > > - account_slab(slab, oo_order(oo), s, flags);
>> > > >
>> > > > slab->slab_cache = s;
>> > > >
>> > > > @@ -3230,6 +3360,13 @@ static struct slab *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
>> > > > start = slab_address(slab);
>> > > >
>> > > > setup_slab_debug(s, slab, start);
>> > > > + init_slab_obj_exts(slab);
>> > > > + /*
>> > > > + * Poison the slab before initializing the slabobj_ext array
>> > > > + * to prevent the array from being overwritten.
>> > > > + */
>> > > > + alloc_slab_obj_exts_early(s, slab);
>> > > > + account_slab(slab, oo_order(oo), s, flags);
>> > >
>> > > alloc_slab_obj_exts() is called in 2 other places:
>> > > 1. __memcg_slab_post_alloc_hook()
>> > > 2. prepare_slab_obj_exts_hook()
>> > >
>> > > Don't you need alloc_slab_obj_exts_early() there as well?
>> >
>> > That's good point, and I thought it's difficult to address
>> > concurrency problem without using a per-slab lock.
>> >
>> > Thread A Thread B
>> > - sees slab->obj_exts == 0
>> > - sees slab->obj_exts == 0
>> > - allocates the vector from unused space
>> > and initializes it.
>> > - try cmpxchg()
>> > - allocates the vector
>> > from unused space and
>> > initializes it.
>> > (the vector is already
>> > in use and it's overwritten!)
>> >
>> > - try cmpxchg()
>> >
>> > But since this is slowpath, using slab_{lock,unlock}() here is probably
>> > fine. What do you think?
>>
>> Ok, was your original intent to leave these callers as is and allocate
>> the vector like we do today even if obj_exts fit inside the slab?
>
> Yes that's what I intended, and maybe later we could allocate the vector
> from the unused space even after the slab is allocated, as long as
> it doesn't hurt performance.
It would be nice. I guess what can happen is there's a cache without
SLAB_ACCOUNT but then some allocations from that will use __GFP_ACCOUNT and
we need to allocate obj_exts on-demand, right?
>> >
>> > --
>> > Cheers,
>> > Harry / Hyeonggon
>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC PATCH V3 6/7] mm/slab: save memory by allocating slabobj_ext array from leftover
2025-10-27 12:28 ` [RFC PATCH V3 6/7] mm/slab: save memory by allocating slabobj_ext array from leftover Harry Yoo
2025-10-29 3:07 ` Suren Baghdasaryan
@ 2025-10-29 18:45 ` Andrey Ryabinin
2025-10-30 1:11 ` Harry Yoo
1 sibling, 1 reply; 34+ messages in thread
From: Andrey Ryabinin @ 2025-10-29 18:45 UTC (permalink / raw)
To: Harry Yoo, akpm, vbabka
Cc: andreyknvl, cl, dvyukov, glider, hannes, linux-mm, mhocko,
muchun.song, rientjes, roman.gushchin, shakeel.butt, surenb,
vincenzo.frascino, yeoreum.yun, tytso, adilger.kernel, linux-ext4,
linux-kernel
On 10/27/25 1:28 PM, Harry Yoo wrote:
>
> +#ifdef CONFIG_SLAB_OBJ_EXT
> +
> +/*
> + * Check if memory cgroup or memory allocation profiling is enabled.
> + * If enabled, SLUB tries to reduce memory overhead of accounting
> + * slab objects. If neither is enabled when this function is called,
> + * the optimization is simply skipped to avoid affecting caches that do not
> + * need slabobj_ext metadata.
> + *
> + * However, this may disable optimization when memory cgroup or memory
> + * allocation profiling is used, but slabs are created too early
> + * even before those subsystems are initialized.
> + */
> +static inline bool need_slab_obj_exts(struct kmem_cache *s)
> +{
> + if (!mem_cgroup_disabled() && (s->flags & SLAB_ACCOUNT))
Shouldn't this be !memcg_kmem_online() check?
In case of disabled kmem accounting via 'cgroup.memory=nokmem'
> + return true;
> +
> + if (mem_alloc_profiling_enabled())
> + return true;
> +
> + return false;
> +}
> +
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: [RFC PATCH V3 6/7] mm/slab: save memory by allocating slabobj_ext array from leftover
2025-10-29 18:45 ` Andrey Ryabinin
@ 2025-10-30 1:11 ` Harry Yoo
0 siblings, 0 replies; 34+ messages in thread
From: Harry Yoo @ 2025-10-30 1:11 UTC (permalink / raw)
To: Andrey Ryabinin
Cc: akpm, vbabka, andreyknvl, cl, dvyukov, glider, hannes, linux-mm,
mhocko, muchun.song, rientjes, roman.gushchin, shakeel.butt,
surenb, vincenzo.frascino, yeoreum.yun, tytso, adilger.kernel,
linux-ext4, linux-kernel
On Wed, Oct 29, 2025 at 07:45:32PM +0100, Andrey Ryabinin wrote:
>
>
> On 10/27/25 1:28 PM, Harry Yoo wrote:
>
> >
> > +#ifdef CONFIG_SLAB_OBJ_EXT
> > +
> > +/*
> > + * Check if memory cgroup or memory allocation profiling is enabled.
> > + * If enabled, SLUB tries to reduce memory overhead of accounting
> > + * slab objects. If neither is enabled when this function is called,
> > + * the optimization is simply skipped to avoid affecting caches that do not
> > + * need slabobj_ext metadata.
> > + *
> > + * However, this may disable optimization when memory cgroup or memory
> > + * allocation profiling is used, but slabs are created too early
> > + * even before those subsystems are initialized.
> > + */
> > +static inline bool need_slab_obj_exts(struct kmem_cache *s)
> > +{
> > + if (!mem_cgroup_disabled() && (s->flags & SLAB_ACCOUNT))
>
> Shouldn't this be !memcg_kmem_online() check?
> In case of disabled kmem accounting via 'cgroup.memory=nokmem'
Good catch. Will fix, thanks!
> > + return true;
> > +
> > + if (mem_alloc_profiling_enabled())
> > + return true;
> > +
> > + return false;
> > +}
> > +
--
Cheers,
Harry / Hyeonggon
^ permalink raw reply [flat|nested] 34+ messages in thread
* [RFC PATCH V3 7/7] mm/slab: place slabobj_ext metadata in unused space within s->size
2025-10-27 12:28 [RFC PATCH V3 0/7] mm/slab: reduce slab accounting memory overhead by allocating slabobj_ext metadata within unused slab space Harry Yoo
` (5 preceding siblings ...)
2025-10-27 12:28 ` [RFC PATCH V3 6/7] mm/slab: save memory by allocating slabobj_ext array from leftover Harry Yoo
@ 2025-10-27 12:28 ` Harry Yoo
2025-10-29 3:19 ` Suren Baghdasaryan
2025-10-29 18:19 ` Andrey Ryabinin
2025-10-30 16:39 ` [RFC PATCH V3 0/7] mm/slab: reduce slab accounting memory overhead by allocating slabobj_ext metadata within unused slab space Vlastimil Babka
7 siblings, 2 replies; 34+ messages in thread
From: Harry Yoo @ 2025-10-27 12:28 UTC (permalink / raw)
To: akpm, vbabka
Cc: andreyknvl, cl, dvyukov, glider, hannes, linux-mm, mhocko,
muchun.song, rientjes, roman.gushchin, ryabinin.a.a, shakeel.butt,
surenb, vincenzo.frascino, yeoreum.yun, harry.yoo, tytso,
adilger.kernel, linux-ext4, linux-kernel
When a cache has high s->align value and s->object_size is not aligned
to it, each object ends up with some unused space because of alignment.
If this wasted space is big enough, we can use it to store the
slabobj_ext metadata instead of wasting it.
On my system, this happens with caches like kmem_cache, mm_struct, pid,
task_struct, sighand_cache, xfs_inode, and others.
To place the slabobj_ext metadata within each object, the existing
slab_obj_ext() logic can still be used by setting:
- slab->obj_exts = slab_address(slab) + s->red_left_zone +
(slabobj_ext offset)
- stride = s->size
slab_obj_ext() doesn't need know where the metadata is stored,
so this method works without adding extra overhead to slab_obj_ext().
A good example benefiting from this optimization is xfs_inode
(object_size: 992, align: 64). To measure memory savings, 2 millions of
files were created on XFS.
[ MEMCG=y, MEM_ALLOC_PROFILING=n ]
Before patch (creating 2M directories on xfs):
Slab: 6693844 kB
SReclaimable: 6016332 kB
SUnreclaim: 677512 kB
After patch (creating 2M directories on xfs):
Slab: 6697572 kB
SReclaimable: 6034744 kB
SUnreclaim: 662828 kB (-14.3 MiB)
Enjoy the memory savings!
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
---
include/linux/slab.h | 9 ++++++
mm/slab_common.c | 6 ++--
mm/slub.c | 72 ++++++++++++++++++++++++++++++++++++++++++--
3 files changed, 82 insertions(+), 5 deletions(-)
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 561597dd2164..fd09674cc117 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -59,6 +59,9 @@ enum _slab_flag_bits {
_SLAB_CMPXCHG_DOUBLE,
#ifdef CONFIG_SLAB_OBJ_EXT
_SLAB_NO_OBJ_EXT,
+#endif
+#if defined(CONFIG_SLAB_OBJ_EXT) && defined(CONFIG_64BIT)
+ _SLAB_OBJ_EXT_IN_OBJ,
#endif
_SLAB_FLAGS_LAST_BIT
};
@@ -244,6 +247,12 @@ enum _slab_flag_bits {
#define SLAB_NO_OBJ_EXT __SLAB_FLAG_UNUSED
#endif
+#if defined(CONFIG_SLAB_OBJ_EXT) && defined(CONFIG_64BIT)
+#define SLAB_OBJ_EXT_IN_OBJ __SLAB_FLAG_BIT(_SLAB_OBJ_EXT_IN_OBJ)
+#else
+#define SLAB_OBJ_EXT_IN_OBJ __SLAB_FLAG_UNUSED
+#endif
+
/*
* ZERO_SIZE_PTR will be returned for zero sized kmalloc requests.
*
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 2c2ed2452271..bfe2f498e622 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -43,11 +43,13 @@ DEFINE_MUTEX(slab_mutex);
struct kmem_cache *kmem_cache;
/*
- * Set of flags that will prevent slab merging
+ * Set of flags that will prevent slab merging.
+ * Any flag that adds per-object metadata should be included,
+ * since slab merging can update s->inuse that affects the metadata layout.
*/
#define SLAB_NEVER_MERGE (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER | \
SLAB_TRACE | SLAB_TYPESAFE_BY_RCU | SLAB_NOLEAKTRACE | \
- SLAB_FAILSLAB | SLAB_NO_MERGE)
+ SLAB_FAILSLAB | SLAB_NO_MERGE | SLAB_OBJ_EXT_IN_OBJ)
#define SLAB_MERGE_SAME (SLAB_RECLAIM_ACCOUNT | SLAB_CACHE_DMA | \
SLAB_CACHE_DMA32 | SLAB_ACCOUNT)
diff --git a/mm/slub.c b/mm/slub.c
index 8101df5fdccf..7de6e8f8f8c2 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -970,6 +970,40 @@ static inline bool obj_exts_in_slab(struct kmem_cache *s, struct slab *slab)
{
return false;
}
+
+#endif
+
+#if defined(CONFIG_SLAB_OBJ_EXT) && defined(CONFIG_64BIT)
+static bool obj_exts_in_object(struct kmem_cache *s)
+{
+ return s->flags & SLAB_OBJ_EXT_IN_OBJ;
+}
+
+static unsigned int obj_exts_offset_in_object(struct kmem_cache *s)
+{
+ unsigned int offset = get_info_end(s);
+
+ if (kmem_cache_debug_flags(s, SLAB_STORE_USER))
+ offset += sizeof(struct track) * 2;
+
+ if (slub_debug_orig_size(s))
+ offset += ALIGN(sizeof(unsigned int),
+ __alignof__(unsigned long));
+
+ offset += kasan_metadata_size(s, false);
+
+ return offset;
+}
+#else
+static inline bool obj_exts_in_object(struct kmem_cache *s)
+{
+ return false;
+}
+
+static inline unsigned int obj_exts_offset_in_object(struct kmem_cache *s)
+{
+ return 0;
+}
#endif
#ifdef CONFIG_SLUB_DEBUG
@@ -1270,6 +1304,9 @@ static void print_trailer(struct kmem_cache *s, struct slab *slab, u8 *p)
off += kasan_metadata_size(s, false);
+ if (obj_exts_in_object(s))
+ off += sizeof(struct slabobj_ext);
+
if (off != size_from_object(s))
/* Beginning of the filler is the free pointer */
print_section(KERN_ERR, "Padding ", p + off,
@@ -1439,7 +1476,10 @@ check_bytes_and_report(struct kmem_cache *s, struct slab *slab,
* A. Free pointer (if we cannot overwrite object on free)
* B. Tracking data for SLAB_STORE_USER
* C. Original request size for kmalloc object (SLAB_STORE_USER enabled)
- * D. Padding to reach required alignment boundary or at minimum
+ * D. KASAN alloc metadata (KASAN enabled)
+ * E. struct slabobj_ext to store accounting metadata
+ * (SLAB_OBJ_EXT_IN_OBJ enabled)
+ * F. Padding to reach required alignment boundary or at minimum
* one word if debugging is on to be able to detect writes
* before the word boundary.
*
@@ -1468,6 +1508,9 @@ static int check_pad_bytes(struct kmem_cache *s, struct slab *slab, u8 *p)
off += kasan_metadata_size(s, false);
+ if (obj_exts_in_object(s))
+ off += sizeof(struct slabobj_ext);
+
if (size_from_object(s) == off)
return 1;
@@ -2250,7 +2293,8 @@ static inline void free_slab_obj_exts(struct slab *slab)
if (!obj_exts)
return;
- if (obj_exts_in_slab(slab->slab_cache, slab)) {
+ if (obj_exts_in_slab(slab->slab_cache, slab) ||
+ obj_exts_in_object(slab->slab_cache)) {
slab->obj_exts = 0;
return;
}
@@ -2291,6 +2335,21 @@ static void alloc_slab_obj_exts_early(struct kmem_cache *s, struct slab *slab)
if (IS_ENABLED(CONFIG_MEMCG))
slab->obj_exts |= MEMCG_DATA_OBJEXTS;
slab_set_stride(slab, sizeof(struct slabobj_ext));
+ } else if (obj_exts_in_object(s)) {
+ unsigned int offset = obj_exts_offset_in_object(s);
+
+ slab->obj_exts = (unsigned long)slab_address(slab);
+ slab->obj_exts += s->red_left_pad;
+ slab->obj_exts += obj_exts_offset_in_object(s);
+ if (IS_ENABLED(CONFIG_MEMCG))
+ slab->obj_exts |= MEMCG_DATA_OBJEXTS;
+ slab_set_stride(slab, s->size);
+
+ for_each_object(addr, s, slab_address(slab), slab->objects) {
+ kasan_unpoison_range(addr + offset,
+ sizeof(struct slabobj_ext));
+ memset(addr + offset, 0, sizeof(struct slabobj_ext));
+ }
}
metadata_access_disable();
}
@@ -7883,6 +7942,7 @@ static int calculate_sizes(struct kmem_cache_args *args, struct kmem_cache *s)
{
slab_flags_t flags = s->flags;
unsigned int size = s->object_size;
+ unsigned int aligned_size;
unsigned int order;
/*
@@ -7997,7 +8057,13 @@ static int calculate_sizes(struct kmem_cache_args *args, struct kmem_cache *s)
* offset 0. In order to align the objects we have to simply size
* each object to conform to the alignment.
*/
- size = ALIGN(size, s->align);
+ aligned_size = ALIGN(size, s->align);
+#if defined(CONFIG_SLAB_OBJ_EXT) && defined(CONFIG_64BIT)
+ if (aligned_size - size >= sizeof(struct slabobj_ext))
+ s->flags |= SLAB_OBJ_EXT_IN_OBJ;
+#endif
+ size = aligned_size;
+
s->size = size;
s->reciprocal_size = reciprocal_value(size);
order = calculate_order(size);
--
2.43.0
^ permalink raw reply related [flat|nested] 34+ messages in thread* Re: [RFC PATCH V3 7/7] mm/slab: place slabobj_ext metadata in unused space within s->size
2025-10-27 12:28 ` [RFC PATCH V3 7/7] mm/slab: place slabobj_ext metadata in unused space within s->size Harry Yoo
@ 2025-10-29 3:19 ` Suren Baghdasaryan
2025-10-29 18:19 ` Andrey Ryabinin
1 sibling, 0 replies; 34+ messages in thread
From: Suren Baghdasaryan @ 2025-10-29 3:19 UTC (permalink / raw)
To: Harry Yoo
Cc: akpm, vbabka, andreyknvl, cl, dvyukov, glider, hannes, linux-mm,
mhocko, muchun.song, rientjes, roman.gushchin, ryabinin.a.a,
shakeel.butt, vincenzo.frascino, yeoreum.yun, tytso,
adilger.kernel, linux-ext4, linux-kernel
On Mon, Oct 27, 2025 at 5:29 AM Harry Yoo <harry.yoo@oracle.com> wrote:
>
> When a cache has high s->align value and s->object_size is not aligned
> to it, each object ends up with some unused space because of alignment.
> If this wasted space is big enough, we can use it to store the
> slabobj_ext metadata instead of wasting it.
>
> On my system, this happens with caches like kmem_cache, mm_struct, pid,
> task_struct, sighand_cache, xfs_inode, and others.
>
> To place the slabobj_ext metadata within each object, the existing
> slab_obj_ext() logic can still be used by setting:
>
> - slab->obj_exts = slab_address(slab) + s->red_left_zone +
> (slabobj_ext offset)
> - stride = s->size
>
> slab_obj_ext() doesn't need know where the metadata is stored,
> so this method works without adding extra overhead to slab_obj_ext().
>
> A good example benefiting from this optimization is xfs_inode
> (object_size: 992, align: 64). To measure memory savings, 2 millions of
> files were created on XFS.
>
> [ MEMCG=y, MEM_ALLOC_PROFILING=n ]
>
> Before patch (creating 2M directories on xfs):
> Slab: 6693844 kB
> SReclaimable: 6016332 kB
> SUnreclaim: 677512 kB
>
> After patch (creating 2M directories on xfs):
> Slab: 6697572 kB
> SReclaimable: 6034744 kB
> SUnreclaim: 662828 kB (-14.3 MiB)
>
> Enjoy the memory savings!
>
> Suggested-by: Vlastimil Babka <vbabka@suse.cz>
> Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
> ---
> include/linux/slab.h | 9 ++++++
> mm/slab_common.c | 6 ++--
> mm/slub.c | 72 ++++++++++++++++++++++++++++++++++++++++++--
> 3 files changed, 82 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index 561597dd2164..fd09674cc117 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -59,6 +59,9 @@ enum _slab_flag_bits {
> _SLAB_CMPXCHG_DOUBLE,
> #ifdef CONFIG_SLAB_OBJ_EXT
> _SLAB_NO_OBJ_EXT,
> +#endif
> +#if defined(CONFIG_SLAB_OBJ_EXT) && defined(CONFIG_64BIT)
> + _SLAB_OBJ_EXT_IN_OBJ,
> #endif
> _SLAB_FLAGS_LAST_BIT
> };
> @@ -244,6 +247,12 @@ enum _slab_flag_bits {
> #define SLAB_NO_OBJ_EXT __SLAB_FLAG_UNUSED
> #endif
>
> +#if defined(CONFIG_SLAB_OBJ_EXT) && defined(CONFIG_64BIT)
> +#define SLAB_OBJ_EXT_IN_OBJ __SLAB_FLAG_BIT(_SLAB_OBJ_EXT_IN_OBJ)
> +#else
> +#define SLAB_OBJ_EXT_IN_OBJ __SLAB_FLAG_UNUSED
> +#endif
> +
> /*
> * ZERO_SIZE_PTR will be returned for zero sized kmalloc requests.
> *
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 2c2ed2452271..bfe2f498e622 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -43,11 +43,13 @@ DEFINE_MUTEX(slab_mutex);
> struct kmem_cache *kmem_cache;
>
> /*
> - * Set of flags that will prevent slab merging
> + * Set of flags that will prevent slab merging.
> + * Any flag that adds per-object metadata should be included,
> + * since slab merging can update s->inuse that affects the metadata layout.
> */
> #define SLAB_NEVER_MERGE (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER | \
> SLAB_TRACE | SLAB_TYPESAFE_BY_RCU | SLAB_NOLEAKTRACE | \
> - SLAB_FAILSLAB | SLAB_NO_MERGE)
> + SLAB_FAILSLAB | SLAB_NO_MERGE | SLAB_OBJ_EXT_IN_OBJ)
>
> #define SLAB_MERGE_SAME (SLAB_RECLAIM_ACCOUNT | SLAB_CACHE_DMA | \
> SLAB_CACHE_DMA32 | SLAB_ACCOUNT)
> diff --git a/mm/slub.c b/mm/slub.c
> index 8101df5fdccf..7de6e8f8f8c2 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -970,6 +970,40 @@ static inline bool obj_exts_in_slab(struct kmem_cache *s, struct slab *slab)
> {
> return false;
> }
> +
> +#endif
> +
> +#if defined(CONFIG_SLAB_OBJ_EXT) && defined(CONFIG_64BIT)
> +static bool obj_exts_in_object(struct kmem_cache *s)
> +{
> + return s->flags & SLAB_OBJ_EXT_IN_OBJ;
> +}
> +
> +static unsigned int obj_exts_offset_in_object(struct kmem_cache *s)
> +{
> + unsigned int offset = get_info_end(s);
> +
> + if (kmem_cache_debug_flags(s, SLAB_STORE_USER))
> + offset += sizeof(struct track) * 2;
> +
> + if (slub_debug_orig_size(s))
> + offset += ALIGN(sizeof(unsigned int),
> + __alignof__(unsigned long));
> +
> + offset += kasan_metadata_size(s, false);
> +
> + return offset;
> +}
> +#else
> +static inline bool obj_exts_in_object(struct kmem_cache *s)
> +{
> + return false;
> +}
> +
> +static inline unsigned int obj_exts_offset_in_object(struct kmem_cache *s)
> +{
> + return 0;
> +}
> #endif
>
> #ifdef CONFIG_SLUB_DEBUG
> @@ -1270,6 +1304,9 @@ static void print_trailer(struct kmem_cache *s, struct slab *slab, u8 *p)
>
> off += kasan_metadata_size(s, false);
>
> + if (obj_exts_in_object(s))
> + off += sizeof(struct slabobj_ext);
> +
> if (off != size_from_object(s))
> /* Beginning of the filler is the free pointer */
> print_section(KERN_ERR, "Padding ", p + off,
> @@ -1439,7 +1476,10 @@ check_bytes_and_report(struct kmem_cache *s, struct slab *slab,
> * A. Free pointer (if we cannot overwrite object on free)
> * B. Tracking data for SLAB_STORE_USER
> * C. Original request size for kmalloc object (SLAB_STORE_USER enabled)
> - * D. Padding to reach required alignment boundary or at minimum
> + * D. KASAN alloc metadata (KASAN enabled)
> + * E. struct slabobj_ext to store accounting metadata
> + * (SLAB_OBJ_EXT_IN_OBJ enabled)
> + * F. Padding to reach required alignment boundary or at minimum
> * one word if debugging is on to be able to detect writes
> * before the word boundary.
> *
> @@ -1468,6 +1508,9 @@ static int check_pad_bytes(struct kmem_cache *s, struct slab *slab, u8 *p)
>
> off += kasan_metadata_size(s, false);
>
> + if (obj_exts_in_object(s))
> + off += sizeof(struct slabobj_ext);
> +
> if (size_from_object(s) == off)
> return 1;
>
> @@ -2250,7 +2293,8 @@ static inline void free_slab_obj_exts(struct slab *slab)
> if (!obj_exts)
> return;
>
> - if (obj_exts_in_slab(slab->slab_cache, slab)) {
> + if (obj_exts_in_slab(slab->slab_cache, slab) ||
> + obj_exts_in_object(slab->slab_cache)) {
I think you need a check for obj_exts_in_object() inside
alloc_slab_obj_exts() to avoid allocating the vector.
> slab->obj_exts = 0;
> return;
> }
> @@ -2291,6 +2335,21 @@ static void alloc_slab_obj_exts_early(struct kmem_cache *s, struct slab *slab)
> if (IS_ENABLED(CONFIG_MEMCG))
> slab->obj_exts |= MEMCG_DATA_OBJEXTS;
> slab_set_stride(slab, sizeof(struct slabobj_ext));
> + } else if (obj_exts_in_object(s)) {
> + unsigned int offset = obj_exts_offset_in_object(s);
> +
> + slab->obj_exts = (unsigned long)slab_address(slab);
> + slab->obj_exts += s->red_left_pad;
> + slab->obj_exts += obj_exts_offset_in_object(s);
> + if (IS_ENABLED(CONFIG_MEMCG))
> + slab->obj_exts |= MEMCG_DATA_OBJEXTS;
> + slab_set_stride(slab, s->size);
> +
> + for_each_object(addr, s, slab_address(slab), slab->objects) {
> + kasan_unpoison_range(addr + offset,
> + sizeof(struct slabobj_ext));
> + memset(addr + offset, 0, sizeof(struct slabobj_ext));
> + }
> }
> metadata_access_disable();
> }
> @@ -7883,6 +7942,7 @@ static int calculate_sizes(struct kmem_cache_args *args, struct kmem_cache *s)
> {
> slab_flags_t flags = s->flags;
> unsigned int size = s->object_size;
> + unsigned int aligned_size;
> unsigned int order;
>
> /*
> @@ -7997,7 +8057,13 @@ static int calculate_sizes(struct kmem_cache_args *args, struct kmem_cache *s)
> * offset 0. In order to align the objects we have to simply size
> * each object to conform to the alignment.
> */
> - size = ALIGN(size, s->align);
> + aligned_size = ALIGN(size, s->align);
> +#if defined(CONFIG_SLAB_OBJ_EXT) && defined(CONFIG_64BIT)
> + if (aligned_size - size >= sizeof(struct slabobj_ext))
> + s->flags |= SLAB_OBJ_EXT_IN_OBJ;
> +#endif
> + size = aligned_size;
> +
> s->size = size;
> s->reciprocal_size = reciprocal_value(size);
> order = calculate_order(size);
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: [RFC PATCH V3 7/7] mm/slab: place slabobj_ext metadata in unused space within s->size
2025-10-27 12:28 ` [RFC PATCH V3 7/7] mm/slab: place slabobj_ext metadata in unused space within s->size Harry Yoo
2025-10-29 3:19 ` Suren Baghdasaryan
@ 2025-10-29 18:19 ` Andrey Ryabinin
2025-10-30 0:51 ` Harry Yoo
1 sibling, 1 reply; 34+ messages in thread
From: Andrey Ryabinin @ 2025-10-29 18:19 UTC (permalink / raw)
To: Harry Yoo, akpm, vbabka
Cc: andreyknvl, cl, dvyukov, glider, hannes, linux-mm, mhocko,
muchun.song, rientjes, roman.gushchin, shakeel.butt, surenb,
vincenzo.frascino, yeoreum.yun, tytso, adilger.kernel, linux-ext4,
linux-kernel
On 10/27/25 1:28 PM, Harry Yoo wrote:
> slab->obj_exts |= MEMCG_DATA_OBJEXTS;
> slab_set_stride(slab, sizeof(struct slabobj_ext));
> + } else if (obj_exts_in_object(s)) {
> + unsigned int offset = obj_exts_offset_in_object(s);
> +
> + slab->obj_exts = (unsigned long)slab_address(slab);
> + slab->obj_exts += s->red_left_pad;
> + slab->obj_exts += obj_exts_offset_in_object(s);
> + if (IS_ENABLED(CONFIG_MEMCG))
> + slab->obj_exts |= MEMCG_DATA_OBJEXTS;
> + slab_set_stride(slab, s->size);
> +
> + for_each_object(addr, s, slab_address(slab), slab->objects) {
> + kasan_unpoison_range(addr + offset,
> + sizeof(struct slabobj_ext));
Is this leftover from previous version? Otherwise I don't get why we unpoison this.
> + memset(addr + offset, 0, sizeof(struct slabobj_ext));
> + }
> }
> metadata_access_disable();
> }
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: [RFC PATCH V3 7/7] mm/slab: place slabobj_ext metadata in unused space within s->size
2025-10-29 18:19 ` Andrey Ryabinin
@ 2025-10-30 0:51 ` Harry Yoo
2025-10-30 12:41 ` Yeoreum Yun
0 siblings, 1 reply; 34+ messages in thread
From: Harry Yoo @ 2025-10-30 0:51 UTC (permalink / raw)
To: Andrey Ryabinin
Cc: akpm, vbabka, andreyknvl, cl, dvyukov, glider, hannes, linux-mm,
mhocko, muchun.song, rientjes, roman.gushchin, shakeel.butt,
surenb, vincenzo.frascino, yeoreum.yun, tytso, adilger.kernel,
linux-ext4, linux-kernel
On Wed, Oct 29, 2025 at 07:19:29PM +0100, Andrey Ryabinin wrote:
>
>
> On 10/27/25 1:28 PM, Harry Yoo wrote:
>
> > slab->obj_exts |= MEMCG_DATA_OBJEXTS;
> > slab_set_stride(slab, sizeof(struct slabobj_ext));
> > + } else if (obj_exts_in_object(s)) {
> > + unsigned int offset = obj_exts_offset_in_object(s);
> > +
> > + slab->obj_exts = (unsigned long)slab_address(slab);
> > + slab->obj_exts += s->red_left_pad;
> > + slab->obj_exts += obj_exts_offset_in_object(s);
> > + if (IS_ENABLED(CONFIG_MEMCG))
> > + slab->obj_exts |= MEMCG_DATA_OBJEXTS;
> > + slab_set_stride(slab, s->size);
> > +
> > + for_each_object(addr, s, slab_address(slab), slab->objects) {
> > + kasan_unpoison_range(addr + offset,
> > + sizeof(struct slabobj_ext));
>
> Is this leftover from previous version? Otherwise I don't get why we unpoison this.
Oh god, yes! Thanks for catching. Will fix in the next version.
> > + memset(addr + offset, 0, sizeof(struct slabobj_ext));
> > + }
> > }
> > metadata_access_disable();
> > }
--
Cheers,
Harry / Hyeonggon
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: [RFC PATCH V3 7/7] mm/slab: place slabobj_ext metadata in unused space within s->size
2025-10-30 0:51 ` Harry Yoo
@ 2025-10-30 12:41 ` Yeoreum Yun
0 siblings, 0 replies; 34+ messages in thread
From: Yeoreum Yun @ 2025-10-30 12:41 UTC (permalink / raw)
To: Harry Yoo
Cc: Andrey Ryabinin, akpm, vbabka, andreyknvl, cl, dvyukov, glider,
hannes, linux-mm, mhocko, muchun.song, rientjes, roman.gushchin,
shakeel.butt, surenb, vincenzo.frascino, tytso, adilger.kernel,
linux-ext4, linux-kernel
Hi Harry,
> On Wed, Oct 29, 2025 at 07:19:29PM +0100, Andrey Ryabinin wrote:
> >
> >
> > On 10/27/25 1:28 PM, Harry Yoo wrote:
> >
> > > slab->obj_exts |= MEMCG_DATA_OBJEXTS;
> > > slab_set_stride(slab, sizeof(struct slabobj_ext));
> > > + } else if (obj_exts_in_object(s)) {
> > > + unsigned int offset = obj_exts_offset_in_object(s);
> > > +
> > > + slab->obj_exts = (unsigned long)slab_address(slab);
> > > + slab->obj_exts += s->red_left_pad;
> > > + slab->obj_exts += obj_exts_offset_in_object(s);
> > > + if (IS_ENABLED(CONFIG_MEMCG))
> > > + slab->obj_exts |= MEMCG_DATA_OBJEXTS;
> > > + slab_set_stride(slab, s->size);
> > > +
> > > + for_each_object(addr, s, slab_address(slab), slab->objects) {
> > > + kasan_unpoison_range(addr + offset,
> > > + sizeof(struct slabobj_ext));
> >
> > Is this leftover from previous version? Otherwise I don't get why we unpoison this.
>
> Oh god, yes! Thanks for catching. Will fix in the next version.
>
Not only this, there would be possible case for WARN_ON() in the
kasan_unpoison_range() for unaligned address with KASAN_GRANULE_SIZE
when:
- No debug information.
- object size = 24 byte.
- align = 32 bytes.
- sizeof(struct slabobj_ext) = 8 (CONFIG_MEMCG=y && CONFIG_MEM_ALLOC_PROFILING=n)
- using KASAN_HW_TAG (KASAN_GRANULE_SIZE = 16 bytes).
Thanks.
--
Sincerely,
Yeoreum Yun
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC PATCH V3 0/7] mm/slab: reduce slab accounting memory overhead by allocating slabobj_ext metadata within unused slab space
2025-10-27 12:28 [RFC PATCH V3 0/7] mm/slab: reduce slab accounting memory overhead by allocating slabobj_ext metadata within unused slab space Harry Yoo
` (6 preceding siblings ...)
2025-10-27 12:28 ` [RFC PATCH V3 7/7] mm/slab: place slabobj_ext metadata in unused space within s->size Harry Yoo
@ 2025-10-30 16:39 ` Vlastimil Babka
7 siblings, 0 replies; 34+ messages in thread
From: Vlastimil Babka @ 2025-10-30 16:39 UTC (permalink / raw)
To: Harry Yoo, akpm
Cc: andreyknvl, cl, dvyukov, glider, hannes, linux-mm, mhocko,
muchun.song, rientjes, roman.gushchin, ryabinin.a.a, shakeel.butt,
surenb, vincenzo.frascino, yeoreum.yun, tytso, adilger.kernel,
linux-ext4, linux-kernel
On 10/27/25 13:28, Harry Yoo wrote:
> RFC v2: https://lore.kernel.org/linux-mm/20250827113726.707801-1-harry.yoo@oracle.com/
>
> RFC v2 -> v3:
> - RFC v3 now depends on the patch "[PATCH V2] mm/slab: ensure all metadata
> in slab object are word-aligned"
Looks like there's some outstanding feedback on that patch. Also on this
series already, so I'll wait for the next version before looking in detail,
but overall it looks good to me! Thanks!
> - During the merge window, the size of ext4 inode cache has shrunken
> and it couldn't benefit from the change anymore as the unused space
> became smaller. But I somehow found a way to shrink the size of
> ext4 inode object by a word...
>
> With new patch 1 and 2, now it can benefit from the optimization again.
>
> - As suggested by Andrey, SLUB now disables KASAN and KMSAN, and reset the
> kasan tag instead of unpoisoning slabobj_ext metadata (Patch 5).
>
> When CONFIG_MEMCG and CONFIG_MEM_ALLOC_PROFILING are enabled,
> the kernel allocates two pointers per object: one for the memory cgroup
> (obj_cgroup) to which it belongs, and another for the code location
> that requested the allocation.
>
> In two special cases, this overhead can be eliminated by allocating
> slabobj_ext metadata from unused space within a slab:
>
> Case 1. The "leftover" space after the last slab object is larger than
> the size of an array of slabobj_ext.
>
> Case 2. The per-object alignment padding is larger than
> sizeof(struct slabobj_ext).
>
> For these two cases, one or two pointers can be saved per slab object.
> Examples: ext4 inode cache (case 1) and xfs inode cache (case 2).
> That's approximately 0.7-0.8% (memcg) or 1.5-1.6%% (memcg + mem profiling)
> of the total inode cache size.
>
> Implementing case 2 is not straightforward, because the existing code
> assumes that slab->obj_exts is an array of slabobj_ext, while case 2
> breaks the assumption.
>
> As suggested by Vlastimil, abstract access to individual slabobj_ext
> metadata via a new helper named slab_obj_ext():
>
> static inline struct slabobj_ext *slab_obj_ext(struct slab *slab,
> unsigned long obj_exts,
> unsigned int index)
> {
> return (struct slabobj_ext *)(obj_exts + slab_get_stride(slab) * index);
> }
>
> In the normal case (including case 1), slab->obj_exts points to an array
> of slabobj_ext, and the stride is sizeof(struct slabobj_ext).
>
> In case 2, the stride is s->size and
> slab->obj_exts = slab_address(slab) + s->red_left_pad + (offset of slabobj_ext)
>
> With this approach, the memcg charging fastpath doesn't need to care the
> storage method of slabobj_ext.
>
> Harry Yoo (7):
> mm/slab: allow specifying freepointer offset when using constructor
> ext4: specify the free pointer offset for ext4_inode_cache
> mm/slab: abstract slabobj_ext access via new slab_obj_ext() helper
> mm/slab: use stride to access slabobj_ext
> mm/memcontrol,alloc_tag: handle slabobj_ext access under KASAN poison
> mm/slab: save memory by allocating slabobj_ext array from leftover
> mm/slab: place slabobj_ext metadata in unused space within s->size
>
> fs/ext4/super.c | 20 ++-
> include/linux/slab.h | 9 ++
> mm/memcontrol.c | 34 +++--
> mm/slab.h | 94 ++++++++++++-
> mm/slab_common.c | 8 +-
> mm/slub.c | 304 ++++++++++++++++++++++++++++++++++++-------
> 6 files changed, 398 insertions(+), 71 deletions(-)
>
^ permalink raw reply [flat|nested] 34+ messages in thread