Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Harry Yoo <harry@kernel.org>
To: "Vlastimil Babka (SUSE)" <vbabka@kernel.org>,
	Shakeel Butt <shakeel.butt@linux.dev>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Hao Li <hao.li@linux.dev>, Christoph Lameter <cl@gentwo.org>,
	David Rientjes <rientjes@google.com>,
	Suren Baghdasaryan <surenb@google.com>,
	Usama Arif <usama.arif@linux.dev>,
	Meta kernel team <kernel-team@meta.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Danielle Costantino <dcostantino@meta.com>
Subject: Re: [PATCH] mm/slub: serve slabobj_ext array from a strictly larger kmalloc cache
Date: Sun, 28 Jun 2026 18:22:22 +0900	[thread overview]
Message-ID: <68122038-e8e0-47ed-82f8-cb6a23e4658e@kernel.org> (raw)
In-Reply-To: <e6ca6ac6-92c8-4db6-a82b-cbb92ceaa4ea@kernel.org>


[-- Attachment #1.1: Type: text/plain, Size: 4115 bytes --]



On 6/28/26 4:47 PM, Vlastimil Babka (SUSE) wrote:
> On 6/28/26 5:23 AM, Shakeel Butt wrote:
>> On Sat, Jun 27, 2026 at 07:58:12PM -0700, Shakeel Butt wrote:
>>> On Fri, Jun 26, 2026 at 07:11:33PM +0200, Vlastimil Babka (SUSE) wrote:
>>> [...]
>>>>>>> Fix it structurally by removing cycles of every shape: serve the array
>>>>>>> from a cache strictly larger than the one it describes whenever it would
>>>>>>> otherwise come from the same or a smaller cache.  Every reference edge
>>>>>>> then points from a smaller to a larger cache (here kmalloc-1k's array
>>>>>>> moves to kmalloc-2k), so the relation is a DAG and cannot contain a cycle.
>>>>>>
>>>>>> This will fix the problem.
>>>>>>
>>>>>> But this will waste memory as we need smaller obj_exts array
>>>>>> as the size gets larger.
>>>>>>
>>>>>> We should probably create a new kmalloc type to avoid cycles instead?
>>>>>> (needed only when memory profiling is enabled, though)
>>>>>>
>>>>>> That would also prevent recursion even further.
>>>>>
>>>>> Yes but I assume that would add kmem caches even for users not using memory
>>>>> profiling. Anyways, I think that is a separate discussion. Am I understanding
>>>>> correctly that you don't have any concerns with this approach?
>>>>
>>>> Umm, the memory waste is a concern?
>>>>
>>>> Minimally I'd now want to only do that size bumping when allocation
>>>> profiling is enabled. Ideally that means both configured in and not booted
>>>> with "never".
>>>>
>>>> We probably should have done that already in 280ea9c3154b2. Because AFAIU
>>>> memcg-only obj_exts array don't have this issue (or maybe they do have the
>>>> [1] issue? Harry?). But if memcg-only should keep avoiding the same size
>>>> bucket, it can keep what it was doing and only memalloc profiling would do
>>>> the strictly larger thing.
>>>
>>> memcg should not have this issue as normal kmalloc caches do not serve memcg
>>> charged objects. 
>>
>> I am wrong here as I went back and see d8df600b67d7.

I was confused too :)

> (8dafa9f5900c upstream)
> 
>>>
>>> So here we can do dedicated caches as Harry suggested or make this size bumping
>>> very specialized as Vlastimil suggested. What do we want long term? Orthogonally
> 
> Maybe long term we make kmem_buckets unconditional and use that.
> 
>>> we do want this fix to be backported easily to older stable kernels. I will see
>>> how does this narrowed down size bumping looks like.
>>>
>>
>> BTW I think we need something like the following, right?
>>
>> 	if (mem_alloc_profiling_enabled()) {
>> 		if (obj_exts_cache->object_size <= s->object_size)
>> 			return s->object_size + 1;
>> 	} else {
>> 		if (obj_exts_cache->object_size == s->object_size)
>> 			return s->object_size + 1;
>> 	}

We should not add mem_alloc_profiling_enabled() check because,
then we're not fixing this issue on SLUB_TINY, when the caller specifies
__GFP_RECLAIMABLE|__GFP_ACCOUNT without memory allocation profiling.

`if (!is_kmalloc_normal(s))` check already bails out when it doesn't
need to bump the size.

So Shakeel's original code will work fine.

We're only pessimizing memory allocation profiling and
SLUB_TINY && MEMCG users, but (as Vlastimil suggests off-list)
it wouldn't make much sense to enable MEMCG on memory restricted systems
anyway. (IIRC even raspberry pis don't enable the memory controller by
default...)

I think it's okay to fix the bug first, but we need to address
the memory wastage issue sooner or later if companies (Meta and
Google I guess?) are deploying kernels with memory allocation profiling
on in production systems.

Perhaps it's worth adding a comment like this, though:

/*
 * Only bump the size when the object (not the obj_exts array) is
 * allocated from KMALLOC_NORMAL, either by memory allocation profiling
 * or memcg on SLUB_TINY with __GFP_RECLAIMABLE|__GFP_ACCOUNT.
 * Otherwise, obj_exts allocations cannot form a cycle between
 * kmalloc caches.
 */
if (!is_kmalloc_normal(s))
        return sz;

Thanks!

-- 
Cheers,
Harry / Hyeonggon

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

  reply	other threads:[~2026-06-28  9:22 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-25 23:00 [PATCH] mm/slub: serve slabobj_ext array from a strictly larger kmalloc cache Shakeel Butt
2026-06-26  4:22 ` Harry Yoo
2026-06-26 16:49   ` Shakeel Butt
2026-06-26 17:11     ` Vlastimil Babka (SUSE)
2026-06-28  2:58       ` Shakeel Butt
2026-06-28  3:23         ` Shakeel Butt
2026-06-28  7:47           ` Vlastimil Babka (SUSE)
2026-06-28  9:22             ` Harry Yoo [this message]
2026-06-28  8:10       ` Harry Yoo
2026-06-28  8:36         ` Harry Yoo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=68122038-e8e0-47ed-82f8-cb6a23e4658e@kernel.org \
    --to=harry@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=cl@gentwo.org \
    --cc=dcostantino@meta.com \
    --cc=hao.li@linux.dev \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeel.butt@linux.dev \
    --cc=surenb@google.com \
    --cc=usama.arif@linux.dev \
    --cc=vbabka@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox