All of lore.kernel.org
 help / color / mirror / Atom feed
From: Harry Yoo <harry@kernel.org>
To: "Vlastimil Babka (SUSE)" <vbabka@kernel.org>,
	Shakeel Butt <shakeel.butt@linux.dev>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Hao Li <hao.li@linux.dev>, Christoph Lameter <cl@gentwo.org>,
	David Rientjes <rientjes@google.com>,
	Suren Baghdasaryan <surenb@google.com>,
	Usama Arif <usama.arif@linux.dev>,
	Meta kernel team <kernel-team@meta.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Danielle Costantino <dcostantino@meta.com>
Subject: Re: [PATCH] mm/slub: serve slabobj_ext array from a strictly larger kmalloc cache
Date: Sun, 28 Jun 2026 18:22:22 +0900	[thread overview]
Message-ID: <68122038-e8e0-47ed-82f8-cb6a23e4658e@kernel.org> (raw)
In-Reply-To: <e6ca6ac6-92c8-4db6-a82b-cbb92ceaa4ea@kernel.org>


[-- Attachment #1.1: Type: text/plain, Size: 4115 bytes --]



On 6/28/26 4:47 PM, Vlastimil Babka (SUSE) wrote:
> On 6/28/26 5:23 AM, Shakeel Butt wrote:
>> On Sat, Jun 27, 2026 at 07:58:12PM -0700, Shakeel Butt wrote:
>>> On Fri, Jun 26, 2026 at 07:11:33PM +0200, Vlastimil Babka (SUSE) wrote:
>>> [...]
>>>>>>> Fix it structurally by removing cycles of every shape: serve the array
>>>>>>> from a cache strictly larger than the one it describes whenever it would
>>>>>>> otherwise come from the same or a smaller cache.  Every reference edge
>>>>>>> then points from a smaller to a larger cache (here kmalloc-1k's array
>>>>>>> moves to kmalloc-2k), so the relation is a DAG and cannot contain a cycle.
>>>>>>
>>>>>> This will fix the problem.
>>>>>>
>>>>>> But this will waste memory as we need smaller obj_exts array
>>>>>> as the size gets larger.
>>>>>>
>>>>>> We should probably create a new kmalloc type to avoid cycles instead?
>>>>>> (needed only when memory profiling is enabled, though)
>>>>>>
>>>>>> That would also prevent recursion even further.
>>>>>
>>>>> Yes but I assume that would add kmem caches even for users not using memory
>>>>> profiling. Anyways, I think that is a separate discussion. Am I understanding
>>>>> correctly that you don't have any concerns with this approach?
>>>>
>>>> Umm, the memory waste is a concern?
>>>>
>>>> Minimally I'd now want to only do that size bumping when allocation
>>>> profiling is enabled. Ideally that means both configured in and not booted
>>>> with "never".
>>>>
>>>> We probably should have done that already in 280ea9c3154b2. Because AFAIU
>>>> memcg-only obj_exts array don't have this issue (or maybe they do have the
>>>> [1] issue? Harry?). But if memcg-only should keep avoiding the same size
>>>> bucket, it can keep what it was doing and only memalloc profiling would do
>>>> the strictly larger thing.
>>>
>>> memcg should not have this issue as normal kmalloc caches do not serve memcg
>>> charged objects. 
>>
>> I am wrong here as I went back and see d8df600b67d7.

I was confused too :)

> (8dafa9f5900c upstream)
> 
>>>
>>> So here we can do dedicated caches as Harry suggested or make this size bumping
>>> very specialized as Vlastimil suggested. What do we want long term? Orthogonally
> 
> Maybe long term we make kmem_buckets unconditional and use that.
> 
>>> we do want this fix to be backported easily to older stable kernels. I will see
>>> how does this narrowed down size bumping looks like.
>>>
>>
>> BTW I think we need something like the following, right?
>>
>> 	if (mem_alloc_profiling_enabled()) {
>> 		if (obj_exts_cache->object_size <= s->object_size)
>> 			return s->object_size + 1;
>> 	} else {
>> 		if (obj_exts_cache->object_size == s->object_size)
>> 			return s->object_size + 1;
>> 	}

We should not add mem_alloc_profiling_enabled() check because,
then we're not fixing this issue on SLUB_TINY, when the caller specifies
__GFP_RECLAIMABLE|__GFP_ACCOUNT without memory allocation profiling.

`if (!is_kmalloc_normal(s))` check already bails out when it doesn't
need to bump the size.

So Shakeel's original code will work fine.

We're only pessimizing memory allocation profiling and
SLUB_TINY && MEMCG users, but (as Vlastimil suggests off-list)
it wouldn't make much sense to enable MEMCG on memory restricted systems
anyway. (IIRC even raspberry pis don't enable the memory controller by
default...)

I think it's okay to fix the bug first, but we need to address
the memory wastage issue sooner or later if companies (Meta and
Google I guess?) are deploying kernels with memory allocation profiling
on in production systems.

Perhaps it's worth adding a comment like this, though:

/*
 * Only bump the size when the object (not the obj_exts array) is
 * allocated from KMALLOC_NORMAL, either by memory allocation profiling
 * or memcg on SLUB_TINY with __GFP_RECLAIMABLE|__GFP_ACCOUNT.
 * Otherwise, obj_exts allocations cannot form a cycle between
 * kmalloc caches.
 */
if (!is_kmalloc_normal(s))
        return sz;

Thanks!

-- 
Cheers,
Harry / Hyeonggon

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

  reply	other threads:[~2026-06-28  9:22 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-25 23:00 [PATCH] mm/slub: serve slabobj_ext array from a strictly larger kmalloc cache Shakeel Butt
2026-06-26  4:22 ` Harry Yoo
2026-06-26 16:49   ` Shakeel Butt
2026-06-26 17:11     ` Vlastimil Babka (SUSE)
2026-06-28  2:58       ` Shakeel Butt
2026-06-28  3:23         ` Shakeel Butt
2026-06-28  7:47           ` Vlastimil Babka (SUSE)
2026-06-28  9:22             ` Harry Yoo [this message]
2026-06-28  8:10       ` Harry Yoo
2026-06-28  8:36         ` Harry Yoo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=68122038-e8e0-47ed-82f8-cb6a23e4658e@kernel.org \
    --to=harry@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=cl@gentwo.org \
    --cc=dcostantino@meta.com \
    --cc=hao.li@linux.dev \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeel.butt@linux.dev \
    --cc=surenb@google.com \
    --cc=usama.arif@linux.dev \
    --cc=vbabka@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.