From: Muchun Song <muchun.song@linux.dev>
To: Shakeel Butt <shakeel.butt@linux.dev>
Cc: lsf-pc@lists.linux-foundation.org,
"Andrew Morton" <akpm@linux-foundation.org>,
"Tejun Heo" <tj@kernel.org>, "Michal Hocko" <mhocko@suse.com>,
"Johannes Weiner" <hannes@cmpxchg.org>,
"Alexei Starovoitov" <ast@kernel.org>,
"Michal Koutný" <mkoutny@suse.com>,
"Roman Gushchin" <roman.gushchin@linux.dev>,
"Hui Zhu" <hui.zhu@linux.dev>,
"JP Kobryn" <inwardvessel@gmail.com>,
"Geliang Tang" <geliang@kernel.org>,
"Sweet Tea Dorminy" <sweettea-kernel@dorminy.me>,
"Emil Tsalapatis" <emil@etsalapatis.com>,
"David Rientjes" <rientjes@google.com>,
"Martin KaFai Lau" <martin.lau@linux.dev>,
"Meta kernel team" <kernel-team@meta.com>,
linux-mm@kvack.org, cgroups@vger.kernel.org, bpf@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [LSF/MM/BPF TOPIC] Reimagining Memory Cgroup (memcg_ext)
Date: Thu, 12 Mar 2026 10:46:10 +0800 [thread overview]
Message-ID: <8F3593EB-9D81-4459-8675-E922426DCB1E@linux.dev> (raw)
In-Reply-To: <abHPsCypwo7ZhqIt@linux.dev>
> On Mar 12, 2026, at 04:39, Shakeel Butt <shakeel.butt@linux.dev> wrote:
>
> On Wed, Mar 11, 2026 at 03:19:31PM +0800, Muchun Song wrote:
>>
>>
>>> On Mar 8, 2026, at 02:24, Shakeel Butt <shakeel.butt@linux.dev> wrote:
>>>
>
> [...]
>
>>>
>>> Per-Memcg Background Reclaim
>>>
>>> In the new memcg world, with the goal of (mostly) eliminating direct synchronous
>>> reclaim for limit enforcement, provide per-memcg background reclaimers which can
>>> scale across CPUs with the allocation rate.
>>
>> Hi Shakeel,
>>
>> I'm quite interested in this. Internally, we privately maintain a set
>> of code to implement asynchronous reclamation, but we're also trying to
>> discard these private codes as much as possible. Therefore, we want to
>> implement a similar asynchronous reclamation mechanism in user space
>> through the memory.reclaim mechanism. However, currently there's a lack
>> of suitable policy notification mechanisms to trigger user threads to
>> proactively reclaim in advance.
>
> Cool, can you please share what "suitable policy notification mechanisms" you
> need for your use-case? This will give me more data on the comparison between
> memory.reclaim and the proposed approach.
If we expect the proactive reclamation to be triggered when the current
memcg's memory usage reaches a certain point, we have to continuously read
memory.current to determine whether it has reached our set watermark value
to trigger asynchronous reclamation. Perhaps we need an event that can notify
user-space threads when the current memory usage reaches a specific
watermark value. Currently, the events supported by memory.events may lack
the capability for custom watermarks.
>
>
>>
>>>
>>> Lock-Aware Throttling
>>>
>>> The ability to avoid throttling an allocating task that is holding locks, to
>>> prevent priority inversion. In Meta's fleet, we have observed lock holders stuck
>>> in memcg reclaim, blocking all waiters regardless of their priority or
>>> criticality.
>>
>> This is a real problem we encountered, especially with the jbd handler
>> resources of the ext4 file system. Our current attempt is to defer
>> memory reclamation until returning to user space, in order to solve
>> various priority inversion issues caused by the jbd handler. Therefore,
>> I would be interested to discuss this topic.
>
> Awesome, do you use memory.max and memory.high both and defer the reclaim for
> both? Are you deferring all the reclaims or just the ones where the charging
> process has the lock? (I need to look what jbd handler is).
>
We do not use memory.high, although it supports deferring memory reclamation
to user-space, it also attempts to throttle memory allocation speed, which
introduces significant latency. In our application's case, we would rather
accept an OOM under such circumstances. We previously attempted to address
the priority inversion issue caused by the jbd handler separately (which we
frequently encounter since we use the ext4 file system), and you can refer
to this [1]. Of course, this solution lacks generality, as it requires
calling new interfaces for various lock resources. Therefore, we internally
have a more aggressive idea: defer all reclamation triggered by kernel-space
memory allocation until just before returning to user-space. This should
resolve the vast majority of priority inversion problems. The only potential
issue introduced is that kernel-space memory usage may briefly exceed memory.max.
[1] https://lore.kernel.org/linux-mm/cover.1750234270.git.hezhongkun.hzk@bytedance.com/#r
Muchun,
Thanks.
next prev parent reply other threads:[~2026-03-12 2:47 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-07 18:24 [LSF/MM/BPF TOPIC] Reimagining Memory Cgroup (memcg_ext) Shakeel Butt
2026-03-09 21:33 ` Roman Gushchin
2026-03-09 23:09 ` Shakeel Butt
2026-03-11 4:57 ` Jiayuan Chen
2026-03-11 17:00 ` Shakeel Butt
2026-03-11 7:19 ` Muchun Song
2026-03-11 20:39 ` Shakeel Butt
2026-03-12 2:46 ` Muchun Song [this message]
2026-03-13 6:17 ` teawater
2026-03-11 7:29 ` Greg Thelen
2026-03-11 21:35 ` Shakeel Butt
2026-03-11 13:20 ` Johannes Weiner
2026-03-11 22:47 ` Shakeel Butt
2026-03-12 3:06 ` hui.zhu
2026-03-12 3:36 ` hui.zhu
2026-03-25 18:47 ` Donet Tom
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8F3593EB-9D81-4459-8675-E922426DCB1E@linux.dev \
--to=muchun.song@linux.dev \
--cc=akpm@linux-foundation.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=cgroups@vger.kernel.org \
--cc=emil@etsalapatis.com \
--cc=geliang@kernel.org \
--cc=hannes@cmpxchg.org \
--cc=hui.zhu@linux.dev \
--cc=inwardvessel@gmail.com \
--cc=kernel-team@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=martin.lau@linux.dev \
--cc=mhocko@suse.com \
--cc=mkoutny@suse.com \
--cc=rientjes@google.com \
--cc=roman.gushchin@linux.dev \
--cc=shakeel.butt@linux.dev \
--cc=sweettea-kernel@dorminy.me \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox