From: Shakeel Butt <shakeel.butt@linux.dev>
To: Greg Thelen <gthelen@google.com>
Cc: lsf-pc@lists.linux-foundation.org,
"Andrew Morton" <akpm@linux-foundation.org>,
"Tejun Heo" <tj@kernel.org>, "Michal Hocko" <mhocko@suse.com>,
"Johannes Weiner" <hannes@cmpxchg.org>,
"Alexei Starovoitov" <ast@kernel.org>,
"Michal Koutný" <mkoutny@suse.com>,
"Roman Gushchin" <roman.gushchin@linux.dev>,
"Hui Zhu" <hui.zhu@linux.dev>,
"JP Kobryn" <inwardvessel@gmail.com>,
"Muchun Song" <muchun.song@linux.dev>,
"Geliang Tang" <geliang@kernel.org>,
"Sweet Tea Dorminy" <sweettea-kernel@dorminy.me>,
"Emil Tsalapatis" <emil@etsalapatis.com>,
"David Rientjes" <rientjes@google.com>,
"Martin KaFai Lau" <martin.lau@linux.dev>,
"Meta kernel team" <kernel-team@meta.com>,
linux-mm@kvack.org, cgroups@vger.kernel.org, bpf@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [LSF/MM/BPF TOPIC] Reimagining Memory Cgroup (memcg_ext)
Date: Wed, 11 Mar 2026 14:35:50 -0700 [thread overview]
Message-ID: <abHWslV0KmjF7x80@linux.dev> (raw)
In-Reply-To: <CAHH2K0ZBJV1peAZVZC9Lm=rFRzSfxsvbrxRjyB=+0xkHGRcdLA@mail.gmail.com>
Hi Greg,
On Wed, Mar 11, 2026 at 12:29:45AM -0700, Greg Thelen wrote:
> On Sat, Mar 7, 2026 at 10:24 AM Shakeel Butt <shakeel.butt@linux.dev> wrote:
> >
> >
>
> Very interesting set of topics. A few more come to mind.
Thanks.
>
> I've wondered about preallocating memory or guaranteeing access to
> physical memory for a job. Memcg has max limits and min protections,
> but no preallocation (i.e. no conceptual memcg free list). So if a job
> is configured with 1GB min workingset protection that only ensures 1GB
> won't be reclaimed, not that 1GB can be allocated in a reasonable
> amount of time. This isn't just a job startup problem: if a page is
> freed with MADV_DONTNEED a subsequent pgfault may require a lot of
> time to handle, even if usage is below min.
This is indeed correct i.e. protection limits protect the workload from external
reclaim but does not provide any gurantee on allocating memory in a reasonable
cheap way (without triggering reclaim/compaction). This is one of the challenge
to implement userspace oom-killer in an aggressively overcommitted environment.
However to me providing memory allocation guarantees is more of a system level
feature and orthogonal to memcg. And I see your next para is about that :)
Anyways I think if we keep system memory utilization below some value and
guarantee there is always some free memory (this can be done by having common
ancestor of all workloads and ancestor has a limit or node controller maintains
the condition that the sum of limits of all top level cgroups is below some
percentage of total memory) then we might not need memcg free list or similar
mechanisms (most of the time, I think).
>
> Initial allocation policies are controlled by mempolicy/cpuset. Should
> we continue to keep allocation policies and resource accounting
> separate? It's a little strange that memcg can (1) cap max usage of
> tier X memory, and (2) provide minimum protection for tier X usage,
> but has no influence on where memory is initially allocated?
I think I understand your point but I think the implementation would be too
messy. This is orthogonal to the proposal but I would say a good topic for
LSFMMBPF if you want to lead the discussion.
next prev parent reply other threads:[~2026-03-11 21:36 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-07 18:24 [LSF/MM/BPF TOPIC] Reimagining Memory Cgroup (memcg_ext) Shakeel Butt
2026-03-09 21:33 ` Roman Gushchin
2026-03-09 23:09 ` Shakeel Butt
2026-03-11 4:57 ` Jiayuan Chen
2026-03-11 17:00 ` Shakeel Butt
2026-03-11 7:19 ` Muchun Song
2026-03-11 20:39 ` Shakeel Butt
2026-03-12 2:46 ` Muchun Song
2026-03-13 6:17 ` teawater
2026-03-11 7:29 ` Greg Thelen
2026-03-11 21:35 ` Shakeel Butt [this message]
2026-03-11 13:20 ` Johannes Weiner
2026-03-11 22:47 ` Shakeel Butt
2026-03-12 3:06 ` hui.zhu
2026-03-12 3:36 ` hui.zhu
2026-03-25 18:47 ` Donet Tom
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=abHWslV0KmjF7x80@linux.dev \
--to=shakeel.butt@linux.dev \
--cc=akpm@linux-foundation.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=cgroups@vger.kernel.org \
--cc=emil@etsalapatis.com \
--cc=geliang@kernel.org \
--cc=gthelen@google.com \
--cc=hannes@cmpxchg.org \
--cc=hui.zhu@linux.dev \
--cc=inwardvessel@gmail.com \
--cc=kernel-team@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=martin.lau@linux.dev \
--cc=mhocko@suse.com \
--cc=mkoutny@suse.com \
--cc=muchun.song@linux.dev \
--cc=rientjes@google.com \
--cc=roman.gushchin@linux.dev \
--cc=sweettea-kernel@dorminy.me \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox