From: Michal Hocko <mhocko@suse.com>
To: Daniil Tatianin <d-tatianin@yandex-team.ru>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Roman Gushchin <roman.gushchin@linux.dev>,
Shakeel Butt <shakeel.butt@linux.dev>,
Muchun Song <muchun.song@linux.dev>,
David Hildenbrand <david@kernel.org>,
Lorenzo Stoakes <ljs@kernel.org>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
Vlastimil Babka <vbabka@kernel.org>,
Mike Rapoport <rppt@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Axel Rasmussen <axelrasmussen@google.com>,
Yuanchu Xie <yuanchu@google.com>, Wei Xu <weixugc@google.com>,
Brendan Jackman <jackmanb@google.com>, Zi Yan <ziy@nvidia.com>,
cgroups@vger.kernel.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, yc-core@yandex-team.ru
Subject: Re: [PATCH] mm: add memory.compact_unevictable_allowed cgroup attribute
Date: Wed, 18 Mar 2026 10:20:59 +0100 [thread overview]
Message-ID: <abpue_k_9mjQAP6X@tiehlicka> (raw)
In-Reply-To: <7ca9876c-f3fa-441c-9a21-ae0ee5523318@yandex-team.ru>
On Wed 18-03-26 12:04:10, Daniil Tatianin wrote:
>
> On 3/18/26 11:25 AM, Michal Hocko wrote:
> > On Tue 17-03-26 23:17:28, Daniil Tatianin wrote:
> > > On 3/17/26 10:17 PM, Andrew Morton wrote:
> > > > On Tue, 17 Mar 2026 13:00:58 +0300 Daniil Tatianin<d-tatianin@yandex-team.ru> wrote:
> > > >
> > > > > The current global sysctl compact_unevictable_allowed is too coarse.
> > > > > In environments with mixed workloads, we may want to protect specific
> > > > > important cgroups from compaction to ensure their stability and
> > > > > responsiveness, while allowing compaction for others.
> > > > >
> > > > > This patch introduces a per-memcg compact_unevictable_allowed attribute.
> > > > > This allows granular control over whether unevictable pages in a specific
> > > > > cgroup can be compacted. The global sysctl still takes precedence if set
> > > > > to disallow compaction, but this new setting allows opting out specific
> > > > > cgroups.
> > > > >
> > > > > This also adds a new ISOLATE_UNEVICTABLE_CHECK_MEMCG flag to
> > > > > isolate_migratepages_block to preserve the old behavior for the
> > > > > ISOLATE_UNEVICTABLE flag unconditionally used by
> > > > > isolage_migratepages_range.
> > > > AI review asked questions:
> > > > https://sashiko.dev/#/patchset/20260317100058.2316997-1-d-tatianin@yandex-team.ru
> > > > Should this dynamically walk up the ancestor chain during evaluation to
> > > > ensure it returns false if any ancestor has disallowed compaction?
> > > I think ultimately it's up to cgroup maintainers whether the code should do
> > > that, but as far as I understand the whole point of cgroups is that a child
> > > can override the settings of its parent. Moreover, this property doesn't
> > > have CFTYPE_NS_DELEGATABLE set, so a child cgroup cannot just toggle it at
> > > will.
> > In general any attributes should have proper hieararchical semantic. I
> > am not sure what that should be in this case. What is a desire in a
> > child cgroup can become fragmentation pressure to others.
> >
> > I think it would be really important to explain more thoroughly about
> > those usecases of mixed workloads.
> I think there are many examples of a system where one process is more
> important than
> others. For example, any sort of healthcheck or even the ssh daemon: these
> may become
> unresponsive during heavy compaction due to thousands of TLB invalidate IPIs
> or page faulting
> on pages that are being compacted. Another example is a VM that is
> responsible for routing
> traffic of all other VMs or even the entire cluster, you really want to
> prioritize its responsiveness, while
> still allowing compaction of memory for the rest of the system, for less
> important VMs or services etc.
Shouldn't those use mlock?
> > Is the memcg even a suitable level of
> > abstraction for this tunable?
>
> In my opinion it is, since it is relatively common to put all related tasks
> into one cgroup with preset memory limits etc.
>
> > Doesn't this belong to tasks if anything?
>
> I think it would be very difficult to implement as a per-task attribute
> properly since compaction works at the folio
> level. While folios have a pointer to the memcg that owns them, they may be
> mapped by multiple process in case
> of shared memory. We would have to find all the address spaces mapping this
> folio, and then check the property on
> every one of them, which may be set to different values. This may be
> problematic performance-wise to do for
> every physical page, and it also introduces unclear semantics if different
> address spaces mapping the same page
> have different opinions.
Yes, it would need to be something like an implicit mlock. I haven't
really indicated that would be a _simpler_ solution. But as this has
obvious userspace API implications the much more important question is
what is a futureproof solution. Also we need to get an answer whether
this is really needed or too niche to cast an interface maintained for
ever for.
--
Michal Hocko
SUSE Labs
next prev parent reply other threads:[~2026-03-18 9:21 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-17 10:00 [PATCH] mm: add memory.compact_unevictable_allowed cgroup attribute Daniil Tatianin
2026-03-17 19:17 ` Andrew Morton
2026-03-17 20:17 ` Daniil Tatianin
2026-03-18 8:25 ` Michal Hocko
2026-03-18 9:09 ` Daniil Tatianin
[not found] ` <7ca9876c-f3fa-441c-9a21-ae0ee5523318@yandex-team.ru>
2026-03-18 9:20 ` Michal Hocko [this message]
2026-03-18 9:25 ` Daniil Tatianin
2026-03-18 10:01 ` Michal Hocko
2026-03-18 10:08 ` Daniil Tatianin
2026-03-18 11:47 ` Michal Hocko
2026-03-18 14:03 ` Daniil Tatianin
2026-03-18 19:55 ` Shakeel Butt
2026-03-19 8:35 ` Michal Hocko
2026-03-19 8:24 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=abpue_k_9mjQAP6X@tiehlicka \
--to=mhocko@suse.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=axelrasmussen@google.com \
--cc=cgroups@vger.kernel.org \
--cc=d-tatianin@yandex-team.ru \
--cc=david@kernel.org \
--cc=hannes@cmpxchg.org \
--cc=jackmanb@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=muchun.song@linux.dev \
--cc=roman.gushchin@linux.dev \
--cc=rppt@kernel.org \
--cc=shakeel.butt@linux.dev \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
--cc=weixugc@google.com \
--cc=yc-core@yandex-team.ru \
--cc=yuanchu@google.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.