public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.com>
To: Daniil Tatianin <d-tatianin@yandex-team.ru>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Shakeel Butt <shakeel.butt@linux.dev>,
	Muchun Song <muchun.song@linux.dev>,
	David Hildenbrand <david@kernel.org>,
	Lorenzo Stoakes <ljs@kernel.org>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	Vlastimil Babka <vbabka@kernel.org>,
	Mike Rapoport <rppt@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Axel Rasmussen <axelrasmussen@google.com>,
	Yuanchu Xie <yuanchu@google.com>, Wei Xu <weixugc@google.com>,
	Brendan Jackman <jackmanb@google.com>, Zi Yan <ziy@nvidia.com>,
	cgroups@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, yc-core@yandex-team.ru
Subject: Re: [PATCH] mm: add memory.compact_unevictable_allowed cgroup attribute
Date: Wed, 18 Mar 2026 10:20:59 +0100	[thread overview]
Message-ID: <abpue_k_9mjQAP6X@tiehlicka> (raw)
In-Reply-To: <7ca9876c-f3fa-441c-9a21-ae0ee5523318@yandex-team.ru>

On Wed 18-03-26 12:04:10, Daniil Tatianin wrote:
> 
> On 3/18/26 11:25 AM, Michal Hocko wrote:
> > On Tue 17-03-26 23:17:28, Daniil Tatianin wrote:
> > > On 3/17/26 10:17 PM, Andrew Morton wrote:
> > > > On Tue, 17 Mar 2026 13:00:58 +0300 Daniil Tatianin<d-tatianin@yandex-team.ru> wrote:
> > > > 
> > > > > The current global sysctl compact_unevictable_allowed is too coarse.
> > > > > In environments with mixed workloads, we may want to protect specific
> > > > > important cgroups from compaction to ensure their stability and
> > > > > responsiveness, while allowing compaction for others.
> > > > > 
> > > > > This patch introduces a per-memcg compact_unevictable_allowed attribute.
> > > > > This allows granular control over whether unevictable pages in a specific
> > > > > cgroup can be compacted. The global sysctl still takes precedence if set
> > > > > to disallow compaction, but this new setting allows opting out specific
> > > > > cgroups.
> > > > > 
> > > > > This also adds a new ISOLATE_UNEVICTABLE_CHECK_MEMCG flag to
> > > > > isolate_migratepages_block to preserve the old behavior for the
> > > > > ISOLATE_UNEVICTABLE flag unconditionally used by
> > > > > isolage_migratepages_range.
> > > > AI review asked questions:
> > > > 	https://sashiko.dev/#/patchset/20260317100058.2316997-1-d-tatianin@yandex-team.ru
> > > > Should this dynamically walk up the ancestor chain during evaluation to
> > > > ensure it returns false if any ancestor has disallowed compaction?
> > > I think ultimately it's up to cgroup maintainers whether the code should do
> > > that, but as far as I understand the whole point of cgroups is that a child
> > > can override the settings of its parent. Moreover, this property doesn't
> > > have CFTYPE_NS_DELEGATABLE set, so a child cgroup cannot just toggle it at
> > > will.
> > In general any attributes should have proper hieararchical semantic. I
> > am not sure what that should be in this case. What is a desire in a
> > child cgroup can become fragmentation pressure to others.
> > 
> > I think it would be really important to explain more thoroughly about
> > those usecases of mixed workloads.
> I think there are many examples of a system where one process is more
> important than
> others. For example, any sort of healthcheck or even the ssh daemon: these
> may become
> unresponsive during heavy compaction due to thousands of TLB invalidate IPIs
> or page faulting
> on pages that are being compacted. Another example is a VM that is
> responsible for routing
> traffic of all other VMs or even the entire cluster, you really want to
> prioritize its responsiveness, while
> still allowing compaction of memory for the rest of the system, for less
> important VMs or services etc.

Shouldn't those use mlock?

> > Is the memcg even a suitable level of
> > abstraction for this tunable?
> 
> In my opinion it is, since it is relatively common to put all related tasks
> into one cgroup with preset memory limits etc.
> 
> > Doesn't this belong to tasks if anything?
> 
> I think it would be very difficult to implement as a per-task attribute
> properly since compaction works at the folio
> level. While folios have a pointer to the memcg that owns them, they may be
> mapped by multiple process in case
> of shared memory. We would have to find all the address spaces mapping this
> folio, and then check the property on
> every one of them, which may be set to different values. This may be
> problematic performance-wise to do for
> every physical page, and it also introduces unclear semantics if different
> address spaces mapping the same page
> have different opinions.

Yes, it would need to be something like an implicit mlock. I haven't
really indicated that would be a _simpler_ solution. But as this has
obvious userspace API implications the much more important question is
what is a futureproof solution. Also we need to get an answer whether
this is really needed or too niche to cast an interface maintained for
ever for.
-- 
Michal Hocko
SUSE Labs

  parent reply	other threads:[~2026-03-18  9:21 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-17 10:00 [PATCH] mm: add memory.compact_unevictable_allowed cgroup attribute Daniil Tatianin
2026-03-17 19:17 ` Andrew Morton
2026-03-17 20:17   ` Daniil Tatianin
2026-03-18  8:25     ` Michal Hocko
2026-03-18  9:09       ` Daniil Tatianin
     [not found]       ` <7ca9876c-f3fa-441c-9a21-ae0ee5523318@yandex-team.ru>
2026-03-18  9:20         ` Michal Hocko [this message]
2026-03-18  9:25           ` Daniil Tatianin
2026-03-18 10:01             ` Michal Hocko
2026-03-18 10:08               ` Daniil Tatianin
2026-03-18 11:47                 ` Michal Hocko
2026-03-18 14:03                   ` Daniil Tatianin
2026-03-18 19:55                     ` Shakeel Butt
2026-03-19  8:35                       ` Michal Hocko
2026-03-19  8:24                     ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=abpue_k_9mjQAP6X@tiehlicka \
    --to=mhocko@suse.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=axelrasmussen@google.com \
    --cc=cgroups@vger.kernel.org \
    --cc=d-tatianin@yandex-team.ru \
    --cc=david@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=jackmanb@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=muchun.song@linux.dev \
    --cc=roman.gushchin@linux.dev \
    --cc=rppt@kernel.org \
    --cc=shakeel.butt@linux.dev \
    --cc=surenb@google.com \
    --cc=vbabka@kernel.org \
    --cc=weixugc@google.com \
    --cc=yc-core@yandex-team.ru \
    --cc=yuanchu@google.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox