From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wm1-f44.google.com (mail-wm1-f44.google.com [209.85.128.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DE3803A9002 for ; Wed, 18 Mar 2026 09:21:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.44 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773825664; cv=none; b=alaQNLRUyUUh6P95CPGy1ZBPipTq7MOWSQ8EtwPFTD44DiyNk5KhH1gPTFLY/8z0s38W4Sduj1dxYMZwJowy154ysM3xQ+urgofoWeVuhG+DNe5A85rOLIyEWXCL2qlOc39uYSPL3hhztCBE3jY8XuE3BBs0FEJmQKW6LZaEE9Y= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773825664; c=relaxed/simple; bh=NjTLOP1JL1fBH5A4NzzAnnNOxOwCVtltXbP5Edd8/KE=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=pA9kwKdFLOm+SuALB3pjOlXQb3ZZ3Qg9nWxooUR0oYkeFrSBcOcr8E1ED/cMOHq/a29RCY6zR3kacBqaTsunyy3JvAFSBpCtOYz1s3lB0x3rOiX1AVKaqbTHYIENw3ut/Ic8dhT8XwOl4VdJJZMRby5zRu2W8Q3mB9MNohdkkuE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com; spf=pass smtp.mailfrom=suse.com; dkim=pass (2048-bit key) header.d=suse.com header.i=@suse.com header.b=bHwfKsGg; arc=none smtp.client-ip=209.85.128.44 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=suse.com header.i=@suse.com header.b="bHwfKsGg" Received: by mail-wm1-f44.google.com with SMTP id 5b1f17b1804b1-48628ce9ab5so22595085e9.2 for ; Wed, 18 Mar 2026 02:21:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=google; t=1773825661; x=1774430461; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=YCsWM+qtYn9cziDx86CoZ1L1lXbooAdYKLink0SGuB4=; b=bHwfKsGgD6X1BikYWNp3VJPY0r9MNvvEsrutbUcczJNgo0l23hQS+iGOTqgslzbnF2 QGZuQXep+ZK04wV7BZAYdhSbG1bL/1I9SloRF4sZIhu2b/RmgKLuVMUk70gQzBk4SGHX 4wz9wRTFotGD4/Fp03mMK7xuh0iCeKTal69kqeaL3/6v1d6Hb+31bofA5wbl+XC2UaTa 7J4BnBjZqfnT0j62cUd4pWe+lUW8Hpq7OxhNVFD0w0rB0nzsLJyP9z3m9+sYk5L816JS mXX5trnQ7eAX/vHYejoq0IsY2PxAFGRXH4FT4WfzBCHUlOwI/efJssNzDiPZC2W5X3nh 3eDw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1773825661; x=1774430461; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=YCsWM+qtYn9cziDx86CoZ1L1lXbooAdYKLink0SGuB4=; b=GTEVSZTGHuiHOjUOcwUuzBvg/6gQX4QxvGuDjbF3fd8Ffy+UnOlW2kNFzAYu3ppMkt jAnoWVucuTLv588cTJ9PjPjg9EcM842Bki6U/vosuYria7Es/sPJdROCiBBCfl29UQak emRoScqtNNm83qRh/lP4uy9qJDhCJL0IL0iQ2hHMzkf6zOc9wAHBH1g3fqgIYQo4662+ 6m3kwsbVT1BiNHpg6tiVKkStGLlETVMg0eC5vL4jkF0fdINbOEeMBj61uNJ4Uk/3vlI/ J4/P7OAT7voGSXSL3bhst3WFhSnAUIduwguI7UIZ2bE0zCJBWcoZwRbNCkjUtGZZk6Cx o1Fw== X-Forwarded-Encrypted: i=1; AJvYcCXEHc+7PDE3gEzgCJxg9p5vs8UTYTzQvumREjjPORWGFw6jaZBoS+gXgplv4UDMUdcWVJyYrRgk6aeZglk=@vger.kernel.org X-Gm-Message-State: AOJu0YyIa/zEXDx44MAN+r4EHhtsHCRswmxBhLsfiGl8hO6OAi6X2OWR A3Naf0kG4v7iaul7JbyirJ1hh4ep8vMNdlgkUQerVAha8IZptNmb5AHUOUrXM95gJp4= X-Gm-Gg: ATEYQzxwm18QZl46JoDVeuWF6UHrtIC5/T+zmdJGpGA04qsG8zrBYI3ZF1Bu+OA0U1b CgE7YhFY7kM1sY1CZkF/JAfS86YFkdw+UhOfKutw5KHpWZYtWmmbIG/zPK/jSg9mm6lhzWnPcgm FG6ONVN22NEtZ2Ph7YUCCcnd0XDoRvfKAdMtqQyenhhrQ1VldpIOCJyfM/DCEN9u5tI22TbJCy7 ReoDhRgE10ly0r8yeQTIjzpIRi0pswfPp8qp1Denpuw5mYFsm7s5cZ9HYgGIP7tZcHaIQE+usVE qQZqm/DMAALxFNVqO0mKviuam33UTktVKXDfXC0jhnKsx9y9S6+3pq6RstANBjPOmuhXy0plL3K e5LVTFtsa2vQtOvxuQcJHZYo8jiSmvS5nKesGvrU+eKyatuOGksTqN2poE5ukYBhS1BApirvXTH SQ5uOCLEVoG7HzF5gJbsn643vgmKgiMIexzYtV X-Received: by 2002:a05:600c:a20e:b0:482:f564:d613 with SMTP id 5b1f17b1804b1-486f443d48cmr26793045e9.15.1773825661196; Wed, 18 Mar 2026 02:21:01 -0700 (PDT) Received: from localhost (109-81-21-195.rct.o2.cz. [109.81.21.195]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-486f4623a26sm16508605e9.7.2026.03.18.02.21.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 18 Mar 2026 02:21:00 -0700 (PDT) Date: Wed, 18 Mar 2026 10:20:59 +0100 From: Michal Hocko To: Daniil Tatianin Cc: Andrew Morton , Johannes Weiner , Roman Gushchin , Shakeel Butt , Muchun Song , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Axel Rasmussen , Yuanchu Xie , Wei Xu , Brendan Jackman , Zi Yan , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, yc-core@yandex-team.ru Subject: Re: [PATCH] mm: add memory.compact_unevictable_allowed cgroup attribute Message-ID: References: <20260317100058.2316997-1-d-tatianin@yandex-team.ru> <20260317121736.f73a828de2a989d1a07efea1@linux-foundation.org> <3db237d0-1ee8-44b7-a356-f3015173f7c2@yandex-team.ru> <7ca9876c-f3fa-441c-9a21-ae0ee5523318@yandex-team.ru> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <7ca9876c-f3fa-441c-9a21-ae0ee5523318@yandex-team.ru> On Wed 18-03-26 12:04:10, Daniil Tatianin wrote: > > On 3/18/26 11:25 AM, Michal Hocko wrote: > > On Tue 17-03-26 23:17:28, Daniil Tatianin wrote: > > > On 3/17/26 10:17 PM, Andrew Morton wrote: > > > > On Tue, 17 Mar 2026 13:00:58 +0300 Daniil Tatianin wrote: > > > > > > > > > The current global sysctl compact_unevictable_allowed is too coarse. > > > > > In environments with mixed workloads, we may want to protect specific > > > > > important cgroups from compaction to ensure their stability and > > > > > responsiveness, while allowing compaction for others. > > > > > > > > > > This patch introduces a per-memcg compact_unevictable_allowed attribute. > > > > > This allows granular control over whether unevictable pages in a specific > > > > > cgroup can be compacted. The global sysctl still takes precedence if set > > > > > to disallow compaction, but this new setting allows opting out specific > > > > > cgroups. > > > > > > > > > > This also adds a new ISOLATE_UNEVICTABLE_CHECK_MEMCG flag to > > > > > isolate_migratepages_block to preserve the old behavior for the > > > > > ISOLATE_UNEVICTABLE flag unconditionally used by > > > > > isolage_migratepages_range. > > > > AI review asked questions: > > > > https://sashiko.dev/#/patchset/20260317100058.2316997-1-d-tatianin@yandex-team.ru > > > > Should this dynamically walk up the ancestor chain during evaluation to > > > > ensure it returns false if any ancestor has disallowed compaction? > > > I think ultimately it's up to cgroup maintainers whether the code should do > > > that, but as far as I understand the whole point of cgroups is that a child > > > can override the settings of its parent. Moreover, this property doesn't > > > have CFTYPE_NS_DELEGATABLE set, so a child cgroup cannot just toggle it at > > > will. > > In general any attributes should have proper hieararchical semantic. I > > am not sure what that should be in this case. What is a desire in a > > child cgroup can become fragmentation pressure to others. > > > > I think it would be really important to explain more thoroughly about > > those usecases of mixed workloads. > I think there are many examples of a system where one process is more > important than > others. For example, any sort of healthcheck or even the ssh daemon: these > may become > unresponsive during heavy compaction due to thousands of TLB invalidate IPIs > or page faulting > on pages that are being compacted. Another example is a VM that is > responsible for routing > traffic of all other VMs or even the entire cluster, you really want to > prioritize its responsiveness, while > still allowing compaction of memory for the rest of the system, for less > important VMs or services etc. Shouldn't those use mlock? > > Is the memcg even a suitable level of > > abstraction for this tunable? > > In my opinion it is, since it is relatively common to put all related tasks > into one cgroup with preset memory limits etc. > > > Doesn't this belong to tasks if anything? > > I think it would be very difficult to implement as a per-task attribute > properly since compaction works at the folio > level. While folios have a pointer to the memcg that owns them, they may be > mapped by multiple process in case > of shared memory. We would have to find all the address spaces mapping this > folio, and then check the property on > every one of them, which may be set to different values. This may be > problematic performance-wise to do for > every physical page, and it also introduces unclear semantics if different > address spaces mapping the same page > have different opinions. Yes, it would need to be something like an implicit mlock. I haven't really indicated that would be a _simpler_ solution. But as this has obvious userspace API implications the much more important question is what is a futureproof solution. Also we need to get an answer whether this is really needed or too niche to cast an interface maintained for ever for. -- Michal Hocko SUSE Labs