From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-qv1-f43.google.com (mail-qv1-f43.google.com [209.85.219.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D813E39E6DA for ; Tue, 24 Feb 2026 15:58:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.43 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771948726; cv=none; b=pjbE8xKfjKsOzlDvldlZ1gB1BkCJyGlTEdPz/L4HhVDoVasAzbITGNujopxaQ2UGeln/KAIhNTiVbRs6PHfuRS9GahV/bmejICVNZUSBJLy4gsHER1jUhXWKlgORWuYOAAIM/gJvQUWQ1bEwxQRfBhtlX26AvFNo6qAoDqqeD18= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771948726; c=relaxed/simple; bh=t1S/lyFT3Tly1pHl3bf4eCsWNS2vHQC8ZOIOI5m87rI=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=kjOR/4ojMvc7dGuvwngVwXbmaYb0FUcaJiOzHqJEzBOc4Du6MJwKFK6k/+dm6Rc4kuvQdOvjzRE0T5hU+A+KZbKpEZE9v8jGowGNOBv3zgzdFgcHWYxq8RCb/EYdmJ/fb3uplJfI9nvJ6/HhTQzfs1e1JaRPIkZ564zlhlJi1cY= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=cmpxchg.org; spf=pass smtp.mailfrom=cmpxchg.org; dkim=pass (2048-bit key) header.d=cmpxchg.org header.i=@cmpxchg.org header.b=Bi2bxutr; arc=none smtp.client-ip=209.85.219.43 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=cmpxchg.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=cmpxchg.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=cmpxchg.org header.i=@cmpxchg.org header.b="Bi2bxutr" Received: by mail-qv1-f43.google.com with SMTP id 6a1803df08f44-899aede64e8so4595236d6.1 for ; Tue, 24 Feb 2026 07:58:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg.org; s=google; t=1771948723; x=1772553523; darn=vger.kernel.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=K0Cl8twM23eFbKY4o+AiEr3XIzEfi52x6hS0RyhMeTs=; b=Bi2bxutrWtX3OrlH0ZSrapi1UxzL5zcma6ciw1lBLniaA2bsXaC2Vpa3cpDJ/Ccnzq iT/7+yDLtL2JbnQgGGacj4jWmPzN3fY7V4sqSxnE4XENaIoAyWgazhhgyv4si1P9jSUS nnA+QOzL5Wj+NTzzb4djC9r20EogVfI6JLnCDc2hzBkH98LjnBORayKAtg10QmTgR3Zu kDvA6TWyDNNhEkdu6K8C8zV2YFjhhlArUAyqFTwZcer3TwhE6yvvFha1sPqny0CyjIMg nYg+8kBc9Vs7+3pyjBnqKByZ1WVmdSWQgE6TxftnYvhXwn8QJBk5zWdzrOIHE71zIsPf 5kNA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1771948723; x=1772553523; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=K0Cl8twM23eFbKY4o+AiEr3XIzEfi52x6hS0RyhMeTs=; b=l6/HAKmGdYIfg6JYWDVJOKLQmbNBnbrwKkhxZDmvf8rhg3xwO6MITCb++98K4ZI4p1 TPgu/PdsQnAQ9/1wsPJ9jigKvuYhU9vqzS7Kuz5T6c4/d5O4qpn2nX4jOH+drEDGIsIt jOMoCV5n6K7zGFVGWG2S14iXFYhuz8eysOtpvCtJOAICGu9/ayQcox2QTMrRhdbdP3R7 vu0C+B0MG+HuCyH4hRcqF8ihkNwtM7Al+NUW0SK0T4Hyk6I1pqtDTgyzFH/1lhus5mip so5Tq1h2LtEReoM40n2zP8RISXSQKSnN7yqWTJWvk3NxaE4h9gChbV1zLOpJVHkRu2Ih 73qA== X-Forwarded-Encrypted: i=1; AJvYcCUWywtA/OzD0ZGHuP0A+VJLqND96ccoDfu5nD02Mc02XAlVeI8nBEAzCA8BXykPWCOyWkID4b3txmByM24=@vger.kernel.org X-Gm-Message-State: AOJu0YwYRYD9GhPZRF9gxCOHBUcD/n2mRXJU/dEaQ6nstTKbJC4dNX39 zzC++Bv4/4iCWxvICUHG2hM4if44NrQ19dzCTIBF8akAp1X45lzPmKOO1P2rb+ssQaI= X-Gm-Gg: ATEYQzxag5B8Q6bSKoLyNyVxYBJ5pdg66yxwKHVdeLWD8M14ydr2VX7Cjs9yiCsdD3f J4keorEnw1ADqaHjsLRQtJ2oCkW91Dv8JEvHYBGJuoDnwLbXMVneNSv0ivKrKkbHhGNiydhyfqm dKVTbtiIV9z55I568jV6S/tAXKekyjrct+XuTEaYeIrdf6BynhNiqcFRA3by6DbFTh/52E57fhX I/FhE2VYLdhyssS6sQCuKObNLkrGDuoRa0GMOh4n9kRJOlCj+WkEEd6ZBCMGLkquUPafaqp1tZA +Wnm2C4Yge0+i1QFLCVktfJV+IM3tU6k1eF5r011teDLRpm1JDhIOAY0LZBT47uknRRaf8jlaJQ SRWMsrMxElTotcTxDikkFT+p6msz4PMaaVERcs4tkXAV6nN1Tvo1MRmLwgrWmx8uTBz7Mb47Vn0 QQNtKLsQqjs/7cpgb8AdWILw== X-Received: by 2002:a05:6214:e4d:b0:880:88cf:59ff with SMTP id 6a1803df08f44-899b34f0ec0mr7442706d6.22.1771948722556; Tue, 24 Feb 2026 07:58:42 -0800 (PST) Received: from localhost ([2603:7000:c00:3a00:365a:60ff:fe62:ff29]) by smtp.gmail.com with ESMTPSA id af79cd13be357-8cb8d0ebcd0sm1219652585a.28.2026.02.24.07.58.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 Feb 2026 07:58:41 -0800 (PST) Date: Tue, 24 Feb 2026 10:58:37 -0500 From: Johannes Weiner To: Kairui Song Cc: Kairui Song via B4 Relay , linux-mm@kvack.org, Andrew Morton , David Hildenbrand , Lorenzo Stoakes , Zi Yan , Baolin Wang , Barry Song , Hugh Dickins , Chris Li , Kemeng Shi , Nhat Pham , Baoquan He , Yosry Ahmed , Youngjun Park , Chengming Zhou , Roman Gushchin , Shakeel Butt , Muchun Song , Qi Zheng , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH RFC 08/15] mm, swap: store and check memcg info in the swap table Message-ID: References: <20260220-swap-table-p4-v1-0-104795d19815@tencent.com> <20260220-swap-table-p4-v1-8-104795d19815@tencent.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: On Tue, Feb 24, 2026 at 04:34:00PM +0800, Kairui Song wrote: > On Tue, Feb 24, 2026 at 12:46 AM Johannes Weiner wrote: > > > > On Fri, Feb 20, 2026 at 07:42:09AM +0800, Kairui Song via B4 Relay wrote: > > > From: Kairui Song > > > > > > To prepare for merging the swap_cgroup_ctrl into the swap table, store > > > the memcg info in the swap table on swapout. > > > > > > This is done by using the existing shadow format. > > > > > > Note this also changes the refault counting at the nearest online memcg > > > level: > > > > > > Unlike file folios, anon folios are mostly exclusive to one mem cgroup, > > > and each cgroup is likely to have different characteristics. > > > > This is not correct. > > > > As much as I like the idea of storing the swap_cgroup association > > inside the shadow entry, the refault evaluation needs to happen at the > > level that drove eviction. > > > > Consider a workload that is split into cgroups purely for accounting, > > not for setting different limits: > > > > workload (limit domain) > > `- component A > > `- component B > > > > This means the two components must compete freely, and it must behave > > as if there is only one LRU. When pages get reclaimed in a round-robin > > fashion, both A and B get aged at the same pace. Likewise, when pages > > in A refault, they must challenge the *combined* workingset of both A > > and B, not just the local pages. > > > > Otherwise, you risk retaining stale workingset in one subgroup while > > the other one is thrashing. This breaks userspace expectations. > > > > Hi Johannes, thanks for pointing this out. > > I'm just not sure how much of a real problem this is. The refault > challenge change was made in commit b910718a948a which was before anon > shadow was introduced. And shadows could get reclaimed, especially > when under pressure (and we could be doing that again by reclaiming > full_clusters with swap tables). And MGLRU simply ignores the > target_memcg here yet it performs surprisingly well with multiple > memcg setups. And I did find a comment in workingset.c saying the > kernel used to activate all pages, which is also fine. And that commit > also mentioned the active list shrinking, but anon active list gets > shrinked just fine without refault feedback in shrink_lruvec under > can_age_anon_pages. *if inactive anon is empty, as part of the second chance logic Please try to understand *why* this code is the way it is before throwing it all out. It was driven by real production problems. The fact that some workloads don't care is not prove that many don't hurt if you break this. Anon refault detection was added for that reason: Once you have swap, you facilitate anon workingsets that exceed memory capacity. At that point, cache replacement strategies apply. Scan resistance matters. With fast modern compression and flash swap, the anon set alone can be larger than memory capacity. Everything that 6a3ed2123a78de22a9e2b2855068a8d89f8e14f4 says about file cache starts applying to anonymous pages: you don't want to throw out the hot anon workingset just because somebody is doing a one-off burst scan through a larger set of cold, swapped out pages. Like I said in the LSFMM thread, there is no difference between anon and file. There didn't use to be historically. The LRU lists were split mechanically because noswap systems became common (lots of RAM + rotational drives = sad swap) and there was no point in scanning/aging anonymous memory if there is no swap space. But no reasonable argument has been put forth why anon should be aged completely differently than file when you DO have swap. There is more explanation of Why for the cgroup behavior in the cover letter portion of 53138cea7f398d2cdd0fa22adeec7e16093e1ebd.