From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out30-124.freemail.mail.aliyun.com (out30-124.freemail.mail.aliyun.com [115.124.30.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 43C2219F40B for ; Wed, 8 Apr 2026 02:35:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775615715; cv=none; b=UfVLtAQAtjywFvacSYgjxALnM+Q9/nyvtaEnoX76INnMNG1V5MnhVneYrlzl0seL1x/eOMGiliYv1VZskEcfnRTReV7SUCqE81Tp35prckLpHTtj8Thsr2G98S5eGrFUxv92A7ha4QUSKaQPGKgCEhniMvGsD+p0oxMm6j2VNas= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775615715; c=relaxed/simple; bh=khflSClP1xTO5YoVqEqstk2iqxaabPeclJfjcj0aHO0=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=TCtKo/C4RoXhj3kwtnJ1aua5MHedRV4NHOPcrX6YAcnHtED6fOKtvdN7UyDGOb0Yc5L/KzoYvN4p+z+ZO1HWvi/S5NfAO+QzDBZBW2KvTcI2SyITO0Zx4+LscS2W0/0p3PH7jBC6ndtW8JTbMj+yS3GGfLoCG1sJAML5PZ0Nc0k= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=nsPVgq8G; arc=none smtp.client-ip=115.124.30.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="nsPVgq8G" DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1775615710; h=Message-ID:Date:MIME-Version:Subject:To:From:Content-Type; bh=rhgcVGNkIdqzP97BKYsaYuB6sJP9zA3v3Is4NOEz5rE=; b=nsPVgq8Gl4JnQQzJ4hMQrCyG67fpfTt3ZAYN54quliJoeOncxoibyT4K65UNj1hst1vO2O+sN9dtewAzzRv4YmkVtvMlNC48FEg9uZPC/VK/ndZg4XWwhGXZkmAv1tyenNPUVh0DVXztX5pOEt9MSZSTq2l1SfE8jOcsqh8mSN8= X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R161e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=maildocker-contentspam033037033178;MF=baolin.wang@linux.alibaba.com;NM=1;PH=DS;RN=16;SR=0;TI=SMTPD_---0X0dBvl6_1775615707; Received: from 30.74.144.134(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0X0dBvl6_1775615707 cluster:ay36) by smtp.aliyun-inc.com; Wed, 08 Apr 2026 10:35:08 +0800 Message-ID: <367ea69a-c802-46d5-a2c7-259342cdc2ab@linux.alibaba.com> Date: Wed, 8 Apr 2026 10:35:07 +0800 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH RFC] mm/vmscan:Fix the hot/cold inversion when swappiness = 0 or 201 To: Barry Song , Kairui Song Cc: wangzhen , Andrew Morton , Johannes Weiner , David Hildenbrand , Michal Hocko , Qi Zheng , Shakeel Butt , Lorenzo Stoakes , Axel Rasmussen , Yuanchu Xie , Wei Xu , "kasong@tencent.com" , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" References: <7829b070df1b405dbc97dd6a028d8c8a@honor.com> <4451bdc432864aebb54f401eee51ea53@honor.com> From: Baolin Wang In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit On 4/8/26 7:00 AM, Barry Song wrote: > On Tue, Apr 7, 2026 at 10:26 PM Kairui Song wrote: >> >> On Tue, Apr 07, 2026 at 01:37:08PM +0800, wangzhen wrote: >>> >From ac731b061f152cba05b9aa351652a04f933986e0 Mon Sep 17 00:00:00 2001 >>> From: w00021541 >>> Date: Tue, 7 Apr 2026 16:17:53 +0800 >>> Subject: [PATCH RFC] mm/vmscan:Fix the hot/cold inversion when swappiness = 0 or 201 >>> >>> In some cases, when swappiness is set to 0 or 201, the oldest generation pages will be changed to the newest generation incorrectly. >>> >>> Consider the following aging scenario: >>> MAX_NR_GENS=4, MIN_NR_GENS=2, swappiness=201, 3 anon gens, 4 file gens. >>> 1. When swappiness = 201, should_run_aging will only check anon type. >>> should_run_aging return true. >>> 2. In inc_max_seq, if the anon and file type have MAX_NR_GENS, inc_min_seq will move the oldest generation pages to the second oldest to prepare for increasing max_seq. >>> Here, the file type will enter inc_min_seq. >>> 3. In inc_min_seq, first goto is true, the pages migration was skipped, resulting in the inversion of cold/hot pages. >>> >>> In fact, when MAX_NR_GENS=4 and MIN_NR_GENS=2, the for loop after the goto is unreachable. >>> >>> Consider the code in inc_max_seq: >>> if (get_nr_gens(lruvec, type) ! = MAX_NR_GENS) >>> continue; >>> This means that only get_nr_gens==4 can enter the inc_min_seq. >>> >>> Discuss the swappiness in three different scenarios: >>> 1<=swappiness<=200: >>> If should_run_aging returns true, both anon and file types must satisfy get_nr_gens<=3, indicating that no type satisfies get_nr_gens==MAX_NR_GENS. >>> Therefore, both cannot enter inc_min_seq. >>> >>> swappiness=201: >>> If should_run_aging returns true, the anon type must satisfy get_nr_gens<=3. Only file type can satisfy get_nr_gens==MAX_NR_GENS. >>> After entering inc_min_seq, type && (swappiness == SWAPPINESS_ANON_ONLY) is true, the for loop will be skipped. >>> >>> swappiness=0: >>> Same as swappiness=201 >>> >>> so the two goto statements should be removed. This ensures that when swappiness=0 or 201, the oldest generation pages are correctly promoted to the second oldest generation. >>> (When 1<= swappiness<=200, only both anon and file types get_nr_gens<=3 will age, preventing the inversion of hot/cold pages). >>> >>> Signed-off-by: w00021541 Please use your real name to sign off. >>> --- >>> mm/vmscan.c | 14 +++----------- >>> 1 file changed, 3 insertions(+), 11 deletions(-) >>> >>> diff --git a/mm/vmscan.c b/mm/vmscan.c >>> index 0fc9373e8251..54c835b07d3e 100644 >>> --- a/mm/vmscan.c >>> +++ b/mm/vmscan.c >>> @@ -3843,7 +3843,7 @@ static void clear_mm_walk(void) >>> kfree(walk); >>> } >>> >>> -static bool inc_min_seq(struct lruvec *lruvec, int type, int swappiness) >>> +static bool inc_min_seq(struct lruvec *lruvec, int type) >>> { >>> int zone; >>> int remaining = MAX_LRU_BATCH; >>> @@ -3851,14 +3851,6 @@ static bool inc_min_seq(struct lruvec *lruvec, int type, int swappiness) >>> int hist = lru_hist_from_seq(lrugen->min_seq[type]); >>> int new_gen, old_gen = lru_gen_from_seq(lrugen->min_seq[type]); >>> >>> - /* For file type, skip the check if swappiness is anon only */ >>> - if (type && (swappiness == SWAPPINESS_ANON_ONLY)) >>> - goto done; >>> - >>> - /* For anon type, skip the check if swappiness is zero (file only) */ >>> - if (!type && !swappiness) >>> - goto done; >>> - >> >> Hi, thanks for the patch. >> >> We have a very similar patch internally, and the result is kind of bad. >> >> Currently MGLRU forbid the gen distance between file and anon go larger >> than 2, which mean with this patch, when under great pressure, you may >> have to keep rotating a long list of the opposite type of folios to >> reclaim another type. >> >> For example, when you have only 2 gens of file folios, swap disabled, >> and there are 3 gens of anon folios. Anon folios are unevictable because >> there is no SWAP. And file is also unevcitable due to force protection >> of gen. Consider anon folios are mostly cold (at least a portion of them >> are), now the oldest gen of anon folios will be very long (e.g. 12G, >> 3145728 folios). >> >> Now, to reclaim any file folios, you have to age first. Before this >> patch that is usually fast. But after this, it will have to rotate >> all 3145728 folios to second oldest anon gen, will could take a >> very long time. I have the same concern. In many of our scenarios, swap is disabled (swappiness=0), and we only reclaim file folios. In such cases, the workloads really don’t care about the hot/cold status of anonymous folios. >> During that period any concurrent reclaimer will get rejected >> due to force protection, result in very ugly long tailing or >> unexpected OOM. >> >> So I agree this is a good idea in general, I agree we should do >> this. But better defer this until we patch up MGLRU to remove >> the force protection first. > > I suspect that once we can age file and anonymous pages > separately, this issue will resolve itself. David already has > some code for this [1]. > > Not sure when he will have time to push it upstream, but I > may carve out some time to take care of it this month. > > [1] https://lore.kernel.org/linux-mm/aam5nOyXs1sNdjTe@google.com/ Great. Sounds reasonable to me.