public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed
From: Baolin Wang <baolin.wang@linux.alibaba.com>
To: Barry Song <baohua@kernel.org>, Kairui Song <ryncsn@gmail.com>
Cc: wangzhen <wangzhen5@honor.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	David Hildenbrand <david@kernel.org>,
	Michal Hocko <mhocko@kernel.org>,
	Qi Zheng <zhengqi.arch@bytedance.com>,
	Shakeel Butt <shakeel.butt@linux.dev>,
	Lorenzo Stoakes <ljs@kernel.org>,
	Axel Rasmussen <axelrasmussen@google.com>,
	Yuanchu Xie <yuanchu@google.com>, Wei Xu <weixugc@google.com>,
	"kasong@tencent.com" <kasong@tencent.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH RFC] mm/vmscan:Fix the hot/cold inversion when swappiness = 0 or 201
Date: Wed, 8 Apr 2026 10:35:07 +0800	[thread overview]
Message-ID: <367ea69a-c802-46d5-a2c7-259342cdc2ab@linux.alibaba.com> (raw)
In-Reply-To: <CAGsJ_4yZHjSq=d1g7dJC9szwRVLuHqLpWt0Cphi7npzrQz6p3g@mail.gmail.com>



On 4/8/26 7:00 AM, Barry Song wrote:
> On Tue, Apr 7, 2026 at 10:26 PM Kairui Song <ryncsn@gmail.com> wrote:
>>
>> On Tue, Apr 07, 2026 at 01:37:08PM +0800, wangzhen wrote:
>>> >From ac731b061f152cba05b9aa351652a04f933986e0 Mon Sep 17 00:00:00 2001
>>> From: w00021541 <wangzhen5@hihonor.com>
>>> Date: Tue, 7 Apr 2026 16:17:53 +0800
>>> Subject: [PATCH RFC] mm/vmscan:Fix the hot/cold inversion when swappiness = 0  or 201
>>>
>>> In some cases, when swappiness is set to 0 or 201, the oldest generation pages will be changed to the newest generation incorrectly.
>>>
>>> Consider the following aging scenario:
>>> MAX_NR_GENS=4, MIN_NR_GENS=2, swappiness=201, 3 anon gens, 4 file gens.
>>> 1. When swappiness = 201, should_run_aging will only check anon type.
>>> should_run_aging return true.
>>> 2. In inc_max_seq, if the anon and file type have MAX_NR_GENS, inc_min_seq will move the oldest generation pages to the second oldest to prepare for increasing max_seq.
>>> Here, the file type will enter inc_min_seq.
>>> 3. In inc_min_seq, first goto is true, the pages migration was skipped, resulting in the inversion of cold/hot pages.
>>>
>>> In fact, when MAX_NR_GENS=4 and MIN_NR_GENS=2, the for loop after the goto is unreachable.
>>>
>>> Consider the code in inc_max_seq:
>>> if (get_nr_gens(lruvec, type) ! = MAX_NR_GENS)
>>>      continue;
>>> This means that only get_nr_gens==4 can enter the inc_min_seq.
>>>
>>> Discuss the swappiness in three different scenarios:
>>> 1<=swappiness<=200:
>>> If should_run_aging returns true, both anon and file types must satisfy get_nr_gens<=3, indicating that no type satisfies get_nr_gens==MAX_NR_GENS.
>>> Therefore, both cannot enter inc_min_seq.
>>>
>>> swappiness=201:
>>> If should_run_aging returns true, the anon type must satisfy get_nr_gens<=3. Only file type can satisfy get_nr_gens==MAX_NR_GENS.
>>> After entering inc_min_seq, type && (swappiness == SWAPPINESS_ANON_ONLY) is true, the for loop will be skipped.
>>>
>>> swappiness=0:
>>> Same as swappiness=201
>>>
>>> so the two goto statements should be removed. This ensures that when swappiness=0 or 201, the oldest generation pages are correctly promoted to the second oldest generation.
>>> (When 1<= swappiness<=200, only both anon and file types get_nr_gens<=3 will age, preventing the inversion of hot/cold pages).
>>>
>>> Signed-off-by: w00021541 <wangzhen5@hihonor.com>

Please use your real name to sign off.

>>> ---
>>>   mm/vmscan.c | 14 +++-----------
>>>   1 file changed, 3 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>> index 0fc9373e8251..54c835b07d3e 100644
>>> --- a/mm/vmscan.c
>>> +++ b/mm/vmscan.c
>>> @@ -3843,7 +3843,7 @@ static void clear_mm_walk(void)
>>>                kfree(walk);
>>>   }
>>>
>>> -static bool inc_min_seq(struct lruvec *lruvec, int type, int swappiness)
>>> +static bool inc_min_seq(struct lruvec *lruvec, int type)
>>>   {
>>>        int zone;
>>>        int remaining = MAX_LRU_BATCH;
>>> @@ -3851,14 +3851,6 @@ static bool inc_min_seq(struct lruvec *lruvec, int type, int swappiness)
>>>        int hist = lru_hist_from_seq(lrugen->min_seq[type]);
>>>        int new_gen, old_gen = lru_gen_from_seq(lrugen->min_seq[type]);
>>>
>>> -     /* For file type, skip the check if swappiness is anon only */
>>> -     if (type && (swappiness == SWAPPINESS_ANON_ONLY))
>>> -             goto done;
>>> -
>>> -     /* For anon type, skip the check if swappiness is zero (file only) */
>>> -     if (!type && !swappiness)
>>> -             goto done;
>>> -
>>
>> Hi, thanks for the patch.
>>
>> We have a very similar patch internally, and the result is kind of bad.
>>
>> Currently MGLRU forbid the gen distance between file and anon go larger
>> than 2, which mean with this patch, when under great pressure, you may
>> have to keep rotating a long list of the opposite type of folios to
>> reclaim another type.
>>
>> For example, when you have only 2 gens of file folios, swap disabled,
>> and there are 3 gens of anon folios. Anon folios are unevictable because
>> there is no SWAP. And file is also unevcitable due to force protection
>> of gen. Consider anon folios are mostly cold (at least a portion of them
>> are), now the oldest gen of anon folios will be very long (e.g. 12G,
>> 3145728 folios).
>>
>> Now, to reclaim any file folios, you have to age first. Before this
>> patch that is usually fast. But after this, it will have to rotate
>> all 3145728 folios to second oldest anon gen, will could take a
>> very long time.

I have the same concern. In many of our scenarios, swap is disabled 
(swappiness=0), and we only reclaim file folios. In such cases, the 
workloads really don’t care about the hot/cold status of anonymous folios.

>> During that period any concurrent reclaimer will get rejected
>> due to force protection, result in very ugly long tailing or
>> unexpected OOM.
>>
>> So I agree this is a good idea in general, I agree we should do
>> this. But better defer this until we patch up MGLRU to remove
>> the force protection first.
> 
> I suspect that once we can age file and anonymous pages
> separately, this issue will resolve itself. David already has
> some code for this [1].
> 
> Not sure when he will have time to push it upstream, but I
> may carve out some time to take care of it this month.
> 
> [1] https://lore.kernel.org/linux-mm/aam5nOyXs1sNdjTe@google.com/

Great. Sounds reasonable to me.


  reply	other threads:[~2026-04-08  2:35 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <7829b070df1b405dbc97dd6a028d8c8a@honor.com>
2026-04-07 13:37 ` [PATCH RFC] mm/vmscan:Fix the hot/cold inversion when swappiness = 0 or 201 wangzhen
2026-04-07 14:25   ` Kairui Song
2026-04-07 23:00     ` Barry Song
2026-04-08  2:35       ` Baolin Wang [this message]
2026-04-08  3:15       ` Kairui Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=367ea69a-c802-46d5-a2c7-259342cdc2ab@linux.alibaba.com \
    --to=baolin.wang@linux.alibaba.com \
    --cc=akpm@linux-foundation.org \
    --cc=axelrasmussen@google.com \
    --cc=baohua@kernel.org \
    --cc=david@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=kasong@tencent.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=mhocko@kernel.org \
    --cc=ryncsn@gmail.com \
    --cc=shakeel.butt@linux.dev \
    --cc=wangzhen5@honor.com \
    --cc=weixugc@google.com \
    --cc=yuanchu@google.com \
    --cc=zhengqi.arch@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox