From: Baolin Wang <baolin.wang@linux.alibaba.com>
To: Barry Song <baohua@kernel.org>, Kairui Song <ryncsn@gmail.com>
Cc: wangzhen <wangzhen5@honor.com>,
Andrew Morton <akpm@linux-foundation.org>,
Johannes Weiner <hannes@cmpxchg.org>,
David Hildenbrand <david@kernel.org>,
Michal Hocko <mhocko@kernel.org>,
Qi Zheng <zhengqi.arch@bytedance.com>,
Shakeel Butt <shakeel.butt@linux.dev>,
Lorenzo Stoakes <ljs@kernel.org>,
Axel Rasmussen <axelrasmussen@google.com>,
Yuanchu Xie <yuanchu@google.com>, Wei Xu <weixugc@google.com>,
"kasong@tencent.com" <kasong@tencent.com>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH RFC] mm/vmscan:Fix the hot/cold inversion when swappiness = 0 or 201
Date: Wed, 8 Apr 2026 10:35:07 +0800 [thread overview]
Message-ID: <367ea69a-c802-46d5-a2c7-259342cdc2ab@linux.alibaba.com> (raw)
In-Reply-To: <CAGsJ_4yZHjSq=d1g7dJC9szwRVLuHqLpWt0Cphi7npzrQz6p3g@mail.gmail.com>
On 4/8/26 7:00 AM, Barry Song wrote:
> On Tue, Apr 7, 2026 at 10:26 PM Kairui Song <ryncsn@gmail.com> wrote:
>>
>> On Tue, Apr 07, 2026 at 01:37:08PM +0800, wangzhen wrote:
>>> >From ac731b061f152cba05b9aa351652a04f933986e0 Mon Sep 17 00:00:00 2001
>>> From: w00021541 <wangzhen5@hihonor.com>
>>> Date: Tue, 7 Apr 2026 16:17:53 +0800
>>> Subject: [PATCH RFC] mm/vmscan:Fix the hot/cold inversion when swappiness = 0 or 201
>>>
>>> In some cases, when swappiness is set to 0 or 201, the oldest generation pages will be changed to the newest generation incorrectly.
>>>
>>> Consider the following aging scenario:
>>> MAX_NR_GENS=4, MIN_NR_GENS=2, swappiness=201, 3 anon gens, 4 file gens.
>>> 1. When swappiness = 201, should_run_aging will only check anon type.
>>> should_run_aging return true.
>>> 2. In inc_max_seq, if the anon and file type have MAX_NR_GENS, inc_min_seq will move the oldest generation pages to the second oldest to prepare for increasing max_seq.
>>> Here, the file type will enter inc_min_seq.
>>> 3. In inc_min_seq, first goto is true, the pages migration was skipped, resulting in the inversion of cold/hot pages.
>>>
>>> In fact, when MAX_NR_GENS=4 and MIN_NR_GENS=2, the for loop after the goto is unreachable.
>>>
>>> Consider the code in inc_max_seq:
>>> if (get_nr_gens(lruvec, type) ! = MAX_NR_GENS)
>>> continue;
>>> This means that only get_nr_gens==4 can enter the inc_min_seq.
>>>
>>> Discuss the swappiness in three different scenarios:
>>> 1<=swappiness<=200:
>>> If should_run_aging returns true, both anon and file types must satisfy get_nr_gens<=3, indicating that no type satisfies get_nr_gens==MAX_NR_GENS.
>>> Therefore, both cannot enter inc_min_seq.
>>>
>>> swappiness=201:
>>> If should_run_aging returns true, the anon type must satisfy get_nr_gens<=3. Only file type can satisfy get_nr_gens==MAX_NR_GENS.
>>> After entering inc_min_seq, type && (swappiness == SWAPPINESS_ANON_ONLY) is true, the for loop will be skipped.
>>>
>>> swappiness=0:
>>> Same as swappiness=201
>>>
>>> so the two goto statements should be removed. This ensures that when swappiness=0 or 201, the oldest generation pages are correctly promoted to the second oldest generation.
>>> (When 1<= swappiness<=200, only both anon and file types get_nr_gens<=3 will age, preventing the inversion of hot/cold pages).
>>>
>>> Signed-off-by: w00021541 <wangzhen5@hihonor.com>
Please use your real name to sign off.
>>> ---
>>> mm/vmscan.c | 14 +++-----------
>>> 1 file changed, 3 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>> index 0fc9373e8251..54c835b07d3e 100644
>>> --- a/mm/vmscan.c
>>> +++ b/mm/vmscan.c
>>> @@ -3843,7 +3843,7 @@ static void clear_mm_walk(void)
>>> kfree(walk);
>>> }
>>>
>>> -static bool inc_min_seq(struct lruvec *lruvec, int type, int swappiness)
>>> +static bool inc_min_seq(struct lruvec *lruvec, int type)
>>> {
>>> int zone;
>>> int remaining = MAX_LRU_BATCH;
>>> @@ -3851,14 +3851,6 @@ static bool inc_min_seq(struct lruvec *lruvec, int type, int swappiness)
>>> int hist = lru_hist_from_seq(lrugen->min_seq[type]);
>>> int new_gen, old_gen = lru_gen_from_seq(lrugen->min_seq[type]);
>>>
>>> - /* For file type, skip the check if swappiness is anon only */
>>> - if (type && (swappiness == SWAPPINESS_ANON_ONLY))
>>> - goto done;
>>> -
>>> - /* For anon type, skip the check if swappiness is zero (file only) */
>>> - if (!type && !swappiness)
>>> - goto done;
>>> -
>>
>> Hi, thanks for the patch.
>>
>> We have a very similar patch internally, and the result is kind of bad.
>>
>> Currently MGLRU forbid the gen distance between file and anon go larger
>> than 2, which mean with this patch, when under great pressure, you may
>> have to keep rotating a long list of the opposite type of folios to
>> reclaim another type.
>>
>> For example, when you have only 2 gens of file folios, swap disabled,
>> and there are 3 gens of anon folios. Anon folios are unevictable because
>> there is no SWAP. And file is also unevcitable due to force protection
>> of gen. Consider anon folios are mostly cold (at least a portion of them
>> are), now the oldest gen of anon folios will be very long (e.g. 12G,
>> 3145728 folios).
>>
>> Now, to reclaim any file folios, you have to age first. Before this
>> patch that is usually fast. But after this, it will have to rotate
>> all 3145728 folios to second oldest anon gen, will could take a
>> very long time.
I have the same concern. In many of our scenarios, swap is disabled
(swappiness=0), and we only reclaim file folios. In such cases, the
workloads really don’t care about the hot/cold status of anonymous folios.
>> During that period any concurrent reclaimer will get rejected
>> due to force protection, result in very ugly long tailing or
>> unexpected OOM.
>>
>> So I agree this is a good idea in general, I agree we should do
>> this. But better defer this until we patch up MGLRU to remove
>> the force protection first.
>
> I suspect that once we can age file and anonymous pages
> separately, this issue will resolve itself. David already has
> some code for this [1].
>
> Not sure when he will have time to push it upstream, but I
> may carve out some time to take care of it this month.
>
> [1] https://lore.kernel.org/linux-mm/aam5nOyXs1sNdjTe@google.com/
Great. Sounds reasonable to me.
next prev parent reply other threads:[~2026-04-08 2:35 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <7829b070df1b405dbc97dd6a028d8c8a@honor.com>
2026-04-07 13:37 ` [PATCH RFC] mm/vmscan:Fix the hot/cold inversion when swappiness = 0 or 201 wangzhen
2026-04-07 14:25 ` Kairui Song
2026-04-07 23:00 ` Barry Song
2026-04-08 2:35 ` Baolin Wang [this message]
2026-04-08 3:15 ` Kairui Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=367ea69a-c802-46d5-a2c7-259342cdc2ab@linux.alibaba.com \
--to=baolin.wang@linux.alibaba.com \
--cc=akpm@linux-foundation.org \
--cc=axelrasmussen@google.com \
--cc=baohua@kernel.org \
--cc=david@kernel.org \
--cc=hannes@cmpxchg.org \
--cc=kasong@tencent.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=mhocko@kernel.org \
--cc=ryncsn@gmail.com \
--cc=shakeel.butt@linux.dev \
--cc=wangzhen5@honor.com \
--cc=weixugc@google.com \
--cc=yuanchu@google.com \
--cc=zhengqi.arch@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox