public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed
* [PATCH RFC] mm/vmscan:Fix the hot/cold inversion when swappiness = 0 or 201
       [not found] <7829b070df1b405dbc97dd6a028d8c8a@honor.com>
@ 2026-04-07 13:37 ` wangzhen
  2026-04-07 14:25   ` Kairui Song
  0 siblings, 1 reply; 7+ messages in thread
From: wangzhen @ 2026-04-07 13:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, David Hildenbrand, Michal Hocko, Qi Zheng,
	Shakeel Butt, Lorenzo Stoakes, Axel Rasmussen, Yuanchu Xie,
	Wei Xu, kasong@tencent.com, baolin.wang@linux.alibaba.com,
	baohua@kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org

From ac731b061f152cba05b9aa351652a04f933986e0 Mon Sep 17 00:00:00 2001
From: w00021541 <wangzhen5@hihonor.com>
Date: Tue, 7 Apr 2026 16:17:53 +0800
Subject: [PATCH RFC] mm/vmscan:Fix the hot/cold inversion when swappiness = 0  or 201

In some cases, when swappiness is set to 0 or 201, the oldest generation pages will be changed to the newest generation incorrectly.

Consider the following aging scenario:
MAX_NR_GENS=4, MIN_NR_GENS=2, swappiness=201, 3 anon gens, 4 file gens.
1. When swappiness = 201, should_run_aging will only check anon type.
should_run_aging return true.
2. In inc_max_seq, if the anon and file type have MAX_NR_GENS, inc_min_seq will move the oldest generation pages to the second oldest to prepare for increasing max_seq.
Here, the file type will enter inc_min_seq.
3. In inc_min_seq, first goto is true, the pages migration was skipped, resulting in the inversion of cold/hot pages.

In fact, when MAX_NR_GENS=4 and MIN_NR_GENS=2, the for loop after the goto is unreachable.

Consider the code in inc_max_seq:
if (get_nr_gens(lruvec, type) ! = MAX_NR_GENS)
    continue;
This means that only get_nr_gens==4 can enter the inc_min_seq.

Discuss the swappiness in three different scenarios:
1<=swappiness<=200:
If should_run_aging returns true, both anon and file types must satisfy get_nr_gens<=3, indicating that no type satisfies get_nr_gens==MAX_NR_GENS.
Therefore, both cannot enter inc_min_seq.

swappiness=201:
If should_run_aging returns true, the anon type must satisfy get_nr_gens<=3. Only file type can satisfy get_nr_gens==MAX_NR_GENS.
After entering inc_min_seq, type && (swappiness == SWAPPINESS_ANON_ONLY) is true, the for loop will be skipped.

swappiness=0:
Same as swappiness=201

so the two goto statements should be removed. This ensures that when swappiness=0 or 201, the oldest generation pages are correctly promoted to the second oldest generation.
(When 1<= swappiness<=200, only both anon and file types get_nr_gens<=3 will age, preventing the inversion of hot/cold pages).

Signed-off-by: w00021541 <wangzhen5@hihonor.com>
---
 mm/vmscan.c | 14 +++-----------
 1 file changed, 3 insertions(+), 11 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 0fc9373e8251..54c835b07d3e 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3843,7 +3843,7 @@ static void clear_mm_walk(void)
 		kfree(walk);
 }
 
-static bool inc_min_seq(struct lruvec *lruvec, int type, int swappiness)
+static bool inc_min_seq(struct lruvec *lruvec, int type)
 {
 	int zone;
 	int remaining = MAX_LRU_BATCH;
@@ -3851,14 +3851,6 @@ static bool inc_min_seq(struct lruvec *lruvec, int type, int swappiness)
 	int hist = lru_hist_from_seq(lrugen->min_seq[type]);
 	int new_gen, old_gen = lru_gen_from_seq(lrugen->min_seq[type]);
 
-	/* For file type, skip the check if swappiness is anon only */
-	if (type && (swappiness == SWAPPINESS_ANON_ONLY))
-		goto done;
-
-	/* For anon type, skip the check if swappiness is zero (file only) */
-	if (!type && !swappiness)
-		goto done;
-
 	/* prevent cold/hot inversion if the type is evictable */
 	for (zone = 0; zone < MAX_NR_ZONES; zone++) {
 		struct list_head *head = &lrugen->folios[old_gen][type][zone];
@@ -3889,7 +3881,7 @@ static bool inc_min_seq(struct lruvec *lruvec, int type, int swappiness)
 				return false;
 		}
 	}
-done:
+
 	reset_ctrl_pos(lruvec, type, true);
 	WRITE_ONCE(lrugen->min_seq[type], lrugen->min_seq[type] + 1);
 
@@ -3975,7 +3967,7 @@ static bool inc_max_seq(struct lruvec *lruvec, unsigned long seq, int swappiness
 		if (get_nr_gens(lruvec, type) != MAX_NR_GENS)
 			continue;
 
-		if (inc_min_seq(lruvec, type, swappiness))
+		if (inc_min_seq(lruvec, type))
 			continue;
 
 		spin_unlock_irq(&lruvec->lru_lock);
--
2.17.1



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH RFC] mm/vmscan:Fix the hot/cold inversion when swappiness = 0 or 201
  2026-04-07 13:37 ` [PATCH RFC] mm/vmscan:Fix the hot/cold inversion when swappiness = 0 or 201 wangzhen
@ 2026-04-07 14:25   ` Kairui Song
  2026-04-07 23:00     ` Barry Song
  0 siblings, 1 reply; 7+ messages in thread
From: Kairui Song @ 2026-04-07 14:25 UTC (permalink / raw)
  To: wangzhen
  Cc: Andrew Morton, Johannes Weiner, David Hildenbrand, Michal Hocko,
	Qi Zheng, Shakeel Butt, Lorenzo Stoakes, Axel Rasmussen,
	Yuanchu Xie, Wei Xu, kasong@tencent.com,
	baolin.wang@linux.alibaba.com, baohua@kernel.org,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org

On Tue, Apr 07, 2026 at 01:37:08PM +0800, wangzhen wrote:
> >From ac731b061f152cba05b9aa351652a04f933986e0 Mon Sep 17 00:00:00 2001
> From: w00021541 <wangzhen5@hihonor.com>
> Date: Tue, 7 Apr 2026 16:17:53 +0800
> Subject: [PATCH RFC] mm/vmscan:Fix the hot/cold inversion when swappiness = 0  or 201
> 
> In some cases, when swappiness is set to 0 or 201, the oldest generation pages will be changed to the newest generation incorrectly.
> 
> Consider the following aging scenario:
> MAX_NR_GENS=4, MIN_NR_GENS=2, swappiness=201, 3 anon gens, 4 file gens.
> 1. When swappiness = 201, should_run_aging will only check anon type.
> should_run_aging return true.
> 2. In inc_max_seq, if the anon and file type have MAX_NR_GENS, inc_min_seq will move the oldest generation pages to the second oldest to prepare for increasing max_seq.
> Here, the file type will enter inc_min_seq.
> 3. In inc_min_seq, first goto is true, the pages migration was skipped, resulting in the inversion of cold/hot pages.
> 
> In fact, when MAX_NR_GENS=4 and MIN_NR_GENS=2, the for loop after the goto is unreachable.
> 
> Consider the code in inc_max_seq:
> if (get_nr_gens(lruvec, type) ! = MAX_NR_GENS)
>     continue;
> This means that only get_nr_gens==4 can enter the inc_min_seq.
> 
> Discuss the swappiness in three different scenarios:
> 1<=swappiness<=200:
> If should_run_aging returns true, both anon and file types must satisfy get_nr_gens<=3, indicating that no type satisfies get_nr_gens==MAX_NR_GENS.
> Therefore, both cannot enter inc_min_seq.
> 
> swappiness=201:
> If should_run_aging returns true, the anon type must satisfy get_nr_gens<=3. Only file type can satisfy get_nr_gens==MAX_NR_GENS.
> After entering inc_min_seq, type && (swappiness == SWAPPINESS_ANON_ONLY) is true, the for loop will be skipped.
> 
> swappiness=0:
> Same as swappiness=201
> 
> so the two goto statements should be removed. This ensures that when swappiness=0 or 201, the oldest generation pages are correctly promoted to the second oldest generation.
> (When 1<= swappiness<=200, only both anon and file types get_nr_gens<=3 will age, preventing the inversion of hot/cold pages).
> 
> Signed-off-by: w00021541 <wangzhen5@hihonor.com>
> ---
>  mm/vmscan.c | 14 +++-----------
>  1 file changed, 3 insertions(+), 11 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 0fc9373e8251..54c835b07d3e 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -3843,7 +3843,7 @@ static void clear_mm_walk(void)
>  		kfree(walk);
>  }
>  
> -static bool inc_min_seq(struct lruvec *lruvec, int type, int swappiness)
> +static bool inc_min_seq(struct lruvec *lruvec, int type)
>  {
>  	int zone;
>  	int remaining = MAX_LRU_BATCH;
> @@ -3851,14 +3851,6 @@ static bool inc_min_seq(struct lruvec *lruvec, int type, int swappiness)
>  	int hist = lru_hist_from_seq(lrugen->min_seq[type]);
>  	int new_gen, old_gen = lru_gen_from_seq(lrugen->min_seq[type]);
>  
> -	/* For file type, skip the check if swappiness is anon only */
> -	if (type && (swappiness == SWAPPINESS_ANON_ONLY))
> -		goto done;
> -
> -	/* For anon type, skip the check if swappiness is zero (file only) */
> -	if (!type && !swappiness)
> -		goto done;
> -

Hi, thanks for the patch.

We have a very similar patch internally, and the result is kind of bad.

Currently MGLRU forbid the gen distance between file and anon go larger
than 2, which mean with this patch, when under great pressure, you may
have to keep rotating a long list of the opposite type of folios to
reclaim another type.

For example, when you have only 2 gens of file folios, swap disabled,
and there are 3 gens of anon folios. Anon folios are unevictable because
there is no SWAP. And file is also unevcitable due to force protection
of gen. Consider anon folios are mostly cold (at least a portion of them
are), now the oldest gen of anon folios will be very long (e.g. 12G,
3145728 folios).

Now, to reclaim any file folios, you have to age first. Before this
patch that is usually fast. But after this, it will have to rotate
all 3145728 folios to second oldest anon gen, will could take a
very long time.

During that period any concurrent reclaimer will get rejected
due to force protection, result in very ugly long tailing or
unexpected OOM.

So I agree this is a good idea in general, I agree we should do
this. But better defer this until we patch up MGLRU to remove
the force protection first.

But I think it might be reasonable to remove the SWAPPINESS_ANON_ONLY
limit now, that can only be triggered by proactive reclaim
which would tolerate long tailing and won't cause OOM.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH RFC] mm/vmscan:Fix the hot/cold inversion when swappiness = 0 or 201
  2026-04-07 14:25   ` Kairui Song
@ 2026-04-07 23:00     ` Barry Song
  2026-04-08  2:35       ` Baolin Wang
  2026-04-08  3:15       ` Kairui Song
  0 siblings, 2 replies; 7+ messages in thread
From: Barry Song @ 2026-04-07 23:00 UTC (permalink / raw)
  To: Kairui Song
  Cc: wangzhen, Andrew Morton, Johannes Weiner, David Hildenbrand,
	Michal Hocko, Qi Zheng, Shakeel Butt, Lorenzo Stoakes,
	Axel Rasmussen, Yuanchu Xie, Wei Xu, kasong@tencent.com,
	baolin.wang@linux.alibaba.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org

On Tue, Apr 7, 2026 at 10:26 PM Kairui Song <ryncsn@gmail.com> wrote:
>
> On Tue, Apr 07, 2026 at 01:37:08PM +0800, wangzhen wrote:
> > >From ac731b061f152cba05b9aa351652a04f933986e0 Mon Sep 17 00:00:00 2001
> > From: w00021541 <wangzhen5@hihonor.com>
> > Date: Tue, 7 Apr 2026 16:17:53 +0800
> > Subject: [PATCH RFC] mm/vmscan:Fix the hot/cold inversion when swappiness = 0  or 201
> >
> > In some cases, when swappiness is set to 0 or 201, the oldest generation pages will be changed to the newest generation incorrectly.
> >
> > Consider the following aging scenario:
> > MAX_NR_GENS=4, MIN_NR_GENS=2, swappiness=201, 3 anon gens, 4 file gens.
> > 1. When swappiness = 201, should_run_aging will only check anon type.
> > should_run_aging return true.
> > 2. In inc_max_seq, if the anon and file type have MAX_NR_GENS, inc_min_seq will move the oldest generation pages to the second oldest to prepare for increasing max_seq.
> > Here, the file type will enter inc_min_seq.
> > 3. In inc_min_seq, first goto is true, the pages migration was skipped, resulting in the inversion of cold/hot pages.
> >
> > In fact, when MAX_NR_GENS=4 and MIN_NR_GENS=2, the for loop after the goto is unreachable.
> >
> > Consider the code in inc_max_seq:
> > if (get_nr_gens(lruvec, type) ! = MAX_NR_GENS)
> >     continue;
> > This means that only get_nr_gens==4 can enter the inc_min_seq.
> >
> > Discuss the swappiness in three different scenarios:
> > 1<=swappiness<=200:
> > If should_run_aging returns true, both anon and file types must satisfy get_nr_gens<=3, indicating that no type satisfies get_nr_gens==MAX_NR_GENS.
> > Therefore, both cannot enter inc_min_seq.
> >
> > swappiness=201:
> > If should_run_aging returns true, the anon type must satisfy get_nr_gens<=3. Only file type can satisfy get_nr_gens==MAX_NR_GENS.
> > After entering inc_min_seq, type && (swappiness == SWAPPINESS_ANON_ONLY) is true, the for loop will be skipped.
> >
> > swappiness=0:
> > Same as swappiness=201
> >
> > so the two goto statements should be removed. This ensures that when swappiness=0 or 201, the oldest generation pages are correctly promoted to the second oldest generation.
> > (When 1<= swappiness<=200, only both anon and file types get_nr_gens<=3 will age, preventing the inversion of hot/cold pages).
> >
> > Signed-off-by: w00021541 <wangzhen5@hihonor.com>
> > ---
> >  mm/vmscan.c | 14 +++-----------
> >  1 file changed, 3 insertions(+), 11 deletions(-)
> >
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 0fc9373e8251..54c835b07d3e 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -3843,7 +3843,7 @@ static void clear_mm_walk(void)
> >               kfree(walk);
> >  }
> >
> > -static bool inc_min_seq(struct lruvec *lruvec, int type, int swappiness)
> > +static bool inc_min_seq(struct lruvec *lruvec, int type)
> >  {
> >       int zone;
> >       int remaining = MAX_LRU_BATCH;
> > @@ -3851,14 +3851,6 @@ static bool inc_min_seq(struct lruvec *lruvec, int type, int swappiness)
> >       int hist = lru_hist_from_seq(lrugen->min_seq[type]);
> >       int new_gen, old_gen = lru_gen_from_seq(lrugen->min_seq[type]);
> >
> > -     /* For file type, skip the check if swappiness is anon only */
> > -     if (type && (swappiness == SWAPPINESS_ANON_ONLY))
> > -             goto done;
> > -
> > -     /* For anon type, skip the check if swappiness is zero (file only) */
> > -     if (!type && !swappiness)
> > -             goto done;
> > -
>
> Hi, thanks for the patch.
>
> We have a very similar patch internally, and the result is kind of bad.
>
> Currently MGLRU forbid the gen distance between file and anon go larger
> than 2, which mean with this patch, when under great pressure, you may
> have to keep rotating a long list of the opposite type of folios to
> reclaim another type.
>
> For example, when you have only 2 gens of file folios, swap disabled,
> and there are 3 gens of anon folios. Anon folios are unevictable because
> there is no SWAP. And file is also unevcitable due to force protection
> of gen. Consider anon folios are mostly cold (at least a portion of them
> are), now the oldest gen of anon folios will be very long (e.g. 12G,
> 3145728 folios).
>
> Now, to reclaim any file folios, you have to age first. Before this
> patch that is usually fast. But after this, it will have to rotate
> all 3145728 folios to second oldest anon gen, will could take a
> very long time.
>
> During that period any concurrent reclaimer will get rejected
> due to force protection, result in very ugly long tailing or
> unexpected OOM.
>
> So I agree this is a good idea in general, I agree we should do
> this. But better defer this until we patch up MGLRU to remove
> the force protection first.

I suspect that once we can age file and anonymous pages
separately, this issue will resolve itself. David already has
some code for this [1].

Not sure when he will have time to push it upstream, but I
may carve out some time to take care of it this month.

[1] https://lore.kernel.org/linux-mm/aam5nOyXs1sNdjTe@google.com/

>
> But I think it might be reasonable to remove the SWAPPINESS_ANON_ONLY
> limit now, that can only be triggered by proactive reclaim
> which would tolerate long tailing and won't cause OOM.

It may be better to defer both cases until file and anonymous
pages can be aged separately.

Thanks
Barry


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH RFC] mm/vmscan:Fix the hot/cold inversion when swappiness = 0 or 201
  2026-04-07 23:00     ` Barry Song
@ 2026-04-08  2:35       ` Baolin Wang
  2026-04-08  3:15       ` Kairui Song
  1 sibling, 0 replies; 7+ messages in thread
From: Baolin Wang @ 2026-04-08  2:35 UTC (permalink / raw)
  To: Barry Song, Kairui Song
  Cc: wangzhen, Andrew Morton, Johannes Weiner, David Hildenbrand,
	Michal Hocko, Qi Zheng, Shakeel Butt, Lorenzo Stoakes,
	Axel Rasmussen, Yuanchu Xie, Wei Xu, kasong@tencent.com,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org



On 4/8/26 7:00 AM, Barry Song wrote:
> On Tue, Apr 7, 2026 at 10:26 PM Kairui Song <ryncsn@gmail.com> wrote:
>>
>> On Tue, Apr 07, 2026 at 01:37:08PM +0800, wangzhen wrote:
>>> >From ac731b061f152cba05b9aa351652a04f933986e0 Mon Sep 17 00:00:00 2001
>>> From: w00021541 <wangzhen5@hihonor.com>
>>> Date: Tue, 7 Apr 2026 16:17:53 +0800
>>> Subject: [PATCH RFC] mm/vmscan:Fix the hot/cold inversion when swappiness = 0  or 201
>>>
>>> In some cases, when swappiness is set to 0 or 201, the oldest generation pages will be changed to the newest generation incorrectly.
>>>
>>> Consider the following aging scenario:
>>> MAX_NR_GENS=4, MIN_NR_GENS=2, swappiness=201, 3 anon gens, 4 file gens.
>>> 1. When swappiness = 201, should_run_aging will only check anon type.
>>> should_run_aging return true.
>>> 2. In inc_max_seq, if the anon and file type have MAX_NR_GENS, inc_min_seq will move the oldest generation pages to the second oldest to prepare for increasing max_seq.
>>> Here, the file type will enter inc_min_seq.
>>> 3. In inc_min_seq, first goto is true, the pages migration was skipped, resulting in the inversion of cold/hot pages.
>>>
>>> In fact, when MAX_NR_GENS=4 and MIN_NR_GENS=2, the for loop after the goto is unreachable.
>>>
>>> Consider the code in inc_max_seq:
>>> if (get_nr_gens(lruvec, type) ! = MAX_NR_GENS)
>>>      continue;
>>> This means that only get_nr_gens==4 can enter the inc_min_seq.
>>>
>>> Discuss the swappiness in three different scenarios:
>>> 1<=swappiness<=200:
>>> If should_run_aging returns true, both anon and file types must satisfy get_nr_gens<=3, indicating that no type satisfies get_nr_gens==MAX_NR_GENS.
>>> Therefore, both cannot enter inc_min_seq.
>>>
>>> swappiness=201:
>>> If should_run_aging returns true, the anon type must satisfy get_nr_gens<=3. Only file type can satisfy get_nr_gens==MAX_NR_GENS.
>>> After entering inc_min_seq, type && (swappiness == SWAPPINESS_ANON_ONLY) is true, the for loop will be skipped.
>>>
>>> swappiness=0:
>>> Same as swappiness=201
>>>
>>> so the two goto statements should be removed. This ensures that when swappiness=0 or 201, the oldest generation pages are correctly promoted to the second oldest generation.
>>> (When 1<= swappiness<=200, only both anon and file types get_nr_gens<=3 will age, preventing the inversion of hot/cold pages).
>>>
>>> Signed-off-by: w00021541 <wangzhen5@hihonor.com>

Please use your real name to sign off.

>>> ---
>>>   mm/vmscan.c | 14 +++-----------
>>>   1 file changed, 3 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>> index 0fc9373e8251..54c835b07d3e 100644
>>> --- a/mm/vmscan.c
>>> +++ b/mm/vmscan.c
>>> @@ -3843,7 +3843,7 @@ static void clear_mm_walk(void)
>>>                kfree(walk);
>>>   }
>>>
>>> -static bool inc_min_seq(struct lruvec *lruvec, int type, int swappiness)
>>> +static bool inc_min_seq(struct lruvec *lruvec, int type)
>>>   {
>>>        int zone;
>>>        int remaining = MAX_LRU_BATCH;
>>> @@ -3851,14 +3851,6 @@ static bool inc_min_seq(struct lruvec *lruvec, int type, int swappiness)
>>>        int hist = lru_hist_from_seq(lrugen->min_seq[type]);
>>>        int new_gen, old_gen = lru_gen_from_seq(lrugen->min_seq[type]);
>>>
>>> -     /* For file type, skip the check if swappiness is anon only */
>>> -     if (type && (swappiness == SWAPPINESS_ANON_ONLY))
>>> -             goto done;
>>> -
>>> -     /* For anon type, skip the check if swappiness is zero (file only) */
>>> -     if (!type && !swappiness)
>>> -             goto done;
>>> -
>>
>> Hi, thanks for the patch.
>>
>> We have a very similar patch internally, and the result is kind of bad.
>>
>> Currently MGLRU forbid the gen distance between file and anon go larger
>> than 2, which mean with this patch, when under great pressure, you may
>> have to keep rotating a long list of the opposite type of folios to
>> reclaim another type.
>>
>> For example, when you have only 2 gens of file folios, swap disabled,
>> and there are 3 gens of anon folios. Anon folios are unevictable because
>> there is no SWAP. And file is also unevcitable due to force protection
>> of gen. Consider anon folios are mostly cold (at least a portion of them
>> are), now the oldest gen of anon folios will be very long (e.g. 12G,
>> 3145728 folios).
>>
>> Now, to reclaim any file folios, you have to age first. Before this
>> patch that is usually fast. But after this, it will have to rotate
>> all 3145728 folios to second oldest anon gen, will could take a
>> very long time.

I have the same concern. In many of our scenarios, swap is disabled 
(swappiness=0), and we only reclaim file folios. In such cases, the 
workloads really don’t care about the hot/cold status of anonymous folios.

>> During that period any concurrent reclaimer will get rejected
>> due to force protection, result in very ugly long tailing or
>> unexpected OOM.
>>
>> So I agree this is a good idea in general, I agree we should do
>> this. But better defer this until we patch up MGLRU to remove
>> the force protection first.
> 
> I suspect that once we can age file and anonymous pages
> separately, this issue will resolve itself. David already has
> some code for this [1].
> 
> Not sure when he will have time to push it upstream, but I
> may carve out some time to take care of it this month.
> 
> [1] https://lore.kernel.org/linux-mm/aam5nOyXs1sNdjTe@google.com/

Great. Sounds reasonable to me.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH RFC] mm/vmscan:Fix the hot/cold inversion when swappiness = 0 or 201
  2026-04-07 23:00     ` Barry Song
  2026-04-08  2:35       ` Baolin Wang
@ 2026-04-08  3:15       ` Kairui Song
  2026-04-09  3:49         ` Barry Song
  1 sibling, 1 reply; 7+ messages in thread
From: Kairui Song @ 2026-04-08  3:15 UTC (permalink / raw)
  To: Barry Song
  Cc: wangzhen, Andrew Morton, Johannes Weiner, David Hildenbrand,
	Michal Hocko, Qi Zheng, Shakeel Butt, Lorenzo Stoakes,
	Axel Rasmussen, Yuanchu Xie, Wei Xu, kasong@tencent.com,
	baolin.wang@linux.alibaba.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org

On Wed, Apr 08, 2026 at 07:00:17AM +0800, Barry Song wrote:
> On Tue, Apr 7, 2026 at 10:26 PM Kairui Song <ryncsn@gmail.com> wrote:
> >
> > On Tue, Apr 07, 2026 at 01:37:08PM +0800, wangzhen wrote:
> > > >From ac731b061f152cba05b9aa351652a04f933986e0 Mon Sep 17 00:00:00 2001
> > > From: w00021541 <wangzhen5@hihonor.com>
> > > Date: Tue, 7 Apr 2026 16:17:53 +0800
> > > Subject: [PATCH RFC] mm/vmscan:Fix the hot/cold inversion when swappiness = 0  or 201
> > >
> > > In some cases, when swappiness is set to 0 or 201, the oldest generation pages will be changed to the newest generation incorrectly.
> > >
> > > Consider the following aging scenario:
> > > MAX_NR_GENS=4, MIN_NR_GENS=2, swappiness=201, 3 anon gens, 4 file gens.
> > > 1. When swappiness = 201, should_run_aging will only check anon type.
> > > should_run_aging return true.
> > > 2. In inc_max_seq, if the anon and file type have MAX_NR_GENS, inc_min_seq will move the oldest generation pages to the second oldest to prepare for increasing max_seq.
> > > Here, the file type will enter inc_min_seq.
> > > 3. In inc_min_seq, first goto is true, the pages migration was skipped, resulting in the inversion of cold/hot pages.
> > >
> > > In fact, when MAX_NR_GENS=4 and MIN_NR_GENS=2, the for loop after the goto is unreachable.
> > >
> > > Consider the code in inc_max_seq:
> > > if (get_nr_gens(lruvec, type) ! = MAX_NR_GENS)
> > >     continue;
> > > This means that only get_nr_gens==4 can enter the inc_min_seq.
> > >
> > > Discuss the swappiness in three different scenarios:
> > > 1<=swappiness<=200:
> > > If should_run_aging returns true, both anon and file types must satisfy get_nr_gens<=3, indicating that no type satisfies get_nr_gens==MAX_NR_GENS.
> > > Therefore, both cannot enter inc_min_seq.
> > >
> > > swappiness=201:
> > > If should_run_aging returns true, the anon type must satisfy get_nr_gens<=3. Only file type can satisfy get_nr_gens==MAX_NR_GENS.
> > > After entering inc_min_seq, type && (swappiness == SWAPPINESS_ANON_ONLY) is true, the for loop will be skipped.
> > >
> > > swappiness=0:
> > > Same as swappiness=201
> > >
> > > so the two goto statements should be removed. This ensures that when swappiness=0 or 201, the oldest generation pages are correctly promoted to the second oldest generation.
> > > (When 1<= swappiness<=200, only both anon and file types get_nr_gens<=3 will age, preventing the inversion of hot/cold pages).
> > >
> > > Signed-off-by: w00021541 <wangzhen5@hihonor.com>
> > > ---
> > >  mm/vmscan.c | 14 +++-----------
> > >  1 file changed, 3 insertions(+), 11 deletions(-)
> > >
> > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > index 0fc9373e8251..54c835b07d3e 100644
> > > --- a/mm/vmscan.c
> > > +++ b/mm/vmscan.c
> > > @@ -3843,7 +3843,7 @@ static void clear_mm_walk(void)
> > >               kfree(walk);
> > >  }
> > >
> > > -static bool inc_min_seq(struct lruvec *lruvec, int type, int swappiness)
> > > +static bool inc_min_seq(struct lruvec *lruvec, int type)
> > >  {
> > >       int zone;
> > >       int remaining = MAX_LRU_BATCH;
> > > @@ -3851,14 +3851,6 @@ static bool inc_min_seq(struct lruvec *lruvec, int type, int swappiness)
> > >       int hist = lru_hist_from_seq(lrugen->min_seq[type]);
> > >       int new_gen, old_gen = lru_gen_from_seq(lrugen->min_seq[type]);
> > >
> > > -     /* For file type, skip the check if swappiness is anon only */
> > > -     if (type && (swappiness == SWAPPINESS_ANON_ONLY))
> > > -             goto done;
> > > -
> > > -     /* For anon type, skip the check if swappiness is zero (file only) */
> > > -     if (!type && !swappiness)
> > > -             goto done;
> > > -
> >
> > Hi, thanks for the patch.
> >
> > We have a very similar patch internally, and the result is kind of bad.
> >
> > Currently MGLRU forbid the gen distance between file and anon go larger
> > than 2, which mean with this patch, when under great pressure, you may
> > have to keep rotating a long list of the opposite type of folios to
> > reclaim another type.
> >
> > For example, when you have only 2 gens of file folios, swap disabled,
> > and there are 3 gens of anon folios. Anon folios are unevictable because
> > there is no SWAP. And file is also unevcitable due to force protection
> > of gen. Consider anon folios are mostly cold (at least a portion of them
> > are), now the oldest gen of anon folios will be very long (e.g. 12G,
> > 3145728 folios).
> >
> > Now, to reclaim any file folios, you have to age first. Before this
> > patch that is usually fast. But after this, it will have to rotate
> > all 3145728 folios to second oldest anon gen, will could take a
> > very long time.
> >
> > During that period any concurrent reclaimer will get rejected
> > due to force protection, result in very ugly long tailing or
> > unexpected OOM.
> >
> > So I agree this is a good idea in general, I agree we should do
> > this. But better defer this until we patch up MGLRU to remove
> > the force protection first.
> 
> I suspect that once we can age file and anonymous pages
> separately, this issue will resolve itself. David already has
> some code for this [1].
> 
> Not sure when he will have time to push it upstream, but I
> may carve out some time to take care of it this month.
> 
> [1] https://lore.kernel.org/linux-mm/aam5nOyXs1sNdjTe@google.com/

Hi, thanks for sharing the idea.

Right, a few weeks ago I also got info from CachyOS that they are using
following patch for MGLRU:

https://github.com/firelzrd/re-swappiness

The idea is also split the seq number for anon / file so swappiness
works again.

However, I really not sure if this is the right approach. It changes
the model of MGLRU and things like TTL may no longer work as expected.
And TTL does solve real problems too (also from CachyOS):

https://github.com/firelzrd/le9uo

TTL replaced the le9 patch above in a cleaner way for thrashing
prevention.

Right now we do page table walk (and it walks both anon / folio)
while generating one unified new gen, meaning the folios in that
gen have the same (or at least all older than a specific) access
time, which is used as the metric for TTL.

Besides, having unified gens also help implementing things like
workingset reporting where each gen is like a bin for histogram:

https://lwn.net/Articles/976985/

Aging triggering could be a bit more problematic too.
I think the right way is to just do the aging asynchronously, Yu
even left a TODO comment in vmscan.c:

/*
 * For future optimizations:
 * 1. Defer try_to_inc_max_seq() to workqueues to reduce latency for memcg
 *    reclaim.
 */

Then, we start the aging when ever there is less than 4 gens, and
allow reclaim to always go on even if there is only 2 gens left.

The performance would be better since the is no more blocking
on aging, no change to existing model, and the change should
be smaller and easier to review IIUC.

One concerning part is doing reclaim while only having 2 gens left.
I think it seems OK. It should be rare as 3 gens act as a buffer
already, having only 2 gens left means the async aging can't catch
up and system is under extreme pressure so it's unlikely the folios
will get access enough times to get meaningful heat info, and
refault will be more meaningful help to sorting out the workingset:

https://lwn.net/Articles/945266/

Cgroup reclaim can do some throttling on that too, and kswapd can
still do aging synchronically.

Just some ideas, we may need to do some test and benchmark
to figure out which is the best solution. Discussion
is welcomed! :D


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH RFC] mm/vmscan:Fix the hot/cold inversion when swappiness = 0 or 201
  2026-04-08  3:15       ` Kairui Song
@ 2026-04-09  3:49         ` Barry Song
  2026-04-09  8:37           ` wangzicheng
  0 siblings, 1 reply; 7+ messages in thread
From: Barry Song @ 2026-04-09  3:49 UTC (permalink / raw)
  To: Kairui Song
  Cc: wangzhen, Andrew Morton, Johannes Weiner, David Hildenbrand,
	Michal Hocko, Qi Zheng, Shakeel Butt, Lorenzo Stoakes,
	Axel Rasmussen, Yuanchu Xie, Wei Xu, kasong@tencent.com,
	baolin.wang@linux.alibaba.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org

[...]
> > > Hi, thanks for the patch.
> > >
> > > We have a very similar patch internally, and the result is kind of bad.
> > >
> > > Currently MGLRU forbid the gen distance between file and anon go larger
> > > than 2, which mean with this patch, when under great pressure, you may
> > > have to keep rotating a long list of the opposite type of folios to
> > > reclaim another type.
> > >
> > > For example, when you have only 2 gens of file folios, swap disabled,
> > > and there are 3 gens of anon folios. Anon folios are unevictable because
> > > there is no SWAP. And file is also unevcitable due to force protection
> > > of gen. Consider anon folios are mostly cold (at least a portion of them
> > > are), now the oldest gen of anon folios will be very long (e.g. 12G,
> > > 3145728 folios).
> > >
> > > Now, to reclaim any file folios, you have to age first. Before this
> > > patch that is usually fast. But after this, it will have to rotate
> > > all 3145728 folios to second oldest anon gen, will could take a
> > > very long time.
> > >
> > > During that period any concurrent reclaimer will get rejected
> > > due to force protection, result in very ugly long tailing or
> > > unexpected OOM.
> > >
> > > So I agree this is a good idea in general, I agree we should do
> > > this. But better defer this until we patch up MGLRU to remove
> > > the force protection first.
> >
> > I suspect that once we can age file and anonymous pages
> > separately, this issue will resolve itself. David already has
> > some code for this [1].
> >
> > Not sure when he will have time to push it upstream, but I
> > may carve out some time to take care of it this month.
> >
> > [1] https://lore.kernel.org/linux-mm/aam5nOyXs1sNdjTe@google.com/
>
> Hi, thanks for sharing the idea.
>
> Right, a few weeks ago I also got info from CachyOS that they are using
> following patch for MGLRU:
>
> https://github.com/firelzrd/re-swappiness
>
> The idea is also split the seq number for anon / file so swappiness
> works again.
>
> However, I really not sure if this is the right approach. It changes
> the model of MGLRU and things like TTL may no longer work as expected.
> And TTL does solve real problems too (also from CachyOS):
>
> https://github.com/firelzrd/le9uo
>
> TTL replaced the le9 patch above in a cleaner way for thrashing
> prevention.
>
> Right now we do page table walk (and it walks both anon / folio)
> while generating one unified new gen, meaning the folios in that
> gen have the same (or at least all older than a specific) access
> time, which is used as the metric for TTL.
>
> Besides, having unified gens also help implementing things like
> workingset reporting where each gen is like a bin for histogram:
>
> https://lwn.net/Articles/976985/
>
> Aging triggering could be a bit more problematic too.
> I think the right way is to just do the aging asynchronously, Yu
> even left a TODO comment in vmscan.c:
>
> /*
>  * For future optimizations:
>  * 1. Defer try_to_inc_max_seq() to workqueues to reduce latency for memcg
>  *    reclaim.
>  */

Aging asynchronously could be a separate topic, as we can
do many things in an async manner—similar to proposals for
asynchronous compression. These async approaches may improve
performance, but they also add complexity—for example, managing
CPU utilization of reclamation threads to prevent devices from
overheating.

>
> Then, we start the aging when ever there is less than 4 gens, and
> allow reclaim to always go on even if there is only 2 gens left.

I don’t think allowing reclamation with two generations left
will resolve the problem. The fundamental issue with sharing the
same generations for file and anon is that one type must catch
up with the other—either through reclamation or via what this
patch is (admittedly) doing as a workaround. If we have to go
through reclamation, that effectively makes swappiness invalid
again.

Allowing reclamation with two generations may let one type move
ahead briefly, but over a smoothed time window there is no real
difference, as the other type still has to catch up with the one
that has fewer generations left.

>
> The performance would be better since the is no more blocking
> on aging, no change to existing model, and the change should
> be smaller and easier to review IIUC.
>
> One concerning part is doing reclaim while only having 2 gens left.
> I think it seems OK. It should be rare as 3 gens act as a buffer
> already, having only 2 gens left means the async aging can't catch
> up and system is under extreme pressure so it's unlikely the folios
> will get access enough times to get meaningful heat info, and
> refault will be more meaningful help to sorting out the workingset:
>
> https://lwn.net/Articles/945266/
>
> Cgroup reclaim can do some throttling on that too, and kswapd can
> still do aging synchronically.
>
> Just some ideas, we may need to do some test and benchmark
> to figure out which is the best solution. Discussion
> is welcomed! :D

Maybe we can still find a way to address the concerns you raised
above, as well as TTL—for example, by using separate timestamps
for anon and file pages.

Thanks
Barry


^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH RFC] mm/vmscan:Fix the hot/cold inversion when swappiness = 0 or 201
  2026-04-09  3:49         ` Barry Song
@ 2026-04-09  8:37           ` wangzicheng
  0 siblings, 0 replies; 7+ messages in thread
From: wangzicheng @ 2026-04-09  8:37 UTC (permalink / raw)
  To: Barry Song, Kairui Song
  Cc: wangzhen, Andrew Morton, Johannes Weiner, David Hildenbrand,
	Michal Hocko, Qi Zheng, Shakeel Butt, Lorenzo Stoakes,
	Axel Rasmussen, Yuanchu Xie, Wei Xu, kasong@tencent.com,
	baolin.wang@linux.alibaba.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org

> > > I suspect that once we can age file and anonymous pages
> > > separately, this issue will resolve itself. David already has
> > > some code for this [1].
> > >
> > > Not sure when he will have time to push it upstream, but I
> > > may carve out some time to take care of it this month.
> > >
> > > [1] https://lore.kernel.org/linux-mm/aam5nOyXs1sNdjTe@google.com/
> >
> > Hi, thanks for sharing the idea.
> >
> > Right, a few weeks ago I also got info from CachyOS that they are using
> > following patch for MGLRU:
> >
> > https://github.com/firelzrd/re-swappiness
> >
> > The idea is also split the seq number for anon / file so swappiness
> > works again.
> >
> > However, I really not sure if this is the right approach. It changes
> > the model of MGLRU and things like TTL may no longer work as expected.
> > And TTL does solve real problems too (also from CachyOS):
> >
> > https://github.com/firelzrd/le9uo
> >
> > TTL replaced the le9 patch above in a cleaner way for thrashing
> > prevention.
> >
> > Right now we do page table walk (and it walks both anon / folio)
> > while generating one unified new gen, meaning the folios in that
> > gen have the same (or at least all older than a specific) access
> > time, which is used as the metric for TTL.
> >
> > Besides, having unified gens also help implementing things like
> > workingset reporting where each gen is like a bin for histogram:
> >
> > https://lwn.net/Articles/976985/
> >
> > Aging triggering could be a bit more problematic too.
> > I think the right way is to just do the aging asynchronously, Yu
> > even left a TODO comment in vmscan.c:
> >
> > /*
> >  * For future optimizations:
> >  * 1. Defer try_to_inc_max_seq() to workqueues to reduce latency for
> memcg
> >  *    reclaim.
> >  */
> 
> Aging asynchronously could be a separate topic, as we can
> do many things in an async manner—similar to proposals for
> asynchronous compression. These async approaches may improve
> performance, but they also add complexity—for example, managing
> CPU utilization of reclamation threads to prevent devices from
> overheating.
> 

Asynchronously reclamation could indeed help(when swap is enabled).
We have also saw some improvements with a similar approach in
Android workloads. Async aging makes swappiness more effective
so that more anonymous pages eventually become reclaimable.

Similar to async aging, giving aging more opportunities may also help.
For example, in should_run_aging(), return true when
Evictable pages < MIN_LRU_BATCH.
I haven't tested this yet but plan to try it.

> >
> > Then, we start the aging when ever there is less than 4 gens, and
> > allow reclaim to always go on even if there is only 2 gens left.
> 
> I don’t think allowing reclamation with two generations left
> will resolve the problem. The fundamental issue with sharing the
> same generations for file and anon is that one type must catch
> up with the other—either through reclamation or via what this
> patch is (admittedly) doing as a workaround. If we have to go
> through reclamation, that effectively makes swappiness invalid
> again.
> 
> Allowing reclamation with two generations may let one type move
> ahead briefly, but over a smoothed time window there is no real
> difference, as the other type still has to catch up with the one
> that has fewer generations left.
> 

That is true.

In some previous experiments on Android we observed that when tasks
are *frozen* and aging is triggered via the debugfs interface, pages may
gradually accumulate into a single generation. In that state the MGLRU
reclaim pattern controlled by swappiness becomes very similar to
classic LRU reclaim.

> >
> > The performance would be better since the is no more blocking
> > on aging, no change to existing model, and the change should
> > be smaller and easier to review IIUC.
> >
> > One concerning part is doing reclaim while only having 2 gens left.
> > I think it seems OK. It should be rare as 3 gens act as a buffer
> > already, having only 2 gens left means the async aging can't catch
> > up and system is under extreme pressure so it's unlikely the folios
> > will get access enough times to get meaningful heat info, and
> > refault will be more meaningful help to sorting out the workingset:
> >
> > https://lwn.net/Articles/945266/
> >
> > Cgroup reclaim can do some throttling on that too, and kswapd can
> > still do aging synchronically.
> >
> > Just some ideas, we may need to do some test and benchmark
> > to figure out which is the best solution. Discussion
> > is welcomed! :D
> 
> Maybe we can still find a way to address the concerns you raised
> above, as well as TTL—for example, by using separate timestamps
> for anon and file pages.
> 
> Thanks
> Barry

Best,
Zicheng

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-04-09  8:37 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <7829b070df1b405dbc97dd6a028d8c8a@honor.com>
2026-04-07 13:37 ` [PATCH RFC] mm/vmscan:Fix the hot/cold inversion when swappiness = 0 or 201 wangzhen
2026-04-07 14:25   ` Kairui Song
2026-04-07 23:00     ` Barry Song
2026-04-08  2:35       ` Baolin Wang
2026-04-08  3:15       ` Kairui Song
2026-04-09  3:49         ` Barry Song
2026-04-09  8:37           ` wangzicheng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox