public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC] mm/vmscan:Fix the hot/cold inversion when swappiness = 0 or 201
       [not found] <7829b070df1b405dbc97dd6a028d8c8a@honor.com>
@ 2026-04-07 13:37 ` wangzhen
  2026-04-07 14:25   ` Kairui Song
  0 siblings, 1 reply; 5+ messages in thread
From: wangzhen @ 2026-04-07 13:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, David Hildenbrand, Michal Hocko, Qi Zheng,
	Shakeel Butt, Lorenzo Stoakes, Axel Rasmussen, Yuanchu Xie,
	Wei Xu, kasong@tencent.com, baolin.wang@linux.alibaba.com,
	baohua@kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org

From ac731b061f152cba05b9aa351652a04f933986e0 Mon Sep 17 00:00:00 2001
From: w00021541 <wangzhen5@hihonor.com>
Date: Tue, 7 Apr 2026 16:17:53 +0800
Subject: [PATCH RFC] mm/vmscan:Fix the hot/cold inversion when swappiness = 0  or 201

In some cases, when swappiness is set to 0 or 201, the oldest generation pages will be changed to the newest generation incorrectly.

Consider the following aging scenario:
MAX_NR_GENS=4, MIN_NR_GENS=2, swappiness=201, 3 anon gens, 4 file gens.
1. When swappiness = 201, should_run_aging will only check anon type.
should_run_aging return true.
2. In inc_max_seq, if the anon and file type have MAX_NR_GENS, inc_min_seq will move the oldest generation pages to the second oldest to prepare for increasing max_seq.
Here, the file type will enter inc_min_seq.
3. In inc_min_seq, first goto is true, the pages migration was skipped, resulting in the inversion of cold/hot pages.

In fact, when MAX_NR_GENS=4 and MIN_NR_GENS=2, the for loop after the goto is unreachable.

Consider the code in inc_max_seq:
if (get_nr_gens(lruvec, type) ! = MAX_NR_GENS)
    continue;
This means that only get_nr_gens==4 can enter the inc_min_seq.

Discuss the swappiness in three different scenarios:
1<=swappiness<=200:
If should_run_aging returns true, both anon and file types must satisfy get_nr_gens<=3, indicating that no type satisfies get_nr_gens==MAX_NR_GENS.
Therefore, both cannot enter inc_min_seq.

swappiness=201:
If should_run_aging returns true, the anon type must satisfy get_nr_gens<=3. Only file type can satisfy get_nr_gens==MAX_NR_GENS.
After entering inc_min_seq, type && (swappiness == SWAPPINESS_ANON_ONLY) is true, the for loop will be skipped.

swappiness=0:
Same as swappiness=201

so the two goto statements should be removed. This ensures that when swappiness=0 or 201, the oldest generation pages are correctly promoted to the second oldest generation.
(When 1<= swappiness<=200, only both anon and file types get_nr_gens<=3 will age, preventing the inversion of hot/cold pages).

Signed-off-by: w00021541 <wangzhen5@hihonor.com>
---
 mm/vmscan.c | 14 +++-----------
 1 file changed, 3 insertions(+), 11 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 0fc9373e8251..54c835b07d3e 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3843,7 +3843,7 @@ static void clear_mm_walk(void)
 		kfree(walk);
 }
 
-static bool inc_min_seq(struct lruvec *lruvec, int type, int swappiness)
+static bool inc_min_seq(struct lruvec *lruvec, int type)
 {
 	int zone;
 	int remaining = MAX_LRU_BATCH;
@@ -3851,14 +3851,6 @@ static bool inc_min_seq(struct lruvec *lruvec, int type, int swappiness)
 	int hist = lru_hist_from_seq(lrugen->min_seq[type]);
 	int new_gen, old_gen = lru_gen_from_seq(lrugen->min_seq[type]);
 
-	/* For file type, skip the check if swappiness is anon only */
-	if (type && (swappiness == SWAPPINESS_ANON_ONLY))
-		goto done;
-
-	/* For anon type, skip the check if swappiness is zero (file only) */
-	if (!type && !swappiness)
-		goto done;
-
 	/* prevent cold/hot inversion if the type is evictable */
 	for (zone = 0; zone < MAX_NR_ZONES; zone++) {
 		struct list_head *head = &lrugen->folios[old_gen][type][zone];
@@ -3889,7 +3881,7 @@ static bool inc_min_seq(struct lruvec *lruvec, int type, int swappiness)
 				return false;
 		}
 	}
-done:
+
 	reset_ctrl_pos(lruvec, type, true);
 	WRITE_ONCE(lrugen->min_seq[type], lrugen->min_seq[type] + 1);
 
@@ -3975,7 +3967,7 @@ static bool inc_max_seq(struct lruvec *lruvec, unsigned long seq, int swappiness
 		if (get_nr_gens(lruvec, type) != MAX_NR_GENS)
 			continue;
 
-		if (inc_min_seq(lruvec, type, swappiness))
+		if (inc_min_seq(lruvec, type))
 			continue;
 
 		spin_unlock_irq(&lruvec->lru_lock);
--
2.17.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH RFC] mm/vmscan:Fix the hot/cold inversion when swappiness = 0 or 201
  2026-04-07 13:37 ` [PATCH RFC] mm/vmscan:Fix the hot/cold inversion when swappiness = 0 or 201 wangzhen
@ 2026-04-07 14:25   ` Kairui Song
  2026-04-07 23:00     ` Barry Song
  0 siblings, 1 reply; 5+ messages in thread
From: Kairui Song @ 2026-04-07 14:25 UTC (permalink / raw)
  To: wangzhen
  Cc: Andrew Morton, Johannes Weiner, David Hildenbrand, Michal Hocko,
	Qi Zheng, Shakeel Butt, Lorenzo Stoakes, Axel Rasmussen,
	Yuanchu Xie, Wei Xu, kasong@tencent.com,
	baolin.wang@linux.alibaba.com, baohua@kernel.org,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org

On Tue, Apr 07, 2026 at 01:37:08PM +0800, wangzhen wrote:
> >From ac731b061f152cba05b9aa351652a04f933986e0 Mon Sep 17 00:00:00 2001
> From: w00021541 <wangzhen5@hihonor.com>
> Date: Tue, 7 Apr 2026 16:17:53 +0800
> Subject: [PATCH RFC] mm/vmscan:Fix the hot/cold inversion when swappiness = 0  or 201
> 
> In some cases, when swappiness is set to 0 or 201, the oldest generation pages will be changed to the newest generation incorrectly.
> 
> Consider the following aging scenario:
> MAX_NR_GENS=4, MIN_NR_GENS=2, swappiness=201, 3 anon gens, 4 file gens.
> 1. When swappiness = 201, should_run_aging will only check anon type.
> should_run_aging return true.
> 2. In inc_max_seq, if the anon and file type have MAX_NR_GENS, inc_min_seq will move the oldest generation pages to the second oldest to prepare for increasing max_seq.
> Here, the file type will enter inc_min_seq.
> 3. In inc_min_seq, first goto is true, the pages migration was skipped, resulting in the inversion of cold/hot pages.
> 
> In fact, when MAX_NR_GENS=4 and MIN_NR_GENS=2, the for loop after the goto is unreachable.
> 
> Consider the code in inc_max_seq:
> if (get_nr_gens(lruvec, type) ! = MAX_NR_GENS)
>     continue;
> This means that only get_nr_gens==4 can enter the inc_min_seq.
> 
> Discuss the swappiness in three different scenarios:
> 1<=swappiness<=200:
> If should_run_aging returns true, both anon and file types must satisfy get_nr_gens<=3, indicating that no type satisfies get_nr_gens==MAX_NR_GENS.
> Therefore, both cannot enter inc_min_seq.
> 
> swappiness=201:
> If should_run_aging returns true, the anon type must satisfy get_nr_gens<=3. Only file type can satisfy get_nr_gens==MAX_NR_GENS.
> After entering inc_min_seq, type && (swappiness == SWAPPINESS_ANON_ONLY) is true, the for loop will be skipped.
> 
> swappiness=0:
> Same as swappiness=201
> 
> so the two goto statements should be removed. This ensures that when swappiness=0 or 201, the oldest generation pages are correctly promoted to the second oldest generation.
> (When 1<= swappiness<=200, only both anon and file types get_nr_gens<=3 will age, preventing the inversion of hot/cold pages).
> 
> Signed-off-by: w00021541 <wangzhen5@hihonor.com>
> ---
>  mm/vmscan.c | 14 +++-----------
>  1 file changed, 3 insertions(+), 11 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 0fc9373e8251..54c835b07d3e 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -3843,7 +3843,7 @@ static void clear_mm_walk(void)
>  		kfree(walk);
>  }
>  
> -static bool inc_min_seq(struct lruvec *lruvec, int type, int swappiness)
> +static bool inc_min_seq(struct lruvec *lruvec, int type)
>  {
>  	int zone;
>  	int remaining = MAX_LRU_BATCH;
> @@ -3851,14 +3851,6 @@ static bool inc_min_seq(struct lruvec *lruvec, int type, int swappiness)
>  	int hist = lru_hist_from_seq(lrugen->min_seq[type]);
>  	int new_gen, old_gen = lru_gen_from_seq(lrugen->min_seq[type]);
>  
> -	/* For file type, skip the check if swappiness is anon only */
> -	if (type && (swappiness == SWAPPINESS_ANON_ONLY))
> -		goto done;
> -
> -	/* For anon type, skip the check if swappiness is zero (file only) */
> -	if (!type && !swappiness)
> -		goto done;
> -

Hi, thanks for the patch.

We have a very similar patch internally, and the result is kind of bad.

Currently MGLRU forbid the gen distance between file and anon go larger
than 2, which mean with this patch, when under great pressure, you may
have to keep rotating a long list of the opposite type of folios to
reclaim another type.

For example, when you have only 2 gens of file folios, swap disabled,
and there are 3 gens of anon folios. Anon folios are unevictable because
there is no SWAP. And file is also unevcitable due to force protection
of gen. Consider anon folios are mostly cold (at least a portion of them
are), now the oldest gen of anon folios will be very long (e.g. 12G,
3145728 folios).

Now, to reclaim any file folios, you have to age first. Before this
patch that is usually fast. But after this, it will have to rotate
all 3145728 folios to second oldest anon gen, will could take a
very long time.

During that period any concurrent reclaimer will get rejected
due to force protection, result in very ugly long tailing or
unexpected OOM.

So I agree this is a good idea in general, I agree we should do
this. But better defer this until we patch up MGLRU to remove
the force protection first.

But I think it might be reasonable to remove the SWAPPINESS_ANON_ONLY
limit now, that can only be triggered by proactive reclaim
which would tolerate long tailing and won't cause OOM.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH RFC] mm/vmscan:Fix the hot/cold inversion when swappiness = 0 or 201
  2026-04-07 14:25   ` Kairui Song
@ 2026-04-07 23:00     ` Barry Song
  2026-04-08  2:35       ` Baolin Wang
  2026-04-08  3:15       ` Kairui Song
  0 siblings, 2 replies; 5+ messages in thread
From: Barry Song @ 2026-04-07 23:00 UTC (permalink / raw)
  To: Kairui Song
  Cc: wangzhen, Andrew Morton, Johannes Weiner, David Hildenbrand,
	Michal Hocko, Qi Zheng, Shakeel Butt, Lorenzo Stoakes,
	Axel Rasmussen, Yuanchu Xie, Wei Xu, kasong@tencent.com,
	baolin.wang@linux.alibaba.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org

On Tue, Apr 7, 2026 at 10:26 PM Kairui Song <ryncsn@gmail.com> wrote:
>
> On Tue, Apr 07, 2026 at 01:37:08PM +0800, wangzhen wrote:
> > >From ac731b061f152cba05b9aa351652a04f933986e0 Mon Sep 17 00:00:00 2001
> > From: w00021541 <wangzhen5@hihonor.com>
> > Date: Tue, 7 Apr 2026 16:17:53 +0800
> > Subject: [PATCH RFC] mm/vmscan:Fix the hot/cold inversion when swappiness = 0  or 201
> >
> > In some cases, when swappiness is set to 0 or 201, the oldest generation pages will be changed to the newest generation incorrectly.
> >
> > Consider the following aging scenario:
> > MAX_NR_GENS=4, MIN_NR_GENS=2, swappiness=201, 3 anon gens, 4 file gens.
> > 1. When swappiness = 201, should_run_aging will only check anon type.
> > should_run_aging return true.
> > 2. In inc_max_seq, if the anon and file type have MAX_NR_GENS, inc_min_seq will move the oldest generation pages to the second oldest to prepare for increasing max_seq.
> > Here, the file type will enter inc_min_seq.
> > 3. In inc_min_seq, first goto is true, the pages migration was skipped, resulting in the inversion of cold/hot pages.
> >
> > In fact, when MAX_NR_GENS=4 and MIN_NR_GENS=2, the for loop after the goto is unreachable.
> >
> > Consider the code in inc_max_seq:
> > if (get_nr_gens(lruvec, type) ! = MAX_NR_GENS)
> >     continue;
> > This means that only get_nr_gens==4 can enter the inc_min_seq.
> >
> > Discuss the swappiness in three different scenarios:
> > 1<=swappiness<=200:
> > If should_run_aging returns true, both anon and file types must satisfy get_nr_gens<=3, indicating that no type satisfies get_nr_gens==MAX_NR_GENS.
> > Therefore, both cannot enter inc_min_seq.
> >
> > swappiness=201:
> > If should_run_aging returns true, the anon type must satisfy get_nr_gens<=3. Only file type can satisfy get_nr_gens==MAX_NR_GENS.
> > After entering inc_min_seq, type && (swappiness == SWAPPINESS_ANON_ONLY) is true, the for loop will be skipped.
> >
> > swappiness=0:
> > Same as swappiness=201
> >
> > so the two goto statements should be removed. This ensures that when swappiness=0 or 201, the oldest generation pages are correctly promoted to the second oldest generation.
> > (When 1<= swappiness<=200, only both anon and file types get_nr_gens<=3 will age, preventing the inversion of hot/cold pages).
> >
> > Signed-off-by: w00021541 <wangzhen5@hihonor.com>
> > ---
> >  mm/vmscan.c | 14 +++-----------
> >  1 file changed, 3 insertions(+), 11 deletions(-)
> >
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 0fc9373e8251..54c835b07d3e 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -3843,7 +3843,7 @@ static void clear_mm_walk(void)
> >               kfree(walk);
> >  }
> >
> > -static bool inc_min_seq(struct lruvec *lruvec, int type, int swappiness)
> > +static bool inc_min_seq(struct lruvec *lruvec, int type)
> >  {
> >       int zone;
> >       int remaining = MAX_LRU_BATCH;
> > @@ -3851,14 +3851,6 @@ static bool inc_min_seq(struct lruvec *lruvec, int type, int swappiness)
> >       int hist = lru_hist_from_seq(lrugen->min_seq[type]);
> >       int new_gen, old_gen = lru_gen_from_seq(lrugen->min_seq[type]);
> >
> > -     /* For file type, skip the check if swappiness is anon only */
> > -     if (type && (swappiness == SWAPPINESS_ANON_ONLY))
> > -             goto done;
> > -
> > -     /* For anon type, skip the check if swappiness is zero (file only) */
> > -     if (!type && !swappiness)
> > -             goto done;
> > -
>
> Hi, thanks for the patch.
>
> We have a very similar patch internally, and the result is kind of bad.
>
> Currently MGLRU forbid the gen distance between file and anon go larger
> than 2, which mean with this patch, when under great pressure, you may
> have to keep rotating a long list of the opposite type of folios to
> reclaim another type.
>
> For example, when you have only 2 gens of file folios, swap disabled,
> and there are 3 gens of anon folios. Anon folios are unevictable because
> there is no SWAP. And file is also unevcitable due to force protection
> of gen. Consider anon folios are mostly cold (at least a portion of them
> are), now the oldest gen of anon folios will be very long (e.g. 12G,
> 3145728 folios).
>
> Now, to reclaim any file folios, you have to age first. Before this
> patch that is usually fast. But after this, it will have to rotate
> all 3145728 folios to second oldest anon gen, will could take a
> very long time.
>
> During that period any concurrent reclaimer will get rejected
> due to force protection, result in very ugly long tailing or
> unexpected OOM.
>
> So I agree this is a good idea in general, I agree we should do
> this. But better defer this until we patch up MGLRU to remove
> the force protection first.

I suspect that once we can age file and anonymous pages
separately, this issue will resolve itself. David already has
some code for this [1].

Not sure when he will have time to push it upstream, but I
may carve out some time to take care of it this month.

[1] https://lore.kernel.org/linux-mm/aam5nOyXs1sNdjTe@google.com/

>
> But I think it might be reasonable to remove the SWAPPINESS_ANON_ONLY
> limit now, that can only be triggered by proactive reclaim
> which would tolerate long tailing and won't cause OOM.

It may be better to defer both cases until file and anonymous
pages can be aged separately.

Thanks
Barry

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH RFC] mm/vmscan:Fix the hot/cold inversion when swappiness = 0 or 201
  2026-04-07 23:00     ` Barry Song
@ 2026-04-08  2:35       ` Baolin Wang
  2026-04-08  3:15       ` Kairui Song
  1 sibling, 0 replies; 5+ messages in thread
From: Baolin Wang @ 2026-04-08  2:35 UTC (permalink / raw)
  To: Barry Song, Kairui Song
  Cc: wangzhen, Andrew Morton, Johannes Weiner, David Hildenbrand,
	Michal Hocko, Qi Zheng, Shakeel Butt, Lorenzo Stoakes,
	Axel Rasmussen, Yuanchu Xie, Wei Xu, kasong@tencent.com,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org



On 4/8/26 7:00 AM, Barry Song wrote:
> On Tue, Apr 7, 2026 at 10:26 PM Kairui Song <ryncsn@gmail.com> wrote:
>>
>> On Tue, Apr 07, 2026 at 01:37:08PM +0800, wangzhen wrote:
>>> >From ac731b061f152cba05b9aa351652a04f933986e0 Mon Sep 17 00:00:00 2001
>>> From: w00021541 <wangzhen5@hihonor.com>
>>> Date: Tue, 7 Apr 2026 16:17:53 +0800
>>> Subject: [PATCH RFC] mm/vmscan:Fix the hot/cold inversion when swappiness = 0  or 201
>>>
>>> In some cases, when swappiness is set to 0 or 201, the oldest generation pages will be changed to the newest generation incorrectly.
>>>
>>> Consider the following aging scenario:
>>> MAX_NR_GENS=4, MIN_NR_GENS=2, swappiness=201, 3 anon gens, 4 file gens.
>>> 1. When swappiness = 201, should_run_aging will only check anon type.
>>> should_run_aging return true.
>>> 2. In inc_max_seq, if the anon and file type have MAX_NR_GENS, inc_min_seq will move the oldest generation pages to the second oldest to prepare for increasing max_seq.
>>> Here, the file type will enter inc_min_seq.
>>> 3. In inc_min_seq, first goto is true, the pages migration was skipped, resulting in the inversion of cold/hot pages.
>>>
>>> In fact, when MAX_NR_GENS=4 and MIN_NR_GENS=2, the for loop after the goto is unreachable.
>>>
>>> Consider the code in inc_max_seq:
>>> if (get_nr_gens(lruvec, type) ! = MAX_NR_GENS)
>>>      continue;
>>> This means that only get_nr_gens==4 can enter the inc_min_seq.
>>>
>>> Discuss the swappiness in three different scenarios:
>>> 1<=swappiness<=200:
>>> If should_run_aging returns true, both anon and file types must satisfy get_nr_gens<=3, indicating that no type satisfies get_nr_gens==MAX_NR_GENS.
>>> Therefore, both cannot enter inc_min_seq.
>>>
>>> swappiness=201:
>>> If should_run_aging returns true, the anon type must satisfy get_nr_gens<=3. Only file type can satisfy get_nr_gens==MAX_NR_GENS.
>>> After entering inc_min_seq, type && (swappiness == SWAPPINESS_ANON_ONLY) is true, the for loop will be skipped.
>>>
>>> swappiness=0:
>>> Same as swappiness=201
>>>
>>> so the two goto statements should be removed. This ensures that when swappiness=0 or 201, the oldest generation pages are correctly promoted to the second oldest generation.
>>> (When 1<= swappiness<=200, only both anon and file types get_nr_gens<=3 will age, preventing the inversion of hot/cold pages).
>>>
>>> Signed-off-by: w00021541 <wangzhen5@hihonor.com>

Please use your real name to sign off.

>>> ---
>>>   mm/vmscan.c | 14 +++-----------
>>>   1 file changed, 3 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>> index 0fc9373e8251..54c835b07d3e 100644
>>> --- a/mm/vmscan.c
>>> +++ b/mm/vmscan.c
>>> @@ -3843,7 +3843,7 @@ static void clear_mm_walk(void)
>>>                kfree(walk);
>>>   }
>>>
>>> -static bool inc_min_seq(struct lruvec *lruvec, int type, int swappiness)
>>> +static bool inc_min_seq(struct lruvec *lruvec, int type)
>>>   {
>>>        int zone;
>>>        int remaining = MAX_LRU_BATCH;
>>> @@ -3851,14 +3851,6 @@ static bool inc_min_seq(struct lruvec *lruvec, int type, int swappiness)
>>>        int hist = lru_hist_from_seq(lrugen->min_seq[type]);
>>>        int new_gen, old_gen = lru_gen_from_seq(lrugen->min_seq[type]);
>>>
>>> -     /* For file type, skip the check if swappiness is anon only */
>>> -     if (type && (swappiness == SWAPPINESS_ANON_ONLY))
>>> -             goto done;
>>> -
>>> -     /* For anon type, skip the check if swappiness is zero (file only) */
>>> -     if (!type && !swappiness)
>>> -             goto done;
>>> -
>>
>> Hi, thanks for the patch.
>>
>> We have a very similar patch internally, and the result is kind of bad.
>>
>> Currently MGLRU forbid the gen distance between file and anon go larger
>> than 2, which mean with this patch, when under great pressure, you may
>> have to keep rotating a long list of the opposite type of folios to
>> reclaim another type.
>>
>> For example, when you have only 2 gens of file folios, swap disabled,
>> and there are 3 gens of anon folios. Anon folios are unevictable because
>> there is no SWAP. And file is also unevcitable due to force protection
>> of gen. Consider anon folios are mostly cold (at least a portion of them
>> are), now the oldest gen of anon folios will be very long (e.g. 12G,
>> 3145728 folios).
>>
>> Now, to reclaim any file folios, you have to age first. Before this
>> patch that is usually fast. But after this, it will have to rotate
>> all 3145728 folios to second oldest anon gen, will could take a
>> very long time.

I have the same concern. In many of our scenarios, swap is disabled 
(swappiness=0), and we only reclaim file folios. In such cases, the 
workloads really don’t care about the hot/cold status of anonymous folios.

>> During that period any concurrent reclaimer will get rejected
>> due to force protection, result in very ugly long tailing or
>> unexpected OOM.
>>
>> So I agree this is a good idea in general, I agree we should do
>> this. But better defer this until we patch up MGLRU to remove
>> the force protection first.
> 
> I suspect that once we can age file and anonymous pages
> separately, this issue will resolve itself. David already has
> some code for this [1].
> 
> Not sure when he will have time to push it upstream, but I
> may carve out some time to take care of it this month.
> 
> [1] https://lore.kernel.org/linux-mm/aam5nOyXs1sNdjTe@google.com/

Great. Sounds reasonable to me.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH RFC] mm/vmscan:Fix the hot/cold inversion when swappiness = 0 or 201
  2026-04-07 23:00     ` Barry Song
  2026-04-08  2:35       ` Baolin Wang
@ 2026-04-08  3:15       ` Kairui Song
  1 sibling, 0 replies; 5+ messages in thread
From: Kairui Song @ 2026-04-08  3:15 UTC (permalink / raw)
  To: Barry Song
  Cc: wangzhen, Andrew Morton, Johannes Weiner, David Hildenbrand,
	Michal Hocko, Qi Zheng, Shakeel Butt, Lorenzo Stoakes,
	Axel Rasmussen, Yuanchu Xie, Wei Xu, kasong@tencent.com,
	baolin.wang@linux.alibaba.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org

On Wed, Apr 08, 2026 at 07:00:17AM +0800, Barry Song wrote:
> On Tue, Apr 7, 2026 at 10:26 PM Kairui Song <ryncsn@gmail.com> wrote:
> >
> > On Tue, Apr 07, 2026 at 01:37:08PM +0800, wangzhen wrote:
> > > >From ac731b061f152cba05b9aa351652a04f933986e0 Mon Sep 17 00:00:00 2001
> > > From: w00021541 <wangzhen5@hihonor.com>
> > > Date: Tue, 7 Apr 2026 16:17:53 +0800
> > > Subject: [PATCH RFC] mm/vmscan:Fix the hot/cold inversion when swappiness = 0  or 201
> > >
> > > In some cases, when swappiness is set to 0 or 201, the oldest generation pages will be changed to the newest generation incorrectly.
> > >
> > > Consider the following aging scenario:
> > > MAX_NR_GENS=4, MIN_NR_GENS=2, swappiness=201, 3 anon gens, 4 file gens.
> > > 1. When swappiness = 201, should_run_aging will only check anon type.
> > > should_run_aging return true.
> > > 2. In inc_max_seq, if the anon and file type have MAX_NR_GENS, inc_min_seq will move the oldest generation pages to the second oldest to prepare for increasing max_seq.
> > > Here, the file type will enter inc_min_seq.
> > > 3. In inc_min_seq, first goto is true, the pages migration was skipped, resulting in the inversion of cold/hot pages.
> > >
> > > In fact, when MAX_NR_GENS=4 and MIN_NR_GENS=2, the for loop after the goto is unreachable.
> > >
> > > Consider the code in inc_max_seq:
> > > if (get_nr_gens(lruvec, type) ! = MAX_NR_GENS)
> > >     continue;
> > > This means that only get_nr_gens==4 can enter the inc_min_seq.
> > >
> > > Discuss the swappiness in three different scenarios:
> > > 1<=swappiness<=200:
> > > If should_run_aging returns true, both anon and file types must satisfy get_nr_gens<=3, indicating that no type satisfies get_nr_gens==MAX_NR_GENS.
> > > Therefore, both cannot enter inc_min_seq.
> > >
> > > swappiness=201:
> > > If should_run_aging returns true, the anon type must satisfy get_nr_gens<=3. Only file type can satisfy get_nr_gens==MAX_NR_GENS.
> > > After entering inc_min_seq, type && (swappiness == SWAPPINESS_ANON_ONLY) is true, the for loop will be skipped.
> > >
> > > swappiness=0:
> > > Same as swappiness=201
> > >
> > > so the two goto statements should be removed. This ensures that when swappiness=0 or 201, the oldest generation pages are correctly promoted to the second oldest generation.
> > > (When 1<= swappiness<=200, only both anon and file types get_nr_gens<=3 will age, preventing the inversion of hot/cold pages).
> > >
> > > Signed-off-by: w00021541 <wangzhen5@hihonor.com>
> > > ---
> > >  mm/vmscan.c | 14 +++-----------
> > >  1 file changed, 3 insertions(+), 11 deletions(-)
> > >
> > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > index 0fc9373e8251..54c835b07d3e 100644
> > > --- a/mm/vmscan.c
> > > +++ b/mm/vmscan.c
> > > @@ -3843,7 +3843,7 @@ static void clear_mm_walk(void)
> > >               kfree(walk);
> > >  }
> > >
> > > -static bool inc_min_seq(struct lruvec *lruvec, int type, int swappiness)
> > > +static bool inc_min_seq(struct lruvec *lruvec, int type)
> > >  {
> > >       int zone;
> > >       int remaining = MAX_LRU_BATCH;
> > > @@ -3851,14 +3851,6 @@ static bool inc_min_seq(struct lruvec *lruvec, int type, int swappiness)
> > >       int hist = lru_hist_from_seq(lrugen->min_seq[type]);
> > >       int new_gen, old_gen = lru_gen_from_seq(lrugen->min_seq[type]);
> > >
> > > -     /* For file type, skip the check if swappiness is anon only */
> > > -     if (type && (swappiness == SWAPPINESS_ANON_ONLY))
> > > -             goto done;
> > > -
> > > -     /* For anon type, skip the check if swappiness is zero (file only) */
> > > -     if (!type && !swappiness)
> > > -             goto done;
> > > -
> >
> > Hi, thanks for the patch.
> >
> > We have a very similar patch internally, and the result is kind of bad.
> >
> > Currently MGLRU forbid the gen distance between file and anon go larger
> > than 2, which mean with this patch, when under great pressure, you may
> > have to keep rotating a long list of the opposite type of folios to
> > reclaim another type.
> >
> > For example, when you have only 2 gens of file folios, swap disabled,
> > and there are 3 gens of anon folios. Anon folios are unevictable because
> > there is no SWAP. And file is also unevcitable due to force protection
> > of gen. Consider anon folios are mostly cold (at least a portion of them
> > are), now the oldest gen of anon folios will be very long (e.g. 12G,
> > 3145728 folios).
> >
> > Now, to reclaim any file folios, you have to age first. Before this
> > patch that is usually fast. But after this, it will have to rotate
> > all 3145728 folios to second oldest anon gen, will could take a
> > very long time.
> >
> > During that period any concurrent reclaimer will get rejected
> > due to force protection, result in very ugly long tailing or
> > unexpected OOM.
> >
> > So I agree this is a good idea in general, I agree we should do
> > this. But better defer this until we patch up MGLRU to remove
> > the force protection first.
> 
> I suspect that once we can age file and anonymous pages
> separately, this issue will resolve itself. David already has
> some code for this [1].
> 
> Not sure when he will have time to push it upstream, but I
> may carve out some time to take care of it this month.
> 
> [1] https://lore.kernel.org/linux-mm/aam5nOyXs1sNdjTe@google.com/

Hi, thanks for sharing the idea.

Right, a few weeks ago I also got info from CachyOS that they are using
following patch for MGLRU:

https://github.com/firelzrd/re-swappiness

The idea is also split the seq number for anon / file so swappiness
works again.

However, I really not sure if this is the right approach. It changes
the model of MGLRU and things like TTL may no longer work as expected.
And TTL does solve real problems too (also from CachyOS):

https://github.com/firelzrd/le9uo

TTL replaced the le9 patch above in a cleaner way for thrashing
prevention.

Right now we do page table walk (and it walks both anon / folio)
while generating one unified new gen, meaning the folios in that
gen have the same (or at least all older than a specific) access
time, which is used as the metric for TTL.

Besides, having unified gens also help implementing things like
workingset reporting where each gen is like a bin for histogram:

https://lwn.net/Articles/976985/

Aging triggering could be a bit more problematic too.
I think the right way is to just do the aging asynchronously, Yu
even left a TODO comment in vmscan.c:

/*
 * For future optimizations:
 * 1. Defer try_to_inc_max_seq() to workqueues to reduce latency for memcg
 *    reclaim.
 */

Then, we start the aging when ever there is less than 4 gens, and
allow reclaim to always go on even if there is only 2 gens left.

The performance would be better since the is no more blocking
on aging, no change to existing model, and the change should
be smaller and easier to review IIUC.

One concerning part is doing reclaim while only having 2 gens left.
I think it seems OK. It should be rare as 3 gens act as a buffer
already, having only 2 gens left means the async aging can't catch
up and system is under extreme pressure so it's unlikely the folios
will get access enough times to get meaningful heat info, and
refault will be more meaningful help to sorting out the workingset:

https://lwn.net/Articles/945266/

Cgroup reclaim can do some throttling on that too, and kswapd can
still do aging synchronically.

Just some ideas, we may need to do some test and benchmark
to figure out which is the best solution. Discussion
is welcomed! :D

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-04-08  3:15 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <7829b070df1b405dbc97dd6a028d8c8a@honor.com>
2026-04-07 13:37 ` [PATCH RFC] mm/vmscan:Fix the hot/cold inversion when swappiness = 0 or 201 wangzhen
2026-04-07 14:25   ` Kairui Song
2026-04-07 23:00     ` Barry Song
2026-04-08  2:35       ` Baolin Wang
2026-04-08  3:15       ` Kairui Song

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox