* Re: [PATCH] mm/mglru: Use folio_mark_accessed to replace folio_set_active in PF
2026-04-18 12:02 [PATCH] mm/mglru: Use folio_mark_accessed to replace folio_set_active in PF Barry Song (Xiaomi)
@ 2026-04-24 11:53 ` Andrew Morton
2026-04-24 14:10 ` Andrew Morton
` (2 subsequent siblings)
3 siblings, 0 replies; 6+ messages in thread
From: Andrew Morton @ 2026-04-24 11:53 UTC (permalink / raw)
To: Barry Song (Xiaomi)
Cc: linux-mm, linux-kernel, Lance Yang, Xueyuan Chen, Kairui Song,
Qi Zheng, Shakeel Butt, wangzicheng, Suren Baghdasaryan, Lei Liu,
Matthew Wilcox, Axel Rasmussen, Yuanchu Xie, Wei Xu, Will Deacon
On Sat, 18 Apr 2026 20:02:33 +0800 "Barry Song (Xiaomi)" <baohua@kernel.org> wrote:
> MGLRU gives high priority to folios mapped in page tables.
> As a result, folio_set_active() is invoked for all folios
> read during page faults. In practice, however, readahead
> can bring in many folios that are never accessed via page
> tables.
>
> A previous attempt by Lei Liu proposed introducing a separate
> LRU for readahead[1] to make readahead pages easier to reclaim,
> but that approach is likely over-engineered.
>
> Before commit 4d5d14a01e2c ("mm/mglru: rework workingset
> protection"), folios with PG_active were always placed in
> the youngest generation, leading to over-protection and
> increased refaults. After that commit, PG_active folios
> are placed in the second youngest generation, which is
> still too optimistic given the presence of readahead. In
> contrast, the classic active/inactive scheme is more
> conservative.
>
> This patch switches to folio_mark_accessed(). If
> folio_check_references() later detects referenced PTEs,
> the folio will be promoted based on the reference flag
> set by folio_mark_accessed().
>
> The following uses a simple model to demonstrate why the current
> code is not ideal. It runs fio-3.42 in a memcg, reading a file in a
> strided pattern—4KB every 64KB—to simulate prefaulted pages that may
> not be accessed.
Are you able to suggest any workloads which might regress? And test
for those?
> Without the patch, we observed 12883855 file refaults and a very low
> bandwidth of 58.5 MiB/s, because prefaulted but unused pages occupy
> hot positions, continuously pushing out the real working set and
> causing incorrect reclaim. With the patch, we observed 0 refaults
> and bandwidth increased to 5078 MiB/s.
Wow. And that isn't a crazy workload.
> For those who want to try the model on x86, you will need the
> following in arch/x86/include/asm/pgtable.h.
>
> #define arch_wants_old_prefaulted_pte arch_wants_old_prefaulted_pte
> static inline bool arch_wants_old_prefaulted_pte(void)
> {
> return true;
> }
Can you propose a patch? We can at least toss it in there for testing
while we think about it.
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -512,7 +512,7 @@ void folio_add_lru(struct folio *folio)
> /* see the comment in lru_gen_folio_seq() */
> if (lru_gen_enabled() && !folio_test_unevictable(folio) &&
> lru_gen_in_fault() && !(current->flags & PF_MEMALLOC))
> - folio_set_active(folio);
> + folio_mark_accessed(folio);
>
> folio_batch_add_and_move(folio, lru_add);
> }
lol, I was expecting something larger ;)
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH] mm/mglru: Use folio_mark_accessed to replace folio_set_active in PF
2026-04-18 12:02 [PATCH] mm/mglru: Use folio_mark_accessed to replace folio_set_active in PF Barry Song (Xiaomi)
2026-04-24 11:53 ` Andrew Morton
@ 2026-04-24 14:10 ` Andrew Morton
2026-04-24 15:19 ` Pedro Falcato
2026-04-24 17:03 ` Shakeel Butt
3 siblings, 0 replies; 6+ messages in thread
From: Andrew Morton @ 2026-04-24 14:10 UTC (permalink / raw)
To: Barry Song (Xiaomi)
Cc: linux-mm, linux-kernel, Lance Yang, Xueyuan Chen, Kairui Song,
Qi Zheng, Shakeel Butt, wangzicheng, Suren Baghdasaryan, Lei Liu,
Matthew Wilcox, Axel Rasmussen, Yuanchu Xie, Wei Xu, Will Deacon
On Sat, 18 Apr 2026 20:02:33 +0800 "Barry Song (Xiaomi)" <baohua@kernel.org> wrote:
> MGLRU gives high priority to folios mapped in page tables.
> As a result, folio_set_active() is invoked for all folios
> read during page faults. In practice, however, readahead
> can bring in many folios that are never accessed via page
> tables.
>
> A previous attempt by Lei Liu proposed introducing a separate
> LRU for readahead[1] to make readahead pages easier to reclaim,
> but that approach is likely over-engineered.
>
> Before commit 4d5d14a01e2c ("mm/mglru: rework workingset
> protection"), folios with PG_active were always placed in
> the youngest generation, leading to over-protection and
> increased refaults. After that commit, PG_active folios
> are placed in the second youngest generation, which is
> still too optimistic given the presence of readahead. In
> contrast, the classic active/inactive scheme is more
> conservative.
>
> This patch switches to folio_mark_accessed(). If
> folio_check_references() later detects referenced PTEs,
> the folio will be promoted based on the reference flag
> set by folio_mark_accessed().
Sashiko: https://sashiko.dev/#/patchset/20260418120233.7162-1-baohua@kernel.org
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH] mm/mglru: Use folio_mark_accessed to replace folio_set_active in PF
2026-04-18 12:02 [PATCH] mm/mglru: Use folio_mark_accessed to replace folio_set_active in PF Barry Song (Xiaomi)
2026-04-24 11:53 ` Andrew Morton
2026-04-24 14:10 ` Andrew Morton
@ 2026-04-24 15:19 ` Pedro Falcato
2026-04-26 4:35 ` Barry Song
2026-04-24 17:03 ` Shakeel Butt
3 siblings, 1 reply; 6+ messages in thread
From: Pedro Falcato @ 2026-04-24 15:19 UTC (permalink / raw)
To: Barry Song (Xiaomi)
Cc: akpm, linux-mm, linux-kernel, Lance Yang, Xueyuan Chen,
Kairui Song, Qi Zheng, Shakeel Butt, wangzicheng,
Suren Baghdasaryan, Lei Liu, Matthew Wilcox, Axel Rasmussen,
Yuanchu Xie, Wei Xu, Will Deacon
On Sat, Apr 18, 2026 at 08:02:33PM +0800, Barry Song (Xiaomi) wrote:
> MGLRU gives high priority to folios mapped in page tables.
> As a result, folio_set_active() is invoked for all folios
> read during page faults. In practice, however, readahead
> can bring in many folios that are never accessed via page
> tables.
>
> A previous attempt by Lei Liu proposed introducing a separate
> LRU for readahead[1] to make readahead pages easier to reclaim,
> but that approach is likely over-engineered.
Why does this even need to be kept? I'm not sure it makes sense
to even mark readahead folios as referenced.
I'd suggest folios should only be marked referenced (or even active, whatever)
when they're mapped. Anything else is a bit random and is hoping you are
eventually going to map them in the future (which is not true for, for example,
anything in an ELF file that may be readahead but not mapped, like debug info,
symbol tables, section headers, relocation tables, etc etc)
--
Pedro
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] mm/mglru: Use folio_mark_accessed to replace folio_set_active in PF
2026-04-24 15:19 ` Pedro Falcato
@ 2026-04-26 4:35 ` Barry Song
0 siblings, 0 replies; 6+ messages in thread
From: Barry Song @ 2026-04-26 4:35 UTC (permalink / raw)
To: Pedro Falcato
Cc: akpm, linux-mm, linux-kernel, Lance Yang, Xueyuan Chen,
Kairui Song, Qi Zheng, Shakeel Butt, wangzicheng,
Suren Baghdasaryan, Lei Liu, Matthew Wilcox, Axel Rasmussen,
Yuanchu Xie, Wei Xu, Will Deacon
On Fri, Apr 24, 2026 at 11:19 PM Pedro Falcato <pfalcato@suse.de> wrote:
>
> On Sat, Apr 18, 2026 at 08:02:33PM +0800, Barry Song (Xiaomi) wrote:
> > MGLRU gives high priority to folios mapped in page tables.
> > As a result, folio_set_active() is invoked for all folios
> > read during page faults. In practice, however, readahead
> > can bring in many folios that are never accessed via page
> > tables.
> >
> > A previous attempt by Lei Liu proposed introducing a separate
> > LRU for readahead[1] to make readahead pages easier to reclaim,
> > but that approach is likely over-engineered.
>
> Why does this even need to be kept? I'm not sure it makes sense
> to even mark readahead folios as referenced.
>
> I'd suggest folios should only be marked referenced (or even active, whatever)
> when they're mapped. Anything else is a bit random and is hoping you are
> eventually going to map them in the future (which is not true for, for example,
> anything in an ELF file that may be readahead but not mapped, like debug info,
> symbol tables, section headers, relocation tables, etc etc)
The patch targets the mmap readahead path rather than the syscall
readahead path.
With lru_gen_in_fault() in place, it’s roughly equivalent to
the mapped case, since readahead is typically 128 KB while
fault_around is 64 KB in PF.
Thanks
Barry
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] mm/mglru: Use folio_mark_accessed to replace folio_set_active in PF
2026-04-18 12:02 [PATCH] mm/mglru: Use folio_mark_accessed to replace folio_set_active in PF Barry Song (Xiaomi)
` (2 preceding siblings ...)
2026-04-24 15:19 ` Pedro Falcato
@ 2026-04-24 17:03 ` Shakeel Butt
3 siblings, 0 replies; 6+ messages in thread
From: Shakeel Butt @ 2026-04-24 17:03 UTC (permalink / raw)
To: Barry Song (Xiaomi)
Cc: akpm, linux-mm, linux-kernel, Lance Yang, Xueyuan Chen,
Kairui Song, Qi Zheng, wangzicheng, Suren Baghdasaryan, Lei Liu,
Matthew Wilcox, Axel Rasmussen, Yuanchu Xie, Wei Xu, Will Deacon
On Sat, Apr 18, 2026 at 08:02:33PM +0800, Barry Song (Xiaomi) wrote:
> MGLRU gives high priority to folios mapped in page tables.
> As a result, folio_set_active() is invoked for all folios
> read during page faults. In practice, however, readahead
> can bring in many folios that are never accessed via page
> tables.
>
> A previous attempt by Lei Liu proposed introducing a separate
> LRU for readahead[1] to make readahead pages easier to reclaim,
> but that approach is likely over-engineered.
>
> Before commit 4d5d14a01e2c ("mm/mglru: rework workingset
> protection"), folios with PG_active were always placed in
> the youngest generation, leading to over-protection and
> increased refaults. After that commit, PG_active folios
> are placed in the second youngest generation, which is
> still too optimistic given the presence of readahead. In
> contrast, the classic active/inactive scheme is more
> conservative.
>
> This patch switches to folio_mark_accessed(). If
> folio_check_references() later detects referenced PTEs,
> the folio will be promoted based on the reference flag
> set by folio_mark_accessed().
>
There is a following comment and stat update in lru_gen_refault() which is
referring to setting active bit which this patch is removing.
/* see folio_add_lru() where folio_set_active() will be called */
if (lru_gen_in_fault())
mod_lruvec_state(lruvec, WORKINGSET_ACTIVATE_BASE + type, delta);
Is this still relevant or need changes?
I have not yet dig deeper into the patch and the heuristic. Will do later.
^ permalink raw reply [flat|nested] 6+ messages in thread