* [PATCH] mm: skip dirty file folios during isolation of legacy LRU
@ 2026-03-20 8:33 zhaoyang.huang
2026-03-20 9:19 ` Kairui Song
0 siblings, 1 reply; 5+ messages in thread
From: zhaoyang.huang @ 2026-03-20 8:33 UTC (permalink / raw)
To: Andrew Morton, Axel Rasmussen, Yuanchu Xie, Wei Xu,
Johannes Weiner, David Hildenbrand, Michal Hocko, Qi Zheng,
Matthew Wilcox, Shakeel Butt, Lorenzo Stoakes, linux-mm,
linux-kernel, Zhaoyang Huang, steve.kang
From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
Since dirty file folios are no longer writeout in reclaiming after
'commit 84798514db50 ("mm: Remove swap_writepage() and
shmem_writepage()")', there is no need to isolate them which could help
to improve the scan efficiency and decrease the unnecessary TLB flush.
This commit would like to bring the dirty file folios detection forward
to isolation phase as well as the statistics which could affect wakeup
the flusher thread under legacy LRU. In terms of MGLRU, the dirty file
folios have been brought to younger gen when sort_folios.
Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
---
mm/vmscan.c | 103 ++++++++++++++++++++++++++++------------------------
1 file changed, 55 insertions(+), 48 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 10f1e7d716ca..79e5910ac62e 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1103,7 +1103,6 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
struct address_space *mapping;
struct folio *folio;
enum folio_references references = FOLIOREF_RECLAIM;
- bool dirty, writeback;
unsigned int nr_pages;
cond_resched();
@@ -1142,26 +1141,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
if (!sc->may_unmap && folio_mapped(folio))
goto keep_locked;
- /*
- * The number of dirty pages determines if a node is marked
- * reclaim_congested. kswapd will stall and start writing
- * folios if the tail of the LRU is all dirty unqueued folios.
- */
- folio_check_dirty_writeback(folio, &dirty, &writeback);
- if (dirty || writeback)
- stat->nr_dirty += nr_pages;
- if (dirty && !writeback)
- stat->nr_unqueued_dirty += nr_pages;
-
- /*
- * Treat this folio as congested if folios are cycling
- * through the LRU so quickly that the folios marked
- * for immediate reclaim are making it to the end of
- * the LRU a second time.
- */
- if (writeback && folio_test_reclaim(folio))
- stat->nr_congested += nr_pages;
/*
* If a folio at the tail of the LRU is under writeback, there
@@ -1717,12 +1697,14 @@ static unsigned long isolate_lru_folios(unsigned long nr_to_scan,
unsigned long nr_zone_taken[MAX_NR_ZONES] = { 0 };
unsigned long nr_skipped[MAX_NR_ZONES] = { 0, };
unsigned long skipped = 0, total_scan = 0, scan = 0;
+ unsigned long nr_dirty = 0, nr_unqueued_dirty = 0, nr_congested = 0;
unsigned long nr_pages;
unsigned long max_nr_skipped = 0;
LIST_HEAD(folios_skipped);
while (scan < nr_to_scan && !list_empty(src)) {
struct list_head *move_to = src;
+ bool dirty, writeback;
struct folio *folio;
folio = lru_to_folio(src);
@@ -1749,6 +1731,30 @@ static unsigned long isolate_lru_folios(unsigned long nr_to_scan,
*/
scan += nr_pages;
+ if (!folio_trylock(folio))
+ goto move;
+ /*
+ * The number of dirty pages determines if a node is marked
+ * reclaim_congested. kswapd will stall and start writing
+ * folios if the tail of the LRU is all dirty unqueued folios.
+ */
+ folio_check_dirty_writeback(folio, &dirty, &writeback);
+ folio_unlock(folio);
+
+ if (dirty || writeback)
+ nr_dirty += nr_pages;
+
+ if (dirty && !writeback)
+ nr_unqueued_dirty += nr_pages;
+ /*
+ * Treat this folio as congested if folios are cycling
+ * through the LRU so quickly that the folios marked
+ * for immediate reclaim are making it to the end of
+ * the LRU a second time.
+ */
+ if (writeback && folio_test_reclaim(folio))
+ nr_congested += nr_pages;
+
if (!folio_test_lru(folio))
goto move;
if (!sc->may_unmap && folio_mapped(folio))
@@ -1798,6 +1804,35 @@ static unsigned long isolate_lru_folios(unsigned long nr_to_scan,
trace_mm_vmscan_lru_isolate(sc->reclaim_idx, sc->order, nr_to_scan,
total_scan, skipped, nr_taken, lru);
update_lru_sizes(lruvec, lru, nr_zone_taken);
+ /*
+ * If dirty folios are scanned that are not queued for IO, it
+ * implies that flushers are not doing their job. This can
+ * happen when memory pressure pushes dirty folios to the end of
+ * the LRU before the dirty limits are breached and the dirty
+ * data has expired. It can also happen when the proportion of
+ * dirty folios grows not through writes but through memory
+ * pressure reclaiming all the clean cache. And in some cases,
+ * the flushers simply cannot keep up with the allocation
+ * rate. Nudge the flusher threads in case they are asleep.
+ */
+ if (nr_unqueued_dirty == scan) {
+ wakeup_flusher_threads(WB_REASON_VMSCAN);
+ /*
+ * For cgroupv1 dirty throttling is achieved by waking up
+ * the kernel flusher here and later waiting on folios
+ * which are in writeback to finish (see shrink_folio_list()).
+ *
+ * Flusher may not be able to issue writeback quickly
+ * enough for cgroupv1 writeback throttling to work
+ * on a large system.
+ */
+ if (!writeback_throttling_sane(sc))
+ reclaim_throttle(lruvec_pgdat(lruvec), VMSCAN_THROTTLE_WRITEBACK);
+ }
+ sc->nr.dirty += nr_dirty;
+ sc->nr.congested += nr_congested;
+ sc->nr.unqueued_dirty += nr_unqueued_dirty;
+
return nr_taken;
}
@@ -2038,35 +2073,7 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan,
lru_note_cost_unlock_irq(lruvec, file, stat.nr_pageout,
nr_scanned - nr_reclaimed);
- /*
- * If dirty folios are scanned that are not queued for IO, it
- * implies that flushers are not doing their job. This can
- * happen when memory pressure pushes dirty folios to the end of
- * the LRU before the dirty limits are breached and the dirty
- * data has expired. It can also happen when the proportion of
- * dirty folios grows not through writes but through memory
- * pressure reclaiming all the clean cache. And in some cases,
- * the flushers simply cannot keep up with the allocation
- * rate. Nudge the flusher threads in case they are asleep.
- */
- if (stat.nr_unqueued_dirty == nr_taken) {
- wakeup_flusher_threads(WB_REASON_VMSCAN);
- /*
- * For cgroupv1 dirty throttling is achieved by waking up
- * the kernel flusher here and later waiting on folios
- * which are in writeback to finish (see shrink_folio_list()).
- *
- * Flusher may not be able to issue writeback quickly
- * enough for cgroupv1 writeback throttling to work
- * on a large system.
- */
- if (!writeback_throttling_sane(sc))
- reclaim_throttle(pgdat, VMSCAN_THROTTLE_WRITEBACK);
- }
- sc->nr.dirty += stat.nr_dirty;
- sc->nr.congested += stat.nr_congested;
- sc->nr.unqueued_dirty += stat.nr_unqueued_dirty;
sc->nr.writeback += stat.nr_writeback;
sc->nr.immediate += stat.nr_immediate;
sc->nr.taken += nr_taken;
--
2.25.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] mm: skip dirty file folios during isolation of legacy LRU
2026-03-20 8:33 [PATCH] mm: skip dirty file folios during isolation of legacy LRU zhaoyang.huang
@ 2026-03-20 9:19 ` Kairui Song
2026-03-20 9:30 ` Zhaoyang Huang
2026-03-23 9:17 ` Zhaoyang Huang
0 siblings, 2 replies; 5+ messages in thread
From: Kairui Song @ 2026-03-20 9:19 UTC (permalink / raw)
To: zhaoyang.huang
Cc: Andrew Morton, Axel Rasmussen, Yuanchu Xie, Wei Xu,
Johannes Weiner, David Hildenbrand, Michal Hocko, Qi Zheng,
Matthew Wilcox, Shakeel Butt, Lorenzo Stoakes, linux-mm,
linux-kernel, Zhaoyang Huang, steve.kang
On Fri, Mar 20, 2026 at 4:34 PM zhaoyang.huang
<zhaoyang.huang@unisoc.com> wrote:
>
> From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
Hi Zhaoyang,
> Since dirty file folios are no longer writeout in reclaiming after
> 'commit 84798514db50 ("mm: Remove swap_writepage() and
> shmem_writepage()")', there is no need to isolate them which could help
> to improve the scan efficiency and decrease the unnecessary TLB flush.
But you are still isolating them with this patch, you just adjusted
where the statistical update happens.
And this is kind of opposite thing to what I'm trying to do here:
https://lore.kernel.org/linux-mm/20260318-mglru-reclaim-v1-0-2c46f9eb0508@tencent.com/
> This commit would like to bring the dirty file folios detection forward
> to isolation phase as well as the statistics which could affect wakeup
> the flusher thread under legacy LRU. In terms of MGLRU, the dirty file
> folios have been brought to younger gen when sort_folios.
If you really just skip isolating them, it could cause a regression:
skipping the isolate and put it back will cause some ping pong effect
on writeback / dirty folios as they will be stuck at inactive list. It
will instead decrease scan efficiency.
Currently shrink_folio_list will reactivate them and set the
PG_reclaim flag. They will be deactivated by writeback callback.
Simply changing that and the flusher wakeup logic could be a bad idea.
You can check the link above and see the benchmark result.
And for under writeback folios, there is no IPI flush or unmap as it
was returned early. For dirty file folios they are unmapped indeed,
but following flush should reclaim them anyway.
It might be a good idea to skip the unmmap part for dirty file folio?
Maybe, some benchmark is needed.
> while (scan < nr_to_scan && !list_empty(src)) {
> struct list_head *move_to = src;
> + bool dirty, writeback;
> struct folio *folio;
>
> folio = lru_to_folio(src);
> @@ -1749,6 +1731,30 @@ static unsigned long isolate_lru_folios(unsigned long nr_to_scan,
> */
> scan += nr_pages;
>
> + if (!folio_trylock(folio))
> + goto move;
> + /*
> + * The number of dirty pages determines if a node is marked
> + * reclaim_congested. kswapd will stall and start writing
> + * folios if the tail of the LRU is all dirty unqueued folios.
> + */
> + folio_check_dirty_writeback(folio, &dirty, &writeback);
> + folio_unlock(folio);
For LRU contention, to force active you always have to take it off the
LRU first, folio_activate will take them off and touch LRU lock
anyway. And now here, there is more work under lruvec lock and it is
also trying to lock the folio under the lruvec lock. The LRU
contention might get worse.
And the wakeup below seems very wrong, you just can't throttle or wait
or sleep under LRU lock.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] mm: skip dirty file folios during isolation of legacy LRU
2026-03-20 9:19 ` Kairui Song
@ 2026-03-20 9:30 ` Zhaoyang Huang
2026-03-23 9:17 ` Zhaoyang Huang
1 sibling, 0 replies; 5+ messages in thread
From: Zhaoyang Huang @ 2026-03-20 9:30 UTC (permalink / raw)
To: Kairui Song
Cc: zhaoyang.huang, Andrew Morton, Axel Rasmussen, Yuanchu Xie,
Wei Xu, Johannes Weiner, David Hildenbrand, Michal Hocko,
Qi Zheng, Matthew Wilcox, Shakeel Butt, Lorenzo Stoakes, linux-mm,
linux-kernel, steve.kang
On Fri, Mar 20, 2026 at 5:20 PM Kairui Song <ryncsn@gmail.com> wrote:
>
> On Fri, Mar 20, 2026 at 4:34 PM zhaoyang.huang
> <zhaoyang.huang@unisoc.com> wrote:
> >
> > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
>
> Hi Zhaoyang,
>
> > Since dirty file folios are no longer writeout in reclaiming after
> > 'commit 84798514db50 ("mm: Remove swap_writepage() and
> > shmem_writepage()")', there is no need to isolate them which could help
> > to improve the scan efficiency and decrease the unnecessary TLB flush.
>
> But you are still isolating them with this patch, you just adjusted
> where the statistical update happens.
>
> And this is kind of opposite thing to what I'm trying to do here:
> https://lore.kernel.org/linux-mm/20260318-mglru-reclaim-v1-0-2c46f9eb0508@tencent.com/
>
> > This commit would like to bring the dirty file folios detection forward
> > to isolation phase as well as the statistics which could affect wakeup
> > the flusher thread under legacy LRU. In terms of MGLRU, the dirty file
> > folios have been brought to younger gen when sort_folios.
>
> If you really just skip isolating them, it could cause a regression:
> skipping the isolate and put it back will cause some ping pong effect
> on writeback / dirty folios as they will be stuck at inactive list. It
> will instead decrease scan efficiency.
>
> Currently shrink_folio_list will reactivate them and set the
> PG_reclaim flag. They will be deactivated by writeback callback.
> Simply changing that and the flusher wakeup logic could be a bad idea.
> You can check the link above and see the benchmark result.
>
> And for under writeback folios, there is no IPI flush or unmap as it
> was returned early. For dirty file folios they are unmapped indeed,
> but following flush should reclaim them anyway.
>
> It might be a good idea to skip the unmmap part for dirty file folio?
> Maybe, some benchmark is needed.
>
> > while (scan < nr_to_scan && !list_empty(src)) {
> > struct list_head *move_to = src;
> > + bool dirty, writeback;
> > struct folio *folio;
> >
> > folio = lru_to_folio(src);
> > @@ -1749,6 +1731,30 @@ static unsigned long isolate_lru_folios(unsigned long nr_to_scan,
> > */
> > scan += nr_pages;
> >
> > + if (!folio_trylock(folio))
> > + goto move;
> > + /*
> > + * The number of dirty pages determines if a node is marked
> > + * reclaim_congested. kswapd will stall and start writing
> > + * folios if the tail of the LRU is all dirty unqueued folios.
> > + */
> > + folio_check_dirty_writeback(folio, &dirty, &writeback);
> > + folio_unlock(folio);
>
> For LRU contention, to force active you always have to take it off the
> LRU first, folio_activate will take them off and touch LRU lock
> anyway. And now here, there is more work under lruvec lock and it is
> also trying to lock the folio under the lruvec lock. The LRU
> contention might get worse.
Thanks for the information and agree with you, it seems that the
simple and right thing is to have dirty folios skip try_to_unmap to
save TLB flush
>
> And the wakeup below seems very wrong, you just can't throttle or wait
> or sleep under LRU lock.
oh, sorry for the stupid change, I don't confirm the context for lock issue
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] mm: skip dirty file folios during isolation of legacy LRU
2026-03-20 9:19 ` Kairui Song
2026-03-20 9:30 ` Zhaoyang Huang
@ 2026-03-23 9:17 ` Zhaoyang Huang
2026-03-23 10:04 ` Kairui Song
1 sibling, 1 reply; 5+ messages in thread
From: Zhaoyang Huang @ 2026-03-23 9:17 UTC (permalink / raw)
To: Kairui Song
Cc: zhaoyang.huang, Andrew Morton, Axel Rasmussen, Yuanchu Xie,
Wei Xu, Johannes Weiner, David Hildenbrand, Michal Hocko,
Qi Zheng, Matthew Wilcox, Shakeel Butt, Lorenzo Stoakes, linux-mm,
linux-kernel, steve.kang
On Fri, Mar 20, 2026 at 5:20 PM Kairui Song <ryncsn@gmail.com> wrote:
>
> On Fri, Mar 20, 2026 at 4:34 PM zhaoyang.huang
> <zhaoyang.huang@unisoc.com> wrote:
> >
> > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
>
> Hi Zhaoyang,
>
> > Since dirty file folios are no longer writeout in reclaiming after
> > 'commit 84798514db50 ("mm: Remove swap_writepage() and
> > shmem_writepage()")', there is no need to isolate them which could help
> > to improve the scan efficiency and decrease the unnecessary TLB flush.
>
> But you are still isolating them with this patch, you just adjusted
> where the statistical update happens.
sorry, I missed the above information in previous feedback. No. The
dirty file folios are moved back to lruvec instead of being isolated
under this patch. How about apply this only when isolate_lru_folios is
called from shrink_active_list which has no worries about stuck the
inactive list.
>
> And this is kind of opposite thing to what I'm trying to do here:
> https://lore.kernel.org/linux-mm/20260318-mglru-reclaim-v1-0-2c46f9eb0508@tencent.com/
>
> > This commit would like to bring the dirty file folios detection forward
> > to isolation phase as well as the statistics which could affect wakeup
> > the flusher thread under legacy LRU. In terms of MGLRU, the dirty file
> > folios have been brought to younger gen when sort_folios.
>
> If you really just skip isolating them, it could cause a regression:
> skipping the isolate and put it back will cause some ping pong effect
> on writeback / dirty folios as they will be stuck at inactive list. It
> will instead decrease scan efficiency.
>
> Currently shrink_folio_list will reactivate them and set the
> PG_reclaim flag. They will be deactivated by writeback callback.
> Simply changing that and the flusher wakeup logic could be a bad idea.
> You can check the link above and see the benchmark result.
>
> And for under writeback folios, there is no IPI flush or unmap as it
> was returned early. For dirty file folios they are unmapped indeed,
> but following flush should reclaim them anyway.
>
> It might be a good idea to skip the unmmap part for dirty file folio?
> Maybe, some benchmark is needed.
>
> > while (scan < nr_to_scan && !list_empty(src)) {
> > struct list_head *move_to = src;
> > + bool dirty, writeback;
> > struct folio *folio;
> >
> > folio = lru_to_folio(src);
> > @@ -1749,6 +1731,30 @@ static unsigned long isolate_lru_folios(unsigned long nr_to_scan,
> > */
> > scan += nr_pages;
> >
> > + if (!folio_trylock(folio))
> > + goto move;
> > + /*
> > + * The number of dirty pages determines if a node is marked
> > + * reclaim_congested. kswapd will stall and start writing
> > + * folios if the tail of the LRU is all dirty unqueued folios.
> > + */
> > + folio_check_dirty_writeback(folio, &dirty, &writeback);
> > + folio_unlock(folio);
>
> For LRU contention, to force active you always have to take it off the
> LRU first, folio_activate will take them off and touch LRU lock
> anyway. And now here, there is more work under lruvec lock and it is
> also trying to lock the folio under the lruvec lock. The LRU
> contention might get worse.
>
> And the wakeup below seems very wrong, you just can't throttle or wait
> or sleep under LRU lock.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] mm: skip dirty file folios during isolation of legacy LRU
2026-03-23 9:17 ` Zhaoyang Huang
@ 2026-03-23 10:04 ` Kairui Song
0 siblings, 0 replies; 5+ messages in thread
From: Kairui Song @ 2026-03-23 10:04 UTC (permalink / raw)
To: Zhaoyang Huang
Cc: zhaoyang.huang, Andrew Morton, Axel Rasmussen, Yuanchu Xie,
Wei Xu, Johannes Weiner, David Hildenbrand, Michal Hocko,
Qi Zheng, Matthew Wilcox, Shakeel Butt, Lorenzo Stoakes, linux-mm,
linux-kernel, steve.kang
On Mon, Mar 23, 2026 at 5:17 PM Zhaoyang Huang <huangzhaoyang@gmail.com> wrote:
>
> On Fri, Mar 20, 2026 at 5:20 PM Kairui Song <ryncsn@gmail.com> wrote:
> >
> > On Fri, Mar 20, 2026 at 4:34 PM zhaoyang.huang
> > <zhaoyang.huang@unisoc.com> wrote:
> > >
> > > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> >
> > Hi Zhaoyang,
> >
> > > Since dirty file folios are no longer writeout in reclaiming after
> > > 'commit 84798514db50 ("mm: Remove swap_writepage() and
> > > shmem_writepage()")', there is no need to isolate them which could help
> > > to improve the scan efficiency and decrease the unnecessary TLB flush.
> >
> > But you are still isolating them with this patch, you just adjusted
> > where the statistical update happens.
> sorry, I missed the above information in previous feedback. No. The
Hi Zhaoyang
No worries, feel free to discuss anytime.
> dirty file folios are moved back to lruvec instead of being isolated
> under this patch. How about apply this only when isolate_lru_folios is
> called from shrink_active_list which has no worries about stuck the
> inactive list.
Hmm, No? Reading your code in isolate_lru_folios, you do "goto move"
when "folio_trylock" fails. But for "if (dirty || writeback)" folios,
you only do "nr_dirty += nr_pages;". Am I missing anything?
And as for the different behavior for active / inacitve isolation, I
don't know if that's a valid optimization worth the complicity, and
the worse part is it may break the dirty flush wake up logic, as
shrinker may never see any dirty writeback folios now.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-03-23 10:05 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-20 8:33 [PATCH] mm: skip dirty file folios during isolation of legacy LRU zhaoyang.huang
2026-03-20 9:19 ` Kairui Song
2026-03-20 9:30 ` Zhaoyang Huang
2026-03-23 9:17 ` Zhaoyang Huang
2026-03-23 10:04 ` Kairui Song
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox