[PATCH v4 0/5] mm: batch TLB flushing for dirty folios in vmscan

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v4 0/5] mm: batch TLB flushing for dirty folios in vmscan
@ 2026-05-25 14:57 Zhang Peng
  2026-05-25 14:57 ` [PATCH v4 1/5] mm/vmscan: introduce folio_activate_locked() helper Zhang Peng
                   ` (5 more replies)
  0 siblings, 6 replies; 14+ messages in thread
From: Zhang Peng @ 2026-05-25 14:57 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Johannes Weiner, Shakeel Butt, Axel Rasmussen, Yuanchu Xie,
	Wei Xu, Michal Hocko, Liam R. Howlett, Qi Zheng
  Cc: linux-mm, linux-kernel, Barry Song, Kairui Song, Zhang Peng

This series introduces batch TLB flushing optimization for dirty folios
during memory reclaim, aiming to reduce IPI overhead on multi-core systems.

Background
----------
Currently, when performing pageout in memory reclaim, try_to_unmap_flush_dirty()
is called for each dirty folio individually. On multi-core systems, this causes
frequent IPIs which can significantly impact performance.

Approach
--------
This patch series accumulates dirty folios into batches and performs a single
TLB flush for the entire batch, rather than flushing for each individual folio.

Changes
-------
Patch 1: Extract the folio activation block at activate_locked into
         folio_activate_locked().
Patch 2: Extract the folio-freeing path (buffer release, lazyfree,
         __remove_mapping, folio_batch drain) into folio_free().
Patch 3: Extract the pageout() dispatch state machine into pageout_one().
Patch 4: Extract the TTU setup and try_to_unmap() block into folio_try_unmap().
Patch 5: Implement batch TLB flushing logic. Dirty folios are accumulated in
         batches and a single TLB flush is performed for each batch before
         calling pageout.

Testing
-------
The benchmark script uses stress-ng to compare TLB shootdown behavior before and
after this patch. It constrains a stress-ng workload via memcg to force reclaim
through shrink_folio_list(), reporting TLB shootdowns and IPIs.

Core benchmark command: stress-ng --vm 16 --vm-bytes 2G --vm-keep --timeout 60

==========================================================================
                 batch_dirty_tlb_flush Benchmark Results
==========================================================================
  Kernel: 7.0.0-rc1+   CPUs: 16
  MemTotal: 31834M   SwapTotal: 8191M
  memcg limit: 512M   alloc: 2G   workers: 16   duration: 60s
--------------------------------------------------------------------------
Metric                 Before        After             Delta (abs / %)
--------------------------------------------------------------------------
bogo ops/s             28238.63      35833.97          +7595.34 (+26.9%)
TLB shootdowns         55428953      17621697          -37807256 (-68.2%)
Function call IPIs     34073695      14498768          -19574927 (-57.4%)
pgscan_anon (pages)    52856224      60252894          7396670 (+14.0%)
pgsteal_anon (pages)   29004962      34054753          5049791 (+17.4%)
--------------------------------------------------------------------------

Suggested-by: Kairui Song <kasong@tencent.com>
Signed-off-by: Zhang Peng <bruzzhang@tencent.com>
---
Changes in v4 (addressing Barry Song's review on v3):
- Drop the "track reclaimed pages in reclaim_stat" patch; keep
  shrink_folio_list() returning nr_reclaimed directly. Avoids touching
  the function signature and its MGLRU evict_folios() and
  reclaim_clean_pages_from_list() callers in this series.
- Rename folio_active_bounce() to folio_activate_locked(). The new name
  reflects the precondition (the folio is locked) that callers care about.
- Split the folio_free()/pageout_one() extraction into two patches;
  make pageout_one() return bool so shrink_folio_list() can see whether
  the folio was reclaimed or kept.
- Move the !folio_mapped() check out of folio_try_unmap() into the
  caller, so folio_try_unmap() is only invoked for mapped folios.
- Link to v3: https://lore.kernel.org/r/20260410-batch-tlb-flush-v3-0-ff0b9d3a351a@icloud.com

Changes in v3:
- Patch 5: Replace folio_test_lru() condition check with
  VM_WARN_ON_FOLIO assertion, as PG_lru should never be set for
  isolated folios
- Patch 5: Add comment explaining folio_batch reuse-in-place
  technique in pageout_batch()
- Patch 5: Rewrite comment above folio_unlock() to explain why the
  folio is unlocked while batching
- Link to v2: https://lore.kernel.org/r/20260326-batch-tlb-flush-v2-0-403e523325c4@icloud.com

Changes in v2:
- Fix incorrect comment about page_ref_freeze
- Add folio_maybe_dma_pinned() check in pageout_batch()
- Link to v1: https://lore.kernel.org/r/20260309-batch-tlb-flush-v1-0-eb8fed7d1a9e@icloud.com

---
Zhang Peng (5):
      mm/vmscan: introduce folio_activate_locked() helper
      mm/vmscan: extract folio_free() from shrink_folio_list()
      mm/vmscan: extract pageout_one() from shrink_folio_list()
      mm/vmscan: extract folio unmap logic into folio_try_unmap()
      mm/vmscan: flush TLB for every 31 folios evictions

 mm/vmscan.c | 448 ++++++++++++++++++++++++++++++++++++++----------------------
 1 file changed, 285 insertions(+), 163 deletions(-)
---
base-commit: d0b709f436b2788a10407624688ab8327c5ce18d
change-id: 20260309-batch-tlb-flush-893f0e56b496

Best regards,
-- 
Zhang Peng <zippermonkey@icloud.com>



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v4 1/5] mm/vmscan: introduce folio_activate_locked() helper
  2026-05-25 14:57 [PATCH v4 0/5] mm: batch TLB flushing for dirty folios in vmscan Zhang Peng
@ 2026-05-25 14:57 ` Zhang Peng
  2026-06-17 11:59   ` David Hildenbrand (Arm)
  2026-05-25 14:57 ` [PATCH v4 2/5] mm/vmscan: extract folio_free() from shrink_folio_list() Zhang Peng
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 14+ messages in thread
From: Zhang Peng @ 2026-05-25 14:57 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Johannes Weiner, Shakeel Butt, Axel Rasmussen, Yuanchu Xie,
	Wei Xu, Michal Hocko, Liam R. Howlett, Qi Zheng
  Cc: linux-mm, linux-kernel, Barry Song, Kairui Song, Zhang Peng

The activate_locked label in shrink_folio_list() reclaims swap cache
when needed, marks the folio active, and updates activation statistics.
Extract this block into folio_activate_locked() so it can be reused.

No functional change.

Signed-off-by: Zhang Peng <bruzzhang@tencent.com>
---
 mm/vmscan.c | 34 +++++++++++++++++++++++-----------
 1 file changed, 23 insertions(+), 11 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index ca4533eba701..886d8b4843aa 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1050,6 +1050,28 @@ static bool may_enter_fs(struct folio *folio, gfp_t gfp_mask)
 	return !data_race(folio_swap_flags(folio) & SWP_FS_OPS);
 }
 
+/*
+ * Prepare a locked folio to be kept active rather than reclaimed.
+ * Reclaims its swap slot if it will not be swapped, then marks it
+ * active and updates activation statistics.
+ */
+static void folio_activate_locked(struct folio *folio,
+		struct reclaim_stat *stat, unsigned int nr_pages)
+{
+	/* Not a candidate for swapping, so reclaim swap space. */
+	if (folio_test_swapcache(folio) &&
+	    (mem_cgroup_swap_full(folio) || folio_test_mlocked(folio)))
+		folio_free_swap(folio);
+	VM_BUG_ON_FOLIO(folio_test_active(folio), folio);
+	if (!folio_test_mlocked(folio)) {
+		int type = folio_is_file_lru(folio);
+
+		folio_set_active(folio);
+		stat->nr_activate[type] += nr_pages;
+		count_memcg_folio_events(folio, PGACTIVATE, nr_pages);
+	}
+}
+
 /*
  * shrink_folio_list() returns the number of reclaimed pages
  */
@@ -1525,17 +1547,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
 			nr_pages = 1;
 		}
 activate_locked:
-		/* Not a candidate for swapping, so reclaim swap space. */
-		if (folio_test_swapcache(folio) &&
-		    (mem_cgroup_swap_full(folio) || folio_test_mlocked(folio)))
-			folio_free_swap(folio);
-		VM_BUG_ON_FOLIO(folio_test_active(folio), folio);
-		if (!folio_test_mlocked(folio)) {
-			int type = folio_is_file_lru(folio);
-			folio_set_active(folio);
-			stat->nr_activate[type] += nr_pages;
-			count_memcg_folio_events(folio, PGACTIVATE, nr_pages);
-		}
+		folio_activate_locked(folio, stat, nr_pages);
 keep_locked:
 		folio_unlock(folio);
 keep:

-- 
2.43.7



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v4 2/5] mm/vmscan: extract folio_free() from shrink_folio_list()
  2026-05-25 14:57 [PATCH v4 0/5] mm: batch TLB flushing for dirty folios in vmscan Zhang Peng
  2026-05-25 14:57 ` [PATCH v4 1/5] mm/vmscan: introduce folio_activate_locked() helper Zhang Peng
@ 2026-05-25 14:57 ` Zhang Peng
  2026-06-17 12:17   ` David Hildenbrand (Arm)
  2026-05-25 14:57 ` [PATCH v4 3/5] mm/vmscan: extract pageout_one() " Zhang Peng
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 14+ messages in thread
From: Zhang Peng @ 2026-05-25 14:57 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Johannes Weiner, Shakeel Butt, Axel Rasmussen, Yuanchu Xie,
	Wei Xu, Michal Hocko, Liam R. Howlett, Qi Zheng
  Cc: linux-mm, linux-kernel, Barry Song, Kairui Song, Zhang Peng

shrink_folio_list() contains a self-contained folio-freeing section:
buffer release, lazyfree, __remove_mapping, and folio_batch drain.
Extract it into folio_free() to reduce the size of shrink_folio_list()
and make the freeing step independently readable.

No functional change.

Signed-off-by: Zhang Peng <bruzzhang@tencent.com>
---
 mm/vmscan.c | 168 +++++++++++++++++++++++++++++++++---------------------------
 1 file changed, 92 insertions(+), 76 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 886d8b4843aa..b31f67801836 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1072,6 +1072,95 @@ static void folio_activate_locked(struct folio *folio,
 	}
 }
 
+static bool folio_free(struct folio *folio, struct folio_batch *free_folios,
+		struct scan_control *sc, struct reclaim_stat *stat,
+		unsigned int *nr_reclaimed)
+{
+	unsigned int nr_pages = folio_nr_pages(folio);
+	struct address_space *mapping = folio_mapping(folio);
+
+	/*
+	 * If the folio has buffers, try to free the buffer
+	 * mappings associated with this folio. If we succeed
+	 * we try to free the folio as well.
+	 *
+	 * We do this even if the folio is dirty.
+	 * filemap_release_folio() does not perform I/O, but it
+	 * is possible for a folio to have the dirty flag set,
+	 * but it is actually clean (all its buffers are clean).
+	 * This happens if the buffers were written out directly,
+	 * with submit_bh(). ext3 will do this, as well as
+	 * the blockdev mapping.  filemap_release_folio() will
+	 * discover that cleanness and will drop the buffers
+	 * and mark the folio clean - it can be freed.
+	 *
+	 * Rarely, folios can have buffers and no ->mapping.
+	 * These are the folios which were not successfully
+	 * invalidated in truncate_cleanup_folio().  We try to
+	 * drop those buffers here and if that worked, and the
+	 * folio is no longer mapped into process address space
+	 * (refcount == 1) it can be freed.  Otherwise, leave
+	 * the folio on the LRU so it is swappable.
+	 */
+	if (folio_needs_release(folio)) {
+		if (!filemap_release_folio(folio, sc->gfp_mask)) {
+			folio_activate_locked(folio, stat, nr_pages);
+			return false;
+		}
+
+		if (!mapping && folio_ref_count(folio) == 1) {
+			folio_unlock(folio);
+			if (folio_put_testzero(folio))
+				goto free_it;
+			else {
+				/*
+				 * rare race with speculative reference.
+				 * the speculative reference will free
+				 * this folio shortly, so we may
+				 * increment nr_reclaimed here (and
+				 * leave it off the LRU).
+				 */
+				*nr_reclaimed += nr_pages;
+				return true;
+			}
+		}
+	}
+
+	if (folio_test_lazyfree(folio)) {
+		/* follow __remove_mapping for reference */
+		if (!folio_ref_freeze(folio, 1))
+			return false;
+		/*
+		 * The folio has only one reference left, which is
+		 * from the isolation. After the caller puts the
+		 * folio back on the lru and drops the reference, the
+		 * folio will be freed anyway. It doesn't matter
+		 * which lru it goes on. So we don't bother checking
+		 * the dirty flag here.
+		 */
+		count_vm_events(PGLAZYFREED, nr_pages);
+		count_memcg_folio_events(folio, PGLAZYFREED, nr_pages);
+	} else if (!mapping || !__remove_mapping(mapping, folio, true,
+							sc->target_mem_cgroup))
+		return false;
+
+	folio_unlock(folio);
+free_it:
+	/*
+	 * Folio may get swapped out as a whole, need to account
+	 * all pages in it.
+	 */
+	*nr_reclaimed += nr_pages;
+
+	folio_unqueue_deferred_split(folio);
+	if (folio_batch_add(free_folios, folio) == 0) {
+		mem_cgroup_uncharge_folios(free_folios);
+		try_to_unmap_flush();
+		free_unref_folios(free_folios);
+	}
+	return true;
+}
+
 /*
  * shrink_folio_list() returns the number of reclaimed pages
  */
@@ -1459,83 +1548,10 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
 			}
 		}
 
-		/*
-		 * If the folio has buffers, try to free the buffer
-		 * mappings associated with this folio. If we succeed
-		 * we try to free the folio as well.
-		 *
-		 * We do this even if the folio is dirty.
-		 * filemap_release_folio() does not perform I/O, but it
-		 * is possible for a folio to have the dirty flag set,
-		 * but it is actually clean (all its buffers are clean).
-		 * This happens if the buffers were written out directly,
-		 * with submit_bh(). ext3 will do this, as well as
-		 * the blockdev mapping.  filemap_release_folio() will
-		 * discover that cleanness and will drop the buffers
-		 * and mark the folio clean - it can be freed.
-		 *
-		 * Rarely, folios can have buffers and no ->mapping.
-		 * These are the folios which were not successfully
-		 * invalidated in truncate_cleanup_folio().  We try to
-		 * drop those buffers here and if that worked, and the
-		 * folio is no longer mapped into process address space
-		 * (refcount == 1) it can be freed.  Otherwise, leave
-		 * the folio on the LRU so it is swappable.
-		 */
-		if (folio_needs_release(folio)) {
-			if (!filemap_release_folio(folio, sc->gfp_mask))
-				goto activate_locked;
-			if (!mapping && folio_ref_count(folio) == 1) {
-				folio_unlock(folio);
-				if (folio_put_testzero(folio))
-					goto free_it;
-				else {
-					/*
-					 * rare race with speculative reference.
-					 * the speculative reference will free
-					 * this folio shortly, so we may
-					 * increment nr_reclaimed here (and
-					 * leave it off the LRU).
-					 */
-					nr_reclaimed += nr_pages;
-					continue;
-				}
-			}
-		}
-
-		if (folio_test_lazyfree(folio)) {
-			/* follow __remove_mapping for reference */
-			if (!folio_ref_freeze(folio, 1))
-				goto keep_locked;
-			/*
-			 * The folio has only one reference left, which is
-			 * from the isolation. After the caller puts the
-			 * folio back on the lru and drops the reference, the
-			 * folio will be freed anyway. It doesn't matter
-			 * which lru it goes on. So we don't bother checking
-			 * the dirty flag here.
-			 */
-			count_vm_events(PGLAZYFREED, nr_pages);
-			count_memcg_folio_events(folio, PGLAZYFREED, nr_pages);
-		} else if (!mapping || !__remove_mapping(mapping, folio, true,
-							 sc->target_mem_cgroup))
+		if (!folio_free(folio, &free_folios, sc, stat, &nr_reclaimed))
 			goto keep_locked;
-
-		folio_unlock(folio);
-free_it:
-		/*
-		 * Folio may get swapped out as a whole, need to account
-		 * all pages in it.
-		 */
-		nr_reclaimed += nr_pages;
-
-		folio_unqueue_deferred_split(folio);
-		if (folio_batch_add(&free_folios, folio) == 0) {
-			mem_cgroup_uncharge_folios(&free_folios);
-			try_to_unmap_flush();
-			free_unref_folios(&free_folios);
-		}
-		continue;
+		else
+			continue;
 
 activate_locked_split:
 		/*

-- 
2.43.7



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v4 3/5] mm/vmscan: extract pageout_one() from shrink_folio_list()
  2026-05-25 14:57 [PATCH v4 0/5] mm: batch TLB flushing for dirty folios in vmscan Zhang Peng
  2026-05-25 14:57 ` [PATCH v4 1/5] mm/vmscan: introduce folio_activate_locked() helper Zhang Peng
  2026-05-25 14:57 ` [PATCH v4 2/5] mm/vmscan: extract folio_free() from shrink_folio_list() Zhang Peng
@ 2026-05-25 14:57 ` Zhang Peng
  2026-06-17 12:19   ` David Hildenbrand (Arm)
  2026-05-25 14:57 ` [PATCH v4 4/5] mm/vmscan: extract folio unmap logic into folio_try_unmap() Zhang Peng
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 14+ messages in thread
From: Zhang Peng @ 2026-05-25 14:57 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Johannes Weiner, Shakeel Butt, Axel Rasmussen, Yuanchu Xie,
	Wei Xu, Michal Hocko, Liam R. Howlett, Qi Zheng
  Cc: linux-mm, linux-kernel, Barry Song, Kairui Song, Zhang Peng

shrink_folio_list() contains a self-contained pageout() dispatch state
machine. Extract it into pageout_one() to reduce the size of
shrink_folio_list() and make the pageout step independently readable.

No functional change.

Signed-off-by: Zhang Peng <bruzzhang@tencent.com>
---
 mm/vmscan.c | 107 ++++++++++++++++++++++++++++++++++++------------------------
 1 file changed, 65 insertions(+), 42 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index b31f67801836..456d38eb172c 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1161,8 +1161,68 @@ static bool folio_free(struct folio *folio, struct folio_batch *free_folios,
 	return true;
 }
 
+static bool pageout_one(struct folio *folio,
+			struct folio_batch *free_folios,
+			struct scan_control *sc, struct reclaim_stat *stat,
+			struct swap_iocb **plug, struct list_head *folio_list,
+			unsigned int *nr_reclaimed)
+{
+	struct address_space *mapping = folio_mapping(folio);
+	unsigned int nr_pages = folio_nr_pages(folio);
+
+	switch (pageout(folio, mapping, plug, folio_list)) {
+	case PAGE_ACTIVATE:
+		/*
+		 * If shmem folio is split when writeback to swap,
+		 * the tail pages will make their own pass through
+		 * this function and be accounted then.
+		 */
+		if (nr_pages > 1 && !folio_test_large(folio)) {
+			sc->nr_scanned -= (nr_pages - 1);
+			nr_pages = 1;
+		}
+		folio_activate_locked(folio, stat, nr_pages);
+		folio_unlock(folio);
+		return false;
+	case PAGE_KEEP:
+		folio_unlock(folio);
+		return false;
+	case PAGE_SUCCESS:
+		if (nr_pages > 1 && !folio_test_large(folio)) {
+			sc->nr_scanned -= (nr_pages - 1);
+			nr_pages = 1;
+		}
+		stat->nr_pageout += nr_pages;
+
+		if (folio_test_writeback(folio))
+			return false;
+		if (folio_test_dirty(folio))
+			return false;
+
+		/*
+		 * A synchronous write - probably a ramdisk.  Go
+		 * ahead and try to reclaim the folio.
+		 */
+		if (!folio_trylock(folio))
+			return false;
+		if (folio_test_dirty(folio) ||
+		    folio_test_writeback(folio)) {
+			folio_unlock(folio);
+			return false;
+		}
+		mapping = folio_mapping(folio);
+		fallthrough;
+	case PAGE_CLEAN:
+		; /* try to free the folio below */
+	}
+	if (folio_free(folio, free_folios, sc, stat, nr_reclaimed))
+		return true;
+	folio_unlock(folio);
+	return false;
+}
+
 /*
- * shrink_folio_list() returns the number of reclaimed pages
+ * Reclaimed folios are counted in the return value.
  */
 static unsigned int shrink_folio_list(struct list_head *folio_list,
 		struct pglist_data *pgdat, struct scan_control *sc,
@@ -1499,53 +1559,16 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
 				goto keep_locked;
 			if (!sc->may_writepage)
 				goto keep_locked;
-
 			/*
 			 * Folio is dirty. Flush the TLB if a writable entry
 			 * potentially exists to avoid CPU writes after I/O
 			 * starts and then write it out here.
 			 */
 			try_to_unmap_flush_dirty();
-			switch (pageout(folio, mapping, &plug, folio_list)) {
-			case PAGE_KEEP:
-				goto keep_locked;
-			case PAGE_ACTIVATE:
-				/*
-				 * If shmem folio is split when writeback to swap,
-				 * the tail pages will make their own pass through
-				 * this function and be accounted then.
-				 */
-				if (nr_pages > 1 && !folio_test_large(folio)) {
-					sc->nr_scanned -= (nr_pages - 1);
-					nr_pages = 1;
-				}
-				goto activate_locked;
-			case PAGE_SUCCESS:
-				if (nr_pages > 1 && !folio_test_large(folio)) {
-					sc->nr_scanned -= (nr_pages - 1);
-					nr_pages = 1;
-				}
-				stat->nr_pageout += nr_pages;
-
-				if (folio_test_writeback(folio))
-					goto keep;
-				if (folio_test_dirty(folio))
-					goto keep;
-
-				/*
-				 * A synchronous write - probably a ramdisk.  Go
-				 * ahead and try to reclaim the folio.
-				 */
-				if (!folio_trylock(folio))
-					goto keep;
-				if (folio_test_dirty(folio) ||
-				    folio_test_writeback(folio))
-					goto keep_locked;
-				mapping = folio_mapping(folio);
-				fallthrough;
-			case PAGE_CLEAN:
-				; /* try to free the folio below */
-			}
+			if (!pageout_one(folio, &free_folios, sc, stat, &plug,
+					 folio_list, &nr_reclaimed))
+				goto keep;
+			continue;
 		}
 
 		if (!folio_free(folio, &free_folios, sc, stat, &nr_reclaimed))

-- 
2.43.7



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v4 4/5] mm/vmscan: extract folio unmap logic into folio_try_unmap()
  2026-05-25 14:57 [PATCH v4 0/5] mm: batch TLB flushing for dirty folios in vmscan Zhang Peng
                   ` (2 preceding siblings ...)
  2026-05-25 14:57 ` [PATCH v4 3/5] mm/vmscan: extract pageout_one() " Zhang Peng
@ 2026-05-25 14:57 ` Zhang Peng
  2026-06-17 12:28   ` David Hildenbrand (Arm)
  2026-05-25 14:57 ` [PATCH v4 5/5] mm/vmscan: flush TLB for every 31 folios evictions Zhang Peng
  2026-05-25 18:58 ` [PATCH v4 0/5] mm: batch TLB flushing for dirty folios in vmscan Andrew Morton
  5 siblings, 1 reply; 14+ messages in thread
From: Zhang Peng @ 2026-05-25 14:57 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Johannes Weiner, Shakeel Butt, Axel Rasmussen, Yuanchu Xie,
	Wei Xu, Michal Hocko, Liam R. Howlett, Qi Zheng
  Cc: linux-mm, linux-kernel, Barry Song, Kairui Song, Zhang Peng

shrink_folio_list() contains a self-contained block that sets up
TTU flags and calls try_to_unmap(), accounting for failures via
reclaim_stat. Extract it into folio_try_unmap() to reduce the size
of shrink_folio_list() and make the unmap step independently readable.

folio_try_unmap() is only called when the folio is actually mapped;
the !folio_mapped() check stays in the caller, keeping the function's
semantics clear: it tries to unmap a mapped folio and returns whether
the unmap succeeded.

No functional change.

Signed-off-by: Zhang Peng <bruzzhang@tencent.com>
---
 mm/vmscan.c | 68 ++++++++++++++++++++++++++++++++++---------------------------
 1 file changed, 38 insertions(+), 30 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 456d38eb172c..abf3a2878456 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1221,6 +1221,41 @@ static bool pageout_one(struct folio *folio,
 	return false;
 }
 
+static bool folio_try_unmap(struct folio *folio, struct reclaim_stat *stat,
+			    unsigned int nr_pages)
+{
+	enum ttu_flags flags = TTU_BATCH_FLUSH;
+	bool was_swapbacked;
+
+	was_swapbacked = folio_test_swapbacked(folio);
+	if (folio_test_pmd_mappable(folio))
+		flags |= TTU_SPLIT_HUGE_PMD;
+	/*
+	 * Without TTU_SYNC, try_to_unmap will only begin to
+	 * hold PTL from the first present PTE within a large
+	 * folio. Some initial PTEs might be skipped due to
+	 * races with parallel PTE writes in which PTEs can be
+	 * cleared temporarily before being written new present
+	 * values. This will lead to a large folio is still
+	 * mapped while some subpages have been partially
+	 * unmapped after try_to_unmap; TTU_SYNC helps
+	 * try_to_unmap acquire PTL from the first PTE,
+	 * eliminating the influence of temporary PTE values.
+	 */
+	if (folio_test_large(folio))
+		flags |= TTU_SYNC;
+
+	try_to_unmap(folio, flags);
+	if (folio_mapped(folio)) {
+		stat->nr_unmap_fail += nr_pages;
+		if (!was_swapbacked &&
+		    folio_test_swapbacked(folio))
+			stat->nr_lazyfree_fail += nr_pages;
+		return false;
+	}
+	return true;
+}
+
 /*
  * Reclaimed folios are counted in the return value.
  */
@@ -1495,36 +1530,9 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
 		 * The folio is mapped into the page tables of one or more
 		 * processes. Try to unmap it here.
 		 */
-		if (folio_mapped(folio)) {
-			enum ttu_flags flags = TTU_BATCH_FLUSH;
-			bool was_swapbacked = folio_test_swapbacked(folio);
-
-			if (folio_test_pmd_mappable(folio))
-				flags |= TTU_SPLIT_HUGE_PMD;
-			/*
-			 * Without TTU_SYNC, try_to_unmap will only begin to
-			 * hold PTL from the first present PTE within a large
-			 * folio. Some initial PTEs might be skipped due to
-			 * races with parallel PTE writes in which PTEs can be
-			 * cleared temporarily before being written new present
-			 * values. This will lead to a large folio is still
-			 * mapped while some subpages have been partially
-			 * unmapped after try_to_unmap; TTU_SYNC helps
-			 * try_to_unmap acquire PTL from the first PTE,
-			 * eliminating the influence of temporary PTE values.
-			 */
-			if (folio_test_large(folio))
-				flags |= TTU_SYNC;
-
-			try_to_unmap(folio, flags);
-			if (folio_mapped(folio)) {
-				stat->nr_unmap_fail += nr_pages;
-				if (!was_swapbacked &&
-				    folio_test_swapbacked(folio))
-					stat->nr_lazyfree_fail += nr_pages;
-				goto activate_locked;
-			}
-		}
+		if (folio_mapped(folio) &&
+		    !folio_try_unmap(folio, stat, nr_pages))
+			goto activate_locked;
 
 		/*
 		 * Folio is unmapped now so it cannot be newly pinned anymore.

-- 
2.43.7



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v4 5/5] mm/vmscan: flush TLB for every 31 folios evictions
  2026-05-25 14:57 [PATCH v4 0/5] mm: batch TLB flushing for dirty folios in vmscan Zhang Peng
                   ` (3 preceding siblings ...)
  2026-05-25 14:57 ` [PATCH v4 4/5] mm/vmscan: extract folio unmap logic into folio_try_unmap() Zhang Peng
@ 2026-05-25 14:57 ` Zhang Peng
  2026-06-17 12:47   ` David Hildenbrand (Arm)
  2026-05-25 18:58 ` [PATCH v4 0/5] mm: batch TLB flushing for dirty folios in vmscan Andrew Morton
  5 siblings, 1 reply; 14+ messages in thread
From: Zhang Peng @ 2026-05-25 14:57 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Johannes Weiner, Shakeel Butt, Axel Rasmussen, Yuanchu Xie,
	Wei Xu, Michal Hocko, Liam R. Howlett, Qi Zheng
  Cc: linux-mm, linux-kernel, Barry Song, Kairui Song, Zhang Peng

Currently we flush TLB for every dirty folio, which is a bottleneck for
systems with many cores as this causes heavy IPI usage.

So instead, batch the folios, and flush once for every 31 folios (one
folio_batch). These folios will be held in a folio_batch releasing their
lock, then when folio_batch is full, do following steps:

- For each folio: lock - check still evictable (writeback, mapped,
  dma_pinned)
  - If no longer evictable, put back to LRU
- Flush TLB once for the batch
- Pageout the folios

Signed-off-by: Zhang Peng <bruzzhang@tencent.com>
---
 mm/vmscan.c | 79 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 71 insertions(+), 8 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index abf3a2878456..c0d22afe67a5 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1221,6 +1221,57 @@ static bool pageout_one(struct folio *folio,
 	return false;
 }
 
+static void pageout_batch(struct folio_batch *fbatch,
+			  struct list_head *ret_folios,
+			  struct folio_batch *free_folios,
+			  struct scan_control *sc, struct reclaim_stat *stat,
+			  struct swap_iocb **plug, struct list_head *folio_list,
+			  unsigned int *nr_reclaimed)
+{
+	int i, count = folio_batch_count(fbatch);
+	struct folio *folio;
+
+	/*
+	 * Reuse fbatch in-place: reinit only clears the count, the
+	 * underlying folios array is still accessible via saved count.
+	 * Filter and re-add valid folios back into the same batch.
+	 */
+	folio_batch_reinit(fbatch);
+	for (i = 0; i < count; ++i) {
+		folio = fbatch->folios[i];
+		if (!folio_trylock(folio)) {
+			list_add(&folio->lru, ret_folios);
+			continue;
+		}
+
+		VM_WARN_ON_FOLIO(folio_test_lru(folio), folio);
+
+		if (folio_test_writeback(folio) || folio_mapped(folio) ||
+		    folio_maybe_dma_pinned(folio)) {
+			folio_unlock(folio);
+			list_add(&folio->lru, ret_folios);
+			continue;
+		}
+
+		folio_batch_add(fbatch, folio);
+	}
+
+	i = 0;
+	count = folio_batch_count(fbatch);
+	if (!count)
+		return;
+	/* One TLB flush for the batch */
+	try_to_unmap_flush_dirty();
+	for (i = 0; i < count; ++i) {
+		folio = fbatch->folios[i];
+		if (!pageout_one(folio, free_folios, sc, stat, plug,
+				 folio_list, nr_reclaimed))
+			list_add(&folio->lru, ret_folios);
+	}
+	/* Clear the batch for the caller's next use */
+	folio_batch_reinit(fbatch);
+}
+
 static bool folio_try_unmap(struct folio *folio, struct reclaim_stat *stat,
 			    unsigned int nr_pages)
 {
@@ -1265,6 +1316,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
 		struct mem_cgroup *memcg)
 {
 	struct folio_batch free_folios;
+	struct folio_batch flush_folios;
 	LIST_HEAD(ret_folios);
 	LIST_HEAD(demote_folios);
 	unsigned int nr_reclaimed = 0, nr_demoted = 0;
@@ -1273,6 +1325,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
 	struct swap_iocb *plug = NULL;
 
 	folio_batch_init(&free_folios);
+	folio_batch_init(&flush_folios);
 	memset(stat, 0, sizeof(*stat));
 	cond_resched();
 	do_demote_pass = can_demote(pgdat->node_id, sc, memcg);
@@ -1568,15 +1621,19 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
 			if (!sc->may_writepage)
 				goto keep_locked;
 			/*
-			 * Folio is dirty. Flush the TLB if a writable entry
-			 * potentially exists to avoid CPU writes after I/O
-			 * starts and then write it out here.
+			 * Unlock while batching: holding the lock until the
+			 * batch fills would stall swap faults that find this
+			 * folio via swap cache lookup. pageout_batch() will
+			 * relock each folio and recheck its state before
+			 * writing it out.
 			 */
-			try_to_unmap_flush_dirty();
-			if (!pageout_one(folio, &free_folios, sc, stat, &plug,
-					 folio_list, &nr_reclaimed))
-				goto keep;
-			continue;
+			folio_unlock(folio);
+			if (!folio_batch_add(&flush_folios, folio))
+				pageout_batch(&flush_folios,
+					      &ret_folios, &free_folios,
+					      sc, stat, &plug,
+					      folio_list, &nr_reclaimed);
+			goto next;
 		}
 
 		if (!folio_free(folio, &free_folios, sc, stat, &nr_reclaimed))
@@ -1601,6 +1658,12 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
 		list_add(&folio->lru, &ret_folios);
 		VM_BUG_ON_FOLIO(folio_test_lru(folio) ||
 				folio_test_unevictable(folio), folio);
+next:
+		continue;
+	}
+	if (folio_batch_count(&flush_folios)) {
+		pageout_batch(&flush_folios, &ret_folios, &free_folios, sc,
+			      stat, &plug, folio_list, &nr_reclaimed);
 	}
 	/* 'folio_list' is always empty here */
 

-- 
2.43.7



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 0/5] mm: batch TLB flushing for dirty folios in vmscan
  2026-05-25 14:57 [PATCH v4 0/5] mm: batch TLB flushing for dirty folios in vmscan Zhang Peng
                   ` (4 preceding siblings ...)
  2026-05-25 14:57 ` [PATCH v4 5/5] mm/vmscan: flush TLB for every 31 folios evictions Zhang Peng
@ 2026-05-25 18:58 ` Andrew Morton
  5 siblings, 0 replies; 14+ messages in thread
From: Andrew Morton @ 2026-05-25 18:58 UTC (permalink / raw)
  To: Zhang Peng
  Cc: David Hildenbrand, Lorenzo Stoakes, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Johannes Weiner,
	Shakeel Butt, Axel Rasmussen, Yuanchu Xie, Wei Xu, Michal Hocko,
	Liam R. Howlett, Qi Zheng, linux-mm, linux-kernel, Barry Song,
	Kairui Song, Zhang Peng

On Mon, 25 May 2026 22:57:16 +0800 Zhang Peng <zippermonkey@icloud.com> wrote:

> This series introduces batch TLB flushing optimization for dirty folios
> during memory reclaim, aiming to reduce IPI overhead on multi-core systems.

So often we see such good results from batching things.

> ...
> 
> Testing
> -------
> The benchmark script uses stress-ng to compare TLB shootdown behavior before and
> after this patch. It constrains a stress-ng workload via memcg to force reclaim
> through shrink_folio_list(), reporting TLB shootdowns and IPIs.
> 
> Core benchmark command: stress-ng --vm 16 --vm-bytes 2G --vm-keep --timeout 60
> 
> ==========================================================================
>                  batch_dirty_tlb_flush Benchmark Results
> ==========================================================================
>   Kernel: 7.0.0-rc1+   CPUs: 16
>   MemTotal: 31834M   SwapTotal: 8191M
>   memcg limit: 512M   alloc: 2G   workers: 16   duration: 60s
> --------------------------------------------------------------------------
> Metric                 Before        After             Delta (abs / %)
> --------------------------------------------------------------------------
> bogo ops/s             28238.63      35833.97          +7595.34 (+26.9%)
> TLB shootdowns         55428953      17621697          -37807256 (-68.2%)
> Function call IPIs     34073695      14498768          -19574927 (-57.4%)
> pgscan_anon (pages)    52856224      60252894          7396670 (+14.0%)
> pgsteal_anon (pages)   29004962      34054753          5049791 (+17.4%)
> --------------------------------------------------------------------------

Nice.

AI review asked a few things:
	https://sashiko.dev/#/patchset/20260525-batch-tlb-flush-v4-0-83789d6abc00@icloud.com

Thanks.  I'll take no action at this time - there's no review yet and we're at
-rc5.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 1/5] mm/vmscan: introduce folio_activate_locked() helper
  2026-05-25 14:57 ` [PATCH v4 1/5] mm/vmscan: introduce folio_activate_locked() helper Zhang Peng
@ 2026-06-17 11:59   ` David Hildenbrand (Arm)
  0 siblings, 0 replies; 14+ messages in thread
From: David Hildenbrand (Arm) @ 2026-06-17 11:59 UTC (permalink / raw)
  To: Zhang Peng, Andrew Morton, Lorenzo Stoakes, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Johannes Weiner,
	Shakeel Butt, Axel Rasmussen, Yuanchu Xie, Wei Xu, Michal Hocko,
	Liam R. Howlett, Qi Zheng
  Cc: linux-mm, linux-kernel, Barry Song, Kairui Song, Zhang Peng

On 5/25/26 16:57, Zhang Peng wrote:
> The activate_locked label in shrink_folio_list() reclaims swap cache
> when needed, marks the folio active, and updates activation statistics.
> Extract this block into folio_activate_locked() so it can be reused.
> 
> No functional change.
> 
> Signed-off-by: Zhang Peng <bruzzhang@tencent.com>
> ---
>  mm/vmscan.c | 34 +++++++++++++++++++++++-----------
>  1 file changed, 23 insertions(+), 11 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index ca4533eba701..886d8b4843aa 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1050,6 +1050,28 @@ static bool may_enter_fs(struct folio *folio, gfp_t gfp_mask)
>  	return !data_race(folio_swap_flags(folio) & SWP_FS_OPS);
>  }
>  
> +/*
> + * Prepare a locked folio to be kept active rather than reclaimed.
> + * Reclaims its swap slot if it will not be swapped, then marks it
> + * active and updates activation statistics.
> + */
> +static void folio_activate_locked(struct folio *folio,
> +		struct reclaim_stat *stat, unsigned int nr_pages)

Passing nr_pages to this helper is rather questionable. Just use
folio_nr_pages(folio) here and make the function less weird.

> +{

Do we want to add a

	VM_WARN_ON_ONCE_FOLIO(!folio_test_locked(folio), folio);

> +	/* Not a candidate for swapping, so reclaim swap space. */
> +	if (folio_test_swapcache(folio) &&
> +	    (mem_cgroup_swap_full(folio) || folio_test_mlocked(folio)))
> +		folio_free_swap(folio);
> +	VM_BUG_ON_FOLIO(folio_test_active(folio), folio);

While at it, do we want to turn this to a VM_WARN_ON_ONCE_FOLIO?

(and can we move that to the beginning of the function?)

In general, LGTM

-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 2/5] mm/vmscan: extract folio_free() from shrink_folio_list()
  2026-05-25 14:57 ` [PATCH v4 2/5] mm/vmscan: extract folio_free() from shrink_folio_list() Zhang Peng
@ 2026-06-17 12:17   ` David Hildenbrand (Arm)
  2026-06-17 12:24     ` David Hildenbrand (Arm)
  0 siblings, 1 reply; 14+ messages in thread
From: David Hildenbrand (Arm) @ 2026-06-17 12:17 UTC (permalink / raw)
  To: Zhang Peng, Andrew Morton, Lorenzo Stoakes, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Johannes Weiner,
	Shakeel Butt, Axel Rasmussen, Yuanchu Xie, Wei Xu, Michal Hocko,
	Liam R. Howlett, Qi Zheng
  Cc: linux-mm, linux-kernel, Barry Song, Kairui Song, Zhang Peng

On 5/25/26 16:57, Zhang Peng wrote:
> shrink_folio_list() contains a self-contained folio-freeing section:
> buffer release, lazyfree, __remove_mapping, and folio_batch drain.
> Extract it into folio_free() to reduce the size of shrink_folio_list()
> and make the freeing step independently readable.
> 
> No functional change.

As you are touching the code, I'll suggest some simple improvements as part of
the move.

> 
> Signed-off-by: Zhang Peng <bruzzhang@tencent.com>
> ---
>  mm/vmscan.c | 168 +++++++++++++++++++++++++++++++++---------------------------
>  1 file changed, 92 insertions(+), 76 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 886d8b4843aa..b31f67801836 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1072,6 +1072,95 @@ static void folio_activate_locked(struct folio *folio,
>  	}
>  }
>  
> +static bool folio_free(struct folio *folio, struct folio_batch *free_folios,

I don't quite like the function name, as it sounds like something very generic,
when it's really specific to memory reclaim code and does things like removing
the folio from the pagecache.

Any way we can easily make that clearer? I think it should contain something
about release and reclaim.

folio_release_for_reclaim() or just "folio_reclaim()" ? Not sure.



> +		struct scan_control *sc, struct reclaim_stat *stat,
> +		unsigned int *nr_reclaimed)

Instead of returning a bool, can we just return 0 / -EBUSY? IMHO, a bool mostly
only makes sense if the function name implies some sort of yes/no test. (there
are exceptions).

> +{
> +	unsigned int nr_pages = folio_nr_pages(folio);

Can be const.

> +	struct address_space *mapping = folio_mapping(folio);
> +
> +	/*
> +	 * If the folio has buffers, try to free the buffer
> +	 * mappings associated with this folio. If we succeed
> +	 * we try to free the folio as well.
> +	 *
> +	 * We do this even if the folio is dirty.
> +	 * filemap_release_folio() does not perform I/O, but it
> +	 * is possible for a folio to have the dirty flag set,
> +	 * but it is actually clean (all its buffers are clean).
> +	 * This happens if the buffers were written out directly,
> +	 * with submit_bh(). ext3 will do this, as well as
> +	 * the blockdev mapping.  filemap_release_folio() will
> +	 * discover that cleanness and will drop the buffers
> +	 * and mark the folio clean - it can be freed.
> +	 *
> +	 * Rarely, folios can have buffers and no ->mapping.
> +	 * These are the folios which were not successfully
> +	 * invalidated in truncate_cleanup_folio().  We try to
> +	 * drop those buffers here and if that worked, and the
> +	 * folio is no longer mapped into process address space
> +	 * (refcount == 1) it can be freed.  Otherwise, leave
> +	 * the folio on the LRU so it is swappable.
> +	 */

I assume you can now adjust the comments to use more space and less LOC.

> +	if (folio_needs_release(folio)) {
> +		if (!filemap_release_folio(folio, sc->gfp_mask)) {
> +			folio_activate_locked(folio, stat, nr_pages);
> +			return false;
> +		}
> +
> +		if (!mapping && folio_ref_count(folio) == 1) {
> +			folio_unlock(folio);
> +			if (folio_put_testzero(folio))
> +				goto free_it;
> +			else {> +				/*
> +				 * rare race with speculative reference.
> +				 * the speculative reference will free
> +				 * this folio shortly, so we may
> +				 * increment nr_reclaimed here (and
> +				 * leave it off the LRU).
> +				 */
> +				*nr_reclaimed += nr_pages;
> +				return true;
> +			}
> +		}
> +	}
> +
> +	if (folio_test_lazyfree(folio)) {
> +		/* follow __remove_mapping for reference */
> +		if (!folio_ref_freeze(folio, 1))
> +			return false;
> +		/*
> +		 * The folio has only one reference left, which is
> +		 * from the isolation. After the caller puts the
> +		 * folio back on the lru and drops the reference, the
> +		 * folio will be freed anyway. It doesn't matter
> +		 * which lru it goes on. So we don't bother checking
> +		 * the dirty flag here.
> +		 */
> +		count_vm_events(PGLAZYFREED, nr_pages);
> +		count_memcg_folio_events(folio, PGLAZYFREED, nr_pages);
> +	} else if (!mapping || !__remove_mapping(mapping, folio, true,
> +							sc->target_mem_cgroup))
> +		return false;
> +
> +	folio_unlock(folio);
> +free_it:

Okay, at this point we should have a refcount of 0. We could make that
more obvious through

VM_WARN_ON_FOLIO_ONCE(folio_ref_count(folio), folio);

> +	/*
> +	 * Folio may get swapped out as a whole, need to account
> +	 * all pages in it.
> +	 */

Likely can shorten to

/* Only full folios get reclaimed. */

Or just drop the comment completely, I don't quite see the point.

> +	*nr_reclaimed += nr_pages;
> +
> +	folio_unqueue_deferred_split(folio);
> +	if (folio_batch_add(free_folios, folio) == 0) {
> +		mem_cgroup_uncharge_folios(free_folios);
> +		try_to_unmap_flush();
> +		free_unref_folios(free_folios);
> +	}
> +	return true;
> +}
> +
>  /*
>   * shrink_folio_list() returns the number of reclaimed pages
>   */
> @@ -1459,83 +1548,10 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
>  			}
>  		}
>  
> -		/*
> -		 * If the folio has buffers, try to free the buffer
> -		 * mappings associated with this folio. If we succeed
> -		 * we try to free the folio as well.
> -		 *
> -		 * We do this even if the folio is dirty.
> -		 * filemap_release_folio() does not perform I/O, but it
> -		 * is possible for a folio to have the dirty flag set,
> -		 * but it is actually clean (all its buffers are clean).
> -		 * This happens if the buffers were written out directly,
> -		 * with submit_bh(). ext3 will do this, as well as
> -		 * the blockdev mapping.  filemap_release_folio() will
> -		 * discover that cleanness and will drop the buffers
> -		 * and mark the folio clean - it can be freed.
> -		 *
> -		 * Rarely, folios can have buffers and no ->mapping.
> -		 * These are the folios which were not successfully
> -		 * invalidated in truncate_cleanup_folio().  We try to
> -		 * drop those buffers here and if that worked, and the
> -		 * folio is no longer mapped into process address space
> -		 * (refcount == 1) it can be freed.  Otherwise, leave
> -		 * the folio on the LRU so it is swappable.
> -		 */
> -		if (folio_needs_release(folio)) {
> -			if (!filemap_release_folio(folio, sc->gfp_mask))
> -				goto activate_locked;
> -			if (!mapping && folio_ref_count(folio) == 1) {
> -				folio_unlock(folio);
> -				if (folio_put_testzero(folio))
> -					goto free_it;
> -				else {
> -					/*
> -					 * rare race with speculative reference.
> -					 * the speculative reference will free
> -					 * this folio shortly, so we may
> -					 * increment nr_reclaimed here (and
> -					 * leave it off the LRU).
> -					 */
> -					nr_reclaimed += nr_pages;
> -					continue;
> -				}
> -			}
> -		}
> -
> -		if (folio_test_lazyfree(folio)) {
> -			/* follow __remove_mapping for reference */
> -			if (!folio_ref_freeze(folio, 1))
> -				goto keep_locked;
> -			/*
> -			 * The folio has only one reference left, which is
> -			 * from the isolation. After the caller puts the
> -			 * folio back on the lru and drops the reference, the
> -			 * folio will be freed anyway. It doesn't matter
> -			 * which lru it goes on. So we don't bother checking
> -			 * the dirty flag here.
> -			 */
> -			count_vm_events(PGLAZYFREED, nr_pages);
> -			count_memcg_folio_events(folio, PGLAZYFREED, nr_pages);
> -		} else if (!mapping || !__remove_mapping(mapping, folio, true,
> -							 sc->target_mem_cgroup))
> +		if (!folio_free(folio, &free_folios, sc, stat, &nr_reclaimed))
>  			goto keep_locked;
> -
> -		folio_unlock(folio);
> -free_it:
> -		/*
> -		 * Folio may get swapped out as a whole, need to account
> -		 * all pages in it.
> -		 */
> -		nr_reclaimed += nr_pages;
> -
> -		folio_unqueue_deferred_split(folio);
> -		if (folio_batch_add(&free_folios, folio) == 0) {
> -			mem_cgroup_uncharge_folios(&free_folios);
> -			try_to_unmap_flush();
> -			free_unref_folios(&free_folios);
> -		}
> -		continue;
> +		else
> +			continue;

You can just drop the "else", no?

if (...)
	goto keep_locked;
continue;

>  
>  activate_locked_split:
>  		/*
> 


-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 3/5] mm/vmscan: extract pageout_one() from shrink_folio_list()
  2026-05-25 14:57 ` [PATCH v4 3/5] mm/vmscan: extract pageout_one() " Zhang Peng
@ 2026-06-17 12:19   ` David Hildenbrand (Arm)
  2026-06-17 12:25     ` David Hildenbrand (Arm)
  0 siblings, 1 reply; 14+ messages in thread
From: David Hildenbrand (Arm) @ 2026-06-17 12:19 UTC (permalink / raw)
  To: Zhang Peng, Andrew Morton, Lorenzo Stoakes, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Johannes Weiner,
	Shakeel Butt, Axel Rasmussen, Yuanchu Xie, Wei Xu, Michal Hocko,
	Liam R. Howlett, Qi Zheng
  Cc: linux-mm, linux-kernel, Barry Song, Kairui Song, Zhang Peng

On 5/25/26 16:57, Zhang Peng wrote:
> shrink_folio_list() contains a self-contained pageout() dispatch state
> machine. Extract it into pageout_one() to reduce the size of
> shrink_folio_list() and make the pageout step independently readable.
> 
> No functional change.
> 
> Signed-off-by: Zhang Peng <bruzzhang@tencent.com>
> ---
>  mm/vmscan.c | 107 ++++++++++++++++++++++++++++++++++++------------------------
>  1 file changed, 65 insertions(+), 42 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index b31f67801836..456d38eb172c 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1161,8 +1161,68 @@ static bool folio_free(struct folio *folio, struct folio_batch *free_folios,
>  	return true;
>  }
>  
> +static bool pageout_one(struct folio *folio,
> +			struct folio_batch *free_folios,
> +			struct scan_control *sc, struct reclaim_stat *stat,
> +			struct swap_iocb **plug, struct list_head *folio_list,
> +			unsigned int *nr_reclaimed)

Two tabs please.

Same comment regarding returning a bool.

> +{
> +	struct address_space *mapping = folio_mapping(folio);
> +	unsigned int nr_pages = folio_nr_pages(folio);

Can both be const I assume.

> +
> +	switch (pageout(folio, mapping, plug, folio_list)) {
> +	case PAGE_ACTIVATE:
> +		/*
> +		 * If shmem folio is split when writeback to swap,
> +		 * the tail pages will make their own pass through
> +		 * this function and be accounted then.
> +		 */

Same comment regarding making these comments use less LOC.


In general, LGTM.


-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 2/5] mm/vmscan: extract folio_free() from shrink_folio_list()
  2026-06-17 12:17   ` David Hildenbrand (Arm)
@ 2026-06-17 12:24     ` David Hildenbrand (Arm)
  0 siblings, 0 replies; 14+ messages in thread
From: David Hildenbrand (Arm) @ 2026-06-17 12:24 UTC (permalink / raw)
  To: Zhang Peng, Andrew Morton, Lorenzo Stoakes, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Johannes Weiner,
	Shakeel Butt, Axel Rasmussen, Yuanchu Xie, Wei Xu, Michal Hocko,
	Liam R. Howlett, Qi Zheng
  Cc: linux-mm, linux-kernel, Barry Song, Kairui Song, Zhang Peng

On 6/17/26 14:17, David Hildenbrand (Arm) wrote:
> On 5/25/26 16:57, Zhang Peng wrote:
>> shrink_folio_list() contains a self-contained folio-freeing section:
>> buffer release, lazyfree, __remove_mapping, and folio_batch drain.
>> Extract it into folio_free() to reduce the size of shrink_folio_list()
>> and make the freeing step independently readable.
>>
>> No functional change.
> 
> As you are touching the code, I'll suggest some simple improvements as part of
> the move.
> 
>>
>> Signed-off-by: Zhang Peng <bruzzhang@tencent.com>
>> ---
>>  mm/vmscan.c | 168 +++++++++++++++++++++++++++++++++---------------------------
>>  1 file changed, 92 insertions(+), 76 deletions(-)
>>
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index 886d8b4843aa..b31f67801836 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -1072,6 +1072,95 @@ static void folio_activate_locked(struct folio *folio,
>>  	}
>>  }
>>  
>> +static bool folio_free(struct folio *folio, struct folio_batch *free_folios,
> 
> I don't quite like the function name, as it sounds like something very generic,
> when it's really specific to memory reclaim code and does things like removing
> the folio from the pagecache.
> 
> Any way we can easily make that clearer? I think it should contain something
> about release and reclaim.
> 
> folio_release_for_reclaim() or just "folio_reclaim()" ? Not sure.
> 
> 
> 
>> +		struct scan_control *sc, struct reclaim_stat *stat,
>> +		unsigned int *nr_reclaimed)
> 
> Instead of returning a bool, can we just return 0 / -EBUSY? IMHO, a bool mostly
> only makes sense if the function name implies some sort of yes/no test. (there
> are exceptions).

Stumbling over patch #4, I think if we call this something with a "try", a bool
would be fine.

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 3/5] mm/vmscan: extract pageout_one() from shrink_folio_list()
  2026-06-17 12:19   ` David Hildenbrand (Arm)
@ 2026-06-17 12:25     ` David Hildenbrand (Arm)
  0 siblings, 0 replies; 14+ messages in thread
From: David Hildenbrand (Arm) @ 2026-06-17 12:25 UTC (permalink / raw)
  To: Zhang Peng, Andrew Morton, Lorenzo Stoakes, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Johannes Weiner,
	Shakeel Butt, Axel Rasmussen, Yuanchu Xie, Wei Xu, Michal Hocko,
	Liam R. Howlett, Qi Zheng
  Cc: linux-mm, linux-kernel, Barry Song, Kairui Song, Zhang Peng

On 6/17/26 14:19, David Hildenbrand (Arm) wrote:
> On 5/25/26 16:57, Zhang Peng wrote:
>> shrink_folio_list() contains a self-contained pageout() dispatch state
>> machine. Extract it into pageout_one() to reduce the size of
>> shrink_folio_list() and make the pageout step independently readable.
>>
>> No functional change.
>>
>> Signed-off-by: Zhang Peng <bruzzhang@tencent.com>
>> ---
>>  mm/vmscan.c | 107 ++++++++++++++++++++++++++++++++++++------------------------
>>  1 file changed, 65 insertions(+), 42 deletions(-)
>>
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index b31f67801836..456d38eb172c 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -1161,8 +1161,68 @@ static bool folio_free(struct folio *folio, struct folio_batch *free_folios,
>>  	return true;
>>  }
>>  
>> +static bool pageout_one(struct folio *folio,
>> +			struct folio_batch *free_folios,
>> +			struct scan_control *sc, struct reclaim_stat *stat,
>> +			struct swap_iocb **plug, struct list_head *folio_list,
>> +			unsigned int *nr_reclaimed)
> 
> Two tabs please.
> 
> Same comment regarding returning a bool.

Thinking again, should we call this "folio_try_pageout" or something like that
(and keep the bool)?

-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 4/5] mm/vmscan: extract folio unmap logic into folio_try_unmap()
  2026-05-25 14:57 ` [PATCH v4 4/5] mm/vmscan: extract folio unmap logic into folio_try_unmap() Zhang Peng
@ 2026-06-17 12:28   ` David Hildenbrand (Arm)
  0 siblings, 0 replies; 14+ messages in thread
From: David Hildenbrand (Arm) @ 2026-06-17 12:28 UTC (permalink / raw)
  To: Zhang Peng, Andrew Morton, Lorenzo Stoakes, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Johannes Weiner,
	Shakeel Butt, Axel Rasmussen, Yuanchu Xie, Wei Xu, Michal Hocko,
	Liam R. Howlett, Qi Zheng
  Cc: linux-mm, linux-kernel, Barry Song, Kairui Song, Zhang Peng

On 5/25/26 16:57, Zhang Peng wrote:
> shrink_folio_list() contains a self-contained block that sets up
> TTU flags and calls try_to_unmap(), accounting for failures via
> reclaim_stat. Extract it into folio_try_unmap() to reduce the size
> of shrink_folio_list() and make the unmap step independently readable.
> 
> folio_try_unmap() is only called when the folio is actually mapped;
> the !folio_mapped() check stays in the caller, keeping the function's
> semantics clear: it tries to unmap a mapped folio and returns whether
> the unmap succeeded.
> 
> No functional change.
> 
> Signed-off-by: Zhang Peng <bruzzhang@tencent.com>
> ---
>  mm/vmscan.c | 68 ++++++++++++++++++++++++++++++++++---------------------------
>  1 file changed, 38 insertions(+), 30 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 456d38eb172c..abf3a2878456 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1221,6 +1221,41 @@ static bool pageout_one(struct folio *folio,
>  	return false;
>  }
>  
> +static bool folio_try_unmap(struct folio *folio, struct reclaim_stat *stat,
> +			    unsigned int nr_pages)

Two tabs.

folio_try_unmap() vs. try_to_unmap() Hm.

Again, maybe we should throw in a "for_reclaim" ?

folio_try_unmap_for_reclaim() ?

Not sure.

> +{
> +	enum ttu_flags flags = TTU_BATCH_FLUSH;
> +	bool was_swapbacked;
> +
> +	was_swapbacked = folio_test_swapbacked(folio);


const bool was_swapbacked = folio_test_swapbacked(folio);

> +	if (folio_test_pmd_mappable(folio))
> +		flags |= TTU_SPLIT_HUGE_PMD;
> +	/*
> +	 * Without TTU_SYNC, try_to_unmap will only begin to
> +	 * hold PTL from the first present PTE within a large
> +	 * folio. Some initial PTEs might be skipped due to
> +	 * races with parallel PTE writes in which PTEs can be
> +	 * cleared temporarily before being written new present
> +	 * values. This will lead to a large folio is still
> +	 * mapped while some subpages have been partially
> +	 * unmapped after try_to_unmap; TTU_SYNC helps
> +	 * try_to_unmap acquire PTL from the first PTE,
> +	 * eliminating the influence of temporary PTE values.
> +	 */

Comment can now use less LOC.

> +	if (folio_test_large(folio))
> +		flags |= TTU_SYNC;
> +
> +	try_to_unmap(folio, flags);
> +	if (folio_mapped(folio)) {
> +		stat->nr_unmap_fail += nr_pages;
> +		if (!was_swapbacked &&
> +		    folio_test_swapbacked(folio))

Probably best in a single line.

> +			stat->nr_lazyfree_fail += nr_pages;
> +		return false;
> +	}
> +	return true;
> +}
> +
>  /*
>   * Reclaimed folios are counted in the return value.
>   */
> @@ -1495,36 +1530,9 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
>  		 * The folio is mapped into the page tables of one or more
>  		 * processes. Try to unmap it here.
>  		 */
> -		if (folio_mapped(folio)) {
> -			enum ttu_flags flags = TTU_BATCH_FLUSH;
> -			bool was_swapbacked = folio_test_swapbacked(folio);
> -
> -			if (folio_test_pmd_mappable(folio))
> -				flags |= TTU_SPLIT_HUGE_PMD;
> -			/*
> -			 * Without TTU_SYNC, try_to_unmap will only begin to
> -			 * hold PTL from the first present PTE within a large
> -			 * folio. Some initial PTEs might be skipped due to
> -			 * races with parallel PTE writes in which PTEs can be
> -			 * cleared temporarily before being written new present
> -			 * values. This will lead to a large folio is still
> -			 * mapped while some subpages have been partially
> -			 * unmapped after try_to_unmap; TTU_SYNC helps
> -			 * try_to_unmap acquire PTL from the first PTE,
> -			 * eliminating the influence of temporary PTE values.
> -			 */
> -			if (folio_test_large(folio))
> -				flags |= TTU_SYNC;
> -
> -			try_to_unmap(folio, flags);
> -			if (folio_mapped(folio)) {
> -				stat->nr_unmap_fail += nr_pages;
> -				if (!was_swapbacked &&
> -				    folio_test_swapbacked(folio))
> -					stat->nr_lazyfree_fail += nr_pages;
> -				goto activate_locked;
> -			}
> -		}
> +		if (folio_mapped(folio) &&
> +		    !folio_try_unmap(folio, stat, nr_pages))

Probably best in a single line.

> +			goto activate_locked;
>  
>  		/*
>  		 * Folio is unmapped now so it cannot be newly pinned anymore.
> 


-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 5/5] mm/vmscan: flush TLB for every 31 folios evictions
  2026-05-25 14:57 ` [PATCH v4 5/5] mm/vmscan: flush TLB for every 31 folios evictions Zhang Peng
@ 2026-06-17 12:47   ` David Hildenbrand (Arm)
  0 siblings, 0 replies; 14+ messages in thread
From: David Hildenbrand (Arm) @ 2026-06-17 12:47 UTC (permalink / raw)
  To: Zhang Peng, Andrew Morton, Lorenzo Stoakes, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Johannes Weiner,
	Shakeel Butt, Axel Rasmussen, Yuanchu Xie, Wei Xu, Michal Hocko,
	Liam R. Howlett, Qi Zheng
  Cc: linux-mm, linux-kernel, Barry Song, Kairui Song, Zhang Peng

On 5/25/26 16:57, Zhang Peng wrote:
> Currently we flush TLB for every dirty folio, which is a bottleneck for
> systems with many cores as this causes heavy IPI usage.
> 
> So instead, batch the folios, and flush once for every 31 folios (one
> folio_batch). These folios will be held in a folio_batch releasing their
> lock, then when folio_batch is full, do following steps:
> 
> - For each folio: lock - check still evictable (writeback, mapped,
>   dma_pinned)
>   - If no longer evictable, put back to LRU
> - Flush TLB once for the batch
> - Pageout the folios
> 
> Signed-off-by: Zhang Peng <bruzzhang@tencent.com>
> ---
>  mm/vmscan.c | 79 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-------
>  1 file changed, 71 insertions(+), 8 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index abf3a2878456..c0d22afe67a5 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1221,6 +1221,57 @@ static bool pageout_one(struct folio *folio,
>  	return false;
>  }
>  
> +static void pageout_batch(struct folio_batch *fbatch,
> +			  struct list_head *ret_folios,
> +			  struct folio_batch *free_folios,
> +			  struct scan_control *sc, struct reclaim_stat *stat,
> +			  struct swap_iocb **plug, struct list_head *folio_list,
> +			  unsigned int *nr_reclaimed)

Two tabs. But this is starting to look a bit messy. Could a helper struct be
used to reduce the parameter count to something readable?

> +{
> +	int i, count = folio_batch_count(fbatch);
> +	struct folio *folio;
> +
> +	/*
> +	 * Reuse fbatch in-place: reinit only clears the count, the
> +	 * underlying folios array is still accessible via saved count.
> +	 * Filter and re-add valid folios back into the same batch.
> +	 */
> +	folio_batch_reinit(fbatch);

That looks rather hacky. You reinit the batch to then traverse the batch? There
must be a cleaner way. Walking and modifying the same folio batch in one go is
rather crazy.

> +	for (i = 0; i < count; ++i) {
> +		folio = fbatch->folios[i];
> +		if (!folio_trylock(folio)) {
> +			list_add(&folio->lru, ret_folios);
> +			continue;
> +		}
> +
> +		VM_WARN_ON_FOLIO(folio_test_lru(folio), folio);
> +
> +		if (folio_test_writeback(folio) || folio_mapped(folio) ||
> +		    folio_maybe_dma_pinned(folio)) {

Took me a second to understand why exactly you test for these properties. Can
you add a comment how these check here mimic what we checked earlier, before
dopping the lock?

> +			folio_unlock(folio);
> +			list_add(&folio->lru, ret_folios);
> +			continue;
> +		}

So, IIUC, what could have happened after dropping the folio lock is that someone
would have remapped the folio to user space?

Either from the pagecache or from the swap PTE -> swapcache.

From there, it could have gotten pinned through the page tables.

Is that correct?

What would happen if page migration finds the folio, locks it and wants to
migrate it?

-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2026-06-17 12:47 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-25 14:57 [PATCH v4 0/5] mm: batch TLB flushing for dirty folios in vmscan Zhang Peng
2026-05-25 14:57 ` [PATCH v4 1/5] mm/vmscan: introduce folio_activate_locked() helper Zhang Peng
2026-06-17 11:59   ` David Hildenbrand (Arm)
2026-05-25 14:57 ` [PATCH v4 2/5] mm/vmscan: extract folio_free() from shrink_folio_list() Zhang Peng
2026-06-17 12:17   ` David Hildenbrand (Arm)
2026-06-17 12:24     ` David Hildenbrand (Arm)
2026-05-25 14:57 ` [PATCH v4 3/5] mm/vmscan: extract pageout_one() " Zhang Peng
2026-06-17 12:19   ` David Hildenbrand (Arm)
2026-06-17 12:25     ` David Hildenbrand (Arm)
2026-05-25 14:57 ` [PATCH v4 4/5] mm/vmscan: extract folio unmap logic into folio_try_unmap() Zhang Peng
2026-06-17 12:28   ` David Hildenbrand (Arm)
2026-05-25 14:57 ` [PATCH v4 5/5] mm/vmscan: flush TLB for every 31 folios evictions Zhang Peng
2026-06-17 12:47   ` David Hildenbrand (Arm)
2026-05-25 18:58 ` [PATCH v4 0/5] mm: batch TLB flushing for dirty folios in vmscan Andrew Morton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.