All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/5] mm: batch TLB flushing for dirty folios in vmscan
@ 2026-05-25 14:57 Zhang Peng
  2026-05-25 14:57 ` [PATCH v4 1/5] mm/vmscan: introduce folio_activate_locked() helper Zhang Peng
                   ` (5 more replies)
  0 siblings, 6 replies; 14+ messages in thread
From: Zhang Peng @ 2026-05-25 14:57 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Johannes Weiner, Shakeel Butt, Axel Rasmussen, Yuanchu Xie,
	Wei Xu, Michal Hocko, Liam R. Howlett, Qi Zheng
  Cc: linux-mm, linux-kernel, Barry Song, Kairui Song, Zhang Peng

This series introduces batch TLB flushing optimization for dirty folios
during memory reclaim, aiming to reduce IPI overhead on multi-core systems.

Background
----------
Currently, when performing pageout in memory reclaim, try_to_unmap_flush_dirty()
is called for each dirty folio individually. On multi-core systems, this causes
frequent IPIs which can significantly impact performance.

Approach
--------
This patch series accumulates dirty folios into batches and performs a single
TLB flush for the entire batch, rather than flushing for each individual folio.

Changes
-------
Patch 1: Extract the folio activation block at activate_locked into
         folio_activate_locked().
Patch 2: Extract the folio-freeing path (buffer release, lazyfree,
         __remove_mapping, folio_batch drain) into folio_free().
Patch 3: Extract the pageout() dispatch state machine into pageout_one().
Patch 4: Extract the TTU setup and try_to_unmap() block into folio_try_unmap().
Patch 5: Implement batch TLB flushing logic. Dirty folios are accumulated in
         batches and a single TLB flush is performed for each batch before
         calling pageout.

Testing
-------
The benchmark script uses stress-ng to compare TLB shootdown behavior before and
after this patch. It constrains a stress-ng workload via memcg to force reclaim
through shrink_folio_list(), reporting TLB shootdowns and IPIs.

Core benchmark command: stress-ng --vm 16 --vm-bytes 2G --vm-keep --timeout 60

==========================================================================
                 batch_dirty_tlb_flush Benchmark Results
==========================================================================
  Kernel: 7.0.0-rc1+   CPUs: 16
  MemTotal: 31834M   SwapTotal: 8191M
  memcg limit: 512M   alloc: 2G   workers: 16   duration: 60s
--------------------------------------------------------------------------
Metric                 Before        After             Delta (abs / %)
--------------------------------------------------------------------------
bogo ops/s             28238.63      35833.97          +7595.34 (+26.9%)
TLB shootdowns         55428953      17621697          -37807256 (-68.2%)
Function call IPIs     34073695      14498768          -19574927 (-57.4%)
pgscan_anon (pages)    52856224      60252894          7396670 (+14.0%)
pgsteal_anon (pages)   29004962      34054753          5049791 (+17.4%)
--------------------------------------------------------------------------

Suggested-by: Kairui Song <kasong@tencent.com>
Signed-off-by: Zhang Peng <bruzzhang@tencent.com>
---
Changes in v4 (addressing Barry Song's review on v3):
- Drop the "track reclaimed pages in reclaim_stat" patch; keep
  shrink_folio_list() returning nr_reclaimed directly. Avoids touching
  the function signature and its MGLRU evict_folios() and
  reclaim_clean_pages_from_list() callers in this series.
- Rename folio_active_bounce() to folio_activate_locked(). The new name
  reflects the precondition (the folio is locked) that callers care about.
- Split the folio_free()/pageout_one() extraction into two patches;
  make pageout_one() return bool so shrink_folio_list() can see whether
  the folio was reclaimed or kept.
- Move the !folio_mapped() check out of folio_try_unmap() into the
  caller, so folio_try_unmap() is only invoked for mapped folios.
- Link to v3: https://lore.kernel.org/r/20260410-batch-tlb-flush-v3-0-ff0b9d3a351a@icloud.com

Changes in v3:
- Patch 5: Replace folio_test_lru() condition check with
  VM_WARN_ON_FOLIO assertion, as PG_lru should never be set for
  isolated folios
- Patch 5: Add comment explaining folio_batch reuse-in-place
  technique in pageout_batch()
- Patch 5: Rewrite comment above folio_unlock() to explain why the
  folio is unlocked while batching
- Link to v2: https://lore.kernel.org/r/20260326-batch-tlb-flush-v2-0-403e523325c4@icloud.com

Changes in v2:
- Fix incorrect comment about page_ref_freeze
- Add folio_maybe_dma_pinned() check in pageout_batch()
- Link to v1: https://lore.kernel.org/r/20260309-batch-tlb-flush-v1-0-eb8fed7d1a9e@icloud.com

---
Zhang Peng (5):
      mm/vmscan: introduce folio_activate_locked() helper
      mm/vmscan: extract folio_free() from shrink_folio_list()
      mm/vmscan: extract pageout_one() from shrink_folio_list()
      mm/vmscan: extract folio unmap logic into folio_try_unmap()
      mm/vmscan: flush TLB for every 31 folios evictions

 mm/vmscan.c | 448 ++++++++++++++++++++++++++++++++++++++----------------------
 1 file changed, 285 insertions(+), 163 deletions(-)
---
base-commit: d0b709f436b2788a10407624688ab8327c5ce18d
change-id: 20260309-batch-tlb-flush-893f0e56b496

Best regards,
-- 
Zhang Peng <zippermonkey@icloud.com>



^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2026-06-17 12:47 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-25 14:57 [PATCH v4 0/5] mm: batch TLB flushing for dirty folios in vmscan Zhang Peng
2026-05-25 14:57 ` [PATCH v4 1/5] mm/vmscan: introduce folio_activate_locked() helper Zhang Peng
2026-06-17 11:59   ` David Hildenbrand (Arm)
2026-05-25 14:57 ` [PATCH v4 2/5] mm/vmscan: extract folio_free() from shrink_folio_list() Zhang Peng
2026-06-17 12:17   ` David Hildenbrand (Arm)
2026-06-17 12:24     ` David Hildenbrand (Arm)
2026-05-25 14:57 ` [PATCH v4 3/5] mm/vmscan: extract pageout_one() " Zhang Peng
2026-06-17 12:19   ` David Hildenbrand (Arm)
2026-06-17 12:25     ` David Hildenbrand (Arm)
2026-05-25 14:57 ` [PATCH v4 4/5] mm/vmscan: extract folio unmap logic into folio_try_unmap() Zhang Peng
2026-06-17 12:28   ` David Hildenbrand (Arm)
2026-05-25 14:57 ` [PATCH v4 5/5] mm/vmscan: flush TLB for every 31 folios evictions Zhang Peng
2026-06-17 12:47   ` David Hildenbrand (Arm)
2026-05-25 18:58 ` [PATCH v4 0/5] mm: batch TLB flushing for dirty folios in vmscan Andrew Morton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.