linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH V2 00/13] mm: slowtier page promotion based on PTE A bit
@ 2025-06-24  5:56 Raghavendra K T
  2025-06-24  5:56 ` [RFC PATCH V2 01/13] mm: Add kscand kthread for PTE A bit scan Raghavendra K T
                   ` (12 more replies)
  0 siblings, 13 replies; 20+ messages in thread
From: Raghavendra K T @ 2025-06-24  5:56 UTC (permalink / raw)
  To: raghavendra.kt
  Cc: AneeshKumar.KizhakeVeetil, Hasan.Maruf, Michael.Day, akpm,
	bharata, dave.hansen, david, dongjoo.linux.dev, feng.tang, gourry,
	hannes, honggyu.kim, hughd, jhubbard, jon.grimm, k.shutemov,
	kbusch, kmanaouil.dev, leesuyeon0506, leillc, liam.howlett,
	linux-kernel, linux-mm, mgorman, mingo, nadav.amit, nphamcs,
	peterz, riel, rientjes, rppt, santosh.shukla, shivankg, shy828301,
	sj, vbabka, weixugc, willy, ying.huang, ziy, Jonathan.Cameron,
	dave, yuanchu, kinseyho, hdanton

The current series is: 
 a) RFC V1 [2] with review comments incorporated.
 b) Design change to stabilize the solution.
 c) Rebased to v6.15.

Introduction:
=============
In the current hot page promotion, all the activities including the
process address space scanning, NUMA hint fault handling and page
migration is performed in the process context. i.e., scanning overhead is
borne by applications.

This RFC V2 patch series does slow-tier page promotion by using PTE Accessed
bit scanning. Scanning is done by a global kernel thread which routinely
scans all the processes' address spaces and checks for accesses by reading
the PTE A bit.

A separate migration thread migrates/promotes the pages to the top-tier
node based on a simple heuristic that uses top-tier scan/access information
of the mm.

Additionally based on the feedback, a prctl knob with a scalar value is
provided to control per task scanning.

Changes since RFC V1:
=====================
- Addressing the review comments by Jonathan (Thank you for your closer
 reviews).

- Per mm migration list with separate lock to resolve race conditions/softlockups
reported by Davidlohr.

- Add one more filter before migration for LRU_GEN case to check whether
 folio is still hot.

- Rename kmmscand ==> kscand kmmmigrated ==> kmigrated (hopefully this
 gets merged into Bharat's upcoming migration thread)

Changes since RFC V0:
======================
- A separate migration thread is used for migration, thus alleviating need for
  multi-threaded scanning (at least as per tracing).

- A simple heuristic for target node calculation is added.

- prctl (David R) interface with scalar value is added to control per task scanning.

- Steve's comment on tracing incorporated.

- Davidlohr's reported bugfix.

- Initial scan delay similar to NUMAB1 mode added.

- Got rid of migration lock during mm_walk.

What is not addressed yet:
===========================
- Patchset can use PFN based list instead of folios for migration.
 This will be done in the next iteration or so when integrating with
  kpromoted [3] /enhanced kmigrated APIs.

- Still using old target node heuristic that chooses only a single
 target. Hillf had suggested promotion to next tier nodes. we are not
 there yet.

- Jonathan, Davidlohr had raised comments on microbenchmark and migration
 based on first access. Now, filtering of folios is done by checking
 whether it is still hot using lru_gen_is_active() (for LRU_GEN case). But it is
 still first access for !LRU_GEN config.

- Davidlohr suggested using NUMAB2 along with scanning. Need
 more thoughts/implementation based on that (without relying on NUMAB2
 timestamps??).

A note on per mm migration list using mm_slot:
=============================================
Using per mm migration list (mm_slot) has helped to reduce contention
 and thus easing mm teardown during process exit.

It also helps to tie PFN/folio with mm to make heuristics work better
 and further it would help to throttle migration per mm (OR process) (TBD).

A note on PTE A bit scanning:
============================
Major positive: Current patchset is able to cover all the process address
 space scanning effectively with simple algorithms to tune scan_size and
 scan_period.

Thanks to Jonathan, Davidlohr for review feedback on RFC V1.

Results:
=======
Microbenchmark gave similar improvements (8%+) as in RFC V1. But more benchmarking
TBD with redis memtier etc (perhaps tuning based on that).

The patchset is also available here 
link:  https://github.com/RaghavendraKT80/linux-mm/tree/kmmscand_rfc_v2

Links:
[1] RFC V0: https://lore.kernel.org/all/20241201153818.2633616-1-raghavendra.kt@amd.com/
[2] RFC V1: https://lore.kernel.org/linux-mm/20250319193028.29514-1-raghavendra.kt@amd.com/
[3] Kpromoted: https://lore.kernel.org/linux-mm/20250306054532.221138-1-bharata@amd.com/

Patch organization:
patch 1-4 initial skeleton for scanning and migration
patch 5: migration
patch 6-8: scanning optimizations
patch 9: target_node heuristic
patch 10-12: sysfs, vmstat and tracing
patch 13: A basic prctl implementation.

Raghavendra K T (13):
  mm: Add kscand kthread for PTE A bit scan
  mm: Maintain mm_struct list in the system
  mm: Scan the mm and create a migration list
  mm: Create a separate kthread for migration
  mm/migration: Migrate accessed folios to toptier node
  mm: Add throttling of mm scanning using scan_period
  mm: Add throttling of mm scanning using scan_size
  mm: Add initial scan delay
  mm: Add a heuristic to calculate target node
  sysfs: Add sysfs support to tune scanning
  vmstat: Add vmstat counters
  trace/kscand: Add tracing of scanning and migration
  prctl: Introduce new prctl to control scanning

 Documentation/filesystems/proc.rst |    2 +
 fs/exec.c                          |    4 +
 fs/proc/task_mmu.c                 |    4 +
 include/linux/kscand.h             |   30 +
 include/linux/migrate.h            |    2 +
 include/linux/mm.h                 |   11 +
 include/linux/mm_types.h           |    7 +
 include/linux/vm_event_item.h      |   10 +
 include/trace/events/kmem.h        |   90 ++
 include/uapi/linux/prctl.h         |    7 +
 kernel/fork.c                      |    8 +
 kernel/sys.c                       |   25 +
 mm/Kconfig                         |    8 +
 mm/Makefile                        |    1 +
 mm/kscand.c                        | 1644 ++++++++++++++++++++++++++++
 mm/migrate.c                       |    2 +-
 mm/vmstat.c                        |   10 +
 17 files changed, 1864 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/kscand.h
 create mode 100644 mm/kscand.c


base-commit: 0ff41df1cb268fc69e703a08a57ee14ae967d0ca
-- 
2.34.1



^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2025-07-08 11:17 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-24  5:56 [RFC PATCH V2 00/13] mm: slowtier page promotion based on PTE A bit Raghavendra K T
2025-06-24  5:56 ` [RFC PATCH V2 01/13] mm: Add kscand kthread for PTE A bit scan Raghavendra K T
2025-06-24  5:56 ` [RFC PATCH V2 02/13] mm: Maintain mm_struct list in the system Raghavendra K T
2025-06-24  5:56 ` [RFC PATCH V2 03/13] mm: Scan the mm and create a migration list Raghavendra K T
2025-06-25 22:07   ` Harry Yoo
2025-06-25 23:05     ` Harry Yoo
2025-06-26  6:27     ` Raghavendra K T
2025-07-08 11:17       ` Raghavendra K T
2025-06-24  5:56 ` [RFC PATCH V2 04/13] mm: Create a separate kthread for migration Raghavendra K T
2025-06-24  5:56 ` [RFC PATCH V2 05/13] mm/migration: Migrate accessed folios to toptier node Raghavendra K T
2025-06-24  5:56 ` [RFC PATCH V2 06/13] mm: Add throttling of mm scanning using scan_period Raghavendra K T
2025-06-24  5:56 ` [RFC PATCH V2 07/13] mm: Add throttling of mm scanning using scan_size Raghavendra K T
2025-06-24  5:56 ` [RFC PATCH V2 08/13] mm: Add initial scan delay Raghavendra K T
2025-06-24  5:56 ` [RFC PATCH V2 09/13] mm: Add a heuristic to calculate target node Raghavendra K T
2025-06-24  5:56 ` [RFC PATCH V2 10/13] sysfs: Add sysfs support to tune scanning Raghavendra K T
2025-06-24  5:56 ` [RFC PATCH V2 11/13] vmstat: Add vmstat counters Raghavendra K T
2025-06-24  5:56 ` [RFC PATCH V2 12/13] trace/kscand: Add tracing of scanning and migration Raghavendra K T
2025-06-24  7:09   ` Masami Hiramatsu
2025-06-24  7:50     ` Raghavendra K T
2025-06-24  5:56 ` [RFC PATCH V2 13/13] prctl: Introduce new prctl to control scanning Raghavendra K T

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).