[PATCH v2 0/5] mm/damon: add mTHP collapse and split actions

DAMON development mailing list
 help / color / mirror / Atom feed

* [PATCH v2 0/5] mm/damon: add mTHP collapse and split actions
@ 2026-07-01 11:47 wang lian
  2026-07-01 11:47 ` [PATCH v2 1/5] mm/damon: add target_order field for DAMOS_COLLAPSE wang lian
                   ` (5 more replies)
  0 siblings, 6 replies; 15+ messages in thread
From: wang lian @ 2026-07-01 11:47 UTC (permalink / raw)
  To: damon, linux-mm
  Cc: linux-kernel, sj, gutierrez.asier, daichaobing, lianux.wang,
	Wang Lian

From: Wang Lian <lianux.mm@gmail.com>

This series gives DAMOS two order-aware folio actions so that an
access-aware policy can manage memory at mTHP granularity: a
target_order field for the existing DAMOS_COLLAPSE, and a new
DAMOS_SPLIT action.  The kernel provides the mechanism; deciding
which specific address ranges to act on is left to user space and
expressed through the existing DAMOS address filter.

v1: https://lore.kernel.org/linux-mm/20260618094838.32805-1-lianux.mm@gmail.com/

Changes since v1

 - Rename DAMOS_MTHP_SPLIT -> DAMOS_SPLIT for naming consistency with
   the existing actions (per SJ's review).
 - Drop the per-scheme hot_threshold field.  Hotness policy does not
   belong in the kernel; target selection now lives in user space and
   is expressed to DAMOS via the address filter (per SJ's review).
 - Drop the v1 SPE debugfs patch entirely.  debugfs is not the right
   interface for a feature, and the SPE profiler belongs in user space
   (see "User-space target selection" below).  v2 is kernel mechanism
   only: 5 patches.
 - Decouple T1 (a lab observation) from T2 (the production issue), and
   correct the architecture claim: ptep_test_and_clear_young() skips
   the TLB flush on both x86_64 and arm64, so the blind spot is
   architecture-independent rather than arm64-only.
 - Terminology: avoid "stale TLB".  A valid TLB entry is doing its
   job; the point is only that it lets the CPU satisfy a translation
   without a page-table walk, so the Accessed bit cleared by DAMON is
   not re-set.

Background

Two effects degrade DAMON's PTE-Accessed-bit (AF) signal once THP is
in play.  Both are described here as motivation only; this series does
not change the AF monitoring path.

T2 -- PMD-granularity inflation (production issue)

A 2MB THP is tracked by a single PMD-level Accessed bit.  One access
to any 4KB sub-page sets the AF for the whole 2MB, so DAMON reports
the entire THP as hot and cannot distinguish a genuinely hot 2MB
region from a 2MB region with a single hot 4KB page.  Cold memory
hides inside "hot" THPs, and access-driven pageout/migration becomes
coarse.

This is the workload that drove the work: Sangfor's Kunpeng 920 KVM
hosts running Oracle.  ARM SPE sampling of that workload shows 94.6%
of THPs have fewer than 10% of their sub-pages actually accessed.

T1 -- TLB-reach blind spot (lab observation)

When the working set fits within L2 TLB reach (Kunpeng 920: 2048
entries x 2MB = 4GB), the CPU keeps hitting the TLB and never walks
the page table.  Because ptep_test_and_clear_young() does not flush
the TLB, valid TLB entries continue to satisfy translations and the
AF that DAMON cleared is never re-set, so DAMON sees nr_accesses=0 for
memory that is in fact hot, and no scheme triggers.  This reproduces
in the lab with small workloads; it is not something we have seen
reported from production, where working sets exceed TLB reach.

What this series adds

Rather than change AF monitoring, this series adds two order-aware
DAMOS actions so a policy layer can act at mTHP granularity:

 - DAMOS_COLLAPSE + target_order (patches 1-3): collapse small folios
   up to a chosen mTHP order.  Patch 1 adds the target_order field and
   its sysfs file; patch 2 exports a khugepaged helper
   (damon_collapse_folio_range()); patch 3 wires the vaddr handler.

 - DAMOS_SPLIT + target_order (patches 4-5): split large folios down
   to a chosen mTHP order via split_folio_to_order(), for both
   anonymous and file-backed (tmpfs/shmem) folios.

The two are complementary, not competing:

   THP=never  + DAMOS_COLLAPSE: start at 4KB, grow hot regions up.
   THP=always + DAMOS_SPLIT:    start at 2MB, shrink cold regions down.

This dual-path design aligns with ideas discussed with Asier
Gutierrez; we plan to unify our mTHP automation and evaluation
roadmaps under this standard DAMOS_SPLIT action.

A deployment can pick either baseline, or run both, and let DAMOS
manage the placement.  THP is still wanted for the hot working set
(fewer TLB misses, shallower walks); the goal is not "no THP" but
"THP where it is hot, small pages where it is cold."

User-space target selection

The decision of *which* regions to collapse or split is left to user
space and fed to DAMOS through the existing DAMOS address filter
(DAMOS_FILTER_TYPE_ADDR) -- the interface suggested during v1 review.
The kernel provides the mechanism; user space provides the policy,
consistent with the perf/BPF "kernel samples, user space decides"
model and with the DAMON-X direction.

Because the AF signal is unreliable at PMD granularity (T1/T2), the
scheme is run with min_nr_accesses=0 so it does not gate on access
count, and the address filter selects targets.  min_nr_accesses=0 is
also what unblocks the T1 case, where nr_accesses is pinned at 0.

Why not just turn khugepaged off?  You can, but khugepaged is global
and usually left enabled because other workloads rely on it; it cannot
be disabled per region.  DAMOS_COLLAPSE gives per-region,
access-pattern-driven collapse -- a more precise, targeted complement
to khugepaged's global scan, not a replacement for it.  To handle the
runtime race where khugepaged might aggressively re-collapse what
DAMOS_SPLIT just split, we are evaluating a precise VMA-level handshake
or back-off mechanism to prevent ping-pong effects in mixed
environments.

Two user-space data sources produce the candidate address ranges:

 1. ARM SPE (ARMv8.2+): perf record (SPE) -> per-2MB hot-fraction
    histogram -> PA->VA via /proc/<pid>/pagemap -> sparse-THP VA
    ranges.  SPE reads physical addresses from the CPU pipeline,
    bypassing the TLB and page tables, so it is immune to T1 and T2.

 2. smaps fallback (no SPE): scan /proc/<pid>/smaps for THP-backed
    VMAs and treat the 2MB-aligned ranges as split candidates.

The SPE profiler stays in user space deliberately: the SPE PMU is a
single-consumer resource, so a kernel consumer would lock out
user-space perf and tooling (x86 PEBS / AMD IBS have the same
property).  Keeping it in user space avoids that and keeps the metric
source pluggable, in line with DAMON-X.  This is why v2 drops the v1
SPE debugfs patch.

Testing

Tested on aarch64 with this series applied to 7.1.0-rc5, THP=always,
using a DAMOS_SPLIT scheme (target_order=2, min_nr_accesses=0) and a
single DAMOS address filter selecting one 2MB-aligned range:

 - Anonymous THP: the filter splits exactly that one THP --
   sz_applied=2MB and AnonHugePages drops by 2MB, the rest of the
   256MB mapping untouched.
 - File-backed THP (tmpfs/shmem mounted huge=always): the same setup
   splits exactly one 2MB shmem THP -- sz_applied=2MB and
   ShmemPmdMapped drops by 2MB.  This confirms split_folio_to_order()
   works for shmem folios (the KVM-guest-on-THP-tmpfs case).
 - The address filter is what bounds the action: sz_tried covers the
   whole ~2GB monitored region while sz_applied is exactly the 2MB the
   filter selected.
 - A smaps-based path (for hosts without SPE) enumerates THP-backed
   ranges and splits all THP in the target workload.
 - checkpatch clean on all 5 patches.

Wang Lian (5):
  mm/damon: add target_order field for DAMOS_COLLAPSE
  mm/khugepaged: add damon_collapse_folio_range() for external callers
  mm/damon/vaddr: implement mTHP-aware DAMOS_COLLAPSE handler
  mm/damon: introduce DAMOS_SPLIT action
  mm/damon/vaddr: implement DAMOS_SPLIT handler

 include/linux/damon.h      |  10 ++++
 include/linux/khugepaged.h |   3 ++
 mm/damon/sysfs-schemes.c   |  57 ++++++++++++++++++++
 mm/damon/vaddr.c           | 106 +++++++++++++++++++++++++++++++++++++
 mm/khugepaged.c            |  39 ++++++++++++++
 5 files changed, 215 insertions(+)

base-commit: 01a87376d94249407343653a63e8ecfbe4c79cda
--
2.50.1 (Apple Git-155)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 1/5] mm/damon: add target_order field for DAMOS_COLLAPSE
  2026-07-01 11:47 [PATCH v2 0/5] mm/damon: add mTHP collapse and split actions wang lian
@ 2026-07-01 11:47 ` wang lian
  2026-07-01 12:07   ` sashiko-bot
  2026-07-01 11:47 ` [PATCH v2 2/5] mm/khugepaged: add damon_collapse_folio_range() for external callers wang lian
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: wang lian @ 2026-07-01 11:47 UTC (permalink / raw)
  To: damon, linux-mm
  Cc: linux-kernel, sj, gutierrez.asier, daichaobing, lianux.wang,
	Wang Lian, Kunwu Chan

From: Wang Lian <lianux.mm@gmail.com>

DAMOS_COLLAPSE currently collapses into PMD-size THP only.  Add a
target_order field to express per-order mTHP collapse intent.  Zero
means system default (PMD order, same as current behavior).  Valid
values are 0 and 2..HPAGE_PMD_ORDER.

Wire up the sysfs interface: a per-scheme rw file "target_order".
Validate at store time that the value is in range, and warn at scheme
creation time if DAMOS_COLLAPSE is used with an unsupported non-PMD
order, resetting to 0.

The actual mTHP application via the khugepaged wrapper will be added
in subsequent patches.

Co-developed-by: Kunwu Chan <kunwu.chan@gmail.com>
Signed-off-by: Kunwu Chan <kunwu.chan@gmail.com>
Signed-off-by: Wang Lian <lianux.mm@gmail.com>
Signed-off-by: Wang Lian <lianux.wang@processmission.com>
---
 include/linux/damon.h    |  5 +++++
 mm/damon/sysfs-schemes.c | 45 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 50 insertions(+)

diff --git a/include/linux/damon.h b/include/linux/damon.h
index 6f7edb3590ef..5a0587556573 100644
--- a/include/linux/damon.h
+++ b/include/linux/damon.h
@@ -572,6 +572,11 @@ struct damos_migrate_dests {
 struct damos {
 	struct damos_access_pattern pattern;
 	enum damos_action action;
+	/*
+	 * @target_order: target order for mTHP actions (DAMOS_COLLAPSE).
+	 * 0 means system default (PMD order).  Valid: 0, 2..HPAGE_PMD_ORDER.
+	 */
+	unsigned int target_order;
 	unsigned long apply_interval_us;
 /* private: internal use only */
 	/*
diff --git a/mm/damon/sysfs-schemes.c b/mm/damon/sysfs-schemes.c
index 329cfd0bbe9f..735970717048 100644
--- a/mm/damon/sysfs-schemes.c
+++ b/mm/damon/sysfs-schemes.c
@@ -6,7 +6,9 @@
  */
 
 #include <linux/slab.h>
+#include <linux/mm.h>
 #include <linux/numa.h>
+#include <linux/huge_mm.h>
 
 #include "sysfs-common.h"
 
@@ -2257,6 +2259,7 @@ struct damon_sysfs_scheme {
 	struct damon_sysfs_stats *stats;
 	struct damon_sysfs_scheme_regions *tried_regions;
 	int target_nid;
+	unsigned int target_order;
 	struct damos_sysfs_dests *dests;
 };
 
@@ -2642,6 +2645,34 @@ static ssize_t target_nid_store(struct kobject *kobj,
 	return err ? err : count;
 }
 
+static ssize_t target_order_show(struct kobject *kobj,
+		struct kobj_attribute *attr, char *buf)
+{
+	struct damon_sysfs_scheme *scheme = container_of(kobj,
+			struct damon_sysfs_scheme, kobj);
+
+	return sysfs_emit(buf, "%u\n", scheme->target_order);
+}
+
+static ssize_t target_order_store(struct kobject *kobj,
+		struct kobj_attribute *attr, const char *buf, size_t count)
+{
+	struct damon_sysfs_scheme *scheme = container_of(kobj,
+			struct damon_sysfs_scheme, kobj);
+	unsigned int val;
+	int err;
+
+	err = kstrtouint(buf, 0, &val);
+	if (err)
+		return err;
+
+	if (val != 0 && (val < 2 || val > HPAGE_PMD_ORDER))
+		return -EINVAL;
+
+	scheme->target_order = val;
+	return count;
+}
+
 static void damon_sysfs_scheme_release(struct kobject *kobj)
 {
 	kfree(container_of(kobj, struct damon_sysfs_scheme, kobj));
@@ -2656,10 +2687,14 @@ static struct kobj_attribute damon_sysfs_scheme_apply_interval_us_attr =
 static struct kobj_attribute damon_sysfs_scheme_target_nid_attr =
 		__ATTR_RW_MODE(target_nid, 0600);
 
+static struct kobj_attribute damon_sysfs_scheme_target_order_attr =
+		__ATTR_RW_MODE(target_order, 0600);
+
 static struct attribute *damon_sysfs_scheme_attrs[] = {
 	&damon_sysfs_scheme_action_attr.attr,
 	&damon_sysfs_scheme_apply_interval_us_attr.attr,
 	&damon_sysfs_scheme_target_nid_attr.attr,
+	&damon_sysfs_scheme_target_order_attr.attr,
 	NULL,
 };
 ATTRIBUTE_GROUPS(damon_sysfs_scheme);
@@ -3005,6 +3040,16 @@ static struct damos *damon_sysfs_mk_scheme(
 	if (!scheme)
 		return NULL;
 
+	if (sysfs_scheme->action == DAMOS_COLLAPSE &&
+	    sysfs_scheme->target_order != 0 &&
+	    sysfs_scheme->target_order != HPAGE_PMD_ORDER) {
+		pr_warn("DAMON collapse: target_order %u not supported, only PMD order (%u) is available. Use 0 or %u.\n",
+			sysfs_scheme->target_order,
+			HPAGE_PMD_ORDER, HPAGE_PMD_ORDER);
+		sysfs_scheme->target_order = 0;
+	}
+	scheme->target_order = sysfs_scheme->target_order;
+
 	err = damos_sysfs_add_quota_score(sysfs_quotas->goals, &scheme->quota);
 	if (err) {
 		damon_destroy_scheme(scheme);
-- 
2.50.1 (Apple Git-155)


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 2/5] mm/khugepaged: add damon_collapse_folio_range() for external callers
  2026-07-01 11:47 [PATCH v2 0/5] mm/damon: add mTHP collapse and split actions wang lian
  2026-07-01 11:47 ` [PATCH v2 1/5] mm/damon: add target_order field for DAMOS_COLLAPSE wang lian
@ 2026-07-01 11:47 ` wang lian
  2026-07-01 12:02   ` sashiko-bot
  2026-07-01 11:47 ` [PATCH v2 3/5] mm/damon/vaddr: implement mTHP-aware DAMOS_COLLAPSE handler wang lian
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: wang lian @ 2026-07-01 11:47 UTC (permalink / raw)
  To: damon, linux-mm
  Cc: linux-kernel, sj, gutierrez.asier, daichaobing, lianux.wang,
	Wang Lian, Kunwu Chan

From: Wang Lian <lianux.mm@gmail.com>

Export a thin wrapper around collapse_huge_page() that allows external
subsystems such as DAMON to trigger THP collapse on a target address
range.

Currently restricted to PMD order (HPAGE_PMD_ORDER), since
collapse_huge_page() does not yet support arbitrary mTHP orders.
The restriction can be relaxed when khugepaged gains mTHP support.

The caller must hold a reference to @mm.  Do not hold mmap lock:
collapse_huge_page() acquires mmap_read_lock for validation, releases
it, then acquires mmap_write_lock for the actual collapse.  Holding
an outer mmap_read_lock would cause a self-deadlock when the same
thread attempts the inner mmap_write_lock.

Co-developed-by: Kunwu Chan <kunwu.chan@gmail.com>
Signed-off-by: Kunwu Chan <kunwu.chan@gmail.com>
Signed-off-by: Wang Lian <lianux.mm@gmail.com>
Signed-off-by: Wang Lian <lianux.wang@processmission.com>
---
 include/linux/khugepaged.h |  3 +++
 mm/khugepaged.c            | 39 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 42 insertions(+)

diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h
index d7a9053ff4fe..6fb8a6857790 100644
--- a/include/linux/khugepaged.h
+++ b/include/linux/khugepaged.h
@@ -20,6 +20,9 @@ extern bool current_is_khugepaged(void);
 void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr,
 		bool install_pmd);
 
+int damon_collapse_folio_range(struct mm_struct *mm, unsigned long start_addr,
+			       unsigned int target_order);
+
 static inline void khugepaged_fork(struct mm_struct *mm, struct mm_struct *oldmm)
 {
 	if (mm_flags_test(MMF_VM_HUGEPAGE, oldmm))
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 617bca76db49..0387841ba2e7 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -3272,3 +3272,42 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
 	return thps == ((hend - hstart) >> HPAGE_PMD_SHIFT) ? 0
 			: madvise_collapse_errno(last_fail);
 }
+
+/**
+ * damon_collapse_folio_range() - Collapse base pages in range into a THP
+ * @mm:         mm_struct of the target process
+ * @start_addr: start address (must be order-aligned)
+ * @target_order: page order of the collapse result (currently only
+ *                HPAGE_PMD_ORDER is supported)
+ *
+ * Thin wrapper around collapse_huge_page() for external callers such as
+ * DAMON.  The caller must hold a reference to @mm.  Do not hold mmap
+ * lock: collapse_huge_page() acquires mmap_read_lock for validation,
+ * releases it, then acquires mmap_write_lock for the collapse.  Holding
+ * an outer mmap_read_lock would self-deadlock.
+ *
+ * Return: 0 on success, -EINVAL on bad arguments, negative error from
+ *         madvise_collapse_errno() otherwise.
+ */
+int damon_collapse_folio_range(struct mm_struct *mm, unsigned long start_addr,
+			       unsigned int target_order)
+{
+	struct collapse_control cc = {
+		.is_khugepaged = false,
+	};
+	enum scan_result result;
+
+	if (target_order != HPAGE_PMD_ORDER) {
+		pr_warn_once("%s: only PMD order (%u) is supported, got %u\n",
+			     __func__, HPAGE_PMD_ORDER, target_order);
+		return -EINVAL;
+	}
+	if (start_addr & ((PAGE_SIZE << target_order) - 1))
+		return -EINVAL;
+
+	result = collapse_huge_page(mm, start_addr, 1, 0, &cc, target_order);
+	if (result == SCAN_SUCCEED)
+		return 0;
+	return madvise_collapse_errno(result);
+}
+EXPORT_SYMBOL_GPL(damon_collapse_folio_range);
-- 
2.50.1 (Apple Git-155)


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 3/5] mm/damon/vaddr: implement mTHP-aware DAMOS_COLLAPSE handler
  2026-07-01 11:47 [PATCH v2 0/5] mm/damon: add mTHP collapse and split actions wang lian
  2026-07-01 11:47 ` [PATCH v2 1/5] mm/damon: add target_order field for DAMOS_COLLAPSE wang lian
  2026-07-01 11:47 ` [PATCH v2 2/5] mm/khugepaged: add damon_collapse_folio_range() for external callers wang lian
@ 2026-07-01 11:47 ` wang lian
  2026-07-01 12:02   ` sashiko-bot
  2026-07-01 11:47 ` [PATCH v2 4/5] mm/damon: introduce DAMOS_SPLIT action wang lian
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: wang lian @ 2026-07-01 11:47 UTC (permalink / raw)
  To: damon, linux-mm
  Cc: linux-kernel, sj, gutierrez.asier, daichaobing, lianux.wang,
	Wang Lian, Kunwu Chan

From: Wang Lian <lianux.mm@gmail.com>

When target_order is set (non-zero), the DAMOS_COLLAPSE handler now calls
damon_collapse_folio_range() to collapse pages into the requested mTHP
size, iterating over the target region in order-aligned chunks.  When
target_order is 0 (default), the existing madvise(MADV_COLLAPSE) path is
used, preserving backwards compatibility.

Region boundaries are expanded outward to the covering aligned range
(ALIGN_DOWN start, ALIGN end) so that collapse works even after
kdamond_split_regions reduces region sizes below the chunk size.
collapse_huge_page() internally validates VMA bounds, so expanding
beyond the original region is safe.

No external mmap lock is held: collapse_huge_page() acquires
mmap_read_lock internally for validation, releases it, then acquires
mmap_write_lock for the actual collapse.  Holding an outer
mmap_read_lock would cause a self-deadlock when the same thread
attempts the inner mmap_write_lock.

Co-developed-by: Kunwu Chan <kunwu.chan@gmail.com>
Signed-off-by: Kunwu Chan <kunwu.chan@gmail.com>
Signed-off-by: Wang Lian <lianux.mm@gmail.com>
Signed-off-by: Wang Lian <lianux.wang@processmission.com>
---
 mm/damon/vaddr.c | 38 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/mm/damon/vaddr.c b/mm/damon/vaddr.c
index d27147603564..2a3757c13bf0 100644
--- a/mm/damon/vaddr.c
+++ b/mm/damon/vaddr.c
@@ -14,6 +14,7 @@
 #include <linux/page_idle.h>
 #include <linux/pagewalk.h>
 #include <linux/sched/mm.h>
+#include <linux/khugepaged.h>
 
 #include "../internal.h"
 #include "ops-common.h"
@@ -899,6 +900,40 @@ static unsigned long damos_va_stat(struct damon_target *target,
 	return 0;
 }
 
+static unsigned long damos_va_collapse(struct damon_target *target,
+		struct damon_region *r, struct damos *s,
+		unsigned long *sz_filter_passed)
+{
+	unsigned long addr, end, chunk_sz;
+	unsigned int target_order = s->target_order;
+	unsigned long applied = 0;
+	struct mm_struct *mm;
+	int ret;
+
+	if (target_order < 2 || target_order > HPAGE_PMD_ORDER)
+		return 0;
+
+	chunk_sz = PAGE_SIZE << target_order;
+	addr = ALIGN_DOWN(r->ar.start, chunk_sz);
+	end = ALIGN(r->ar.end, chunk_sz);
+
+	mm = damon_get_mm(target);
+	if (!mm)
+		return 0;
+
+	while (addr < end) {
+		ret = damon_collapse_folio_range(mm, addr, target_order);
+		if (!ret)
+			applied += chunk_sz;
+		*sz_filter_passed += chunk_sz;
+		addr += chunk_sz;
+		cond_resched();
+	}
+
+	mmput(mm);
+	return applied;
+}
+
 static unsigned long damon_va_apply_scheme(struct damon_ctx *ctx,
 		struct damon_target *t, struct damon_region *r,
 		struct damos *scheme, unsigned long *sz_filter_passed)
@@ -922,6 +957,9 @@ static unsigned long damon_va_apply_scheme(struct damon_ctx *ctx,
 		madv_action = MADV_NOHUGEPAGE;
 		break;
 	case DAMOS_COLLAPSE:
+		if (scheme->target_order)
+			return damos_va_collapse(t, r, scheme,
+						 sz_filter_passed);
 		madv_action = MADV_COLLAPSE;
 		break;
 	case DAMOS_MIGRATE_HOT:
-- 
2.50.1 (Apple Git-155)


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 4/5] mm/damon: introduce DAMOS_SPLIT action
  2026-07-01 11:47 [PATCH v2 0/5] mm/damon: add mTHP collapse and split actions wang lian
                   ` (2 preceding siblings ...)
  2026-07-01 11:47 ` [PATCH v2 3/5] mm/damon/vaddr: implement mTHP-aware DAMOS_COLLAPSE handler wang lian
@ 2026-07-01 11:47 ` wang lian
  2026-07-01 12:04   ` sashiko-bot
  2026-07-01 11:47 ` [PATCH v2 5/5] mm/damon/vaddr: implement DAMOS_SPLIT handler wang lian
  2026-07-01 13:52 ` [PATCH v2 0/5] mm/damon: add mTHP collapse and split actions SJ Park
  5 siblings, 1 reply; 15+ messages in thread
From: wang lian @ 2026-07-01 11:47 UTC (permalink / raw)
  To: damon, linux-mm
  Cc: linux-kernel, sj, gutierrez.asier, daichaobing, lianux.wang,
	Wang Lian

From: Wang Lian <lianux.mm@gmail.com>

Add DAMOS_SPLIT to the damos_action enum for splitting large folios
into smaller mTHP-order folios.  Add a target_order field to struct
damos to specify the desired split order.

Expose the action as "split" through the DAMON sysfs interface with
target_order validation (must be 2..HPAGE_PMD_ORDER-1).

Signed-off-by: Wang Lian <lianux.mm@gmail.com>
Signed-off-by: Wang Lian <lianux.wang@processmission.com>
---
 include/linux/damon.h    |  9 +++++++--
 mm/damon/sysfs-schemes.c | 12 ++++++++++++
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/include/linux/damon.h b/include/linux/damon.h
index 5a0587556573..30cf4afb212c 100644
--- a/include/linux/damon.h
+++ b/include/linux/damon.h
@@ -121,6 +121,7 @@ struct damon_target {
  * @DAMOS_HUGEPAGE:	Call ``madvise()`` for the region with MADV_HUGEPAGE.
  * @DAMOS_NOHUGEPAGE:	Call ``madvise()`` for the region with MADV_NOHUGEPAGE.
  * @DAMOS_COLLAPSE:	Call ``madvise()`` for the region with MADV_COLLAPSE.
+ * @DAMOS_SPLIT:	Split large folios to the target mTHP order.
  * @DAMOS_LRU_PRIO:	Prioritize the region on its LRU lists.
  * @DAMOS_LRU_DEPRIO:	Deprioritize the region on its LRU lists.
  * @DAMOS_MIGRATE_HOT:  Migrate the regions prioritizing warmer regions.
@@ -141,6 +142,7 @@ enum damos_action {
 	DAMOS_HUGEPAGE,
 	DAMOS_NOHUGEPAGE,
 	DAMOS_COLLAPSE,
+	DAMOS_SPLIT,
 	DAMOS_LRU_PRIO,
 	DAMOS_LRU_DEPRIO,
 	DAMOS_MIGRATE_HOT,
@@ -573,8 +575,11 @@ struct damos {
 	struct damos_access_pattern pattern;
 	enum damos_action action;
 	/*
-	 * @target_order: target order for mTHP actions (DAMOS_COLLAPSE).
-	 * 0 means system default (PMD order).  Valid: 0, 2..HPAGE_PMD_ORDER.
+	 * @target_order: target mTHP order for DAMOS_COLLAPSE and
+	 * DAMOS_SPLIT.  For COLLAPSE, 0 means PMD order default,
+	 * valid values: 0, 2..HPAGE_PMD_ORDER.  For SPLIT,
+	 * valid values: 2..HPAGE_PMD_ORDER-1; 0 and HPAGE_PMD_ORDER
+	 * are rejected at scheme creation time (defaulting to 2).
 	 */
 	unsigned int target_order;
 	unsigned long apply_interval_us;
diff --git a/mm/damon/sysfs-schemes.c b/mm/damon/sysfs-schemes.c
index 735970717048..547252fc8a20 100644
--- a/mm/damon/sysfs-schemes.c
+++ b/mm/damon/sysfs-schemes.c
@@ -2293,6 +2293,10 @@ static struct damos_sysfs_action_name damos_sysfs_action_names[] = {
 		.action = DAMOS_COLLAPSE,
 		.name = "collapse",
 	},
+	{
+		.action = DAMOS_SPLIT,
+		.name = "split",
+	},
 	{
 		.action = DAMOS_LRU_PRIO,
 		.name = "lru_prio",
@@ -3048,6 +3052,14 @@ static struct damos *damon_sysfs_mk_scheme(
 			HPAGE_PMD_ORDER, HPAGE_PMD_ORDER);
 		sysfs_scheme->target_order = 0;
 	}
+	if (sysfs_scheme->action == DAMOS_SPLIT &&
+	    (sysfs_scheme->target_order == 0 ||
+	     sysfs_scheme->target_order >= HPAGE_PMD_ORDER)) {
+		pr_warn("DAMON split: target_order %u invalid, need 2..%u. Defaulting to 2.\n",
+			sysfs_scheme->target_order,
+			HPAGE_PMD_ORDER - 1);
+		sysfs_scheme->target_order = 2;
+	}
 	scheme->target_order = sysfs_scheme->target_order;
 
 	err = damos_sysfs_add_quota_score(sysfs_quotas->goals, &scheme->quota);
-- 
2.50.1 (Apple Git-155)


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 5/5] mm/damon/vaddr: implement DAMOS_SPLIT handler
  2026-07-01 11:47 [PATCH v2 0/5] mm/damon: add mTHP collapse and split actions wang lian
                   ` (3 preceding siblings ...)
  2026-07-01 11:47 ` [PATCH v2 4/5] mm/damon: introduce DAMOS_SPLIT action wang lian
@ 2026-07-01 11:47 ` wang lian
  2026-07-01 11:57   ` sashiko-bot
  2026-07-01 13:52 ` [PATCH v2 0/5] mm/damon: add mTHP collapse and split actions SJ Park
  5 siblings, 1 reply; 15+ messages in thread
From: wang lian @ 2026-07-01 11:47 UTC (permalink / raw)
  To: damon, linux-mm
  Cc: linux-kernel, sj, gutierrez.asier, daichaobing, lianux.wang,
	Wang Lian

From: Wang Lian <lianux.mm@gmail.com>

Implement the vaddr operations layer handler for DAMOS_SPLIT.
For each folio in the target region that is larger than the
scheme's target_order, split it via split_folio_to_order().

This supports both anonymous and file-backed (e.g. tmpfs/shmem)
folios, covering KVM guest memory backed by THP tmpfs.

Signed-off-by: Wang Lian <lianux.mm@gmail.com>
Signed-off-by: Wang Lian <lianux.wang@processmission.com>
---
 mm/damon/vaddr.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 68 insertions(+)

diff --git a/mm/damon/vaddr.c b/mm/damon/vaddr.c
index 2a3757c13bf0..3f2061b29ed8 100644
--- a/mm/damon/vaddr.c
+++ b/mm/damon/vaddr.c
@@ -934,6 +934,71 @@ static unsigned long damos_va_collapse(struct damon_target *target,
 	return applied;
 }
 
+static unsigned long damos_va_split(struct damon_target *target,
+		struct damon_region *r, struct damos *s,
+		unsigned long *sz_filter_passed)
+{
+	unsigned long addr, end, chunk_sz;
+	unsigned int target_order = s->target_order;
+	unsigned long applied = 0;
+	struct mm_struct *mm;
+	struct vm_area_struct *vma;
+	struct folio *folio;
+	struct folio_walk fw;
+
+	mm = damon_get_mm(target);
+	if (!mm)
+		return 0;
+
+	chunk_sz = PAGE_SIZE << HPAGE_PMD_ORDER;
+	addr = ALIGN_DOWN(r->ar.start, chunk_sz);
+	end = ALIGN(r->ar.end, chunk_sz);
+
+	while (addr < end) {
+		mmap_read_lock(mm);
+		vma = find_vma(mm, addr);
+		/*
+		 * split_folio_to_order() supports both anon and shmem
+		 * folios, so we accept any VMA that has a folio at @addr.
+		 * This covers important use cases like tmpfs THP-backed
+		 * KVM guest memory where cold and hot pages are bundled
+		 * together in a single PMD THP.
+		 */
+		if (!vma || addr < vma->vm_start)
+			goto unlock;
+
+		folio = folio_walk_start(&fw, vma, addr, 0);
+		if (!folio)
+			goto unlock;
+
+		if (folio_order(folio) > target_order) {
+			if (!folio_trylock(folio)) {
+				folio_walk_end(&fw, vma);
+				goto unlock;
+			}
+			folio_get(folio);
+			folio_walk_end(&fw, vma);
+
+			if (!split_folio_to_order(folio, target_order))
+				applied += chunk_sz;
+
+			folio_unlock(folio);
+			folio_put(folio);
+		} else {
+			folio_walk_end(&fw, vma);
+		}
+
+unlock:
+		*sz_filter_passed += chunk_sz;
+		addr += chunk_sz;
+		mmap_read_unlock(mm);
+		cond_resched();
+	}
+
+	mmput(mm);
+	return applied;
+}
+
 static unsigned long damon_va_apply_scheme(struct damon_ctx *ctx,
 		struct damon_target *t, struct damon_region *r,
 		struct damos *scheme, unsigned long *sz_filter_passed)
@@ -967,6 +1032,9 @@ static unsigned long damon_va_apply_scheme(struct damon_ctx *ctx,
 		return damos_va_migrate(t, r, scheme, sz_filter_passed);
 	case DAMOS_STAT:
 		return damos_va_stat(t, r, scheme, sz_filter_passed);
+	case DAMOS_SPLIT:
+		return damos_va_split(t, r, scheme,
+					  sz_filter_passed);
 	default:
 		/*
 		 * DAMOS actions that are not yet supported by 'vaddr'.
-- 
2.50.1 (Apple Git-155)


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 5/5] mm/damon/vaddr: implement DAMOS_SPLIT handler
  2026-07-01 11:47 ` [PATCH v2 5/5] mm/damon/vaddr: implement DAMOS_SPLIT handler wang lian
@ 2026-07-01 11:57   ` sashiko-bot
  0 siblings, 0 replies; 15+ messages in thread
From: sashiko-bot @ 2026-07-01 11:57 UTC (permalink / raw)
  To: wang lian; +Cc: damon

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [High] Hardcoded 2MB chunk stepping and alignment skips folios, evaluates out-of-bounds addresses, and inflates DAMON quota accounting.
- [High] Missing VMA validation allows `folio_walk_start` to interact with driver-allocated non-folio compound pages and hugetlb pages.
--

commit 5ba0b9e1345c425f2c1d50b6fcfb6b47ff6ed394
Author: Wang Lian <lianux.mm@gmail.com>

mm/damon/vaddr: implement DAMOS_SPLIT handler

This commit implements the vaddr operations layer handler for DAMOS_SPLIT.
It walks the target region and attempts to split any folios larger than
the scheme's target_order using split_folio_to_order().

> diff --git a/mm/damon/vaddr.c b/mm/damon/vaddr.c
> index eeaa403eb79e4..2dd268f037a4f 100644
> --- a/mm/damon/vaddr.c
> +++ b/mm/damon/vaddr.c

[ ... ]

> @@ -932,6 +932,71 @@ static unsigned long damos_va_collapse(struct damon_target *target,
>  	return applied;
>  }
>  
> +static unsigned long damos_va_split(struct damon_target *target,
> +		struct damon_region *r, struct damos *s,
> +		unsigned long *sz_filter_passed)
> +{
> +	unsigned long addr, end, chunk_sz;
> +	unsigned int target_order = s->target_order;
> +	unsigned long applied = 0;
> +	struct mm_struct *mm;
> +	struct vm_area_struct *vma;
> +	struct folio *folio;
> +	struct folio_walk fw;
> +
> +	mm = damon_get_mm(target);
> +	if (!mm)
> +		return 0;
> +
> +	chunk_sz = PAGE_SIZE << HPAGE_PMD_ORDER;
> +	addr = ALIGN_DOWN(r->ar.start, chunk_sz);
> +	end = ALIGN(r->ar.end, chunk_sz);

[Severity: High]
Does this alignment safely handle regions that do not start on 2MB
boundaries? By rounding down the start and rounding up the end to chunk_sz,
the loop in damos_va_split() may evaluate addresses outside the intended
damon_region bounds.

> +
> +	while (addr < end) {
> +		mmap_read_lock(mm);
> +		vma = find_vma(mm, addr);
> +		/*
> +		 * split_folio_to_order() supports both anon and shmem
> +		 * folios, so we accept any VMA that has a folio at @addr.
> +		 * This covers important use cases like tmpfs THP-backed
> +		 * KVM guest memory where cold and hot pages are bundled
> +		 * together in a single PMD THP.
> +		 */
> +		if (!vma || addr < vma->vm_start)
> +			goto unlock;

[Severity: High]
Are we missing VMA type validation here? This check accepts any VMA, which
can include VM_HUGETLB or driver-allocated VM_MIXEDMAP VMAs. 

If the VMA is VM_HUGETLB, folio_walk_start() will use standard page table
accessors which can return garbage or crash. If it is a driver-mapped
compound page, folio_walk_start() will return a non-folio page, and
subsequent calls to folio_trylock() and split_folio_to_order() can dereference
uninitialized fields and corrupt driver state.

> +
> +		folio = folio_walk_start(&fw, vma, addr, 0);
> +		if (!folio)
> +			goto unlock;
> +
> +		if (folio_order(folio) > target_order) {
> +			if (!folio_trylock(folio)) {
> +				folio_walk_end(&fw, vma);
> +				goto unlock;
> +			}
> +			folio_get(folio);
> +			folio_walk_end(&fw, vma);
> +
> +			if (!split_folio_to_order(folio, target_order))
> +				applied += chunk_sz;

[Severity: High]
Could this over-account the applied bytes? By unconditionally adding chunk_sz
(2MB) here, successfully splitting a smaller folio (like a 64KB mTHP) will
increment the applied counter by 2MB, artificially inflating the DAMON quota
accounting.

> +
> +			folio_unlock(folio);
> +			folio_put(folio);
> +		} else {
> +			folio_walk_end(&fw, vma);
> +		}
> +
> +unlock:
> +		*sz_filter_passed += chunk_sz;
> +		addr += chunk_sz;

[Severity: High]
Does this loop inadvertently skip folios? By unconditionally stepping addr by
chunk_sz (2MB) on every iteration, any smaller folios inside this 2MB span
will be completely skipped by the loop in damos_va_split(). Should this step
by the actual folio size instead to ensure we process all folios?

> +		mmap_read_unlock(mm);
> +		cond_resched();
> +	}
> +
> +	mmput(mm);
> +	return applied;
> +}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260701114716.56503-1-lianux.mm@gmail.com?part=5

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/5] mm/damon/vaddr: implement mTHP-aware DAMOS_COLLAPSE handler
  2026-07-01 11:47 ` [PATCH v2 3/5] mm/damon/vaddr: implement mTHP-aware DAMOS_COLLAPSE handler wang lian
@ 2026-07-01 12:02   ` sashiko-bot
  0 siblings, 0 replies; 15+ messages in thread
From: sashiko-bot @ 2026-07-01 12:02 UTC (permalink / raw)
  To: wang lian; +Cc: damon

Thank you for your contribution! Sashiko AI review found 6 potential issue(s) to consider:
- [High] DAMOS_COLLAPSE silently drops support for file-backed memory when target_order is explicitly set.
- [High] damos_va_collapse() spins uselessly for mTHP orders because the backend function explicitly rejects them.
- [High] Expanding sub-regions to chunk boundaries without deduplication causes massive redundant processing and quota inflation.
- [High] Stack exhaustion risk due to allocating a large `struct collapse_control` on the kernel stack.
- [Medium] ALIGN() boundary calculation can overflow to 0 for high memory regions, causing them to be silently skipped.
- [Low] Missing stub for damon_collapse_folio_range() causes build failures when CONFIG_TRANSPARENT_HUGEPAGE is disabled.
--

commit 54fd1422ea9f2a5e501241c8565e081399639772
Author: Wang Lian <lianux.mm@gmail.com>

mm/damon/vaddr: implement mTHP-aware DAMOS_COLLAPSE handler

This commit updates the DAMOS_COLLAPSE handler to support collapsing memory
into requested mTHP sizes. It iterates over target regions in chunk-aligned
ranges and falls back to the existing MADV_COLLAPSE path if no target order
is provided.

> diff --git a/mm/damon/vaddr.c b/mm/damon/vaddr.c
> index e73ec1ce016e5..eeaa403eb79e4 100644
> --- a/mm/damon/vaddr.c
> +++ b/mm/damon/vaddr.c
[ ... ]
> @@ -897,6 +898,40 @@ static unsigned long damos_va_stat(struct damon_target *target,
>  	return 0;
>  }
>  
> +static unsigned long damos_va_collapse(struct damon_target *target,
> +		struct damon_region *r, struct damos *s,
> +		unsigned long *sz_filter_passed)
> +{
> +	unsigned long addr, end, chunk_sz;
> +	unsigned int target_order = s->target_order;
> +	unsigned long applied = 0;
> +	struct mm_struct *mm;
> +	int ret;
> +
> +	if (target_order < 2 || target_order > HPAGE_PMD_ORDER)
> +		return 0;
> +
> +	chunk_sz = PAGE_SIZE << target_order;
> +	addr = ALIGN_DOWN(r->ar.start, chunk_sz);
> +	end = ALIGN(r->ar.end, chunk_sz);

[Severity: Medium]
Does this calculation overflow to 0 if the region ends near the top of the
address space?

If the address is close enough to ULONG_MAX, ALIGN() wrapping around to 0
would cause the subsequent while loop condition to immediately evaluate to
false, silently skipping those memory regions.

> +
> +	mm = damon_get_mm(target);
> +	if (!mm)
> +		return 0;
> +
> +	while (addr < end) {
> +		ret = damon_collapse_folio_range(mm, addr, target_order);

[Severity: Low]
Does this cause a build failure when CONFIG_TRANSPARENT_HUGEPAGE is disabled?

The declaration for damon_collapse_folio_range() in
include/linux/khugepaged.h appears to lack an empty stub for the disabled
case, which would break the unconditional call here.

[Severity: High]
Is there a risk of stack exhaustion in the backend?

Looking at damon_collapse_folio_range() in mm/khugepaged.c, it allocates
struct collapse_control directly on the stack:

mm/khugepaged.c:damon_collapse_folio_range() {
        struct collapse_control cc = {
                .is_khugepaged = false,
        };
        ...
}

Because this struct contains bitmaps sized by MAX_NUMNODES, it can easily
exceed safe kernel stack limits on typical configurations.

[Severity: High]
Can this loop waste CPU by repeatedly attempting operations that will always
fail for mTHP orders?

While this function allows target orders from 2 up to HPAGE_PMD_ORDER,
damon_collapse_folio_range() in mm/khugepaged.c appears to explicitly reject
anything other than HPAGE_PMD_ORDER:

mm/khugepaged.c:damon_collapse_folio_range() {
        if (target_order != HPAGE_PMD_ORDER) {
                pr_warn_once("... only PMD order (%u) is supported, got %u\n",
                             __func__, HPAGE_PMD_ORDER, target_order);
                return -EINVAL;
        }
        ...
}

This would cause immediate -EINVAL failures for valid mTHP target orders.

> +		if (!ret)
> +			applied += chunk_sz;
> +		*sz_filter_passed += chunk_sz;
> +		addr += chunk_sz;

[Severity: High]
Can multiple small DAMON sub-regions that fall within the same chunk cause
redundant processing and inflate the quota?

Since the boundaries are expanded earlier using ALIGN_DOWN() and ALIGN()
without deduplicating adjacent sub-regions, the same chunk might be
collapsed and accounted for multiple times.

> +		cond_resched();
> +	}
> +
> +	mmput(mm);
> +	return applied;
> +}
> +
>  static unsigned long damon_va_apply_scheme(struct damon_ctx *ctx,
>  		struct damon_target *t, struct damon_region *r,
>  		struct damos *scheme, unsigned long *sz_filter_passed)
> @@ -920,6 +955,9 @@ static unsigned long damon_va_apply_scheme(struct damon_ctx *ctx,
>  		madv_action = MADV_NOHUGEPAGE;
>  		break;
>  	case DAMOS_COLLAPSE:
> +		if (scheme->target_order)
> +			return damos_va_collapse(t, r, scheme,
> +						 sz_filter_passed);

[Severity: High]
Does setting the target order silently break support for file-backed memory?

The original MADV_COLLAPSE path supports file-backed regions, but
damon_collapse_folio_range() delegates to collapse_huge_page() in
mm/khugepaged.c, which unconditionally enforces expect_anon=true.

>  		madv_action = MADV_COLLAPSE;
>  		break;
>  	case DAMOS_MIGRATE_HOT:

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260701114716.56503-1-lianux.mm@gmail.com?part=3

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/5] mm/khugepaged: add damon_collapse_folio_range() for external callers
  2026-07-01 11:47 ` [PATCH v2 2/5] mm/khugepaged: add damon_collapse_folio_range() for external callers wang lian
@ 2026-07-01 12:02   ` sashiko-bot
  0 siblings, 0 replies; 15+ messages in thread
From: sashiko-bot @ 2026-07-01 12:02 UTC (permalink / raw)
  To: wang lian; +Cc: damon

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
- [High] Allocating the large `struct collapse_control` directly on the kernel stack risks stack overflow.
- [High] Bypassing the PMD pre-scan phase violates khugepaged invariants, causing logic errors, performance degradation, and potential crashes.
- [Medium] Missing `lru_add_drain_all()` before collapsing causes spurious collapse failures due to inflated page refcounts.
- [Medium] Successful pre-existing THP state (`SCAN_PMD_MAPPED`) is incorrectly translated into a hard `-EINVAL` error.
--

commit bb75d3df2a4ada9ebcc7237ae7417a21c9fbc4f8
Author: Wang Lian <lianux.mm@gmail.com>

mm/khugepaged: add damon_collapse_folio_range() for external callers

Export a wrapper around collapse_huge_page() that allows external
subsystems such as DAMON to trigger THP collapse on a target address range.

> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 617bca76db49b..0387841ba2e7d 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -3272,3 +3272,42 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
>  	return thps == ((hend - hstart) >> HPAGE_PMD_SHIFT) ? 0
>  			: madvise_collapse_errno(last_fail);
>  }
> +
> +/**
> + * damon_collapse_folio_range() - Collapse base pages in range into a THP
> + * @mm:         mm_struct of the target process
> + * @start_addr: start address (must be order-aligned)
> + * @target_order: page order of the collapse result (currently only
> + *                HPAGE_PMD_ORDER is supported)
> + *
> + * Thin wrapper around collapse_huge_page() for external callers such as
> + * DAMON.  The caller must hold a reference to @mm.  Do not hold mmap
> + * lock: collapse_huge_page() acquires mmap_read_lock for validation,
> + * releases it, then acquires mmap_write_lock for the collapse.  Holding
> + * an outer mmap_read_lock would self-deadlock.
> + *
> + * Return: 0 on success, -EINVAL on bad arguments, negative error from
> + *         madvise_collapse_errno() otherwise.
> + */
> +int damon_collapse_folio_range(struct mm_struct *mm, unsigned long start_addr,
> +			       unsigned int target_order)
> +{
> +	struct collapse_control cc = {
> +		.is_khugepaged = false,
> +	};

[Severity: High]
Does allocating struct collapse_control directly on the stack risk a stack
overflow?

The collapse_control structure contains a large array:

struct collapse_control {
	...
	u32 node_load[MAX_NUMNODES];
	...
};

With CONFIG_NODES_SHIFT=10 on x86_64, MAX_NUMNODES is 1024, making the struct
larger than 4KB. Allocating this directly on the stack could overflow the
kernel stack.

Should this be heap-allocated with kmalloc_obj() as done in
madvise_collapse()?

> +	enum scan_result result;
> +
> +	if (target_order != HPAGE_PMD_ORDER) {
> +		pr_warn_once("%s: only PMD order (%u) is supported, got %u\n",
> +			     __func__, HPAGE_PMD_ORDER, target_order);
> +		return -EINVAL;
> +	}
> +	if (start_addr & ((PAGE_SIZE << target_order) - 1))
> +		return -EINVAL;
> +
> +	result = collapse_huge_page(mm, start_addr, 1, 0, &cc, target_order);

[Severity: High]
Is it safe to bypass the collapse_scan_pmd() pre-scan phase and directly call
collapse_huge_page()?

By skipping the pre-scan, several issues appear to occur:

First, collapse_huge_page() unconditionally allocates and charges a 2MB folio
before validating the VMA:

static enum scan_result collapse_huge_page(...)
{
	...
	result = alloc_charge_folio(&folio, mm, cc, order);
	...
}

Second, because cc.node_load is never populated by the scan,
collapse_find_target_node() always defaults to node 0, breaking NUMA locality.

Finally, if a non-anonymous page is legitimately mapped into an anonymous VMA,
__collapse_huge_page_isolate() could trigger a kernel panic via:

	VM_BUG_ON_FOLIO(!folio_test_anon(folio), folio);

Does this sequence need to invoke the pre-scan phase before collapsing?

[Severity: Medium]
Will this reliably fail to collapse pages that are currently in per-CPU LRU
batches?

Without calling lru_add_drain_all() before collapse_huge_page(), transient
references held by the LRU batches inflate the page reference counts. The
strict exclusivity check in __collapse_huge_page_isolate() will then return
SCAN_PAGE_COUNT, resulting in a spurious -EAGAIN failure.

> +	if (result == SCAN_SUCCEED)
> +		return 0;
> +	return madvise_collapse_errno(result);

[Severity: Medium]
Does this code handle SCAN_PMD_MAPPED correctly when memory is already backed
by a THP?

If the range is already PMD-mapped, collapse_huge_page() detects this in
find_pmd_or_thp_or_none() and returns SCAN_PMD_MAPPED. Because this function
only checks for SCAN_SUCCEED, SCAN_PMD_MAPPED falls through to
madvise_collapse_errno(), which lacks a case for it and returns -EINVAL.

Should SCAN_PMD_MAPPED be treated as a success case?

> +}
> +EXPORT_SYMBOL_GPL(damon_collapse_folio_range);

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260701114716.56503-1-lianux.mm@gmail.com?part=2

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 4/5] mm/damon: introduce DAMOS_SPLIT action
  2026-07-01 11:47 ` [PATCH v2 4/5] mm/damon: introduce DAMOS_SPLIT action wang lian
@ 2026-07-01 12:04   ` sashiko-bot
  0 siblings, 0 replies; 15+ messages in thread
From: sashiko-bot @ 2026-07-01 12:04 UTC (permalink / raw)
  To: wang lian; +Cc: damon

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [Medium] TOCTOU race in damon_sysfs_mk_scheme() allows bypassing bounds validation for DAMOS_SPLIT target_order.
- [Medium] Uninitialized target_order field in damon_sysfs_scheme_alloc() leaks kernel memory to userspace.
- [Low] Unconditional use of HPAGE_PMD_ORDER causes a build failure when huge pages are disabled.
--

commit 3d569aafbbd14d480f96311f5d7fa6a898ce6974
Author: Wang Lian <lianux.mm@gmail.com>

mm/damon: introduce DAMOS_SPLIT action

Add DAMOS_SPLIT to the damos_action enum for splitting large folios
into smaller mTHP-order folios.  Add a target_order field to struct
damos to specify the desired split order.

> diff --git a/include/linux/damon.h b/include/linux/damon.h
> index 35b90b47e35a0..766984a1e63c5 100644
> --- a/include/linux/damon.h
> +++ b/include/linux/damon.h
[ ... ]
> @@ -571,8 +573,11 @@ struct damos {
>  	struct damos_access_pattern pattern;
>  	enum damos_action action;
>  	/*
> -	 * @target_order: target order for mTHP actions (DAMOS_COLLAPSE).
> -	 * 0 means system default (PMD order).  Valid: 0, 2..HPAGE_PMD_ORDER.
> +	 * @target_order: target mTHP order for DAMOS_COLLAPSE and
> +	 * DAMOS_SPLIT.  For COLLAPSE, 0 means PMD order default,
> +	 * valid values: 0, 2..HPAGE_PMD_ORDER.  For SPLIT,
> +	 * valid values: 2..HPAGE_PMD_ORDER-1; 0 and HPAGE_PMD_ORDER
> +	 * are rejected at scheme creation time (defaulting to 2).
>  	 */
>  	unsigned int target_order;
>  	unsigned long apply_interval_us;
> diff --git a/mm/damon/sysfs-schemes.c b/mm/damon/sysfs-schemes.c
> index 085b8042d1250..00d502119c3a5 100644
> --- a/mm/damon/sysfs-schemes.c
> +++ b/mm/damon/sysfs-schemes.c
[ ... ]
> @@ -3053,6 +3057,14 @@ static struct damos *damon_sysfs_mk_scheme(
>  			HPAGE_PMD_ORDER, HPAGE_PMD_ORDER);
>  		sysfs_scheme->target_order = 0;
>  	}
> +	if (sysfs_scheme->action == DAMOS_SPLIT &&
> +	    (sysfs_scheme->target_order == 0 ||
> +	     sysfs_scheme->target_order >= HPAGE_PMD_ORDER)) {

[Severity: Low]
When huge pages are disabled, HPAGE_PMD_ORDER can evaluate to a BUILD_BUG()
expression on some architectures. Will using it unconditionally here cause a
build failure during compile time?

> +		pr_warn("DAMON split: target_order %u invalid, need 2..%u. Defaulting to 2.\n",
> +			sysfs_scheme->target_order,
> +			HPAGE_PMD_ORDER - 1);
> +		sysfs_scheme->target_order = 2;
> +	}
>  	scheme->target_order = sysfs_scheme->target_order;

[Severity: Medium]
Could a concurrent sysfs write change sysfs_scheme->target_order between
the validation above and this assignment?

If sysfs_scheme->target_order is modified locklessly, it might bypass
the boundary checks and assign an invalid value to scheme->target_order.

[Severity: Medium]
It appears sysfs_scheme->target_order might be uninitialized when a scheme
is first allocated. In damon_sysfs_scheme_alloc(), the scheme structure is
allocated without zeroing:

mm/damon/sysfs-schemes.c:damon_sysfs_scheme_alloc() {
	...
	struct damon_sysfs_scheme *scheme = kmalloc_obj(*scheme);
	...
}

If target_order_show() reads this field before it is explicitly set:

mm/damon/sysfs-schemes.c:target_order_show() {
	...
	return sysfs_emit(buf, "%u\n", scheme->target_order);
}

Could this leak uninitialized kernel memory to userspace?

>  
>  	err = damos_sysfs_add_quota_score(sysfs_quotas->goals, &scheme->quota);

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260701114716.56503-1-lianux.mm@gmail.com?part=4

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/5] mm/damon: add target_order field for DAMOS_COLLAPSE
  2026-07-01 11:47 ` [PATCH v2 1/5] mm/damon: add target_order field for DAMOS_COLLAPSE wang lian
@ 2026-07-01 12:07   ` sashiko-bot
  0 siblings, 0 replies; 15+ messages in thread
From: sashiko-bot @ 2026-07-01 12:07 UTC (permalink / raw)
  To: wang lian; +Cc: damon

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
- [High] The newly added `target_order` field is not initialized in `damon_sysfs_scheme_alloc()` or `damon_new_scheme()`, leading to kernel heap memory leaks to userspace via sysfs.
- [High] The `target_order` field is not copied when committing scheme updates, causing configuration changes to be ignored.
- [Medium] TOCTOU race in `damon_sysfs_mk_scheme` allows configuring invalid `target_order` for DAMOS_COLLAPSE and DAMOS_SPLIT schemes.
- [Low] Unconditional use of `HPAGE_PMD_ORDER` causes a build failure (`BUILD_BUG()`) on architectures or configurations without huge page support.
--

commit 5b7feca9accb6a6f2fee1a76e45a6ee03f08c17e
Author: Wang Lian <lianux.mm@gmail.com>

mm/damon: add target_order field for DAMOS_COLLAPSE

DAMOS_COLLAPSE currently collapses into PMD-size THP only.  Add a
target_order field to express per-order mTHP collapse intent.

> diff --git a/include/linux/damon.h b/include/linux/damon.h
> index cfbbf8ba28f63..35b90b47e35a0 100644
> --- a/include/linux/damon.h
> +++ b/include/linux/damon.h
> @@ -570,6 +570,11 @@ struct damos_migrate_dests {
>  struct damos {
>  	struct damos_access_pattern pattern;
>  	enum damos_action action;
> +	/*
> +	 * @target_order: target order for mTHP actions (DAMOS_COLLAPSE).
> +	 * 0 means system default (PMD order).  Valid: 0, 2..HPAGE_PMD_ORDER.
> +	 */
> +	unsigned int target_order;

[Severity: High]
Does adding this field require updating damos_commit() in mm/damon/core.c to
copy the new value?

When a user updates the target_order via sysfs and commits the changes,
damos_commit() copies parameters from the pending sysfs scheme to the running
scheme. Since target_order is not copied there, it looks like dynamic
configuration changes to this field will be silently ignored.

[Severity: High]
Is this field properly initialized when allocating a new scheme?

Looking at damon_new_scheme() in mm/damon/core.c, the core struct damos is
allocated using kmalloc_obj() which does not zero-initialize memory. Since
target_order is not explicitly initialized there, it will contain
uninitialized heap data.

[ ... ]
> diff --git a/mm/damon/sysfs-schemes.c b/mm/damon/sysfs-schemes.c
> index 41f93a1823bf1..085b8042d1250 100644
> --- a/mm/damon/sysfs-schemes.c
> +++ b/mm/damon/sysfs-schemes.c
[ ... ]
> @@ -2644,6 +2647,34 @@ static ssize_t target_nid_store(struct kobject *kobj,
>  	return err ? err : count;
>  }
>  
> +static ssize_t target_order_show(struct kobject *kobj,
> +		struct kobj_attribute *attr, char *buf)
> +{
> +	struct damon_sysfs_scheme *scheme = container_of(kobj,
> +			struct damon_sysfs_scheme, kobj);
> +
> +	return sysfs_emit(buf, "%u\n", scheme->target_order);
> +}

[Severity: High]
Can this leak kernel heap memory to userspace?

The struct damon_sysfs_scheme is allocated in damon_sysfs_scheme_alloc()
using kmalloc_obj(), which does not zero the memory. The new target_order
field is not initialized during allocation.

If a user reads the sysfs file via target_order_show() before writing to it,
uninitialized kernel heap memory will be exposed.

> +
> +static ssize_t target_order_store(struct kobject *kobj,
> +		struct kobj_attribute *attr, const char *buf, size_t count)
> +{
> +	struct damon_sysfs_scheme *scheme = container_of(kobj,
> +			struct damon_sysfs_scheme, kobj);
> +	unsigned int val;
> +	int err;
> +
> +	err = kstrtouint(buf, 0, &val);
> +	if (err)
> +		return err;
> +
> +	if (val != 0 && (val < 2 || val > HPAGE_PMD_ORDER))

[Severity: Low]
Will this cause a build failure on configurations without huge page support?

When CONFIG_PGTABLE_HAS_HUGE_LEAVES is not set (e.g. both THP and Hugetlb
are disabled), HPAGE_PMD_SHIFT is defined as a statement expression
triggering a BUILD_BUG().

Since CONFIG_DAMON_SYSFS does not depend on huge page support, evaluating
HPAGE_PMD_ORDER unconditionally here will trigger that build error.

[ ... ]
> @@ -3010,6 +3045,16 @@ static struct damos *damon_sysfs_mk_scheme(
>  	if (!scheme)
>  		return NULL;
>  
> +	if (sysfs_scheme->action == DAMOS_COLLAPSE &&
> +	    sysfs_scheme->target_order != 0 &&
> +	    sysfs_scheme->target_order != HPAGE_PMD_ORDER) {

[Severity: Medium]
Is there a Time-of-Check to Time-of-Use race when reading target_order?

damon_sysfs_mk_scheme() builds the kernel internal struct damos object from
the sysfs state while holding damon_sysfs_lock. However, sysfs property
writers like target_order_store() do not acquire damon_sysfs_lock when
modifying sysfs_scheme->target_order.

Because sysfs_scheme->target_order is read multiple times here:

1. To check if it is valid for DAMOS_COLLAPSE.
2. To log a warning if invalid.
3. For final assignment to scheme->target_order below.

A concurrent write could change the value between the checks and the final
assignment, completely bypassing the validation rules.

> +		pr_warn("DAMON collapse: target_order %u not supported, only PMD order (%u) is available. Use 0 or %u.\n",
> +			sysfs_scheme->target_order,
> +			HPAGE_PMD_ORDER, HPAGE_PMD_ORDER);
> +		sysfs_scheme->target_order = 0;
> +	}
> +	scheme->target_order = sysfs_scheme->target_order;

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260701114716.56503-1-lianux.mm@gmail.com?part=1

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/5] mm/damon: add mTHP collapse and split actions
  2026-07-01 11:47 [PATCH v2 0/5] mm/damon: add mTHP collapse and split actions wang lian
                   ` (4 preceding siblings ...)
  2026-07-01 11:47 ` [PATCH v2 5/5] mm/damon/vaddr: implement DAMOS_SPLIT handler wang lian
@ 2026-07-01 13:52 ` SJ Park
  2026-07-02  6:52   ` wang lian
  2026-07-02  7:02   ` wang lian
  5 siblings, 2 replies; 15+ messages in thread
From: SJ Park @ 2026-07-01 13:52 UTC (permalink / raw)
  To: wang lian
  Cc: SJ Park, damon, linux-mm, linux-kernel, gutierrez.asier,
	daichaobing, lianux.wang

This is the first version of this series that has dropped the RFC tag.  From
the next time, please reset the version number when you drop RFC.  E.g.,

    RFC PATCH -> RFC PATCH v2 -> RFC PATCH v3 -> PATCH v1 -> PATCH v2

Also, droppping RFC means you think this patch is ready to be merged as-is.  It
is important because the level of review should also be different for RFC and
non-RFC patches.  I'm unsure if this is your intention or you just mistakenly
dropped the tag, because I was expecting RFC v2 for this series.  Could you
please clarify?

I will hold review of this series before the answer to the above question is
clear.

Thanks,
SJ

[...]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/5] mm/damon: add mTHP collapse and split actions
  2026-07-01 13:52 ` [PATCH v2 0/5] mm/damon: add mTHP collapse and split actions SJ Park
@ 2026-07-02  6:52   ` wang lian
  2026-07-02 16:10     ` SJ Park
  2026-07-02  7:02   ` wang lian
  1 sibling, 1 reply; 15+ messages in thread
From: wang lian @ 2026-07-02  6:52 UTC (permalink / raw)
  To: sj; +Cc: damon, linux-mm, daichaobing

From: Wang Lian <lianux.mm@gmail.com>

Hi SJ,

Thank you for correcting my misunderstanding of the versioning rules.
This was indeed a mistake on my part, and I apologize for the confusion.

I did not intend to claim that this series is ready to be merged as-is in its current state.
I fully expected to send this iteration as an "RFC v2" to continue our high-level architectural
discussion, especially given the major structural pivots we made from v1
(dropping the debugfs pipeline and shifting the telemetry to user space).
I mistakenly incremented the version number to [PATCH v2] while
prematurely dropping the RFC tag.

Since my true intention is to gather feedback on this new user-space control
plane approach before any final merge integration, I would like to fix this immediately.

I will re-send this exact series right away with the correct tag:
[RFC PATCH v2 0/5], following the versioning pipeline you suggested.

Please hold off on reviewing this "PATCH v2" thread;
I will get the proper "RFC v2" into your inbox in a moment.

Thanks again for guiding me through the community conventions!

Best regards,
Wang Lian

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/5] mm/damon: add mTHP collapse and split actions
  2026-07-01 13:52 ` [PATCH v2 0/5] mm/damon: add mTHP collapse and split actions SJ Park
  2026-07-02  6:52   ` wang lian
@ 2026-07-02  7:02   ` wang lian
  1 sibling, 0 replies; 15+ messages in thread
From: wang lian @ 2026-07-02  7:02 UTC (permalink / raw)
  To: sj; +Cc: damon, linux-mm, daichaobing

From: Wang Lian <lianux.mm@gmail.com>

Hi SJ,

Thank you for correcting my misunderstanding of the versioning rules.
This was indeed a mistake on my part, and I apologize for the confusion.

I did not intend to claim that this series is ready to be merged as-is in its current state.
I fully expected to send this iteration as an "RFC v2" to continue our high-level architectural
discussion, especially given the major structural pivots we made from v1
(dropping the debugfs pipeline and shifting the telemetry to user space).
I mistakenly incremented the version number to [PATCH v2] while
prematurely dropping the RFC tag.

Since my true intention is to gather feedback on this new user-space control
plane approach before any final merge integration, I would like to fix this immediately.

I will re-send this exact series right away with the correct tag:
[RFC PATCH v2 0/5], following the versioning pipeline you suggested.

Please hold off on reviewing this "PATCH v2" thread;
I will get the proper "RFC v2" into your inbox in a moment.

Thanks again for guiding me through the community conventions!

Best regards,
Wang Lian

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/5] mm/damon: add mTHP collapse and split actions
  2026-07-02  6:52   ` wang lian
@ 2026-07-02 16:10     ` SJ Park
  0 siblings, 0 replies; 15+ messages in thread
From: SJ Park @ 2026-07-02 16:10 UTC (permalink / raw)
  To: wang lian; +Cc: SJ Park, damon, linux-mm, daichaobing

On Thu,  2 Jul 2026 14:52:44 +0800 wang lian <lianux.mm@gmail.com> wrote:

> From: Wang Lian <lianux.mm@gmail.com>
> 
> Hi SJ,
> 
> Thank you for correcting my misunderstanding of the versioning rules.
> This was indeed a mistake on my part, and I apologize for the confusion.
> 
> I did not intend to claim that this series is ready to be merged as-is in its current state.
> I fully expected to send this iteration as an "RFC v2" to continue our high-level architectural
> discussion, especially given the major structural pivots we made from v1
> (dropping the debugfs pipeline and shifting the telemetry to user space).
> I mistakenly incremented the version number to [PATCH v2] while
> prematurely dropping the RFC tag.

No worry, thank you for clarifying!

> 
> Since my true intention is to gather feedback on this new user-space control
> plane approach before any final merge integration, I would like to fix this immediately.
> 
> I will re-send this exact series right away with the correct tag:
> [RFC PATCH v2 0/5], following the versioning pipeline you suggested.
> 
> Please hold off on reviewing this "PATCH v2" thread;
> I will get the proper "RFC v2" into your inbox in a moment.

I got it.  I will review it soon.


Thanks,
SJ

[...]

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2026-07-02 16:10 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-07-01 11:47 [PATCH v2 0/5] mm/damon: add mTHP collapse and split actions wang lian
2026-07-01 11:47 ` [PATCH v2 1/5] mm/damon: add target_order field for DAMOS_COLLAPSE wang lian
2026-07-01 12:07   ` sashiko-bot
2026-07-01 11:47 ` [PATCH v2 2/5] mm/khugepaged: add damon_collapse_folio_range() for external callers wang lian
2026-07-01 12:02   ` sashiko-bot
2026-07-01 11:47 ` [PATCH v2 3/5] mm/damon/vaddr: implement mTHP-aware DAMOS_COLLAPSE handler wang lian
2026-07-01 12:02   ` sashiko-bot
2026-07-01 11:47 ` [PATCH v2 4/5] mm/damon: introduce DAMOS_SPLIT action wang lian
2026-07-01 12:04   ` sashiko-bot
2026-07-01 11:47 ` [PATCH v2 5/5] mm/damon/vaddr: implement DAMOS_SPLIT handler wang lian
2026-07-01 11:57   ` sashiko-bot
2026-07-01 13:52 ` [PATCH v2 0/5] mm/damon: add mTHP collapse and split actions SJ Park
2026-07-02  6:52   ` wang lian
2026-07-02 16:10     ` SJ Park
2026-07-02  7:02   ` wang lian

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox