Linux Documentation
 help / color / mirror / Atom feed
* [RFC PATCH 0/5] mm/damon: DAMOS quota controller and paddr migration walk fixes
@ 2026-05-16 21:03 Ravi Jonnalagadda
  2026-05-16 21:03 ` [RFC PATCH 1/5] mm/damon/core: fix nr_accesses_bp underflow in damon_moving_sum Ravi Jonnalagadda
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Ravi Jonnalagadda @ 2026-05-16 21:03 UTC (permalink / raw)
  To: sj, damon, linux-mm, linux-kernel, linux-doc
  Cc: akpm, corbet, bijan311, ajayjoshi, honggyu.kim, yunjeong.mun,
	ravis.opensrc

MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Hi,

This series carries five fixes for the DAMOS quota controller and the
paddr migration walk.  All five were surfaced during closed-loop tiering
testing on a heterogeneous-memory system (DRAM + CXL on a separate
NUMA node), but each fix is in code paths that benefit any caller --
not scoped to closed-loop tiering or to any specific goal metric.

Test envelope: AMD EPYC dual-socket host with CXL.mem on a separate
NUMA node, two-scheme migrate_hot PULL+PUSH setup driven by
node_eligible_mem_bp (now in linux-next)[1].

What each patch does
====================

Patch 1 - mm/damon/core: fix nr_accesses_bp underflow in damon_moving_sum

  damon_moving_sum() can underflow when a region's access rate drops
  to zero faster than the moving-sum window length.  The internal
  accumulator subtracts an outgoing sample without a lower bound,
  producing a sentinel-large nr_accesses_bp that mis-classifies a
  cold region as hot.  Affects every DAMOS scheme, since
  nr_accesses_bp is used on every access-rate update for every
  region regardless of which scheme or goal metric is active.

Patch 2 - mm/damon/core: cap effective quota size to total monitored memory

  The DAMOS quota tuner can compute an effective size (esz) larger
  than the total monitored memory, because the tuner integrates over
  cumulative deltas without bounding by the actual workload size.
  Once esz exceeds total monitored memory the per-tick "remaining
  quota" arithmetic stops being meaningful: any scheme can apply to
  the entire monitored space and "remaining" stays positive
  indefinitely.  Cap esz at total monitored memory so the controller
  remains within physically realisable bounds.  Tuner-shape and
  goal-metric agnostic.

Patch 3 - mm/damon/core: floor effective quota size at minimum region size

  Symmetric to patch 2: the tuner can also compute esz < min_region_sz,
  causing schemes to attempt zero-byte migrations for many ticks before
  the tuner ramps esz back up.  Observed under the CONSIST tuner with
  a node_eligible_mem_bp goal: ftrace traced esz stuck at 1 byte for
  96 seconds before the first region was tried; the first acted-on
  region appeared at t=113s when esz crossed the min_region_sz
  threshold.  Floor esz at min_region_sz so schemes always have at
  least one region's worth of quota when the tuner asks them to act.

Patch 4 - mm/damon/paddr: skip free pageblocks in migration walk

  damon_pa_migrate() walks every 4KB PFN in a region.  On
  sparsely-populated lower-tier extents (e.g., a 549GB CXL region
  with only ~8GB populated), this is ~144M PFN iterations per scheme
  tick at ~40ns each = ~5.6 seconds of walk per tick.  Use
  pageblock-level free-page detection to skip unpopulated runs of
  pages: only enter the per-page loop for pageblocks that contain at
  least one allocated page.  This brings the walk to
  O(region_size / pageblock_size) skip-check cost plus
  O(populated_pages) per-page work.  On x86 pageblocks are 2MB, so
  the same 549GB/8GB example becomes ~281K pageblock skip-checks
  (microseconds total) plus ~2M per-page visits for the populated
  pages -- ~80ms expected.  Helps any migrate_hot/migrate_cold scheme
  on paddr ops, regardless of what drives them.

Patch 5 - mm/damon/paddr: add time budget to migration page walk

  Densely populated regions (e.g., a busy DRAM range where most
  pageblocks contain at least one allocated page) can still consume
  full ticks even with patch 4 applied.  Add a 100ms wall-clock
  budget with a ktime_get() check every 4096 pages walked
  (~16MB worth).  When the budget expires before reaching the end of
  a region, kdamond returns control; subsequent ticks re-walk the
  region from the start.  Folios already on the target node are
  dropped at migration time, so re-walks only re-do collection work,
  not the migrate itself.  Together with the per-scheme quota cap,
  per-tick work is bounded and the workload converges over multiple
  ticks for dense regions.

  Worst-case migration walk contribution to a tick is bounded at
  100ms per scheme regardless of region size or population density,
  preserving kdamond's ability to service other DAMOS schemes and
  user-space sysfs operations during heavy migration phases.


Testing context
===============

  Hardware:  AMD EPYC dual-socket, CXL.mem on a separate NUMA node.
  Workload: 32GB hot working set across DRAM and CXL nodes.
  DAMON config: paddr ops, two migrate_hot schemes (PULL CXL->DRAM,
                PUSH DRAM->CXL) with complementary address filters,
                node_eligible_mem_bp goal per scheme, temporal
                quota tuner, 1s reset interval.

Each fix in this series was reproduced under the above setup, then
verified via ftrace and per-scheme stats after the fix landed.

References
==========

[1] mm/damon: add node_eligible_mem_bp goal metric
https://lore.kernel.org/damon/20260428030520.701-1-ravis.opensrc@gmail.com/

Ravi Jonnalagadda (5):
  mm/damon/core: fix nr_accesses_bp underflow in damon_moving_sum
  mm/damon/core: cap effective quota size to total monitored memory
  mm/damon/core: floor effective quota size at minimum region size
  mm/damon/paddr: skip free pageblocks in migration walk
  mm/damon/paddr: add time budget to migration page walk

 mm/damon/core.c  | 29 ++++++++++++++++++++++++++++-
 mm/damon/paddr.c | 40 +++++++++++++++++++++++++++++++++++++---
 2 files changed, 65 insertions(+), 4 deletions(-)


base-commit: 0cec77cfd5314c0b3b03530abe1a4b32e991f639
-- 
2.43.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC PATCH 1/5] mm/damon/core: fix nr_accesses_bp underflow in damon_moving_sum
  2026-05-16 21:03 [RFC PATCH 0/5] mm/damon: DAMOS quota controller and paddr migration walk fixes Ravi Jonnalagadda
@ 2026-05-16 21:03 ` Ravi Jonnalagadda
  2026-05-17 18:16   ` SeongJae Park
  2026-05-16 21:03 ` [RFC PATCH 2/5] mm/damon/core: cap effective quota size to total monitored memory Ravi Jonnalagadda
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 17+ messages in thread
From: Ravi Jonnalagadda @ 2026-05-16 21:03 UTC (permalink / raw)
  To: sj, damon, linux-mm, linux-kernel, linux-doc
  Cc: akpm, corbet, bijan311, ajayjoshi, honggyu.kim, yunjeong.mun,
	ravis.opensrc

Guard against unsigned integer underflow when nomvsum/len_window
exceeds mvsum.  When that subtraction wraps, the moving sum returns a
near-ULONG_MAX value and corrupts nr_accesses_bp.

If subtrahend > mvsum, return new_value: this clamps the moving-sum
estimate to the current observation rather than wrapping.

Signed-off-by: Ravi Jonnalagadda <ravis.opensrc@gmail.com>
---
 mm/damon/core.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/mm/damon/core.c b/mm/damon/core.c
index 3a8725e400c6b..9975f3d9ebfe9 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -3449,7 +3449,11 @@ int damon_set_region_system_rams_default(struct damon_target *t,
 static unsigned int damon_moving_sum(unsigned int mvsum, unsigned int nomvsum,
 		unsigned int len_window, unsigned int new_value)
 {
-	return mvsum - nomvsum / len_window + new_value;
+	unsigned int subtrahend = nomvsum / len_window;
+
+	if (subtrahend > mvsum)
+		return new_value;
+	return mvsum - subtrahend + new_value;
 }
 
 /**
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 2/5] mm/damon/core: cap effective quota size to total monitored memory
  2026-05-16 21:03 [RFC PATCH 0/5] mm/damon: DAMOS quota controller and paddr migration walk fixes Ravi Jonnalagadda
  2026-05-16 21:03 ` [RFC PATCH 1/5] mm/damon/core: fix nr_accesses_bp underflow in damon_moving_sum Ravi Jonnalagadda
@ 2026-05-16 21:03 ` Ravi Jonnalagadda
  2026-05-17 18:36   ` SeongJae Park
  2026-05-16 21:03 ` [RFC PATCH 3/5] mm/damon/core: floor effective quota size at minimum region size Ravi Jonnalagadda
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 17+ messages in thread
From: Ravi Jonnalagadda @ 2026-05-16 21:03 UTC (permalink / raw)
  To: sj, damon, linux-mm, linux-kernel, linux-doc
  Cc: akpm, corbet, bijan311, ajayjoshi, honggyu.kim, yunjeong.mun,
	ravis.opensrc

The DAMOS quota goal tuner can compute an effective size (esz) larger
than the total monitored memory because it integrates over cumulative
deltas without bounding by the actual workload size.  Once esz exceeds
total monitored memory, the per-tick "remaining quota" arithmetic
stops being meaningful: any scheme can apply to the entire monitored
space and "remaining" stays positive indefinitely.

Cap esz to the total size of all currently monitored regions as a
final bound after all other quota calculations.  Add
damon_ctx_total_monitored_sz() helper that sums region sizes across
all targets.

The helper runs only inside damos_set_effective_quota(), which is
called at most once per quota reset_interval (default 1s) per scheme,
not per kdamond tick.  Walk cost is O(nr_regions) at that frequency
and is dominated by the enclosing tuner work.

This bound is tuner-shape and goal-metric agnostic: it constrains the
quota controller to physically realisable values regardless of which
tuner or goal metric drives it.

Signed-off-by: Ravi Jonnalagadda <ravis.opensrc@gmail.com>
---
 mm/damon/core.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/mm/damon/core.c b/mm/damon/core.c
index 9975f3d9ebfe9..fd1db234ca304 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -2614,6 +2614,19 @@ static void damos_goal_tune_esz_bp_temporal(struct damon_ctx *c,
 		quota->esz_bp = ULONG_MAX;
 }
 
+/* Sum of all monitored region sizes across all targets in @ctx. */
+static unsigned long damon_ctx_total_monitored_sz(struct damon_ctx *ctx)
+{
+	struct damon_target *t;
+	struct damon_region *r;
+	unsigned long total = 0;
+
+	damon_for_each_target(t, ctx)
+		damon_for_each_region(r, t)
+			total += damon_sz_region(r);
+	return total;
+}
+
 /*
  * Called only if quota->ms, or quota->sz are set, or quota->goals is not empty
  */
@@ -2621,6 +2634,7 @@ static void damos_set_effective_quota(struct damon_ctx *ctx, struct damos *s)
 {
 	struct damos_quota *quota = &s->quota;
 	unsigned long throughput;
+	unsigned long total_sz;
 	unsigned long esz = ULONG_MAX;
 
 	if (!quota->ms && list_empty(&quota->goals)) {
@@ -2649,6 +2663,11 @@ static void damos_set_effective_quota(struct damon_ctx *ctx, struct damos *s)
 	if (quota->sz && quota->sz < esz)
 		esz = quota->sz;
 
+	/* Safety cap: never migrate more than total monitored memory */
+	total_sz = damon_ctx_total_monitored_sz(ctx);
+	if (total_sz && esz > total_sz)
+		esz = total_sz;
+
 	quota->esz = esz;
 }
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 3/5] mm/damon/core: floor effective quota size at minimum region size
  2026-05-16 21:03 [RFC PATCH 0/5] mm/damon: DAMOS quota controller and paddr migration walk fixes Ravi Jonnalagadda
  2026-05-16 21:03 ` [RFC PATCH 1/5] mm/damon/core: fix nr_accesses_bp underflow in damon_moving_sum Ravi Jonnalagadda
  2026-05-16 21:03 ` [RFC PATCH 2/5] mm/damon/core: cap effective quota size to total monitored memory Ravi Jonnalagadda
@ 2026-05-16 21:03 ` Ravi Jonnalagadda
  2026-05-17 18:47   ` SeongJae Park
  2026-05-16 21:03 ` [RFC PATCH 4/5] mm/damon/paddr: skip free pageblocks in migration walk Ravi Jonnalagadda
  2026-05-16 21:03 ` [RFC PATCH 5/5] mm/damon/paddr: add time budget to migration page walk Ravi Jonnalagadda
  4 siblings, 1 reply; 17+ messages in thread
From: Ravi Jonnalagadda @ 2026-05-16 21:03 UTC (permalink / raw)
  To: sj, damon, linux-mm, linux-kernel, linux-doc
  Cc: akpm, corbet, bijan311, ajayjoshi, honggyu.kim, yunjeong.mun,
	ravis.opensrc

The CONSIST quota goal tuner initializes esz_bp to 0, producing an
effective quota size (esz) of 1 byte on the first tick.
damos_quota_is_full() rejects all regions when esz < min_region_sz
(default PAGE_SIZE = 4096), so no regions can be tried and no
feedback reaches the tuner — a bootstrapping deadlock.

Floor esz at ctx->min_region_sz after the tuner computes it, guarded
by an esz != 0 check.  The guard preserves the temporal tuner's
intentional stop behavior: when score >= 10000 (goal met), temporal
sets esz_bp = 0 to halt migration; the floor must not override that.

Signed-off-by: Ravi Jonnalagadda <ravis.opensrc@gmail.com>
---
 mm/damon/core.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/mm/damon/core.c b/mm/damon/core.c
index fd1db234ca304..d33c4360cbd60 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -2650,6 +2650,10 @@ static void damos_set_effective_quota(struct damon_ctx *ctx, struct damos *s)
 		esz = quota->esz_bp / 10000;
 	}
 
+	/* avoid cold-start deadlock, but respect tuner stop signal (esz=0) */
+	if (esz)
+		esz = max_t(unsigned long, esz, ctx->min_region_sz);
+
 	if (quota->ms) {
 		if (quota->total_charged_ns)
 			throughput = mult_frac(quota->total_charged_sz,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 4/5] mm/damon/paddr: skip free pageblocks in migration walk
  2026-05-16 21:03 [RFC PATCH 0/5] mm/damon: DAMOS quota controller and paddr migration walk fixes Ravi Jonnalagadda
                   ` (2 preceding siblings ...)
  2026-05-16 21:03 ` [RFC PATCH 3/5] mm/damon/core: floor effective quota size at minimum region size Ravi Jonnalagadda
@ 2026-05-16 21:03 ` Ravi Jonnalagadda
  2026-05-17 23:37   ` SeongJae Park
  2026-05-16 21:03 ` [RFC PATCH 5/5] mm/damon/paddr: add time budget to migration page walk Ravi Jonnalagadda
  4 siblings, 1 reply; 17+ messages in thread
From: Ravi Jonnalagadda @ 2026-05-16 21:03 UTC (permalink / raw)
  To: sj, damon, linux-mm, linux-kernel, linux-doc
  Cc: akpm, corbet, bijan311, ajayjoshi, honggyu.kim, yunjeong.mun,
	ravis.opensrc

damon_pa_migrate() walks every PFN in a region linearly, calling
damon_get_folio() for each one.  On sparse physical address spaces
(e.g., CXL-attached memory), a single DAMON region can span hundreds
of gigabytes where most memory is free and sitting in the buddy
allocator.  Most page lookups are fruitless and dominate kdamond
tick time.

Check at pageblock boundaries (2MB on x86_64) whether the block is
entirely free.  If the first page of a pageblock is a buddy page at
pageblock_order or higher, the entire block is free and can be
skipped.  Similarly skip pageblocks where pfn_to_online_page() returns
NULL.

This reduces the iteration from O(region_sz / PAGE_SIZE) to
O(region_sz / pageblock_sz) + O(populated_pages).

buddy_order_unsafe() is used without zone->lock.  A transient false
positive (block becomes non-free between the PageBuddy and order
checks) costs at most one tick of missed candidates on that block;
the next tick re-scans.  No correctness consequence as DAMON walks
are best-effort.

Signed-off-by: Ravi Jonnalagadda <ravis.opensrc@gmail.com>
---
 mm/damon/paddr.c | 25 ++++++++++++++++++++++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/mm/damon/paddr.c b/mm/damon/paddr.c
index c4738cd5e221e..e844c990987b9 100644
--- a/mm/damon/paddr.c
+++ b/mm/damon/paddr.c
@@ -258,13 +258,32 @@ static unsigned long damon_pa_migrate(struct damon_region *r,
 		unsigned long addr_unit, struct damos *s,
 		unsigned long *sz_filter_passed)
 {
-	phys_addr_t addr, applied;
+	phys_addr_t addr, end, applied;
 	LIST_HEAD(folio_list);
 	struct folio *folio = NULL;
+	unsigned long pfn;
 
 	addr = damon_pa_phys_addr(r->ar.start, addr_unit);
-	while (addr < damon_pa_phys_addr(r->ar.end, addr_unit)) {
-		folio = damon_get_folio(PHYS_PFN(addr));
+	end = damon_pa_phys_addr(r->ar.end, addr_unit);
+	while (addr < end) {
+		pfn = PHYS_PFN(addr);
+
+		/* Skip pageblocks that are entirely free. */
+		if (IS_ALIGNED(pfn, pageblock_nr_pages)) {
+			struct page *page = pfn_to_online_page(pfn);
+
+			if (!page) {
+				addr += pageblock_nr_pages * PAGE_SIZE;
+				continue;
+			}
+			if (PageBuddy(page) &&
+			    buddy_order_unsafe(page) >= pageblock_order) {
+				addr += pageblock_nr_pages * PAGE_SIZE;
+				continue;
+			}
+		}
+
+		folio = damon_get_folio(pfn);
 		if (damon_pa_invalid_damos_folio(folio, s)) {
 			addr += PAGE_SIZE;
 			continue;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 5/5] mm/damon/paddr: add time budget to migration page walk
  2026-05-16 21:03 [RFC PATCH 0/5] mm/damon: DAMOS quota controller and paddr migration walk fixes Ravi Jonnalagadda
                   ` (3 preceding siblings ...)
  2026-05-16 21:03 ` [RFC PATCH 4/5] mm/damon/paddr: skip free pageblocks in migration walk Ravi Jonnalagadda
@ 2026-05-16 21:03 ` Ravi Jonnalagadda
  2026-05-17 23:43   ` SeongJae Park
  4 siblings, 1 reply; 17+ messages in thread
From: Ravi Jonnalagadda @ 2026-05-16 21:03 UTC (permalink / raw)
  To: sj, damon, linux-mm, linux-kernel, linux-doc
  Cc: akpm, corbet, bijan311, ajayjoshi, honggyu.kim, yunjeong.mun,
	ravis.opensrc

On populated physical address ranges the pageblock skip optimization
alone is insufficient — most pageblocks contain at least one allocated
page, so the walk still iterates millions of PFNs.

Add a 100ms wall-clock time budget to damon_pa_migrate().  Once the
deadline is reached, the walk breaks out and migrates whatever folios
have been collected so far.

The time check is amortized by only calling ktime_get() every 4096
pages (~16MB of address space), adding negligible overhead to the
fast path.

Signed-off-by: Ravi Jonnalagadda <ravis.opensrc@gmail.com>
---
 mm/damon/paddr.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/mm/damon/paddr.c b/mm/damon/paddr.c
index e844c990987b9..a2565287bc10f 100644
--- a/mm/damon/paddr.c
+++ b/mm/damon/paddr.c
@@ -14,6 +14,7 @@
 #include <linux/swap.h>
 #include <linux/memory-tiers.h>
 #include <linux/mm_inline.h>
+#include <linux/ktime.h>
 
 #include "../internal.h"
 #include "ops-common.h"
@@ -254,6 +255,14 @@ static unsigned long damon_pa_deactivate_pages(struct damon_region *r,
 	return damon_pa_de_activate(r, addr_unit, s, false, sz_filter_passed);
 }
 
+/* Maximum wall-clock time to spend in a single migration walk (ns) */
+#define DAMON_PA_MIGRATE_BUDGET_NS	(100 * NSEC_PER_MSEC)
+
+/* Check the time budget every 4096 pages (~16MB) to amortize ktime_get(). */
+#define DAMON_PA_MIGRATE_TIME_CHECK_PAGES	4096
+#define DAMON_PA_MIGRATE_TIME_CHECK_MASK	\
+	(DAMON_PA_MIGRATE_TIME_CHECK_PAGES - 1)
+
 static unsigned long damon_pa_migrate(struct damon_region *r,
 		unsigned long addr_unit, struct damos *s,
 		unsigned long *sz_filter_passed)
@@ -262,6 +271,7 @@ static unsigned long damon_pa_migrate(struct damon_region *r,
 	LIST_HEAD(folio_list);
 	struct folio *folio = NULL;
 	unsigned long pfn;
+	ktime_t deadline = ktime_add_ns(ktime_get(), DAMON_PA_MIGRATE_BUDGET_NS);
 
 	addr = damon_pa_phys_addr(r->ar.start, addr_unit);
 	end = damon_pa_phys_addr(r->ar.end, addr_unit);
@@ -283,6 +293,11 @@ static unsigned long damon_pa_migrate(struct damon_region *r,
 			}
 		}
 
+		/* Time budget: keep kdamond responsive on long migration walks. */
+		if (!(pfn & DAMON_PA_MIGRATE_TIME_CHECK_MASK) &&
+		    ktime_after(ktime_get(), deadline))
+			break;
+
 		folio = damon_get_folio(pfn);
 		if (damon_pa_invalid_damos_folio(folio, s)) {
 			addr += PAGE_SIZE;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 1/5] mm/damon/core: fix nr_accesses_bp underflow in damon_moving_sum
  2026-05-16 21:03 ` [RFC PATCH 1/5] mm/damon/core: fix nr_accesses_bp underflow in damon_moving_sum Ravi Jonnalagadda
@ 2026-05-17 18:16   ` SeongJae Park
  0 siblings, 0 replies; 17+ messages in thread
From: SeongJae Park @ 2026-05-17 18:16 UTC (permalink / raw)
  To: Ravi Jonnalagadda
  Cc: SeongJae Park, damon, linux-mm, linux-kernel, linux-doc, akpm,
	corbet, bijan311, ajayjoshi, honggyu.kim, yunjeong.mun

Hello Ravi,

On Sat, 16 May 2026 14:03:53 -0700 Ravi Jonnalagadda <ravis.opensrc@gmail.com> wrote:

> Guard against unsigned integer underflow when nomvsum/len_window
> exceeds mvsum.

How could this happen?  mvsum is assumed to be same to nomvsum at the beginning
of the window.  Hence, even if there is only zero new_value, at the end of the
window, mvsum should be exactly zero.  Of course there could be a bug that
breaks the assumption.

> When that subtraction wraps, the moving sum returns a
> near-ULONG_MAX value and corrupts nr_accesses_bp.
> 
> If subtrahend > mvsum, return new_value: this clamps the moving-sum
> estimate to the current observation rather than wrapping.

I guess you saw this issue in real, and this change should fix the issue.  But
I think we should know why and how mvsum < nomvum / len_window can unexpectedly
happen, and fix that.

Could you share more details about when and how the situation happens?


Thanks,
SJ

[...]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 2/5] mm/damon/core: cap effective quota size to total monitored memory
  2026-05-16 21:03 ` [RFC PATCH 2/5] mm/damon/core: cap effective quota size to total monitored memory Ravi Jonnalagadda
@ 2026-05-17 18:36   ` SeongJae Park
  2026-05-18  5:22     ` Ravi Jonnalagadda
  0 siblings, 1 reply; 17+ messages in thread
From: SeongJae Park @ 2026-05-17 18:36 UTC (permalink / raw)
  To: Ravi Jonnalagadda
  Cc: SeongJae Park, damon, linux-mm, linux-kernel, linux-doc, akpm,
	corbet, bijan311, ajayjoshi, honggyu.kim, yunjeong.mun

Hello Ravi,

On Sat, 16 May 2026 14:03:54 -0700 Ravi Jonnalagadda <ravis.opensrc@gmail.com> wrote:

> The DAMOS quota goal tuner can compute an effective size (esz) larger
> than the total monitored memory because it integrates over cumulative
> deltas without bounding by the actual workload size.  Once esz exceeds
> total monitored memory, the per-tick "remaining quota" arithmetic
> stops being meaningful: any scheme can apply to the entire monitored
> space and "remaining" stays positive indefinitely.

Nice finding!

> 
> Cap esz to the total size of all currently monitored regions as a
> final bound after all other quota calculations.  Add
> damon_ctx_total_monitored_sz() helper that sums region sizes across
> all targets.

You could also make an arbitrary cap by setting the static size quota.  That
is, if there are not only quota goal but also the size quota and/or time quota,
and the different types of quotas disagree about the real quota, DAMOS uses
smallest quota.  You could read damos_set_effective_quota() code and kernel-doc
comment of 'struct damos_quota' for more details.

So you could apply the total monitoring region size cap by setting the size
quota of the total monitoring region size.  Could that work for you?

Adding the total monitoring region size cap makes sense to me, and I think that
will make user experience better.  But, if the size quota based cap works, that
could also be handled on user space in an easier and even a betetr way.  If so,
I'd prefer the direction, to reduce kernel code complexity.  What do you think?


Thanks,
SJ

[...]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 3/5] mm/damon/core: floor effective quota size at minimum region size
  2026-05-16 21:03 ` [RFC PATCH 3/5] mm/damon/core: floor effective quota size at minimum region size Ravi Jonnalagadda
@ 2026-05-17 18:47   ` SeongJae Park
  0 siblings, 0 replies; 17+ messages in thread
From: SeongJae Park @ 2026-05-17 18:47 UTC (permalink / raw)
  To: Ravi Jonnalagadda
  Cc: SeongJae Park, damon, linux-mm, linux-kernel, linux-doc, akpm,
	corbet, bijan311, ajayjoshi, honggyu.kim, yunjeong.mun

On Sat, 16 May 2026 14:03:55 -0700 Ravi Jonnalagadda <ravis.opensrc@gmail.com> wrote:

> The CONSIST quota goal tuner initializes esz_bp to 0, producing an
> effective quota size (esz) of 1 byte on the first tick.
> damos_quota_is_full() rejects all regions when esz < min_region_sz
> (default PAGE_SIZE = 4096), so no regions can be tried and no
> feedback reaches the tuner — a bootstrapping deadlock.

That depend on whether the goal is already [over]-achieved.  If the goal is
achieved, the tuner will think no change is needed, so keep the
effectively-zero quota.  If the goal is over-achived, the tuner will think the
DAMOS scheme should be less aggressive, but it is already effectively-zero
quota, so keep having effectively-zero quota.

If the ogal is under-achived, the logic will iteratively increase the internal
esz (esz_bp), until it exceeds the min_region_sz, and finally start making some
effects.

So, unless the goal is already [over]-achieved, there is no deadlock.  If the
goal is already [over]-achieved, why we would want to make DAMOS do something?

Am I missing something?

I'd like to discuss this high level thing first, before digging deep into the
details.


Thanks,
SJ

[...]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 4/5] mm/damon/paddr: skip free pageblocks in migration walk
  2026-05-16 21:03 ` [RFC PATCH 4/5] mm/damon/paddr: skip free pageblocks in migration walk Ravi Jonnalagadda
@ 2026-05-17 23:37   ` SeongJae Park
  2026-05-18  5:38     ` Ravi Jonnalagadda
  0 siblings, 1 reply; 17+ messages in thread
From: SeongJae Park @ 2026-05-17 23:37 UTC (permalink / raw)
  To: Ravi Jonnalagadda
  Cc: SeongJae Park, damon, linux-mm, linux-kernel, linux-doc, akpm,
	corbet, bijan311, ajayjoshi, honggyu.kim, yunjeong.mun

On Sat, 16 May 2026 14:03:56 -0700 Ravi Jonnalagadda <ravis.opensrc@gmail.com> wrote:

> damon_pa_migrate() walks every PFN in a region linearly, calling
> damon_get_folio() for each one.  On sparse physical address spaces
> (e.g., CXL-attached memory), a single DAMON region can span hundreds
> of gigabytes where most memory is free and sitting in the buddy
> allocator.  Most page lookups are fruitless and dominate kdamond
> tick time.

On sparse address spaces, the problem would be large DAMON regions of offlined
memory.  The large DAMON regions that nearly all freed memory is another
problem that doesn't require the sparse address spaces.  If I'm not wrong, the
above paragraph could better clarified in my opinion.

> 
> Check at pageblock boundaries (2MB on x86_64) whether the block is
> entirely free.  If the first page of a pageblock is a buddy page at
> pageblock_order or higher, the entire block is free and can be
> skipped.
> Similarly skip pageblocks where pfn_to_online_page() returns
> NULL.
> 
> This reduces the iteration from O(region_sz / PAGE_SIZE) to
> O(region_sz / pageblock_sz) + O(populated_pages).
> 
> buddy_order_unsafe() is used without zone->lock.  A transient false
> positive (block becomes non-free between the PageBuddy and order
> checks) costs at most one tick of missed candidates on that block;
> the next tick re-scans.  No correctness consequence as DAMON walks
> are best-effort.

I was initially thinking this is a good and reasonable optimization approach.
But on the second thought I get below questions.

For large offlined memory space problem, couldn't we simply tune DAMON's
monitoring regions boundary to ignore the holes?

For large free memory area, is it reasonable to assume such situations?  In
production, users will try to utilize as much memory of the system as possible.
Then, wouldn't there be such problematically large free memory area?

Could you please enlighten me?

I will hold digging deep until this high level questions are answered.


Thanks,
SJ

[...]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 5/5] mm/damon/paddr: add time budget to migration page walk
  2026-05-16 21:03 ` [RFC PATCH 5/5] mm/damon/paddr: add time budget to migration page walk Ravi Jonnalagadda
@ 2026-05-17 23:43   ` SeongJae Park
  2026-05-18  5:54     ` Ravi Jonnalagadda
  0 siblings, 1 reply; 17+ messages in thread
From: SeongJae Park @ 2026-05-17 23:43 UTC (permalink / raw)
  To: Ravi Jonnalagadda
  Cc: SeongJae Park, damon, linux-mm, linux-kernel, linux-doc, akpm,
	corbet, bijan311, ajayjoshi, honggyu.kim, yunjeong.mun

On Sat, 16 May 2026 14:03:57 -0700 Ravi Jonnalagadda <ravis.opensrc@gmail.com> wrote:

> On populated physical address ranges the pageblock skip optimization
> alone is insufficient — most pageblocks contain at least one allocated
> page, so the walk still iterates millions of PFNs.

So my questions to the fourth patch of this series are also applied here,
especially about the assumption of systems having most memory free.  I will
hold digging deep here until the high level discussion is completed.


Thanks,
SJ

[...]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 2/5] mm/damon/core: cap effective quota size to total monitored memory
  2026-05-17 18:36   ` SeongJae Park
@ 2026-05-18  5:22     ` Ravi Jonnalagadda
  2026-05-19  0:38       ` SeongJae Park
  0 siblings, 1 reply; 17+ messages in thread
From: Ravi Jonnalagadda @ 2026-05-18  5:22 UTC (permalink / raw)
  To: SeongJae Park
  Cc: damon, linux-mm, linux-kernel, linux-doc, akpm, corbet, bijan311,
	ajayjoshi, honggyu.kim, yunjeong.mun

On Sun, May 17, 2026 at 11:37 AM SeongJae Park <sj@kernel.org> wrote:
>
> Hello Ravi,
>
> On Sat, 16 May 2026 14:03:54 -0700 Ravi Jonnalagadda <ravis.opensrc@gmail.com> wrote:
>
> > The DAMOS quota goal tuner can compute an effective size (esz) larger
> > than the total monitored memory because it integrates over cumulative
> > deltas without bounding by the actual workload size.  Once esz exceeds
> > total monitored memory, the per-tick "remaining quota" arithmetic
> > stops being meaningful: any scheme can apply to the entire monitored
> > space and "remaining" stays positive indefinitely.
>
> Nice finding!
>
> >
> > Cap esz to the total size of all currently monitored regions as a
> > final bound after all other quota calculations.  Add
> > damon_ctx_total_monitored_sz() helper that sums region sizes across
> > all targets.
>
> You could also make an arbitrary cap by setting the static size quota.  That
> is, if there are not only quota goal but also the size quota and/or time quota,
> and the different types of quotas disagree about the real quota, DAMOS uses
> smallest quota.  You could read damos_set_effective_quota() code and kernel-doc
> comment of 'struct damos_quota' for more details.
>
> So you could apply the total monitoring region size cap by setting the size
> quota of the total monitoring region size.  Could that work for you?
>
> Adding the total monitoring region size cap makes sense to me, and I think that
> will make user experience better.  But, if the size quota based cap works, that
> could also be handled on user space in an easier and even a betetr way.  If so,
> I'd prefer the direction, to reduce kernel code complexity.  What do you think?

Hello SJ,

Agreed.  quota->sz combined with the smallest-quota-wins rule in
damos_set_effective_quota does express this cap from userspace
without kernel changes, and keeping the kernel side clean is the
right call.

If the UX argument carries weight later, I'm happy to respin v2
with sashiko fixes addressed.

Thanks,
Ravi

>
>
> Thanks,
> SJ
>
> [...]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 4/5] mm/damon/paddr: skip free pageblocks in migration walk
  2026-05-17 23:37   ` SeongJae Park
@ 2026-05-18  5:38     ` Ravi Jonnalagadda
  2026-05-19  1:14       ` SeongJae Park
  0 siblings, 1 reply; 17+ messages in thread
From: Ravi Jonnalagadda @ 2026-05-18  5:38 UTC (permalink / raw)
  To: SeongJae Park
  Cc: damon, linux-mm, linux-kernel, linux-doc, akpm, corbet, bijan311,
	ajayjoshi, honggyu.kim, yunjeong.mun

On Sun, May 17, 2026 at 4:38 PM SeongJae Park <sj@kernel.org> wrote:
>
> On Sat, 16 May 2026 14:03:56 -0700 Ravi Jonnalagadda <ravis.opensrc@gmail.com> wrote:
>
> > damon_pa_migrate() walks every PFN in a region linearly, calling
> > damon_get_folio() for each one.  On sparse physical address spaces
> > (e.g., CXL-attached memory), a single DAMON region can span hundreds
> > of gigabytes where most memory is free and sitting in the buddy
> > allocator.  Most page lookups are fruitless and dominate kdamond
> > tick time.
>
> On sparse address spaces, the problem would be large DAMON regions of offlined
> memory.  The large DAMON regions that nearly all freed memory is another
> problem that doesn't require the sparse address spaces.  If I'm not wrong, the
> above paragraph could better clarified in my opinion.
>
> >
> > Check at pageblock boundaries (2MB on x86_64) whether the block is
> > entirely free.  If the first page of a pageblock is a buddy page at
> > pageblock_order or higher, the entire block is free and can be
> > skipped.
> > Similarly skip pageblocks where pfn_to_online_page() returns
> > NULL.
> >
> > This reduces the iteration from O(region_sz / PAGE_SIZE) to
> > O(region_sz / pageblock_sz) + O(populated_pages).
> >
> > buddy_order_unsafe() is used without zone->lock.  A transient false
> > positive (block becomes non-free between the PageBuddy and order
> > checks) costs at most one tick of missed candidates on that block;
> > the next tick re-scans.  No correctness consequence as DAMON walks
> > are best-effort.
>
> I was initially thinking this is a good and reasonable optimization approach.
> But on the second thought I get below questions.
>
> For large offlined memory space problem, couldn't we simply tune DAMON's
> monitoring regions boundary to ignore the holes?
>
> For large free memory area, is it reasonable to assume such situations?  In
> production, users will try to utilize as much memory of the system as possible.
> Then, wouldn't there be such problematically large free memory area?
>
> Could you please enlighten me?
>

Hi SJ,

You're right on the first point.  For static offlined memory
holes (memory hotplug gaps, partial socket population, etc.) the
right answer is configuring the monitoring region boundaries to
exclude them upfront, not making the walk skip them at runtime.
The changelog is clearer if I narrow the patch to the free-but-
online case.

On the free-online case: I agree large free memory areas are
not the steady state on a fully-utilized system.  The cases I
had in mind are more limited:

   - A workload using a small part of a much larger range, with
      the rest left as headroom (e.g. 64 GB used of a 512 GB
      range).

  - Shared tiers where workloads are allocated and freed on their own
    timelines.  Any single piece of free memory doesn't last
    long, but on a busy system there's typically a meaningful
    free fraction in the range at any point -- especially on a
    slower tier, where workloads prefer faster memory first
    when it's available.

The patch as written is a narrow optimization for those cases:
the pageblock-aligned check is one extra read per
pageblock_nr_pages PFNs (about 1 per 512 on x86_64), so it's
effectively a no-op when the region is fully populated.

If you don't see those workloads as warranting the change, I'm
happy to drop the patch.  If the framing is the issue more than
the change itself, I can respin a v2 with:

  - the changelog narrowed to the free-but-online case (no
    offlined-memory framing);
  - any suggestions from you on sashiko's review comments.

Thanks,
Ravi

> I will hold digging deep until this high level questions are answered.
>
>
> Thanks,
> SJ
>
> [...]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 5/5] mm/damon/paddr: add time budget to migration page walk
  2026-05-17 23:43   ` SeongJae Park
@ 2026-05-18  5:54     ` Ravi Jonnalagadda
  2026-05-19  1:27       ` SeongJae Park
  0 siblings, 1 reply; 17+ messages in thread
From: Ravi Jonnalagadda @ 2026-05-18  5:54 UTC (permalink / raw)
  To: SeongJae Park
  Cc: damon, linux-mm, linux-kernel, linux-doc, akpm, corbet, bijan311,
	ajayjoshi, honggyu.kim, yunjeong.mun

On Sun, May 17, 2026 at 4:43 PM SeongJae Park <sj@kernel.org> wrote:
>
> On Sat, 16 May 2026 14:03:57 -0700 Ravi Jonnalagadda <ravis.opensrc@gmail.com> wrote:
>
> > On populated physical address ranges the pageblock skip optimization
> > alone is insufficient — most pageblocks contain at least one allocated
> > page, so the walk still iterates millions of PFNs.
>
> So my questions to the fourth patch of this series are also applied here,
> especially about the assumption of systems having most memory free.  I will
> hold digging deep here until the high level discussion is completed.
>
Hello SJ,

Stepping back to look at this with fresh eyes, I think this
patch is in the same bucket as patches 1 and 3 (full background
on the patch 3 thread): it came out of the same parallel debug
effort, where I was seeing long walks during the startup
transient on a multi-hundred-GB monitored target -- before
kdamond_split_regions() and damon_apply_min_nr_regions() had
trimmed the initial regions down -- and was unsure whether
those long walks were contributing to the NMI-side
responsiveness issues I was chasing.

Once the actual NMI problem was fixed and the per-region work
in steady state is bounded by DAMON's region splitting (and by
the scheme's quota when one is set), the per-call cost in
damon_pa_migrate() is already small enough that the budget
isn't doing useful work.  cond_resched() after damon_migrate_pages()
covers the preemption case.

If a real workload later shows a per-region walk long
enough to matter, I'll re-evaluate then with concrete numbers.

Thanks,
Ravi

>
> Thanks,
> SJ
>
> [...]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 2/5] mm/damon/core: cap effective quota size to total monitored memory
  2026-05-18  5:22     ` Ravi Jonnalagadda
@ 2026-05-19  0:38       ` SeongJae Park
  0 siblings, 0 replies; 17+ messages in thread
From: SeongJae Park @ 2026-05-19  0:38 UTC (permalink / raw)
  To: Ravi Jonnalagadda
  Cc: SeongJae Park, damon, linux-mm, linux-kernel, linux-doc, akpm,
	corbet, bijan311, ajayjoshi, honggyu.kim, yunjeong.mun

On Sun, 17 May 2026 22:22:34 -0700 Ravi Jonnalagadda <ravis.opensrc@gmail.com> wrote:

> On Sun, May 17, 2026 at 11:37 AM SeongJae Park <sj@kernel.org> wrote:
> >
> > Hello Ravi,
> >
> > On Sat, 16 May 2026 14:03:54 -0700 Ravi Jonnalagadda <ravis.opensrc@gmail.com> wrote:
> >
> > > The DAMOS quota goal tuner can compute an effective size (esz) larger
> > > than the total monitored memory because it integrates over cumulative
> > > deltas without bounding by the actual workload size.  Once esz exceeds
> > > total monitored memory, the per-tick "remaining quota" arithmetic
> > > stops being meaningful: any scheme can apply to the entire monitored
> > > space and "remaining" stays positive indefinitely.
> >
> > Nice finding!
> >
> > >
> > > Cap esz to the total size of all currently monitored regions as a
> > > final bound after all other quota calculations.  Add
> > > damon_ctx_total_monitored_sz() helper that sums region sizes across
> > > all targets.
> >
> > You could also make an arbitrary cap by setting the static size quota.  That
> > is, if there are not only quota goal but also the size quota and/or time quota,
> > and the different types of quotas disagree about the real quota, DAMOS uses
> > smallest quota.  You could read damos_set_effective_quota() code and kernel-doc
> > comment of 'struct damos_quota' for more details.
> >
> > So you could apply the total monitoring region size cap by setting the size
> > quota of the total monitoring region size.  Could that work for you?
> >
> > Adding the total monitoring region size cap makes sense to me, and I think that
> > will make user experience better.  But, if the size quota based cap works, that
> > could also be handled on user space in an easier and even a betetr way.  If so,
> > I'd prefer the direction, to reduce kernel code complexity.  What do you think?
> 
> Hello SJ,
> 
> Agreed.  quota->sz combined with the smallest-quota-wins rule in
> damos_set_effective_quota does express this cap from userspace
> without kernel changes, and keeping the kernel side clean is the
> right call.
> 
> If the UX argument carries weight later, I'm happy to respin v2
> with sashiko fixes addressed.

Makes sense.  I find no change on the weight for now.  If someone else
including myself or you in the future claims again, we could revisit.


Thanks,
SJ

[...]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 4/5] mm/damon/paddr: skip free pageblocks in migration walk
  2026-05-18  5:38     ` Ravi Jonnalagadda
@ 2026-05-19  1:14       ` SeongJae Park
  0 siblings, 0 replies; 17+ messages in thread
From: SeongJae Park @ 2026-05-19  1:14 UTC (permalink / raw)
  To: Ravi Jonnalagadda
  Cc: SeongJae Park, damon, linux-mm, linux-kernel, linux-doc, akpm,
	corbet, bijan311, ajayjoshi, honggyu.kim, yunjeong.mun

On Sun, 17 May 2026 22:38:51 -0700 Ravi Jonnalagadda <ravis.opensrc@gmail.com> wrote:

> On Sun, May 17, 2026 at 4:38 PM SeongJae Park <sj@kernel.org> wrote:
> >
> > On Sat, 16 May 2026 14:03:56 -0700 Ravi Jonnalagadda <ravis.opensrc@gmail.com> wrote:
> >
> > > damon_pa_migrate() walks every PFN in a region linearly, calling
> > > damon_get_folio() for each one.  On sparse physical address spaces
> > > (e.g., CXL-attached memory), a single DAMON region can span hundreds
> > > of gigabytes where most memory is free and sitting in the buddy
> > > allocator.  Most page lookups are fruitless and dominate kdamond
> > > tick time.
> >
> > On sparse address spaces, the problem would be large DAMON regions of offlined
> > memory.  The large DAMON regions that nearly all freed memory is another
> > problem that doesn't require the sparse address spaces.  If I'm not wrong, the
> > above paragraph could better clarified in my opinion.
> >
> > >
> > > Check at pageblock boundaries (2MB on x86_64) whether the block is
> > > entirely free.  If the first page of a pageblock is a buddy page at
> > > pageblock_order or higher, the entire block is free and can be
> > > skipped.
> > > Similarly skip pageblocks where pfn_to_online_page() returns
> > > NULL.
> > >
> > > This reduces the iteration from O(region_sz / PAGE_SIZE) to
> > > O(region_sz / pageblock_sz) + O(populated_pages).
> > >
> > > buddy_order_unsafe() is used without zone->lock.  A transient false
> > > positive (block becomes non-free between the PageBuddy and order
> > > checks) costs at most one tick of missed candidates on that block;
> > > the next tick re-scans.  No correctness consequence as DAMON walks
> > > are best-effort.
> >
> > I was initially thinking this is a good and reasonable optimization approach.
> > But on the second thought I get below questions.
> >
> > For large offlined memory space problem, couldn't we simply tune DAMON's
> > monitoring regions boundary to ignore the holes?
> >
> > For large free memory area, is it reasonable to assume such situations?  In
> > production, users will try to utilize as much memory of the system as possible.
> > Then, wouldn't there be such problematically large free memory area?
> >
> > Could you please enlighten me?
> >
> 
> Hi SJ,
> 
> You're right on the first point.  For static offlined memory
> holes (memory hotplug gaps, partial socket population, etc.) the
> right answer is configuring the monitoring region boundaries to
> exclude them upfront, not making the walk skip them at runtime.
> The changelog is clearer if I narrow the patch to the free-but-
> online case.

Thank you for clarifying, Ravi.

> 
> On the free-online case: I agree large free memory areas are
> not the steady state on a fully-utilized system.  The cases I
> had in mind are more limited:
> 
>    - A workload using a small part of a much larger range, with
>       the rest left as headroom (e.g. 64 GB used of a 512 GB
>       range).

Why would the user have that large amount of headroom?

> 
>   - Shared tiers where workloads are allocated and freed on their own
>     timelines.  Any single piece of free memory doesn't last
>     long, but on a busy system there's typically a meaningful
>     free fraction in the range at any point -- especially on a
>     slower tier, where workloads prefer faster memory first
>     when it's available.

I agree there could be reasonable amount of free memory.  But, I'm still not
feeling difficult to know would that be big enough to cause the issue in DAMOS.

> 
> The patch as written is a narrow optimization for those cases:
> the pageblock-aligned check is one extra read per
> pageblock_nr_pages PFNs (about 1 per 512 on x86_64), so it's
> effectively a no-op when the region is fully populated.
> 
> If you don't see those workloads as warranting the change, I'm
> happy to drop the patch.  If the framing is the issue more than
> the change itself, I can respin a v2 with:
> 
>   - the changelog narrowed to the free-but-online case (no
>     offlined-memory framing);
>   - any suggestions from you on sashiko's review comments.

I think your arguments make sense in general.  But I'm still not quite sure
what is the realistic size of the problem, so difficult to judge.  Having a
clearer and detailed use case and backing data would be nice.

I also got a little and trivial concern for this approach.  DAMOS quota system
assumes the cost of applying DAMOS action will be proportional to the size of
memory it is applied for.  After this patch is applied, the cost will depend on
amount of free or offline memory in the memory.  It might make users difficult
to predict the overhead of DAMOS.  I might be too picky and hallucinated, but
to be honest I'm not feeling 100% comfortable with this change.

For long term, we are working on extending DAMON for general data attributes
monitoring.  I pretty sure you also aware of that.  The v1 [1] is just added to
mm-new for more testing.  It is currently supporting anon page and belinging
memory cgroup attributes.  I'm planning to extend that a lot.  In future, DAMOS
might be able to target and filter memory based on the attributes monitoring
results.  Then, we may be able to extend it for monitoring online or freeness
of the memory and ask DAMOS to filter out or de-prioritize memory regions
having high proportion of free or offline memory.

So, long story short, I'd suggest to revisit this after a clear use case and
real problem is found, unless we have it right now.

[1] https://lore.kernel.org/20260518234119.97569-1-sj@kernel.org


Thanks,
SJ

[...]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 5/5] mm/damon/paddr: add time budget to migration page walk
  2026-05-18  5:54     ` Ravi Jonnalagadda
@ 2026-05-19  1:27       ` SeongJae Park
  0 siblings, 0 replies; 17+ messages in thread
From: SeongJae Park @ 2026-05-19  1:27 UTC (permalink / raw)
  To: Ravi Jonnalagadda
  Cc: SeongJae Park, damon, linux-mm, linux-kernel, linux-doc, akpm,
	corbet, bijan311, ajayjoshi, honggyu.kim, yunjeong.mun

On Sun, 17 May 2026 22:54:18 -0700 Ravi Jonnalagadda <ravis.opensrc@gmail.com> wrote:

> On Sun, May 17, 2026 at 4:43 PM SeongJae Park <sj@kernel.org> wrote:
> >
> > On Sat, 16 May 2026 14:03:57 -0700 Ravi Jonnalagadda <ravis.opensrc@gmail.com> wrote:
> >
> > > On populated physical address ranges the pageblock skip optimization
> > > alone is insufficient — most pageblocks contain at least one allocated
> > > page, so the walk still iterates millions of PFNs.
> >
> > So my questions to the fourth patch of this series are also applied here,
> > especially about the assumption of systems having most memory free.  I will
> > hold digging deep here until the high level discussion is completed.
> >
> Hello SJ,
> 
> Stepping back to look at this with fresh eyes, I think this
> patch is in the same bucket as patches 1 and 3 (full background
> on the patch 3 thread): it came out of the same parallel debug
> effort, where I was seeing long walks during the startup
> transient on a multi-hundred-GB monitored target -- before
> kdamond_split_regions() and damon_apply_min_nr_regions() had
> trimmed the initial regions down -- and was unsure whether
> those long walks were contributing to the NMI-side
> responsiveness issues I was chasing.
> 
> Once the actual NMI problem was fixed and the per-region work
> in steady state is bounded by DAMON's region splitting (and by
> the scheme's quota when one is set), the per-call cost in
> damon_pa_migrate() is already small enough that the budget
> isn't doing useful work.  cond_resched() after damon_migrate_pages()
> covers the preemption case.
> 
> If a real workload later shows a per-region walk long
> enough to matter, I'll re-evaluate then with concrete numbers.

Sounds good!

FYI, many parts of DAMON are designed assuming it will be used on production
environments that have long-running workload and prefer stability.  It helps
making good results in long run, but also make it difficult to understand it in
short term, especially on lab environments.

I learned that by grateful users including you, and therefore recently
developed the multiple quota tuning logics and failed regions charge ratio.  I
feel like such DAMON limitation has contributed to this case to confuse you.
Sorry if that was the case, and please feel free to share your pain points and
improvement ideas.  Every user's use case including yours does matter!


Thanks,
SJ

[...]

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2026-05-19  1:28 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-16 21:03 [RFC PATCH 0/5] mm/damon: DAMOS quota controller and paddr migration walk fixes Ravi Jonnalagadda
2026-05-16 21:03 ` [RFC PATCH 1/5] mm/damon/core: fix nr_accesses_bp underflow in damon_moving_sum Ravi Jonnalagadda
2026-05-17 18:16   ` SeongJae Park
2026-05-16 21:03 ` [RFC PATCH 2/5] mm/damon/core: cap effective quota size to total monitored memory Ravi Jonnalagadda
2026-05-17 18:36   ` SeongJae Park
2026-05-18  5:22     ` Ravi Jonnalagadda
2026-05-19  0:38       ` SeongJae Park
2026-05-16 21:03 ` [RFC PATCH 3/5] mm/damon/core: floor effective quota size at minimum region size Ravi Jonnalagadda
2026-05-17 18:47   ` SeongJae Park
2026-05-16 21:03 ` [RFC PATCH 4/5] mm/damon/paddr: skip free pageblocks in migration walk Ravi Jonnalagadda
2026-05-17 23:37   ` SeongJae Park
2026-05-18  5:38     ` Ravi Jonnalagadda
2026-05-19  1:14       ` SeongJae Park
2026-05-16 21:03 ` [RFC PATCH 5/5] mm/damon/paddr: add time budget to migration page walk Ravi Jonnalagadda
2026-05-17 23:43   ` SeongJae Park
2026-05-18  5:54     ` Ravi Jonnalagadda
2026-05-19  1:27       ` SeongJae Park

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox