From: Ravi Jonnalagadda <ravis.opensrc@gmail.com>
To: sj@kernel.org, damon@lists.linux.dev, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org
Cc: akpm@linux-foundation.org, corbet@lwn.net, bijan311@gmail.com,
ajayjoshi@micron.com, honggyu.kim@sk.com, yunjeong.mun@sk.com,
ravis.opensrc@gmail.com
Subject: [RFC PATCH 0/5] mm/damon: DAMOS quota controller and paddr migration walk fixes
Date: Sat, 16 May 2026 14:03:52 -0700 [thread overview]
Message-ID: <20260516210357.2247-1-ravis.opensrc@gmail.com> (raw)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Hi,
This series carries five fixes for the DAMOS quota controller and the
paddr migration walk. All five were surfaced during closed-loop tiering
testing on a heterogeneous-memory system (DRAM + CXL on a separate
NUMA node), but each fix is in code paths that benefit any caller --
not scoped to closed-loop tiering or to any specific goal metric.
Test envelope: AMD EPYC dual-socket host with CXL.mem on a separate
NUMA node, two-scheme migrate_hot PULL+PUSH setup driven by
node_eligible_mem_bp (now in linux-next)[1].
What each patch does
====================
Patch 1 - mm/damon/core: fix nr_accesses_bp underflow in damon_moving_sum
damon_moving_sum() can underflow when a region's access rate drops
to zero faster than the moving-sum window length. The internal
accumulator subtracts an outgoing sample without a lower bound,
producing a sentinel-large nr_accesses_bp that mis-classifies a
cold region as hot. Affects every DAMOS scheme, since
nr_accesses_bp is used on every access-rate update for every
region regardless of which scheme or goal metric is active.
Patch 2 - mm/damon/core: cap effective quota size to total monitored memory
The DAMOS quota tuner can compute an effective size (esz) larger
than the total monitored memory, because the tuner integrates over
cumulative deltas without bounding by the actual workload size.
Once esz exceeds total monitored memory the per-tick "remaining
quota" arithmetic stops being meaningful: any scheme can apply to
the entire monitored space and "remaining" stays positive
indefinitely. Cap esz at total monitored memory so the controller
remains within physically realisable bounds. Tuner-shape and
goal-metric agnostic.
Patch 3 - mm/damon/core: floor effective quota size at minimum region size
Symmetric to patch 2: the tuner can also compute esz < min_region_sz,
causing schemes to attempt zero-byte migrations for many ticks before
the tuner ramps esz back up. Observed under the CONSIST tuner with
a node_eligible_mem_bp goal: ftrace traced esz stuck at 1 byte for
96 seconds before the first region was tried; the first acted-on
region appeared at t=113s when esz crossed the min_region_sz
threshold. Floor esz at min_region_sz so schemes always have at
least one region's worth of quota when the tuner asks them to act.
Patch 4 - mm/damon/paddr: skip free pageblocks in migration walk
damon_pa_migrate() walks every 4KB PFN in a region. On
sparsely-populated lower-tier extents (e.g., a 549GB CXL region
with only ~8GB populated), this is ~144M PFN iterations per scheme
tick at ~40ns each = ~5.6 seconds of walk per tick. Use
pageblock-level free-page detection to skip unpopulated runs of
pages: only enter the per-page loop for pageblocks that contain at
least one allocated page. This brings the walk to
O(region_size / pageblock_size) skip-check cost plus
O(populated_pages) per-page work. On x86 pageblocks are 2MB, so
the same 549GB/8GB example becomes ~281K pageblock skip-checks
(microseconds total) plus ~2M per-page visits for the populated
pages -- ~80ms expected. Helps any migrate_hot/migrate_cold scheme
on paddr ops, regardless of what drives them.
Patch 5 - mm/damon/paddr: add time budget to migration page walk
Densely populated regions (e.g., a busy DRAM range where most
pageblocks contain at least one allocated page) can still consume
full ticks even with patch 4 applied. Add a 100ms wall-clock
budget with a ktime_get() check every 4096 pages walked
(~16MB worth). When the budget expires before reaching the end of
a region, kdamond returns control; subsequent ticks re-walk the
region from the start. Folios already on the target node are
dropped at migration time, so re-walks only re-do collection work,
not the migrate itself. Together with the per-scheme quota cap,
per-tick work is bounded and the workload converges over multiple
ticks for dense regions.
Worst-case migration walk contribution to a tick is bounded at
100ms per scheme regardless of region size or population density,
preserving kdamond's ability to service other DAMOS schemes and
user-space sysfs operations during heavy migration phases.
Testing context
===============
Hardware: AMD EPYC dual-socket, CXL.mem on a separate NUMA node.
Workload: 32GB hot working set across DRAM and CXL nodes.
DAMON config: paddr ops, two migrate_hot schemes (PULL CXL->DRAM,
PUSH DRAM->CXL) with complementary address filters,
node_eligible_mem_bp goal per scheme, temporal
quota tuner, 1s reset interval.
Each fix in this series was reproduced under the above setup, then
verified via ftrace and per-scheme stats after the fix landed.
References
==========
[1] mm/damon: add node_eligible_mem_bp goal metric
https://lore.kernel.org/damon/20260428030520.701-1-ravis.opensrc@gmail.com/
Ravi Jonnalagadda (5):
mm/damon/core: fix nr_accesses_bp underflow in damon_moving_sum
mm/damon/core: cap effective quota size to total monitored memory
mm/damon/core: floor effective quota size at minimum region size
mm/damon/paddr: skip free pageblocks in migration walk
mm/damon/paddr: add time budget to migration page walk
mm/damon/core.c | 29 ++++++++++++++++++++++++++++-
mm/damon/paddr.c | 40 +++++++++++++++++++++++++++++++++++++---
2 files changed, 65 insertions(+), 4 deletions(-)
base-commit: 0cec77cfd5314c0b3b03530abe1a4b32e991f639
--
2.43.0
next reply other threads:[~2026-05-16 21:04 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-16 21:03 Ravi Jonnalagadda [this message]
2026-05-16 21:03 ` [RFC PATCH 1/5] mm/damon/core: fix nr_accesses_bp underflow in damon_moving_sum Ravi Jonnalagadda
2026-05-16 22:29 ` sashiko-bot
2026-05-16 21:03 ` [RFC PATCH 2/5] mm/damon/core: cap effective quota size to total monitored memory Ravi Jonnalagadda
2026-05-16 22:55 ` sashiko-bot
2026-05-16 21:03 ` [RFC PATCH 3/5] mm/damon/core: floor effective quota size at minimum region size Ravi Jonnalagadda
2026-05-16 21:03 ` [RFC PATCH 4/5] mm/damon/paddr: skip free pageblocks in migration walk Ravi Jonnalagadda
2026-05-16 23:36 ` sashiko-bot
2026-05-16 21:03 ` [RFC PATCH 5/5] mm/damon/paddr: add time budget to migration page walk Ravi Jonnalagadda
2026-05-16 23:55 ` sashiko-bot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260516210357.2247-1-ravis.opensrc@gmail.com \
--to=ravis.opensrc@gmail.com \
--cc=ajayjoshi@micron.com \
--cc=akpm@linux-foundation.org \
--cc=bijan311@gmail.com \
--cc=corbet@lwn.net \
--cc=damon@lists.linux.dev \
--cc=honggyu.kim@sk.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=sj@kernel.org \
--cc=yunjeong.mun@sk.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.