[RFC PATCH 0/5] mm/damon: DAMOS quota controller and paddr migration walk fixes

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Ravi Jonnalagadda <ravis.opensrc@gmail.com>
To: sj@kernel.org, damon@lists.linux.dev, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org
Cc: akpm@linux-foundation.org, corbet@lwn.net, bijan311@gmail.com,
	ajayjoshi@micron.com, honggyu.kim@sk.com, yunjeong.mun@sk.com,
	ravis.opensrc@gmail.com
Subject: [RFC PATCH 0/5] mm/damon: DAMOS quota controller and paddr migration walk fixes
Date: Sat, 16 May 2026 14:03:52 -0700	[thread overview]
Message-ID: <20260516210357.2247-1-ravis.opensrc@gmail.com> (raw)

MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Hi,

This series carries five fixes for the DAMOS quota controller and the
paddr migration walk.  All five were surfaced during closed-loop tiering
testing on a heterogeneous-memory system (DRAM + CXL on a separate
NUMA node), but each fix is in code paths that benefit any caller --
not scoped to closed-loop tiering or to any specific goal metric.

Test envelope: AMD EPYC dual-socket host with CXL.mem on a separate
NUMA node, two-scheme migrate_hot PULL+PUSH setup driven by
node_eligible_mem_bp (now in linux-next)[1].

What each patch does
====================

Patch 1 - mm/damon/core: fix nr_accesses_bp underflow in damon_moving_sum

  damon_moving_sum() can underflow when a region's access rate drops
  to zero faster than the moving-sum window length.  The internal
  accumulator subtracts an outgoing sample without a lower bound,
  producing a sentinel-large nr_accesses_bp that mis-classifies a
  cold region as hot.  Affects every DAMOS scheme, since
  nr_accesses_bp is used on every access-rate update for every
  region regardless of which scheme or goal metric is active.

Patch 2 - mm/damon/core: cap effective quota size to total monitored memory

  The DAMOS quota tuner can compute an effective size (esz) larger
  than the total monitored memory, because the tuner integrates over
  cumulative deltas without bounding by the actual workload size.
  Once esz exceeds total monitored memory the per-tick "remaining
  quota" arithmetic stops being meaningful: any scheme can apply to
  the entire monitored space and "remaining" stays positive
  indefinitely.  Cap esz at total monitored memory so the controller
  remains within physically realisable bounds.  Tuner-shape and
  goal-metric agnostic.

Patch 3 - mm/damon/core: floor effective quota size at minimum region size

  Symmetric to patch 2: the tuner can also compute esz < min_region_sz,
  causing schemes to attempt zero-byte migrations for many ticks before
  the tuner ramps esz back up.  Observed under the CONSIST tuner with
  a node_eligible_mem_bp goal: ftrace traced esz stuck at 1 byte for
  96 seconds before the first region was tried; the first acted-on
  region appeared at t=113s when esz crossed the min_region_sz
  threshold.  Floor esz at min_region_sz so schemes always have at
  least one region's worth of quota when the tuner asks them to act.

Patch 4 - mm/damon/paddr: skip free pageblocks in migration walk

  damon_pa_migrate() walks every 4KB PFN in a region.  On
  sparsely-populated lower-tier extents (e.g., a 549GB CXL region
  with only ~8GB populated), this is ~144M PFN iterations per scheme
  tick at ~40ns each = ~5.6 seconds of walk per tick.  Use
  pageblock-level free-page detection to skip unpopulated runs of
  pages: only enter the per-page loop for pageblocks that contain at
  least one allocated page.  This brings the walk to
  O(region_size / pageblock_size) skip-check cost plus
  O(populated_pages) per-page work.  On x86 pageblocks are 2MB, so
  the same 549GB/8GB example becomes ~281K pageblock skip-checks
  (microseconds total) plus ~2M per-page visits for the populated
  pages -- ~80ms expected.  Helps any migrate_hot/migrate_cold scheme
  on paddr ops, regardless of what drives them.

Patch 5 - mm/damon/paddr: add time budget to migration page walk

  Densely populated regions (e.g., a busy DRAM range where most
  pageblocks contain at least one allocated page) can still consume
  full ticks even with patch 4 applied.  Add a 100ms wall-clock
  budget with a ktime_get() check every 4096 pages walked
  (~16MB worth).  When the budget expires before reaching the end of
  a region, kdamond returns control; subsequent ticks re-walk the
  region from the start.  Folios already on the target node are
  dropped at migration time, so re-walks only re-do collection work,
  not the migrate itself.  Together with the per-scheme quota cap,
  per-tick work is bounded and the workload converges over multiple
  ticks for dense regions.

  Worst-case migration walk contribution to a tick is bounded at
  100ms per scheme regardless of region size or population density,
  preserving kdamond's ability to service other DAMOS schemes and
  user-space sysfs operations during heavy migration phases.

Testing context
===============

  Hardware:  AMD EPYC dual-socket, CXL.mem on a separate NUMA node.
  Workload: 32GB hot working set across DRAM and CXL nodes.
  DAMON config: paddr ops, two migrate_hot schemes (PULL CXL->DRAM,
                PUSH DRAM->CXL) with complementary address filters,
                node_eligible_mem_bp goal per scheme, temporal
                quota tuner, 1s reset interval.

Each fix in this series was reproduced under the above setup, then
verified via ftrace and per-scheme stats after the fix landed.

References
==========

[1] mm/damon: add node_eligible_mem_bp goal metric
https://lore.kernel.org/damon/20260428030520.701-1-ravis.opensrc@gmail.com/

Ravi Jonnalagadda (5):
  mm/damon/core: fix nr_accesses_bp underflow in damon_moving_sum
  mm/damon/core: cap effective quota size to total monitored memory
  mm/damon/core: floor effective quota size at minimum region size
  mm/damon/paddr: skip free pageblocks in migration walk
  mm/damon/paddr: add time budget to migration page walk

 mm/damon/core.c  | 29 ++++++++++++++++++++++++++++-
 mm/damon/paddr.c | 40 +++++++++++++++++++++++++++++++++++++---
 2 files changed, 65 insertions(+), 4 deletions(-)

base-commit: 0cec77cfd5314c0b3b03530abe1a4b32e991f639
-- 
2.43.0

next             reply	other threads:[~2026-05-16 21:04 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-16 21:03 Ravi Jonnalagadda [this message]
2026-05-16 21:03 ` [RFC PATCH 1/5] mm/damon/core: fix nr_accesses_bp underflow in damon_moving_sum Ravi Jonnalagadda
2026-05-16 22:29   ` sashiko-bot
2026-05-16 21:03 ` [RFC PATCH 2/5] mm/damon/core: cap effective quota size to total monitored memory Ravi Jonnalagadda
2026-05-16 22:55   ` sashiko-bot
2026-05-16 21:03 ` [RFC PATCH 3/5] mm/damon/core: floor effective quota size at minimum region size Ravi Jonnalagadda
2026-05-16 21:03 ` [RFC PATCH 4/5] mm/damon/paddr: skip free pageblocks in migration walk Ravi Jonnalagadda
2026-05-16 23:36   ` sashiko-bot
2026-05-16 21:03 ` [RFC PATCH 5/5] mm/damon/paddr: add time budget to migration page walk Ravi Jonnalagadda
2026-05-16 23:55   ` sashiko-bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260516210357.2247-1-ravis.opensrc@gmail.com \
    --to=ravis.opensrc@gmail.com \
    --cc=ajayjoshi@micron.com \
    --cc=akpm@linux-foundation.org \
    --cc=bijan311@gmail.com \
    --cc=corbet@lwn.net \
    --cc=damon@lists.linux.dev \
    --cc=honggyu.kim@sk.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=sj@kernel.org \
    --cc=yunjeong.mun@sk.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.