* [RFC PATCH 1/5] mm/damon/core: fix nr_accesses_bp underflow in damon_moving_sum
2026-05-16 21:03 [RFC PATCH 0/5] mm/damon: DAMOS quota controller and paddr migration walk fixes Ravi Jonnalagadda
@ 2026-05-16 21:03 ` Ravi Jonnalagadda
2026-05-17 18:16 ` SeongJae Park
2026-05-16 21:03 ` [RFC PATCH 2/5] mm/damon/core: cap effective quota size to total monitored memory Ravi Jonnalagadda
` (3 subsequent siblings)
4 siblings, 1 reply; 9+ messages in thread
From: Ravi Jonnalagadda @ 2026-05-16 21:03 UTC (permalink / raw)
To: sj, damon, linux-mm, linux-kernel, linux-doc
Cc: akpm, corbet, bijan311, ajayjoshi, honggyu.kim, yunjeong.mun,
ravis.opensrc
Guard against unsigned integer underflow when nomvsum/len_window
exceeds mvsum. When that subtraction wraps, the moving sum returns a
near-ULONG_MAX value and corrupts nr_accesses_bp.
If subtrahend > mvsum, return new_value: this clamps the moving-sum
estimate to the current observation rather than wrapping.
Signed-off-by: Ravi Jonnalagadda <ravis.opensrc@gmail.com>
---
mm/damon/core.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/mm/damon/core.c b/mm/damon/core.c
index 3a8725e400c6b..9975f3d9ebfe9 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -3449,7 +3449,11 @@ int damon_set_region_system_rams_default(struct damon_target *t,
static unsigned int damon_moving_sum(unsigned int mvsum, unsigned int nomvsum,
unsigned int len_window, unsigned int new_value)
{
- return mvsum - nomvsum / len_window + new_value;
+ unsigned int subtrahend = nomvsum / len_window;
+
+ if (subtrahend > mvsum)
+ return new_value;
+ return mvsum - subtrahend + new_value;
}
/**
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [RFC PATCH 1/5] mm/damon/core: fix nr_accesses_bp underflow in damon_moving_sum
2026-05-16 21:03 ` [RFC PATCH 1/5] mm/damon/core: fix nr_accesses_bp underflow in damon_moving_sum Ravi Jonnalagadda
@ 2026-05-17 18:16 ` SeongJae Park
0 siblings, 0 replies; 9+ messages in thread
From: SeongJae Park @ 2026-05-17 18:16 UTC (permalink / raw)
To: Ravi Jonnalagadda
Cc: SeongJae Park, damon, linux-mm, linux-kernel, linux-doc, akpm,
corbet, bijan311, ajayjoshi, honggyu.kim, yunjeong.mun
Hello Ravi,
On Sat, 16 May 2026 14:03:53 -0700 Ravi Jonnalagadda <ravis.opensrc@gmail.com> wrote:
> Guard against unsigned integer underflow when nomvsum/len_window
> exceeds mvsum.
How could this happen? mvsum is assumed to be same to nomvsum at the beginning
of the window. Hence, even if there is only zero new_value, at the end of the
window, mvsum should be exactly zero. Of course there could be a bug that
breaks the assumption.
> When that subtraction wraps, the moving sum returns a
> near-ULONG_MAX value and corrupts nr_accesses_bp.
>
> If subtrahend > mvsum, return new_value: this clamps the moving-sum
> estimate to the current observation rather than wrapping.
I guess you saw this issue in real, and this change should fix the issue. But
I think we should know why and how mvsum < nomvum / len_window can unexpectedly
happen, and fix that.
Could you share more details about when and how the situation happens?
Thanks,
SJ
[...]
^ permalink raw reply [flat|nested] 9+ messages in thread
* [RFC PATCH 2/5] mm/damon/core: cap effective quota size to total monitored memory
2026-05-16 21:03 [RFC PATCH 0/5] mm/damon: DAMOS quota controller and paddr migration walk fixes Ravi Jonnalagadda
2026-05-16 21:03 ` [RFC PATCH 1/5] mm/damon/core: fix nr_accesses_bp underflow in damon_moving_sum Ravi Jonnalagadda
@ 2026-05-16 21:03 ` Ravi Jonnalagadda
2026-05-17 18:36 ` SeongJae Park
2026-05-16 21:03 ` [RFC PATCH 3/5] mm/damon/core: floor effective quota size at minimum region size Ravi Jonnalagadda
` (2 subsequent siblings)
4 siblings, 1 reply; 9+ messages in thread
From: Ravi Jonnalagadda @ 2026-05-16 21:03 UTC (permalink / raw)
To: sj, damon, linux-mm, linux-kernel, linux-doc
Cc: akpm, corbet, bijan311, ajayjoshi, honggyu.kim, yunjeong.mun,
ravis.opensrc
The DAMOS quota goal tuner can compute an effective size (esz) larger
than the total monitored memory because it integrates over cumulative
deltas without bounding by the actual workload size. Once esz exceeds
total monitored memory, the per-tick "remaining quota" arithmetic
stops being meaningful: any scheme can apply to the entire monitored
space and "remaining" stays positive indefinitely.
Cap esz to the total size of all currently monitored regions as a
final bound after all other quota calculations. Add
damon_ctx_total_monitored_sz() helper that sums region sizes across
all targets.
The helper runs only inside damos_set_effective_quota(), which is
called at most once per quota reset_interval (default 1s) per scheme,
not per kdamond tick. Walk cost is O(nr_regions) at that frequency
and is dominated by the enclosing tuner work.
This bound is tuner-shape and goal-metric agnostic: it constrains the
quota controller to physically realisable values regardless of which
tuner or goal metric drives it.
Signed-off-by: Ravi Jonnalagadda <ravis.opensrc@gmail.com>
---
mm/damon/core.c | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/mm/damon/core.c b/mm/damon/core.c
index 9975f3d9ebfe9..fd1db234ca304 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -2614,6 +2614,19 @@ static void damos_goal_tune_esz_bp_temporal(struct damon_ctx *c,
quota->esz_bp = ULONG_MAX;
}
+/* Sum of all monitored region sizes across all targets in @ctx. */
+static unsigned long damon_ctx_total_monitored_sz(struct damon_ctx *ctx)
+{
+ struct damon_target *t;
+ struct damon_region *r;
+ unsigned long total = 0;
+
+ damon_for_each_target(t, ctx)
+ damon_for_each_region(r, t)
+ total += damon_sz_region(r);
+ return total;
+}
+
/*
* Called only if quota->ms, or quota->sz are set, or quota->goals is not empty
*/
@@ -2621,6 +2634,7 @@ static void damos_set_effective_quota(struct damon_ctx *ctx, struct damos *s)
{
struct damos_quota *quota = &s->quota;
unsigned long throughput;
+ unsigned long total_sz;
unsigned long esz = ULONG_MAX;
if (!quota->ms && list_empty("a->goals)) {
@@ -2649,6 +2663,11 @@ static void damos_set_effective_quota(struct damon_ctx *ctx, struct damos *s)
if (quota->sz && quota->sz < esz)
esz = quota->sz;
+ /* Safety cap: never migrate more than total monitored memory */
+ total_sz = damon_ctx_total_monitored_sz(ctx);
+ if (total_sz && esz > total_sz)
+ esz = total_sz;
+
quota->esz = esz;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [RFC PATCH 2/5] mm/damon/core: cap effective quota size to total monitored memory
2026-05-16 21:03 ` [RFC PATCH 2/5] mm/damon/core: cap effective quota size to total monitored memory Ravi Jonnalagadda
@ 2026-05-17 18:36 ` SeongJae Park
0 siblings, 0 replies; 9+ messages in thread
From: SeongJae Park @ 2026-05-17 18:36 UTC (permalink / raw)
To: Ravi Jonnalagadda
Cc: SeongJae Park, damon, linux-mm, linux-kernel, linux-doc, akpm,
corbet, bijan311, ajayjoshi, honggyu.kim, yunjeong.mun
Hello Ravi,
On Sat, 16 May 2026 14:03:54 -0700 Ravi Jonnalagadda <ravis.opensrc@gmail.com> wrote:
> The DAMOS quota goal tuner can compute an effective size (esz) larger
> than the total monitored memory because it integrates over cumulative
> deltas without bounding by the actual workload size. Once esz exceeds
> total monitored memory, the per-tick "remaining quota" arithmetic
> stops being meaningful: any scheme can apply to the entire monitored
> space and "remaining" stays positive indefinitely.
Nice finding!
>
> Cap esz to the total size of all currently monitored regions as a
> final bound after all other quota calculations. Add
> damon_ctx_total_monitored_sz() helper that sums region sizes across
> all targets.
You could also make an arbitrary cap by setting the static size quota. That
is, if there are not only quota goal but also the size quota and/or time quota,
and the different types of quotas disagree about the real quota, DAMOS uses
smallest quota. You could read damos_set_effective_quota() code and kernel-doc
comment of 'struct damos_quota' for more details.
So you could apply the total monitoring region size cap by setting the size
quota of the total monitoring region size. Could that work for you?
Adding the total monitoring region size cap makes sense to me, and I think that
will make user experience better. But, if the size quota based cap works, that
could also be handled on user space in an easier and even a betetr way. If so,
I'd prefer the direction, to reduce kernel code complexity. What do you think?
Thanks,
SJ
[...]
^ permalink raw reply [flat|nested] 9+ messages in thread
* [RFC PATCH 3/5] mm/damon/core: floor effective quota size at minimum region size
2026-05-16 21:03 [RFC PATCH 0/5] mm/damon: DAMOS quota controller and paddr migration walk fixes Ravi Jonnalagadda
2026-05-16 21:03 ` [RFC PATCH 1/5] mm/damon/core: fix nr_accesses_bp underflow in damon_moving_sum Ravi Jonnalagadda
2026-05-16 21:03 ` [RFC PATCH 2/5] mm/damon/core: cap effective quota size to total monitored memory Ravi Jonnalagadda
@ 2026-05-16 21:03 ` Ravi Jonnalagadda
2026-05-17 18:47 ` SeongJae Park
2026-05-16 21:03 ` [RFC PATCH 4/5] mm/damon/paddr: skip free pageblocks in migration walk Ravi Jonnalagadda
2026-05-16 21:03 ` [RFC PATCH 5/5] mm/damon/paddr: add time budget to migration page walk Ravi Jonnalagadda
4 siblings, 1 reply; 9+ messages in thread
From: Ravi Jonnalagadda @ 2026-05-16 21:03 UTC (permalink / raw)
To: sj, damon, linux-mm, linux-kernel, linux-doc
Cc: akpm, corbet, bijan311, ajayjoshi, honggyu.kim, yunjeong.mun,
ravis.opensrc
The CONSIST quota goal tuner initializes esz_bp to 0, producing an
effective quota size (esz) of 1 byte on the first tick.
damos_quota_is_full() rejects all regions when esz < min_region_sz
(default PAGE_SIZE = 4096), so no regions can be tried and no
feedback reaches the tuner — a bootstrapping deadlock.
Floor esz at ctx->min_region_sz after the tuner computes it, guarded
by an esz != 0 check. The guard preserves the temporal tuner's
intentional stop behavior: when score >= 10000 (goal met), temporal
sets esz_bp = 0 to halt migration; the floor must not override that.
Signed-off-by: Ravi Jonnalagadda <ravis.opensrc@gmail.com>
---
mm/damon/core.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/mm/damon/core.c b/mm/damon/core.c
index fd1db234ca304..d33c4360cbd60 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -2650,6 +2650,10 @@ static void damos_set_effective_quota(struct damon_ctx *ctx, struct damos *s)
esz = quota->esz_bp / 10000;
}
+ /* avoid cold-start deadlock, but respect tuner stop signal (esz=0) */
+ if (esz)
+ esz = max_t(unsigned long, esz, ctx->min_region_sz);
+
if (quota->ms) {
if (quota->total_charged_ns)
throughput = mult_frac(quota->total_charged_sz,
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [RFC PATCH 3/5] mm/damon/core: floor effective quota size at minimum region size
2026-05-16 21:03 ` [RFC PATCH 3/5] mm/damon/core: floor effective quota size at minimum region size Ravi Jonnalagadda
@ 2026-05-17 18:47 ` SeongJae Park
0 siblings, 0 replies; 9+ messages in thread
From: SeongJae Park @ 2026-05-17 18:47 UTC (permalink / raw)
To: Ravi Jonnalagadda
Cc: SeongJae Park, damon, linux-mm, linux-kernel, linux-doc, akpm,
corbet, bijan311, ajayjoshi, honggyu.kim, yunjeong.mun
On Sat, 16 May 2026 14:03:55 -0700 Ravi Jonnalagadda <ravis.opensrc@gmail.com> wrote:
> The CONSIST quota goal tuner initializes esz_bp to 0, producing an
> effective quota size (esz) of 1 byte on the first tick.
> damos_quota_is_full() rejects all regions when esz < min_region_sz
> (default PAGE_SIZE = 4096), so no regions can be tried and no
> feedback reaches the tuner — a bootstrapping deadlock.
That depend on whether the goal is already [over]-achieved. If the goal is
achieved, the tuner will think no change is needed, so keep the
effectively-zero quota. If the goal is over-achived, the tuner will think the
DAMOS scheme should be less aggressive, but it is already effectively-zero
quota, so keep having effectively-zero quota.
If the ogal is under-achived, the logic will iteratively increase the internal
esz (esz_bp), until it exceeds the min_region_sz, and finally start making some
effects.
So, unless the goal is already [over]-achieved, there is no deadlock. If the
goal is already [over]-achieved, why we would want to make DAMOS do something?
Am I missing something?
I'd like to discuss this high level thing first, before digging deep into the
details.
Thanks,
SJ
[...]
^ permalink raw reply [flat|nested] 9+ messages in thread
* [RFC PATCH 4/5] mm/damon/paddr: skip free pageblocks in migration walk
2026-05-16 21:03 [RFC PATCH 0/5] mm/damon: DAMOS quota controller and paddr migration walk fixes Ravi Jonnalagadda
` (2 preceding siblings ...)
2026-05-16 21:03 ` [RFC PATCH 3/5] mm/damon/core: floor effective quota size at minimum region size Ravi Jonnalagadda
@ 2026-05-16 21:03 ` Ravi Jonnalagadda
2026-05-16 21:03 ` [RFC PATCH 5/5] mm/damon/paddr: add time budget to migration page walk Ravi Jonnalagadda
4 siblings, 0 replies; 9+ messages in thread
From: Ravi Jonnalagadda @ 2026-05-16 21:03 UTC (permalink / raw)
To: sj, damon, linux-mm, linux-kernel, linux-doc
Cc: akpm, corbet, bijan311, ajayjoshi, honggyu.kim, yunjeong.mun,
ravis.opensrc
damon_pa_migrate() walks every PFN in a region linearly, calling
damon_get_folio() for each one. On sparse physical address spaces
(e.g., CXL-attached memory), a single DAMON region can span hundreds
of gigabytes where most memory is free and sitting in the buddy
allocator. Most page lookups are fruitless and dominate kdamond
tick time.
Check at pageblock boundaries (2MB on x86_64) whether the block is
entirely free. If the first page of a pageblock is a buddy page at
pageblock_order or higher, the entire block is free and can be
skipped. Similarly skip pageblocks where pfn_to_online_page() returns
NULL.
This reduces the iteration from O(region_sz / PAGE_SIZE) to
O(region_sz / pageblock_sz) + O(populated_pages).
buddy_order_unsafe() is used without zone->lock. A transient false
positive (block becomes non-free between the PageBuddy and order
checks) costs at most one tick of missed candidates on that block;
the next tick re-scans. No correctness consequence as DAMON walks
are best-effort.
Signed-off-by: Ravi Jonnalagadda <ravis.opensrc@gmail.com>
---
mm/damon/paddr.c | 25 ++++++++++++++++++++++---
1 file changed, 22 insertions(+), 3 deletions(-)
diff --git a/mm/damon/paddr.c b/mm/damon/paddr.c
index c4738cd5e221e..e844c990987b9 100644
--- a/mm/damon/paddr.c
+++ b/mm/damon/paddr.c
@@ -258,13 +258,32 @@ static unsigned long damon_pa_migrate(struct damon_region *r,
unsigned long addr_unit, struct damos *s,
unsigned long *sz_filter_passed)
{
- phys_addr_t addr, applied;
+ phys_addr_t addr, end, applied;
LIST_HEAD(folio_list);
struct folio *folio = NULL;
+ unsigned long pfn;
addr = damon_pa_phys_addr(r->ar.start, addr_unit);
- while (addr < damon_pa_phys_addr(r->ar.end, addr_unit)) {
- folio = damon_get_folio(PHYS_PFN(addr));
+ end = damon_pa_phys_addr(r->ar.end, addr_unit);
+ while (addr < end) {
+ pfn = PHYS_PFN(addr);
+
+ /* Skip pageblocks that are entirely free. */
+ if (IS_ALIGNED(pfn, pageblock_nr_pages)) {
+ struct page *page = pfn_to_online_page(pfn);
+
+ if (!page) {
+ addr += pageblock_nr_pages * PAGE_SIZE;
+ continue;
+ }
+ if (PageBuddy(page) &&
+ buddy_order_unsafe(page) >= pageblock_order) {
+ addr += pageblock_nr_pages * PAGE_SIZE;
+ continue;
+ }
+ }
+
+ folio = damon_get_folio(pfn);
if (damon_pa_invalid_damos_folio(folio, s)) {
addr += PAGE_SIZE;
continue;
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread* [RFC PATCH 5/5] mm/damon/paddr: add time budget to migration page walk
2026-05-16 21:03 [RFC PATCH 0/5] mm/damon: DAMOS quota controller and paddr migration walk fixes Ravi Jonnalagadda
` (3 preceding siblings ...)
2026-05-16 21:03 ` [RFC PATCH 4/5] mm/damon/paddr: skip free pageblocks in migration walk Ravi Jonnalagadda
@ 2026-05-16 21:03 ` Ravi Jonnalagadda
4 siblings, 0 replies; 9+ messages in thread
From: Ravi Jonnalagadda @ 2026-05-16 21:03 UTC (permalink / raw)
To: sj, damon, linux-mm, linux-kernel, linux-doc
Cc: akpm, corbet, bijan311, ajayjoshi, honggyu.kim, yunjeong.mun,
ravis.opensrc
On populated physical address ranges the pageblock skip optimization
alone is insufficient — most pageblocks contain at least one allocated
page, so the walk still iterates millions of PFNs.
Add a 100ms wall-clock time budget to damon_pa_migrate(). Once the
deadline is reached, the walk breaks out and migrates whatever folios
have been collected so far.
The time check is amortized by only calling ktime_get() every 4096
pages (~16MB of address space), adding negligible overhead to the
fast path.
Signed-off-by: Ravi Jonnalagadda <ravis.opensrc@gmail.com>
---
mm/damon/paddr.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/mm/damon/paddr.c b/mm/damon/paddr.c
index e844c990987b9..a2565287bc10f 100644
--- a/mm/damon/paddr.c
+++ b/mm/damon/paddr.c
@@ -14,6 +14,7 @@
#include <linux/swap.h>
#include <linux/memory-tiers.h>
#include <linux/mm_inline.h>
+#include <linux/ktime.h>
#include "../internal.h"
#include "ops-common.h"
@@ -254,6 +255,14 @@ static unsigned long damon_pa_deactivate_pages(struct damon_region *r,
return damon_pa_de_activate(r, addr_unit, s, false, sz_filter_passed);
}
+/* Maximum wall-clock time to spend in a single migration walk (ns) */
+#define DAMON_PA_MIGRATE_BUDGET_NS (100 * NSEC_PER_MSEC)
+
+/* Check the time budget every 4096 pages (~16MB) to amortize ktime_get(). */
+#define DAMON_PA_MIGRATE_TIME_CHECK_PAGES 4096
+#define DAMON_PA_MIGRATE_TIME_CHECK_MASK \
+ (DAMON_PA_MIGRATE_TIME_CHECK_PAGES - 1)
+
static unsigned long damon_pa_migrate(struct damon_region *r,
unsigned long addr_unit, struct damos *s,
unsigned long *sz_filter_passed)
@@ -262,6 +271,7 @@ static unsigned long damon_pa_migrate(struct damon_region *r,
LIST_HEAD(folio_list);
struct folio *folio = NULL;
unsigned long pfn;
+ ktime_t deadline = ktime_add_ns(ktime_get(), DAMON_PA_MIGRATE_BUDGET_NS);
addr = damon_pa_phys_addr(r->ar.start, addr_unit);
end = damon_pa_phys_addr(r->ar.end, addr_unit);
@@ -283,6 +293,11 @@ static unsigned long damon_pa_migrate(struct damon_region *r,
}
}
+ /* Time budget: keep kdamond responsive on long migration walks. */
+ if (!(pfn & DAMON_PA_MIGRATE_TIME_CHECK_MASK) &&
+ ktime_after(ktime_get(), deadline))
+ break;
+
folio = damon_get_folio(pfn);
if (damon_pa_invalid_damos_folio(folio, s)) {
addr += PAGE_SIZE;
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread