* [PATCH v7] mm/damon: add node_eligible_mem_bp goal metric
@ 2026-04-24 20:34 Ravi Jonnalagadda
2026-04-24 20:59 ` sashiko-bot
2026-04-25 0:39 ` SeongJae Park
0 siblings, 2 replies; 6+ messages in thread
From: Ravi Jonnalagadda @ 2026-04-24 20:34 UTC (permalink / raw)
To: sj, damon, linux-mm, linux-kernel, linux-doc
Cc: akpm, corbet, bijan311, ajayjoshi, honggyu.kim, yunjeong.mun,
ravis.opensrc
Background and Motivation
=========================
In heterogeneous memory systems, controlling memory distribution across
NUMA nodes is essential for performance optimization. This patch enables
system-wide page distribution with target-state goals such as "maintain
60% of scheme-eligible memory on DRAM" using PA-mode DAMON schemes.
Rather than using absolute thresholds, this metric tracks the ratio of
memory that matches each scheme's access pattern filters on a target
node, enabling the quota system to automatically adjust migration
aggressiveness to maintain the desired distribution.
What This Metric Measures
=========================
node_eligible_mem_bp:
scheme_eligible_bytes_on_node / total_scheme_eligible_bytes * 10000
The metric iterates through each scheme's eligible regions and uses
damon_get_folio() to determine NUMA node placement of each folio.
Two-Scheme Setup for Hot Page Distribution
==========================================
For maintaining hot memory on DRAM (node 0) and CXL (node 1) in a 6:4
ratio (per SeongJae Park's suggestion):
PULL scheme: migrate_hot to node 0
goal: node_eligible_mem_bp, nid=0, target=6000
"Move hot pages to DRAM if less than 60% of hot data is in DRAM"
PUSH scheme: migrate_hot to node 1
goal: node_eligible_mem_bp, nid=1, target=4000
"Move hot pages to CXL if less than 40% of hot data is in CXL"
Each scheme independently measures its own eligible memory and adjusts
its quota to achieve its target ratio. The schemes work in concert
through DAMON's unified monitoring context, with the quota autotuner
balancing their relative aggressiveness.
Implementation Details
======================
Key implementation considerations:
- Uses phys_addr_t for byte calculations to prevent overflow on 32-bit
architectures with large memory configurations
- Includes cond_resched() per region to prevent CPU soft lockups when
processing large numbers of regions
- Validates that ops type is DAMON_OPS_PADDR in both damon_commit_ctx()
(for clear error feedback) and the metric function (defensive check)
- Properly handles folio alignment when region addresses fall within
large folios by calculating exact overlap
The implementation requires CONFIG_DAMON_PADDR since damon_get_folio()
is only available for physical address space monitoring.
Testing Results
===============
Functionally tested on a two-node heterogeneous memory system with DRAM
(node 0) and CXL memory (node 1). A PUSH+PULL scheme configuration using
migrate_hot actions was used to reach a target hot memory ratio between
the two tiers.
With the TEMPORAL tuner, the system converges quickly to the target
distribution. The tuner drives esz to maximum when under goal and to
zero once the goal is met, forming a simple on/off feedback loop that
stabilizes at the desired ratio.
With the CONSIST tuner, the scheme still converges but more slowly, as
it migrates and then throttles itself based on quota feedback. The time
to reach the goal varies depending on workload intensity.
Note: This metric works with both TEMPORAL and CONSIST goal tuners.
Suggested-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Ravi Jonnalagadda <ravis.opensrc@gmail.com>
---
include/linux/damon.h | 3 +
mm/damon/core.c | 171 +++++++++++++++++++++++++++++++++++----
mm/damon/sysfs-schemes.c | 7 ++
3 files changed, 165 insertions(+), 16 deletions(-)
diff --git a/include/linux/damon.h b/include/linux/damon.h
index f2cdb7c3f5e6..986b8c902585 100644
--- a/include/linux/damon.h
+++ b/include/linux/damon.h
@@ -159,6 +159,8 @@ enum damos_action {
* @DAMOS_QUOTA_NODE_MEMCG_FREE_BP: MemFree ratio of a node for a cgroup.
* @DAMOS_QUOTA_ACTIVE_MEM_BP: Active to total LRU memory ratio.
* @DAMOS_QUOTA_INACTIVE_MEM_BP: Inactive to total LRU memory ratio.
+ * @DAMOS_QUOTA_NODE_ELIGIBLE_MEM_BP: Scheme-eligible memory ratio of a
+ * node in basis points (0-10000).
* @NR_DAMOS_QUOTA_GOAL_METRICS: Number of DAMOS quota goal metrics.
*
* Metrics equal to larger than @NR_DAMOS_QUOTA_GOAL_METRICS are unsupported.
@@ -172,6 +174,7 @@ enum damos_quota_goal_metric {
DAMOS_QUOTA_NODE_MEMCG_FREE_BP,
DAMOS_QUOTA_ACTIVE_MEM_BP,
DAMOS_QUOTA_INACTIVE_MEM_BP,
+ DAMOS_QUOTA_NODE_ELIGIBLE_MEM_BP,
NR_DAMOS_QUOTA_GOAL_METRICS,
};
diff --git a/mm/damon/core.c b/mm/damon/core.c
index 3dbbbfdeff71..4ac805205261 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -13,10 +13,14 @@
#include <linux/memcontrol.h>
#include <linux/mm.h>
#include <linux/psi.h>
+#include <linux/sched.h>
#include <linux/slab.h>
#include <linux/string.h>
#include <linux/string_choices.h>
+/* for damon_get_folio() used by node eligible memory metrics */
+#include "ops-common.h"
+
#define CREATE_TRACE_POINTS
#include <trace/events/damon.h>
@@ -1326,11 +1330,25 @@ static int damon_commit_targets(
int damon_commit_ctx(struct damon_ctx *dst, struct damon_ctx *src)
{
int err;
+ struct damos *scheme;
+ struct damos_quota_goal *goal;
dst->maybe_corrupted = true;
if (!is_power_of_2(src->min_region_sz))
return -EINVAL;
+ /* node_eligible_mem_bp metric requires PADDR ops */
+ if (src->ops.id != DAMON_OPS_PADDR) {
+ damon_for_each_scheme(scheme, src) {
+ struct damos_quota *quota = &scheme->quota;
+
+ damos_for_each_quota_goal(goal, quota) {
+ if (goal->metric == DAMOS_QUOTA_NODE_ELIGIBLE_MEM_BP)
+ return -EINVAL;
+ }
+ }
+ }
+
err = damon_commit_schemes(dst, src);
if (err)
return err;
@@ -2287,7 +2305,112 @@ static unsigned long damos_get_node_memcg_used_bp(
numerator = i.totalram - used_pages;
return mult_frac(numerator, 10000, i.totalram);
}
-#else
+
+#ifdef CONFIG_DAMON_PADDR
+/*
+ * damos_calc_eligible_bytes() - Calculate raw eligible bytes per node.
+ * @c: The DAMON context.
+ * @s: The scheme.
+ * @nid: The target NUMA node id.
+ * @total: Output for total eligible bytes across all nodes.
+ *
+ * Iterates through each folio in eligible regions to accurately determine
+ * which node the memory resides on. Returns eligible bytes on the specified
+ * node and sets *total to the sum across all nodes.
+ *
+ * Note: This function requires damon_get_folio() from ops-common.c, which is
+ * only available when CONFIG_DAMON_PADDR is enabled. It also requires the
+ * context to be using PADDR operations for meaningful results.
+ */
+static phys_addr_t damos_calc_eligible_bytes(struct damon_ctx *c,
+ struct damos *s, int nid, phys_addr_t *total)
+{
+ struct damon_target *t;
+ struct damon_region *r;
+ phys_addr_t total_eligible = 0;
+ phys_addr_t node_eligible = 0;
+
+ damon_for_each_target(t, c) {
+ damon_for_each_region(r, t) {
+ phys_addr_t addr, end_addr;
+
+ if (!__damos_valid_target(r, s))
+ continue;
+
+ /* Convert from core address units to physical bytes */
+ addr = (phys_addr_t)r->ar.start * c->addr_unit;
+ end_addr = (phys_addr_t)r->ar.end * c->addr_unit;
+ while (addr < end_addr) {
+ struct folio *folio;
+ phys_addr_t folio_start, folio_end;
+ phys_addr_t overlap_start, overlap_end;
+ phys_addr_t counted;
+
+ folio = damon_get_folio(PHYS_PFN(addr));
+ if (!folio) {
+ addr += PAGE_SIZE;
+ continue;
+ }
+
+ /*
+ * Calculate exact overlap between the region
+ * [addr, end_addr) and the folio range.
+ * The folio may start before addr if addr is
+ * in the middle of a large folio.
+ */
+ folio_start = PFN_PHYS(folio_pfn(folio));
+ folio_end = folio_start + folio_size(folio);
+
+ overlap_start = max(addr, folio_start);
+ overlap_end = min(end_addr, folio_end);
+
+ if (overlap_end > overlap_start) {
+ counted = overlap_end - overlap_start;
+ total_eligible += counted;
+ if (folio_nid(folio) == nid)
+ node_eligible += counted;
+ }
+
+ /* Advance past the entire folio */
+ addr = folio_end;
+ folio_put(folio);
+ }
+ cond_resched();
+ }
+ }
+
+ *total = total_eligible;
+ return node_eligible;
+}
+
+static unsigned long damos_get_node_eligible_mem_bp(struct damon_ctx *c,
+ struct damos *s, int nid)
+{
+ phys_addr_t total_eligible = 0;
+ phys_addr_t node_eligible;
+
+ if (c->ops.id != DAMON_OPS_PADDR)
+ return 0;
+
+ if (nid < 0 || nid >= MAX_NUMNODES || !node_online(nid))
+ return 0;
+
+ node_eligible = damos_calc_eligible_bytes(c, s, nid, &total_eligible);
+
+ if (!total_eligible)
+ return 0;
+
+ return mult_frac((unsigned long)node_eligible, 10000,
+ (unsigned long)total_eligible);
+}
+#else /* CONFIG_DAMON_PADDR */
+static unsigned long damos_get_node_eligible_mem_bp(struct damon_ctx *c,
+ struct damos *s, int nid)
+{
+ return 0;
+}
+#endif /* CONFIG_DAMON_PADDR */
+#else /* CONFIG_NUMA */
static __kernel_ulong_t damos_get_node_mem_bp(
struct damos_quota_goal *goal)
{
@@ -2299,7 +2422,13 @@ static unsigned long damos_get_node_memcg_used_bp(
{
return 0;
}
-#endif
+
+static unsigned long damos_get_node_eligible_mem_bp(struct damon_ctx *c,
+ struct damos *s, int nid)
+{
+ return 0;
+}
+#endif /* CONFIG_NUMA */
/*
* Returns LRU-active or inactive memory to total LRU memory size ratio.
@@ -2319,7 +2448,8 @@ static unsigned int damos_get_in_active_mem_bp(bool active_ratio)
return mult_frac(inactive, 10000, total);
}
-static void damos_set_quota_goal_current_value(struct damos_quota_goal *goal)
+static void damos_set_quota_goal_current_value(struct damon_ctx *c,
+ struct damos *s, struct damos_quota_goal *goal)
{
u64 now_psi_total;
@@ -2345,19 +2475,24 @@ static void damos_set_quota_goal_current_value(struct damos_quota_goal *goal)
goal->current_value = damos_get_in_active_mem_bp(
goal->metric == DAMOS_QUOTA_ACTIVE_MEM_BP);
break;
+ case DAMOS_QUOTA_NODE_ELIGIBLE_MEM_BP:
+ goal->current_value = damos_get_node_eligible_mem_bp(c, s,
+ goal->nid);
+ break;
default:
break;
}
}
/* Return the highest score since it makes schemes least aggressive */
-static unsigned long damos_quota_score(struct damos_quota *quota)
+static unsigned long damos_quota_score(struct damon_ctx *c, struct damos *s)
{
struct damos_quota_goal *goal;
+ struct damos_quota *quota = &s->quota;
unsigned long highest_score = 0;
damos_for_each_quota_goal(goal, quota) {
- damos_set_quota_goal_current_value(goal);
+ damos_set_quota_goal_current_value(c, s, goal);
highest_score = max(highest_score,
mult_frac(goal->current_value, 10000,
goal->target_value));
@@ -2366,17 +2501,19 @@ static unsigned long damos_quota_score(struct damos_quota *quota)
return highest_score;
}
-static void damos_goal_tune_esz_bp_consist(struct damos_quota *quota)
+static void damos_goal_tune_esz_bp_consist(struct damon_ctx *c, struct damos *s)
{
- unsigned long score = damos_quota_score(quota);
+ struct damos_quota *quota = &s->quota;
+ unsigned long score = damos_quota_score(c, s);
quota->esz_bp = damon_feed_loop_next_input(
max(quota->esz_bp, 10000UL), score);
}
-static void damos_goal_tune_esz_bp_temporal(struct damos_quota *quota)
+static void damos_goal_tune_esz_bp_temporal(struct damon_ctx *c, struct damos *s)
{
- unsigned long score = damos_quota_score(quota);
+ struct damos_quota *quota = &s->quota;
+ unsigned long score = damos_quota_score(c, s);
if (score >= 10000)
quota->esz_bp = 0;
@@ -2389,9 +2526,9 @@ static void damos_goal_tune_esz_bp_temporal(struct damos_quota *quota)
/*
* Called only if quota->ms, or quota->sz are set, or quota->goals is not empty
*/
-static void damos_set_effective_quota(struct damos_quota *quota,
- struct damon_ctx *ctx)
+static void damos_set_effective_quota(struct damon_ctx *c, struct damos *s)
{
+ struct damos_quota *quota = &s->quota;
unsigned long throughput;
unsigned long esz = ULONG_MAX;
@@ -2402,9 +2539,9 @@ static void damos_set_effective_quota(struct damos_quota *quota,
if (!list_empty("a->goals)) {
if (quota->goal_tuner == DAMOS_QUOTA_GOAL_TUNER_CONSIST)
- damos_goal_tune_esz_bp_consist(quota);
+ damos_goal_tune_esz_bp_consist(c, s);
else if (quota->goal_tuner == DAMOS_QUOTA_GOAL_TUNER_TEMPORAL)
- damos_goal_tune_esz_bp_temporal(quota);
+ damos_goal_tune_esz_bp_temporal(c, s);
esz = quota->esz_bp / 10000;
}
@@ -2415,7 +2552,7 @@ static void damos_set_effective_quota(struct damos_quota *quota,
else
throughput = PAGE_SIZE * 1024;
esz = min(throughput * quota->ms, esz);
- esz = max(ctx->min_region_sz, esz);
+ esz = max(c->min_region_sz, esz);
}
if (quota->sz && quota->sz < esz)
@@ -2452,7 +2589,9 @@ static void damos_adjust_quota(struct damon_ctx *c, struct damos *s)
/* First charge window */
if (!quota->total_charged_sz && !quota->charged_from) {
quota->charged_from = jiffies;
- damos_set_effective_quota(quota, c);
+ damos_set_effective_quota(c, s);
+ if (trace_damos_esz_enabled())
+ damos_trace_esz(c, s, quota);
}
/* New charge window starts */
@@ -2467,7 +2606,7 @@ static void damos_adjust_quota(struct damon_ctx *c, struct damos *s)
quota->charged_sz = 0;
if (trace_damos_esz_enabled())
cached_esz = quota->esz;
- damos_set_effective_quota(quota, c);
+ damos_set_effective_quota(c, s);
if (trace_damos_esz_enabled() && quota->esz != cached_esz)
damos_trace_esz(c, s, quota);
}
diff --git a/mm/damon/sysfs-schemes.c b/mm/damon/sysfs-schemes.c
index 245d63808411..f21191d31f58 100644
--- a/mm/damon/sysfs-schemes.c
+++ b/mm/damon/sysfs-schemes.c
@@ -1094,6 +1094,10 @@ struct damos_sysfs_qgoal_metric_name damos_sysfs_qgoal_metric_names[] = {
.metric = DAMOS_QUOTA_INACTIVE_MEM_BP,
.name = "inactive_mem_bp",
},
+ {
+ .metric = DAMOS_QUOTA_NODE_ELIGIBLE_MEM_BP,
+ .name = "node_eligible_mem_bp",
+ },
};
static ssize_t target_metric_show(struct kobject *kobj,
@@ -2685,6 +2689,9 @@ static int damos_sysfs_add_quota_score(
}
goal->nid = sysfs_goal->nid;
break;
+ case DAMOS_QUOTA_NODE_ELIGIBLE_MEM_BP:
+ goal->nid = sysfs_goal->nid;
+ break;
default:
break;
}
base-commit: 277a7fae172bb4a598e0d4144ffb2ff6207fedb3
--
2.43.0
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [PATCH v7] mm/damon: add node_eligible_mem_bp goal metric 2026-04-24 20:34 [PATCH v7] mm/damon: add node_eligible_mem_bp goal metric Ravi Jonnalagadda @ 2026-04-24 20:59 ` sashiko-bot 2026-04-25 0:45 ` SeongJae Park 2026-04-25 0:39 ` SeongJae Park 1 sibling, 1 reply; 6+ messages in thread From: sashiko-bot @ 2026-04-24 20:59 UTC (permalink / raw) To: Ravi Jonnalagadda; +Cc: damon Sashiko AI review found 4 potential issue(s): - [High] Integer Truncation and Arithmetic Overflow on 32-bit Architectures - [High] CPU Soft Lockups due to Unbounded Physical Address Iteration - [Medium] Missing Core Filters Evaluation Causes Eligible Memory Over-counting - [Medium] Unaligned Address Advancement Truncates Exact Overlap Calculation -- commit d92efd581a7b3318becfa32765dc87547acb91c6 Author: Ravi Jonnalagadda <ravis.opensrc@gmail.com> Subject: mm/damon: add node_eligible_mem_bp goal metric This commit introduces a new metric node_eligible_mem_bp for DAMON schemes to control memory distribution across NUMA nodes based on the ratio of scheme-eligible memory. It calculates physical bytes per node using PADDR ops and integrates with the DAMOS quota goal system. > diff --git a/mm/damon/core.c b/mm/damon/core.c > index 3dbbbfdeff719f..4ac8052052613d 100644 > --- a/mm/damon/core.c > +++ b/mm/damon/core.c [ ... ] > @@ -2287,7 +2305,112 @@ static unsigned long damos_get_node_memcg_used_bp( > numerator = i.totalram - used_pages; > return mult_frac(numerator, 10000, i.totalram); > } > -#else > + > +#ifdef CONFIG_DAMON_PADDR > +/* > + * damos_calc_eligible_bytes() - Calculate raw eligible bytes per node. > + * @c: The DAMON context. > + * @s: The scheme. > + * @nid: The target NUMA node id. > + * @total: Output for total eligible bytes across all nodes. > + * > + * Iterates through each folio in eligible regions to accurately determine > + * which node the memory resides on. Returns eligible bytes on the specified > + * node and sets *total to the sum across all nodes. > + * > + * Note: This function requires damon_get_folio() from ops-common.c, which is > + * only available when CONFIG_DAMON_PADDR is enabled. It also requires the > + * context to be using PADDR operations for meaningful results. > + */ > +static phys_addr_t damos_calc_eligible_bytes(struct damon_ctx *c, > + struct damos *s, int nid, phys_addr_t *total) > +{ > + struct damon_target *t; > + struct damon_region *r; > + phys_addr_t total_eligible = 0; > + phys_addr_t node_eligible = 0; > + > + damon_for_each_target(t, c) { > + damon_for_each_region(r, t) { > + phys_addr_t addr, end_addr; > + > + if (!__damos_valid_target(r, s)) > + continue; Does this overcount eligible memory by skipping core filters? The quota adjustment logic uses damos_core_filter_out() to exclude explicitly filtered regions, but it appears this calculation only checks __damos_valid_target(). Should damos_core_filter_out() be evaluated here as well? > + > + /* Convert from core address units to physical bytes */ > + addr = (phys_addr_t)r->ar.start * c->addr_unit; > + end_addr = (phys_addr_t)r->ar.end * c->addr_unit; > + while (addr < end_addr) { > + struct folio *folio; > + phys_addr_t folio_start, folio_end; > + phys_addr_t overlap_start, overlap_end; > + phys_addr_t counted; > + > + folio = damon_get_folio(PHYS_PFN(addr)); > + if (!folio) { > + addr += PAGE_SIZE; > + continue; > + } If addr is unaligned, does adding PAGE_SIZE preserve the unaligned offset and cause under-counting on the next valid folio? If an unaligned fallback happens, the address might skip past a valid page boundary instead of aligning to it. When advancing the address without a valid folio, would it be better to align it to the next page boundary? > + > + /* > + * Calculate exact overlap between the region > + * [addr, end_addr) and the folio range. > + * The folio may start before addr if addr is > + * in the middle of a large folio. > + */ [ ... ] > + if (overlap_end > overlap_start) { > + counted = overlap_end - overlap_start; > + total_eligible += counted; > + if (folio_nid(folio) == nid) > + node_eligible += counted; > + } > + > + /* Advance past the entire folio */ > + addr = folio_end; > + folio_put(folio); > + } > + cond_resched(); > + } > + } Could this cause CPU soft lockups for very large regions? Since the while loop can iterate millions of times for large memory regions, should cond_resched() be placed inside the while loop instead of outside it? > + > + *total = total_eligible; > + return node_eligible; > +} > + > +static unsigned long damos_get_node_eligible_mem_bp(struct damon_ctx *c, > + struct damos *s, int nid) > +{ > + phys_addr_t total_eligible = 0; > + phys_addr_t node_eligible; > + > + if (c->ops.id != DAMON_OPS_PADDR) > + return 0; > + > + if (nid < 0 || nid >= MAX_NUMNODES || !node_online(nid)) > + return 0; > + > + node_eligible = damos_calc_eligible_bytes(c, s, nid, &total_eligible); > + > + if (!total_eligible) > + return 0; > + > + return mult_frac((unsigned long)node_eligible, 10000, > + (unsigned long)total_eligible); > +} Can this cast cause truncation and overflow on 32-bit architectures? Both node_eligible and total_eligible are phys_addr_t, which are 64-bit on PAE systems, but unsigned long is 32-bit. If total_eligible is an exact multiple of 4GB, it would cast to 0 and trigger a divide-by-zero panic in mult_frac(). Additionally, mult_frac() multiplies by 10000 using 32-bit math, which might overflow for values larger than roughly 429 KB. Should this use 64-bit division or shift the byte sizes down to page counts before calculating the fraction? [ ... ] -- Sashiko AI review · https://sashiko.dev/#/patchset/20260424203448.5040-1-ravis.opensrc@gmail.com?part=1 ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v7] mm/damon: add node_eligible_mem_bp goal metric 2026-04-24 20:59 ` sashiko-bot @ 2026-04-25 0:45 ` SeongJae Park 0 siblings, 0 replies; 6+ messages in thread From: SeongJae Park @ 2026-04-25 0:45 UTC (permalink / raw) To: sashiko-bot; +Cc: SeongJae Park, Ravi Jonnalagadda, damon On Fri, 24 Apr 2026 20:59:07 +0000 sashiko-bot@kernel.org wrote: > Sashiko AI review found 4 potential issue(s): > - [High] Integer Truncation and Arithmetic Overflow on 32-bit Architectures No real problem. > - [High] CPU Soft Lockups due to Unbounded Physical Address Iteration Unlikely. > - [Medium] Missing Core Filters Evaluation Causes Eligible Memory Over-counting This is intentional. > - [Medium] Unaligned Address Advancement Truncates Exact Overlap Calculation I agree to Sashiko. > -- > > commit d92efd581a7b3318becfa32765dc87547acb91c6 > Author: Ravi Jonnalagadda <ravis.opensrc@gmail.com> > Subject: mm/damon: add node_eligible_mem_bp goal metric > > This commit introduces a new metric node_eligible_mem_bp for DAMON schemes > to control memory distribution across NUMA nodes based on the ratio of > scheme-eligible memory. It calculates physical bytes per node using PADDR > ops and integrates with the DAMOS quota goal system. > > > diff --git a/mm/damon/core.c b/mm/damon/core.c > > index 3dbbbfdeff719f..4ac8052052613d 100644 > > --- a/mm/damon/core.c > > +++ b/mm/damon/core.c > > [ ... ] > > > @@ -2287,7 +2305,112 @@ static unsigned long damos_get_node_memcg_used_bp( > > numerator = i.totalram - used_pages; > > return mult_frac(numerator, 10000, i.totalram); > > } > > -#else > > + > > +#ifdef CONFIG_DAMON_PADDR > > +/* > > + * damos_calc_eligible_bytes() - Calculate raw eligible bytes per node. > > + * @c: The DAMON context. > > + * @s: The scheme. > > + * @nid: The target NUMA node id. > > + * @total: Output for total eligible bytes across all nodes. > > + * > > + * Iterates through each folio in eligible regions to accurately determine > > + * which node the memory resides on. Returns eligible bytes on the specified > > + * node and sets *total to the sum across all nodes. > > + * > > + * Note: This function requires damon_get_folio() from ops-common.c, which is > > + * only available when CONFIG_DAMON_PADDR is enabled. It also requires the > > + * context to be using PADDR operations for meaningful results. > > + */ > > +static phys_addr_t damos_calc_eligible_bytes(struct damon_ctx *c, > > + struct damos *s, int nid, phys_addr_t *total) > > +{ > > + struct damon_target *t; > > + struct damon_region *r; > > + phys_addr_t total_eligible = 0; > > + phys_addr_t node_eligible = 0; > > + > > + damon_for_each_target(t, c) { > > + damon_for_each_region(r, t) { > > + phys_addr_t addr, end_addr; > > + > > + if (!__damos_valid_target(r, s)) > > + continue; > > Does this overcount eligible memory by skipping core filters? > > The quota adjustment logic uses damos_core_filter_out() to exclude explicitly > filtered regions, but it appears this calculation only checks > __damos_valid_target(). Should damos_core_filter_out() be evaluated here as > well? This is an intentional behavior as we discussed [1] on the previous version. > > > + > > + /* Convert from core address units to physical bytes */ > > + addr = (phys_addr_t)r->ar.start * c->addr_unit; > > + end_addr = (phys_addr_t)r->ar.end * c->addr_unit; > > + while (addr < end_addr) { > > + struct folio *folio; > > + phys_addr_t folio_start, folio_end; > > + phys_addr_t overlap_start, overlap_end; > > + phys_addr_t counted; > > + > > + folio = damon_get_folio(PHYS_PFN(addr)); > > + if (!folio) { > > + addr += PAGE_SIZE; > > + continue; > > + } > > If addr is unaligned, does adding PAGE_SIZE preserve the unaligned offset and > cause under-counting on the next valid folio? > > If an unaligned fallback happens, the address might skip past a valid page > boundary instead of aligning to it. When advancing the address without a > valid folio, would it be better to align it to the next page boundary? Makes sense. I suggested to use PAGE_ALING_DOWN() on the reply to the patch. > > > + > > + /* > > + * Calculate exact overlap between the region > > + * [addr, end_addr) and the folio range. > > + * The folio may start before addr if addr is > > + * in the middle of a large folio. > > + */ > > [ ... ] > > > + if (overlap_end > overlap_start) { > > + counted = overlap_end - overlap_start; > > + total_eligible += counted; > > + if (folio_nid(folio) = nid) > > + node_eligible += counted; > > + } > > + > > + /* Advance past the entire folio */ > > + addr = folio_end; > > + folio_put(folio); > > + } > > + cond_resched(); > > + } > > + } > > Could this cause CPU soft lockups for very large regions? > > Since the while loop can iterate millions of times for large memory regions, > should cond_resched() be placed inside the while loop instead of outside it? Not likely. Users could set min_nr_regions to control the maximum size of each region. We might be able to make this more robust, but seems not a blocker of this patch in my opinion. > > > + > > + *total = total_eligible; > > + return node_eligible; > > +} > > + > > +static unsigned long damos_get_node_eligible_mem_bp(struct damon_ctx *c, > > + struct damos *s, int nid) > > +{ > > + phys_addr_t total_eligible = 0; > > + phys_addr_t node_eligible; > > + > > + if (c->ops.id != DAMON_OPS_PADDR) > > + return 0; > > + > > + if (nid < 0 || nid >= MAX_NUMNODES || !node_online(nid)) > > + return 0; > > + > > + node_eligible = damos_calc_eligible_bytes(c, s, nid, &total_eligible); > > + > > + if (!total_eligible) > > + return 0; > > + > > + return mult_frac((unsigned long)node_eligible, 10000, > > + (unsigned long)total_eligible); > > +} > > Can this cast cause truncation and overflow on 32-bit architectures? > > Both node_eligible and total_eligible are phys_addr_t, which are 64-bit on > PAE systems, but unsigned long is 32-bit. > > If total_eligible is an exact multiple of 4GB, it would cast to 0 and trigger > a divide-by-zero panic in mult_frac(). Additionally, mult_frac() multiplies > by 10000 using 32-bit math, which might overflow for values larger than > roughly 429 KB. > > Should this use 64-bit division or shift the byte sizes down to page counts > before calculating the fraction? Too much overflow concern in my opinion. The overflow makes no real critical problem. Let's keep this as is for now. > > [ ... ] > > -- > Sashiko AI review · https://sashiko.dev/#/patchset/20260424203448.5040-1-ravis.opensrc@gmail.com?part=1 > [1] https://lore.kernel.org/20260405224550.76218-1-sj@kernel.org Thanks, SJ ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v7] mm/damon: add node_eligible_mem_bp goal metric 2026-04-24 20:34 [PATCH v7] mm/damon: add node_eligible_mem_bp goal metric Ravi Jonnalagadda 2026-04-24 20:59 ` sashiko-bot @ 2026-04-25 0:39 ` SeongJae Park 2026-04-25 4:18 ` Ravi Jonnalagadda 1 sibling, 1 reply; 6+ messages in thread From: SeongJae Park @ 2026-04-25 0:39 UTC (permalink / raw) To: Ravi Jonnalagadda Cc: SeongJae Park, damon, linux-mm, linux-kernel, linux-doc, akpm, corbet, bijan311, ajayjoshi, honggyu.kim, yunjeong.mun Hello Ravi, This version looks good, except very trivial nits. On Fri, 24 Apr 2026 13:34:48 -0700 Ravi Jonnalagadda <ravis.opensrc@gmail.com> wrote: > Background and Motivation > ========================= > > In heterogeneous memory systems, controlling memory distribution across > NUMA nodes is essential for performance optimization. This patch enables > system-wide page distribution with target-state goals such as "maintain > 60% of scheme-eligible memory on DRAM" using PA-mode DAMON schemes. > > Rather than using absolute thresholds, this metric tracks the ratio of > memory that matches each scheme's access pattern filters on a target > node, enabling the quota system to automatically adjust migration > aggressiveness to maintain the desired distribution. > > What This Metric Measures > ========================= > > node_eligible_mem_bp: > scheme_eligible_bytes_on_node / total_scheme_eligible_bytes * 10000 > > The metric iterates through each scheme's eligible regions and uses > damon_get_folio() to determine NUMA node placement of each folio. > > Two-Scheme Setup for Hot Page Distribution > ========================================== > > For maintaining hot memory on DRAM (node 0) and CXL (node 1) in a 6:4 > ratio (per SeongJae Park's suggestion): > > PULL scheme: migrate_hot to node 0 > goal: node_eligible_mem_bp, nid=0, target=6000 > "Move hot pages to DRAM if less than 60% of hot data is in DRAM" You also set a damos filter of node type with node id 1 for this scheme, right? > > PUSH scheme: migrate_hot to node 1 > goal: node_eligible_mem_bp, nid=1, target=4000 > "Move hot pages to CXL if less than 40% of hot data is in CXL" And you set damos filter of node type with node id 0 here, right? I think it would good to mention that here, too. > > Each scheme independently measures its own eligible memory and adjusts > its quota to achieve its target ratio. The schemes work in concert > through DAMON's unified monitoring context, with the quota autotuner > balancing their relative aggressiveness. > > Implementation Details > ====================== > > Key implementation considerations: > - Uses phys_addr_t for byte calculations to prevent overflow on 32-bit > architectures with large memory configurations > - Includes cond_resched() per region to prevent CPU soft lockups when > processing large numbers of regions > - Validates that ops type is DAMON_OPS_PADDR in both damon_commit_ctx() > (for clear error feedback) and the metric function (defensive check) > - Properly handles folio alignment when region addresses fall within > large folios by calculating exact overlap This is changelog from the previous version [1], right? Let's put all changelog from all previous version on the commentary area [2]. > > The implementation requires CONFIG_DAMON_PADDR since damon_get_folio() > is only available for physical address space monitoring. > > Testing Results > =============== > > Functionally tested on a two-node heterogeneous memory system with DRAM > (node 0) and CXL memory (node 1). A PUSH+PULL scheme configuration using > migrate_hot actions was used to reach a target hot memory ratio between > the two tiers. > > With the TEMPORAL tuner, the system converges quickly to the target > distribution. The tuner drives esz to maximum when under goal and to > zero once the goal is met, forming a simple on/off feedback loop that > stabilizes at the desired ratio. > > With the CONSIST tuner, the scheme still converges but more slowly, as > it migrates and then throttles itself based on quota feedback. The time > to reach the goal varies depending on workload intensity. > > Note: This metric works with both TEMPORAL and CONSIST goal tuners. > > Suggested-by: SeongJae Park <sj@kernel.org> > Signed-off-by: Ravi Jonnalagadda <ravis.opensrc@gmail.com> > --- > include/linux/damon.h | 3 + > mm/damon/core.c | 171 +++++++++++++++++++++++++++++++++++---- > mm/damon/sysfs-schemes.c | 7 ++ > 3 files changed, 165 insertions(+), 16 deletions(-) > > diff --git a/include/linux/damon.h b/include/linux/damon.h > index f2cdb7c3f5e6..986b8c902585 100644 > --- a/include/linux/damon.h > +++ b/include/linux/damon.h > @@ -159,6 +159,8 @@ enum damos_action { > * @DAMOS_QUOTA_NODE_MEMCG_FREE_BP: MemFree ratio of a node for a cgroup. > * @DAMOS_QUOTA_ACTIVE_MEM_BP: Active to total LRU memory ratio. > * @DAMOS_QUOTA_INACTIVE_MEM_BP: Inactive to total LRU memory ratio. > + * @DAMOS_QUOTA_NODE_ELIGIBLE_MEM_BP: Scheme-eligible memory ratio of a > + * node in basis points (0-10000). > * @NR_DAMOS_QUOTA_GOAL_METRICS: Number of DAMOS quota goal metrics. > * > * Metrics equal to larger than @NR_DAMOS_QUOTA_GOAL_METRICS are unsupported. > @@ -172,6 +174,7 @@ enum damos_quota_goal_metric { > DAMOS_QUOTA_NODE_MEMCG_FREE_BP, > DAMOS_QUOTA_ACTIVE_MEM_BP, > DAMOS_QUOTA_INACTIVE_MEM_BP, > + DAMOS_QUOTA_NODE_ELIGIBLE_MEM_BP, > NR_DAMOS_QUOTA_GOAL_METRICS, > }; > > diff --git a/mm/damon/core.c b/mm/damon/core.c > index 3dbbbfdeff71..4ac805205261 100644 > --- a/mm/damon/core.c > +++ b/mm/damon/core.c > @@ -13,10 +13,14 @@ > #include <linux/memcontrol.h> > #include <linux/mm.h> > #include <linux/psi.h> > +#include <linux/sched.h> > #include <linux/slab.h> > #include <linux/string.h> > #include <linux/string_choices.h> > > +/* for damon_get_folio() used by node eligible memory metrics */ > +#include "ops-common.h" > + > #define CREATE_TRACE_POINTS > #include <trace/events/damon.h> > > @@ -1326,11 +1330,25 @@ static int damon_commit_targets( > int damon_commit_ctx(struct damon_ctx *dst, struct damon_ctx *src) > { > int err; > + struct damos *scheme; > + struct damos_quota_goal *goal; > > dst->maybe_corrupted = true; > if (!is_power_of_2(src->min_region_sz)) > return -EINVAL; > > + /* node_eligible_mem_bp metric requires PADDR ops */ > + if (src->ops.id != DAMON_OPS_PADDR) { > + damon_for_each_scheme(scheme, src) { > + struct damos_quota *quota = &scheme->quota; > + > + damos_for_each_quota_goal(goal, quota) { > + if (goal->metric == DAMOS_QUOTA_NODE_ELIGIBLE_MEM_BP) Let's wrap this line to keep the 80 columns limit. > + return -EINVAL; > + } > + } > + } > + > err = damon_commit_schemes(dst, src); > if (err) > return err; > @@ -2287,7 +2305,112 @@ static unsigned long damos_get_node_memcg_used_bp( > numerator = i.totalram - used_pages; > return mult_frac(numerator, 10000, i.totalram); > } > -#else > + > +#ifdef CONFIG_DAMON_PADDR > +/* > + * damos_calc_eligible_bytes() - Calculate raw eligible bytes per node. > + * @c: The DAMON context. > + * @s: The scheme. > + * @nid: The target NUMA node id. > + * @total: Output for total eligible bytes across all nodes. > + * > + * Iterates through each folio in eligible regions to accurately determine > + * which node the memory resides on. Returns eligible bytes on the specified > + * node and sets *total to the sum across all nodes. > + * > + * Note: This function requires damon_get_folio() from ops-common.c, which is > + * only available when CONFIG_DAMON_PADDR is enabled. It also requires the > + * context to be using PADDR operations for meaningful results. > + */ > +static phys_addr_t damos_calc_eligible_bytes(struct damon_ctx *c, > + struct damos *s, int nid, phys_addr_t *total) > +{ > + struct damon_target *t; > + struct damon_region *r; > + phys_addr_t total_eligible = 0; > + phys_addr_t node_eligible = 0; > + > + damon_for_each_target(t, c) { > + damon_for_each_region(r, t) { > + phys_addr_t addr, end_addr; > + > + if (!__damos_valid_target(r, s)) > + continue; > + > + /* Convert from core address units to physical bytes */ > + addr = (phys_addr_t)r->ar.start * c->addr_unit; > + end_addr = (phys_addr_t)r->ar.end * c->addr_unit; > + while (addr < end_addr) { > + struct folio *folio; > + phys_addr_t folio_start, folio_end; > + phys_addr_t overlap_start, overlap_end; > + phys_addr_t counted; > + > + folio = damon_get_folio(PHYS_PFN(addr)); > + if (!folio) { > + addr += PAGE_SIZE; Sashiko found [3] this may better to be aligned. It is trivial but I agree. How about using PAGE_ALIGN_DOWN(addr + PAGE_SIZE)? > + continue; > + } > + > + /* > + * Calculate exact overlap between the region > + * [addr, end_addr) and the folio range. > + * The folio may start before addr if addr is > + * in the middle of a large folio. > + */ > + folio_start = PFN_PHYS(folio_pfn(folio)); > + folio_end = folio_start + folio_size(folio); > + > + overlap_start = max(addr, folio_start); > + overlap_end = min(end_addr, folio_end); > + > + if (overlap_end > overlap_start) { > + counted = overlap_end - overlap_start; > + total_eligible += counted; > + if (folio_nid(folio) == nid) > + node_eligible += counted; > + } > + > + /* Advance past the entire folio */ > + addr = folio_end; > + folio_put(folio); > + } > + cond_resched(); > + } > + } > + > + *total = total_eligible; > + return node_eligible; > +} > + > +static unsigned long damos_get_node_eligible_mem_bp(struct damon_ctx *c, > + struct damos *s, int nid) > +{ > + phys_addr_t total_eligible = 0; > + phys_addr_t node_eligible; > + > + if (c->ops.id != DAMON_OPS_PADDR) > + return 0; > + > + if (nid < 0 || nid >= MAX_NUMNODES || !node_online(nid)) > + return 0; Trivial nit. I'd prefer using '!node_state(nid, N_MEMORY)' to be consistent with the similar one in damon_migrate_pages(). But this one seems fine to me. > + > + node_eligible = damos_calc_eligible_bytes(c, s, nid, &total_eligible); > + > + if (!total_eligible) > + return 0; > + > + return mult_frac((unsigned long)node_eligible, 10000, > + (unsigned long)total_eligible); > +} > +#else /* CONFIG_DAMON_PADDR */ > +static unsigned long damos_get_node_eligible_mem_bp(struct damon_ctx *c, > + struct damos *s, int nid) > +{ > + return 0; > +} > +#endif /* CONFIG_DAMON_PADDR */ > +#else /* CONFIG_NUMA */ > static __kernel_ulong_t damos_get_node_mem_bp( > struct damos_quota_goal *goal) > { > @@ -2299,7 +2422,13 @@ static unsigned long damos_get_node_memcg_used_bp( > { > return 0; > } > -#endif > + > +static unsigned long damos_get_node_eligible_mem_bp(struct damon_ctx *c, > + struct damos *s, int nid) > +{ > + return 0; > +} > +#endif /* CONFIG_NUMA */ > > /* > * Returns LRU-active or inactive memory to total LRU memory size ratio. > @@ -2319,7 +2448,8 @@ static unsigned int damos_get_in_active_mem_bp(bool active_ratio) > return mult_frac(inactive, 10000, total); > } > > -static void damos_set_quota_goal_current_value(struct damos_quota_goal *goal) > +static void damos_set_quota_goal_current_value(struct damon_ctx *c, > + struct damos *s, struct damos_quota_goal *goal) > { > u64 now_psi_total; > > @@ -2345,19 +2475,24 @@ static void damos_set_quota_goal_current_value(struct damos_quota_goal *goal) > goal->current_value = damos_get_in_active_mem_bp( > goal->metric == DAMOS_QUOTA_ACTIVE_MEM_BP); > break; > + case DAMOS_QUOTA_NODE_ELIGIBLE_MEM_BP: > + goal->current_value = damos_get_node_eligible_mem_bp(c, s, > + goal->nid); > + break; > default: > break; > } > } > > /* Return the highest score since it makes schemes least aggressive */ > -static unsigned long damos_quota_score(struct damos_quota *quota) > +static unsigned long damos_quota_score(struct damon_ctx *c, struct damos *s) > { > struct damos_quota_goal *goal; > + struct damos_quota *quota = &s->quota; > unsigned long highest_score = 0; > > damos_for_each_quota_goal(goal, quota) { > - damos_set_quota_goal_current_value(goal); > + damos_set_quota_goal_current_value(c, s, goal); > highest_score = max(highest_score, > mult_frac(goal->current_value, 10000, > goal->target_value)); > @@ -2366,17 +2501,19 @@ static unsigned long damos_quota_score(struct damos_quota *quota) > return highest_score; > } > > -static void damos_goal_tune_esz_bp_consist(struct damos_quota *quota) > +static void damos_goal_tune_esz_bp_consist(struct damon_ctx *c, struct damos *s) 80 columns limit? > { > - unsigned long score = damos_quota_score(quota); > + struct damos_quota *quota = &s->quota; > + unsigned long score = damos_quota_score(c, s); > > quota->esz_bp = damon_feed_loop_next_input( > max(quota->esz_bp, 10000UL), score); > } > > -static void damos_goal_tune_esz_bp_temporal(struct damos_quota *quota) > +static void damos_goal_tune_esz_bp_temporal(struct damon_ctx *c, struct damos *s) > { > - unsigned long score = damos_quota_score(quota); > + struct damos_quota *quota = &s->quota; > + unsigned long score = damos_quota_score(c, s); > > if (score >= 10000) > quota->esz_bp = 0; > @@ -2389,9 +2526,9 @@ static void damos_goal_tune_esz_bp_temporal(struct damos_quota *quota) > /* > * Called only if quota->ms, or quota->sz are set, or quota->goals is not empty > */ > -static void damos_set_effective_quota(struct damos_quota *quota, > - struct damon_ctx *ctx) > +static void damos_set_effective_quota(struct damon_ctx *c, struct damos *s) > { > + struct damos_quota *quota = &s->quota; > unsigned long throughput; > unsigned long esz = ULONG_MAX; > > @@ -2402,9 +2539,9 @@ static void damos_set_effective_quota(struct damos_quota *quota, > > if (!list_empty("a->goals)) { > if (quota->goal_tuner == DAMOS_QUOTA_GOAL_TUNER_CONSIST) > - damos_goal_tune_esz_bp_consist(quota); > + damos_goal_tune_esz_bp_consist(c, s); > else if (quota->goal_tuner == DAMOS_QUOTA_GOAL_TUNER_TEMPORAL) > - damos_goal_tune_esz_bp_temporal(quota); > + damos_goal_tune_esz_bp_temporal(c, s); > esz = quota->esz_bp / 10000; > } > > @@ -2415,7 +2552,7 @@ static void damos_set_effective_quota(struct damos_quota *quota, > else > throughput = PAGE_SIZE * 1024; > esz = min(throughput * quota->ms, esz); > - esz = max(ctx->min_region_sz, esz); > + esz = max(c->min_region_sz, esz); > } > > if (quota->sz && quota->sz < esz) > @@ -2452,7 +2589,9 @@ static void damos_adjust_quota(struct damon_ctx *c, struct damos *s) > /* First charge window */ > if (!quota->total_charged_sz && !quota->charged_from) { > quota->charged_from = jiffies; > - damos_set_effective_quota(quota, c); > + damos_set_effective_quota(c, s); > + if (trace_damos_esz_enabled()) > + damos_trace_esz(c, s, quota); damos_trace_esz() seems an unintended change? > } > > /* New charge window starts */ > @@ -2467,7 +2606,7 @@ static void damos_adjust_quota(struct damon_ctx *c, struct damos *s) > quota->charged_sz = 0; > if (trace_damos_esz_enabled()) > cached_esz = quota->esz; > - damos_set_effective_quota(quota, c); > + damos_set_effective_quota(c, s); > if (trace_damos_esz_enabled() && quota->esz != cached_esz) > damos_trace_esz(c, s, quota); > } > diff --git a/mm/damon/sysfs-schemes.c b/mm/damon/sysfs-schemes.c > index 245d63808411..f21191d31f58 100644 > --- a/mm/damon/sysfs-schemes.c > +++ b/mm/damon/sysfs-schemes.c > @@ -1094,6 +1094,10 @@ struct damos_sysfs_qgoal_metric_name damos_sysfs_qgoal_metric_names[] = { > .metric = DAMOS_QUOTA_INACTIVE_MEM_BP, > .name = "inactive_mem_bp", > }, > + { > + .metric = DAMOS_QUOTA_NODE_ELIGIBLE_MEM_BP, > + .name = "node_eligible_mem_bp", > + }, > }; > > static ssize_t target_metric_show(struct kobject *kobj, > @@ -2685,6 +2689,9 @@ static int damos_sysfs_add_quota_score( > } > goal->nid = sysfs_goal->nid; > break; > + case DAMOS_QUOTA_NODE_ELIGIBLE_MEM_BP: > + goal->nid = sysfs_goal->nid; > + break; > default: > break; > } > > base-commit: 277a7fae172bb4a598e0d4144ffb2ff6207fedb3 > -- > 2.43.0 I also reviewed Sashiko review [3] and I find no real problem except the addr alignment. So this looks good to me, except the trivial things I commented above. Could you please revision for the last time? [1] https://lore.kernel.org/20260405184247.2690-1-ravis.opensrc@gmail.com [2] https://docs.kernel.org/process/submitting-patches.html#commentary [3] https://lore.kernel.org/20260424205908.CF949C2BCB8@smtp.kernel.org Thanks, SJ ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v7] mm/damon: add node_eligible_mem_bp goal metric 2026-04-25 0:39 ` SeongJae Park @ 2026-04-25 4:18 ` Ravi Jonnalagadda 2026-04-25 15:37 ` SeongJae Park 0 siblings, 1 reply; 6+ messages in thread From: Ravi Jonnalagadda @ 2026-04-25 4:18 UTC (permalink / raw) To: SeongJae Park Cc: damon, linux-mm, linux-kernel, linux-doc, akpm, corbet, bijan311, ajayjoshi, honggyu.kim, yunjeong.mun On Fri, Apr 24, 2026 at 5:39 PM SeongJae Park <sj@kernel.org> wrote: > > Hello Ravi, > > > This version looks good, except very trivial nits. Hi SJ, Thanks for the super quick review! Very helpful. > > On Fri, 24 Apr 2026 13:34:48 -0700 Ravi Jonnalagadda <ravis.opensrc@gmail.com> wrote: > > > Background and Motivation > > ========================= > > > > In heterogeneous memory systems, controlling memory distribution across > > NUMA nodes is essential for performance optimization. This patch enables > > system-wide page distribution with target-state goals such as "maintain > > 60% of scheme-eligible memory on DRAM" using PA-mode DAMON schemes. > > > > Rather than using absolute thresholds, this metric tracks the ratio of > > memory that matches each scheme's access pattern filters on a target > > node, enabling the quota system to automatically adjust migration > > aggressiveness to maintain the desired distribution. > > > > What This Metric Measures > > ========================= > > > > node_eligible_mem_bp: > > scheme_eligible_bytes_on_node / total_scheme_eligible_bytes * 10000 > > > > The metric iterates through each scheme's eligible regions and uses > > damon_get_folio() to determine NUMA node placement of each folio. > > > > Two-Scheme Setup for Hot Page Distribution > > ========================================== > > > > For maintaining hot memory on DRAM (node 0) and CXL (node 1) in a 6:4 > > ratio (per SeongJae Park's suggestion): > > > > PULL scheme: migrate_hot to node 0 > > goal: node_eligible_mem_bp, nid=0, target=6000 > > "Move hot pages to DRAM if less than 60% of hot data is in DRAM" > > You also set a damos filter of node type with node id 1 for this scheme, right? > > > > > PUSH scheme: migrate_hot to node 1 > > goal: node_eligible_mem_bp, nid=1, target=4000 > > "Move hot pages to CXL if less than 40% of hot data is in CXL" > > And you set damos filter of node type with node id 0 here, right? > > I think it would good to mention that here, too. > Good point. I'll add a note explaining that each scheme uses DAMOS addr filters to restrict the source node - PULL uses an addr filter for node 1's address range, and PUSH uses an addr filter for node 0's address range. > > > > Each scheme independently measures its own eligible memory and adjusts > > its quota to achieve its target ratio. The schemes work in concert > > through DAMON's unified monitoring context, with the quota autotuner > > balancing their relative aggressiveness. > > > > Implementation Details > > ====================== > > > > Key implementation considerations: > > - Uses phys_addr_t for byte calculations to prevent overflow on 32-bit > > architectures with large memory configurations > > - Includes cond_resched() per region to prevent CPU soft lockups when > > processing large numbers of regions > > - Validates that ops type is DAMON_OPS_PADDR in both damon_commit_ctx() > > (for clear error feedback) and the metric function (defensive check) > > - Properly handles folio alignment when region addresses fall within > > large folios by calculating exact overlap > > This is changelog from the previous version [1], right? Let's put all > changelog from all previous version on the commentary area [2]. > Understood. Will move this to the commentary area and include the full changelog history from RFC v1. > > > > The implementation requires CONFIG_DAMON_PADDR since damon_get_folio() > > is only available for physical address space monitoring. > > > > Testing Results > > =============== > > > > Functionally tested on a two-node heterogeneous memory system with DRAM > > (node 0) and CXL memory (node 1). A PUSH+PULL scheme configuration using > > migrate_hot actions was used to reach a target hot memory ratio between > > the two tiers. > > > > With the TEMPORAL tuner, the system converges quickly to the target > > distribution. The tuner drives esz to maximum when under goal and to > > zero once the goal is met, forming a simple on/off feedback loop that > > stabilizes at the desired ratio. > > > > With the CONSIST tuner, the scheme still converges but more slowly, as > > it migrates and then throttles itself based on quota feedback. The time > > to reach the goal varies depending on workload intensity. > > > > Note: This metric works with both TEMPORAL and CONSIST goal tuners. > > > > Suggested-by: SeongJae Park <sj@kernel.org> > > Signed-off-by: Ravi Jonnalagadda <ravis.opensrc@gmail.com> > > --- > > include/linux/damon.h | 3 + > > mm/damon/core.c | 171 +++++++++++++++++++++++++++++++++++---- > > mm/damon/sysfs-schemes.c | 7 ++ > > 3 files changed, 165 insertions(+), 16 deletions(-) > > > > diff --git a/include/linux/damon.h b/include/linux/damon.h > > index f2cdb7c3f5e6..986b8c902585 100644 > > --- a/include/linux/damon.h > > +++ b/include/linux/damon.h > > @@ -159,6 +159,8 @@ enum damos_action { > > * @DAMOS_QUOTA_NODE_MEMCG_FREE_BP: MemFree ratio of a node for a cgroup. > > * @DAMOS_QUOTA_ACTIVE_MEM_BP: Active to total LRU memory ratio. > > * @DAMOS_QUOTA_INACTIVE_MEM_BP: Inactive to total LRU memory ratio. > > + * @DAMOS_QUOTA_NODE_ELIGIBLE_MEM_BP: Scheme-eligible memory ratio of a > > + * node in basis points (0-10000). > > * @NR_DAMOS_QUOTA_GOAL_METRICS: Number of DAMOS quota goal metrics. > > * > > * Metrics equal to larger than @NR_DAMOS_QUOTA_GOAL_METRICS are unsupported. > > @@ -172,6 +174,7 @@ enum damos_quota_goal_metric { > > DAMOS_QUOTA_NODE_MEMCG_FREE_BP, > > DAMOS_QUOTA_ACTIVE_MEM_BP, > > DAMOS_QUOTA_INACTIVE_MEM_BP, > > + DAMOS_QUOTA_NODE_ELIGIBLE_MEM_BP, > > NR_DAMOS_QUOTA_GOAL_METRICS, > > }; > > > > diff --git a/mm/damon/core.c b/mm/damon/core.c > > index 3dbbbfdeff71..4ac805205261 100644 > > --- a/mm/damon/core.c > > +++ b/mm/damon/core.c > > @@ -13,10 +13,14 @@ > > #include <linux/memcontrol.h> > > #include <linux/mm.h> > > #include <linux/psi.h> > > +#include <linux/sched.h> > > #include <linux/slab.h> > > #include <linux/string.h> > > #include <linux/string_choices.h> > > > > +/* for damon_get_folio() used by node eligible memory metrics */ > > +#include "ops-common.h" > > + > > #define CREATE_TRACE_POINTS > > #include <trace/events/damon.h> > > > > @@ -1326,11 +1330,25 @@ static int damon_commit_targets( > > int damon_commit_ctx(struct damon_ctx *dst, struct damon_ctx *src) > > { > > int err; > > + struct damos *scheme; > > + struct damos_quota_goal *goal; > > > > dst->maybe_corrupted = true; > > if (!is_power_of_2(src->min_region_sz)) > > return -EINVAL; > > > > + /* node_eligible_mem_bp metric requires PADDR ops */ > > + if (src->ops.id != DAMON_OPS_PADDR) { > > + damon_for_each_scheme(scheme, src) { > > + struct damos_quota *quota = &scheme->quota; > > + > > + damos_for_each_quota_goal(goal, quota) { > > + if (goal->metric == DAMOS_QUOTA_NODE_ELIGIBLE_MEM_BP) > > Let's wrap this line to keep the 80 columns limit. Will wrap this in v8. > > > + return -EINVAL; > > + } > > + } > > + } > > + > > err = damon_commit_schemes(dst, src); > > if (err) > > return err; > > @@ -2287,7 +2305,112 @@ static unsigned long damos_get_node_memcg_used_bp( > > numerator = i.totalram - used_pages; > > return mult_frac(numerator, 10000, i.totalram); > > } > > -#else > > + > > +#ifdef CONFIG_DAMON_PADDR > > +/* > > + * damos_calc_eligible_bytes() - Calculate raw eligible bytes per node. > > + * @c: The DAMON context. > > + * @s: The scheme. > > + * @nid: The target NUMA node id. > > + * @total: Output for total eligible bytes across all nodes. > > + * > > + * Iterates through each folio in eligible regions to accurately determine > > + * which node the memory resides on. Returns eligible bytes on the specified > > + * node and sets *total to the sum across all nodes. > > + * > > + * Note: This function requires damon_get_folio() from ops-common.c, which is > > + * only available when CONFIG_DAMON_PADDR is enabled. It also requires the > > + * context to be using PADDR operations for meaningful results. > > + */ > > +static phys_addr_t damos_calc_eligible_bytes(struct damon_ctx *c, > > + struct damos *s, int nid, phys_addr_t *total) > > +{ > > + struct damon_target *t; > > + struct damon_region *r; > > + phys_addr_t total_eligible = 0; > > + phys_addr_t node_eligible = 0; > > + > > + damon_for_each_target(t, c) { > > + damon_for_each_region(r, t) { > > + phys_addr_t addr, end_addr; > > + > > + if (!__damos_valid_target(r, s)) > > + continue; > > + > > + /* Convert from core address units to physical bytes */ > > + addr = (phys_addr_t)r->ar.start * c->addr_unit; > > + end_addr = (phys_addr_t)r->ar.end * c->addr_unit; > > + while (addr < end_addr) { > > + struct folio *folio; > > + phys_addr_t folio_start, folio_end; > > + phys_addr_t overlap_start, overlap_end; > > + phys_addr_t counted; > > + > > + folio = damon_get_folio(PHYS_PFN(addr)); > > + if (!folio) { > > + addr += PAGE_SIZE; > > Sashiko found [3] this may better to be aligned. It is trivial but I agree. > How about using PAGE_ALIGN_DOWN(addr + PAGE_SIZE)? Agreed, will change to PAGE_ALIGN_DOWN(addr + PAGE_SIZE) in v8. > > > + continue; > > + } > > + > > + /* > > + * Calculate exact overlap between the region > > + * [addr, end_addr) and the folio range. > > + * The folio may start before addr if addr is > > + * in the middle of a large folio. > > + */ > > + folio_start = PFN_PHYS(folio_pfn(folio)); > > + folio_end = folio_start + folio_size(folio); > > + > > + overlap_start = max(addr, folio_start); > > + overlap_end = min(end_addr, folio_end); > > + > > + if (overlap_end > overlap_start) { > > + counted = overlap_end - overlap_start; > > + total_eligible += counted; > > + if (folio_nid(folio) == nid) > > + node_eligible += counted; > > + } > > + > > + /* Advance past the entire folio */ > > + addr = folio_end; > > + folio_put(folio); > > + } > > + cond_resched(); > > + } > > + } > > + > > + *total = total_eligible; > > + return node_eligible; > > +} > > + > > +static unsigned long damos_get_node_eligible_mem_bp(struct damon_ctx *c, > > + struct damos *s, int nid) > > +{ > > + phys_addr_t total_eligible = 0; > > + phys_addr_t node_eligible; > > + > > + if (c->ops.id != DAMON_OPS_PADDR) > > + return 0; > > + > > + if (nid < 0 || nid >= MAX_NUMNODES || !node_online(nid)) > > + return 0; > > Trivial nit. I'd prefer using '!node_state(nid, N_MEMORY)' to be consistent > with the similar one in damon_migrate_pages(). But this one seems fine to me. Thanks for the suggestion. I'll keep node_online() for now as you indicated it's fine, but happy to change it later, if you prefer consistency. > > > + > > + node_eligible = damos_calc_eligible_bytes(c, s, nid, &total_eligible); > > + > > + if (!total_eligible) > > + return 0; > > + > > + return mult_frac((unsigned long)node_eligible, 10000, > > + (unsigned long)total_eligible); > > +} > > +#else /* CONFIG_DAMON_PADDR */ > > +static unsigned long damos_get_node_eligible_mem_bp(struct damon_ctx *c, > > + struct damos *s, int nid) > > +{ > > + return 0; > > +} > > +#endif /* CONFIG_DAMON_PADDR */ > > +#else /* CONFIG_NUMA */ > > static __kernel_ulong_t damos_get_node_mem_bp( > > struct damos_quota_goal *goal) > > { > > @@ -2299,7 +2422,13 @@ static unsigned long damos_get_node_memcg_used_bp( > > { > > return 0; > > } > > -#endif > > + > > +static unsigned long damos_get_node_eligible_mem_bp(struct damon_ctx *c, > > + struct damos *s, int nid) > > +{ > > + return 0; > > +} > > +#endif /* CONFIG_NUMA */ > > > > /* > > * Returns LRU-active or inactive memory to total LRU memory size ratio. > > @@ -2319,7 +2448,8 @@ static unsigned int damos_get_in_active_mem_bp(bool active_ratio) > > return mult_frac(inactive, 10000, total); > > } > > > > -static void damos_set_quota_goal_current_value(struct damos_quota_goal *goal) > > +static void damos_set_quota_goal_current_value(struct damon_ctx *c, > > + struct damos *s, struct damos_quota_goal *goal) > > { > > u64 now_psi_total; > > > > @@ -2345,19 +2475,24 @@ static void damos_set_quota_goal_current_value(struct damos_quota_goal *goal) > > goal->current_value = damos_get_in_active_mem_bp( > > goal->metric == DAMOS_QUOTA_ACTIVE_MEM_BP); > > break; > > + case DAMOS_QUOTA_NODE_ELIGIBLE_MEM_BP: > > + goal->current_value = damos_get_node_eligible_mem_bp(c, s, > > + goal->nid); > > + break; > > default: > > break; > > } > > } > > > > /* Return the highest score since it makes schemes least aggressive */ > > -static unsigned long damos_quota_score(struct damos_quota *quota) > > +static unsigned long damos_quota_score(struct damon_ctx *c, struct damos *s) > > { > > struct damos_quota_goal *goal; > > + struct damos_quota *quota = &s->quota; > > unsigned long highest_score = 0; > > > > damos_for_each_quota_goal(goal, quota) { > > - damos_set_quota_goal_current_value(goal); > > + damos_set_quota_goal_current_value(c, s, goal); > > highest_score = max(highest_score, > > mult_frac(goal->current_value, 10000, > > goal->target_value)); > > @@ -2366,17 +2501,19 @@ static unsigned long damos_quota_score(struct damos_quota *quota) > > return highest_score; > > } > > > > -static void damos_goal_tune_esz_bp_consist(struct damos_quota *quota) > > +static void damos_goal_tune_esz_bp_consist(struct damon_ctx *c, struct damos *s) > > 80 columns limit? I checked - `damos_goal_tune_esz_bp_consist` is exactly 80 characters, so it's fine as-is. However, `damos_goal_tune_esz_bp_temporal` below is 81 characters, so I'll wrap that one in v8. > > > { > > - unsigned long score = damos_quota_score(quota); > > + struct damos_quota *quota = &s->quota; > > + unsigned long score = damos_quota_score(c, s); > > > > quota->esz_bp = damon_feed_loop_next_input( > > max(quota->esz_bp, 10000UL), score); > > } > > > > -static void damos_goal_tune_esz_bp_temporal(struct damos_quota *quota) > > +static void damos_goal_tune_esz_bp_temporal(struct damon_ctx *c, struct damos *s) > > { > > - unsigned long score = damos_quota_score(quota); > > + struct damos_quota *quota = &s->quota; > > + unsigned long score = damos_quota_score(c, s); > > > > if (score >= 10000) > > quota->esz_bp = 0; > > @@ -2389,9 +2526,9 @@ static void damos_goal_tune_esz_bp_temporal(struct damos_quota *quota) > > /* > > * Called only if quota->ms, or quota->sz are set, or quota->goals is not empty > > */ > > -static void damos_set_effective_quota(struct damos_quota *quota, > > - struct damon_ctx *ctx) > > +static void damos_set_effective_quota(struct damon_ctx *c, struct damos *s) > > { > > + struct damos_quota *quota = &s->quota; > > unsigned long throughput; > > unsigned long esz = ULONG_MAX; > > > > @@ -2402,9 +2539,9 @@ static void damos_set_effective_quota(struct damos_quota *quota, > > > > if (!list_empty("a->goals)) { > > if (quota->goal_tuner == DAMOS_QUOTA_GOAL_TUNER_CONSIST) > > - damos_goal_tune_esz_bp_consist(quota); > > + damos_goal_tune_esz_bp_consist(c, s); > > else if (quota->goal_tuner == DAMOS_QUOTA_GOAL_TUNER_TEMPORAL) > > - damos_goal_tune_esz_bp_temporal(quota); > > + damos_goal_tune_esz_bp_temporal(c, s); > > esz = quota->esz_bp / 10000; > > } > > > > @@ -2415,7 +2552,7 @@ static void damos_set_effective_quota(struct damos_quota *quota, > > else > > throughput = PAGE_SIZE * 1024; > > esz = min(throughput * quota->ms, esz); > > - esz = max(ctx->min_region_sz, esz); > > + esz = max(c->min_region_sz, esz); > > } > > > > if (quota->sz && quota->sz < esz) > > @@ -2452,7 +2589,9 @@ static void damos_adjust_quota(struct damon_ctx *c, struct damos *s) > > /* First charge window */ > > if (!quota->total_charged_sz && !quota->charged_from) { > > quota->charged_from = jiffies; > > - damos_set_effective_quota(quota, c); > > + damos_set_effective_quota(c, s); > > + if (trace_damos_esz_enabled()) > > + damos_trace_esz(c, s, quota); > > damos_trace_esz() seems an unintended change? Yes, this was unintended. Will remove it in v8. > > > } > > > > /* New charge window starts */ > > @@ -2467,7 +2606,7 @@ static void damos_adjust_quota(struct damon_ctx *c, struct damos *s) > > quota->charged_sz = 0; > > if (trace_damos_esz_enabled()) > > cached_esz = quota->esz; > > - damos_set_effective_quota(quota, c); > > + damos_set_effective_quota(c, s); > > if (trace_damos_esz_enabled() && quota->esz != cached_esz) > > damos_trace_esz(c, s, quota); > > } > > diff --git a/mm/damon/sysfs-schemes.c b/mm/damon/sysfs-schemes.c > > index 245d63808411..f21191d31f58 100644 > > --- a/mm/damon/sysfs-schemes.c > > +++ b/mm/damon/sysfs-schemes.c > > @@ -1094,6 +1094,10 @@ struct damos_sysfs_qgoal_metric_name damos_sysfs_qgoal_metric_names[] = { > > .metric = DAMOS_QUOTA_INACTIVE_MEM_BP, > > .name = "inactive_mem_bp", > > }, > > + { > > + .metric = DAMOS_QUOTA_NODE_ELIGIBLE_MEM_BP, > > + .name = "node_eligible_mem_bp", > > + }, > > }; > > > > static ssize_t target_metric_show(struct kobject *kobj, > > @@ -2685,6 +2689,9 @@ static int damos_sysfs_add_quota_score( > > } > > goal->nid = sysfs_goal->nid; > > break; > > + case DAMOS_QUOTA_NODE_ELIGIBLE_MEM_BP: > > + goal->nid = sysfs_goal->nid; > > + break; > > default: > > break; > > } > > > > base-commit: 277a7fae172bb4a598e0d4144ffb2ff6207fedb3 > > -- > > 2.43.0 > > I also reviewed Sashiko review [3] and I find no real problem except the addr > alignment. > > So this looks good to me, except the trivial things I commented above. Could > you please revision for the last time? Will do. Summary of changes for v8: 1. Mention addr filters for source-node filtering in Two-Scheme Setup 2. Move implementation details to commentary area with full changelog 3. Wrap the 80-column violation in damon_commit_ctx() 4. Use PAGE_ALIGN_DOWN(addr + PAGE_SIZE) for alignment 5. Wrap damos_goal_tune_esz_bp_temporal() (81 chars) 6. Remove unintended damos_trace_esz() from first charge window Thanks, Ravi > > [1] https://lore.kernel.org/20260405184247.2690-1-ravis.opensrc@gmail.com > [2] https://docs.kernel.org/process/submitting-patches.html#commentary > [3] https://lore.kernel.org/20260424205908.CF949C2BCB8@smtp.kernel.org > > > Thanks, > SJ ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v7] mm/damon: add node_eligible_mem_bp goal metric 2026-04-25 4:18 ` Ravi Jonnalagadda @ 2026-04-25 15:37 ` SeongJae Park 0 siblings, 0 replies; 6+ messages in thread From: SeongJae Park @ 2026-04-25 15:37 UTC (permalink / raw) To: Ravi Jonnalagadda Cc: SeongJae Park, damon, linux-mm, linux-kernel, linux-doc, akpm, corbet, bijan311, ajayjoshi, honggyu.kim, yunjeong.mun On Fri, 24 Apr 2026 21:18:57 -0700 Ravi Jonnalagadda <ravis.opensrc@gmail.com> wrote: > On Fri, Apr 24, 2026 at 5:39 PM SeongJae Park <sj@kernel.org> wrote: > > > > Hello Ravi, > > > > > > This version looks good, except very trivial nits. > > Hi SJ, > > Thanks for the super quick review! Very helpful. My pleasure! [...] > > So this looks good to me, except the trivial things I commented above. Could > > you please revision for the last time? > > Will do. Summary of changes for v8: > 1. Mention addr filters for source-node filtering in Two-Scheme > Setup > 2. Move implementation details to commentary area with full > changelog > 3. Wrap the 80-column violation in damon_commit_ctx() > 4. Use PAGE_ALIGN_DOWN(addr + PAGE_SIZE) for alignment > 5. Wrap damos_goal_tune_esz_bp_temporal() (81 chars) > 6. Remove unintended damos_trace_esz() from first charge window Sounds good, looking forward to the v8! Thanks, SJ [...] ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-04-25 15:37 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-04-24 20:34 [PATCH v7] mm/damon: add node_eligible_mem_bp goal metric Ravi Jonnalagadda 2026-04-24 20:59 ` sashiko-bot 2026-04-25 0:45 ` SeongJae Park 2026-04-25 0:39 ` SeongJae Park 2026-04-25 4:18 ` Ravi Jonnalagadda 2026-04-25 15:37 ` SeongJae Park
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox