* [PATCH v8] mm/damon: add node_eligible_mem_bp goal metric
@ 2026-04-26 0:32 Ravi Jonnalagadda
2026-04-26 0:53 ` sashiko-bot
2026-04-26 17:19 ` SeongJae Park
0 siblings, 2 replies; 5+ messages in thread
From: Ravi Jonnalagadda @ 2026-04-26 0:32 UTC (permalink / raw)
To: sj, damon, linux-mm, linux-kernel, linux-doc
Cc: akpm, corbet, bijan311, ajayjoshi, honggyu.kim, yunjeong.mun,
ravis.opensrc
Background and Motivation
=========================
In heterogeneous memory systems, controlling memory distribution across
NUMA nodes is essential for performance optimization. This patch enables
system-wide page distribution with target-state goals such as "maintain
60% of scheme-eligible memory on DRAM" using PA-mode DAMON schemes.
Rather than using absolute thresholds, this metric tracks the ratio of
memory that matches each scheme's access pattern filters on a target
node, enabling the quota system to automatically adjust migration
aggressiveness to maintain the desired distribution.
What This Metric Measures
=========================
node_eligible_mem_bp:
scheme_eligible_bytes_on_node / total_scheme_eligible_bytes * 10000
Two-Scheme Setup for Hot Page Distribution
==========================================
For maintaining 60% of hot memory on DRAM (node 0) and 40% on CXL
(node 1):
PULL scheme: migrate_hot to node 0
goal: node_eligible_mem_bp, nid=0, target=6000
addr filter: node 1 address range (only migrate FROM CXL)
"Move hot pages to DRAM if less than 60% of hot data is in DRAM"
PUSH scheme: migrate_hot to node 1
goal: node_eligible_mem_bp, nid=1, target=4000
addr filter: node 0 address range (only migrate FROM DRAM)
"Move hot pages to CXL if less than 40% of hot data is in CXL"
Each scheme independently measures its own eligible memory and adjusts
its quota to achieve its target ratio. The schemes work in concert
through DAMON's unified monitoring context, with the quota autotuner
balancing their relative aggressiveness.
Implementation Details
======================
The implementation adds a new quota goal metric type
DAMOS_QUOTA_NODE_ELIGIBLE_MEM_BP to the existing DAMOS quota goal
framework. When this metric is configured for a scheme:
1. During each quota adjustment cycle, damos_get_node_eligible_mem_bp()
is called to calculate the current memory distribution.
2. The function iterates through all regions that match the scheme's
access pattern (via __damos_valid_target()) and calculates:
- Total eligible bytes across all nodes
- Eligible bytes specifically on the target node (goal->nid)
3. For each eligible region, damos_calc_eligible_bytes() walks through
the physical address range, using damon_get_folio() to look up
each folio and determine its NUMA node via folio_nid().
4. Large folios are handled by calculating the exact overlap between
the region boundaries and folio boundaries, ensuring accurate
byte counts even when regions partially span folios.
5. The ratio (node_eligible / total_eligible * 10000) is returned
as basis points, which the quota autotuner uses to adjust the
scheme's effective quota size (esz).
The implementation requires CONFIG_DAMON_PADDR since damon_get_folio()
is only available for physical address space monitoring.
Testing Results
===============
Functionally tested on a two-node heterogeneous memory system with DRAM
(node 0) and CXL memory (node 1). A PUSH+PULL scheme configuration using
migrate_hot actions was used to reach a target hot memory ratio between
the two tiers.
With the TEMPORAL tuner, the system converges quickly to the target
distribution. The tuner drives esz to maximum when under goal and to
zero once the goal is met, forming a simple on/off feedback loop that
stabilizes at the desired ratio.
With the CONSIST tuner, the scheme still converges but more slowly, as
it migrates and then throttles itself based on quota feedback. The time
to reach the goal varies depending on workload intensity.
Note: This metric works with both TEMPORAL and CONSIST goal tuners.
Suggested-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Ravi Jonnalagadda <ravis.opensrc@gmail.com>
---
Changes since v7:
=================
https://lore.kernel.org/linux-mm/20260424203448.5040-1-ravis.opensrc@gmail.com/
- Wrapped if condition for goal->metric check to fix 80-column violation
- Used PAGE_ALIGN_DOWN(addr + PAGE_SIZE) instead of addr += PAGE_SIZE
for proper page-aligned advancement
- Wrapped damos_goal_tune_esz_bp_temporal() function signature for
80-column compliance
- Removed unintended damos_trace_esz() call from first charge window
- Added addr filter usage note in two-scheme setup documentation
Changes since v6:
=================
https://lore.kernel.org/linux-mm/20260405184247.2690-1-ravis.opensrc@gmail.com/
- Dropped node_ineligible_mem_bp metric per maintainer feedback
- Updated two-scheme setup to use 6:4 ratio with both schemes using
node_eligible_mem_bp (per SeongJae Park's suggestion)
Changes since v5:
=================
https://lore.kernel.org/linux-mm/20260404012215.1539-1-ravis.opensrc@gmail.com/
- Rebased onto mm-new instead of damon/next for sashiko review
- Removed Reported-by/Closes tags per maintainer feedback (not needed
for bugs found before merge)
Changes since v4:
=================
https://lore.kernel.org/linux-mm/20260320190453.1430-1-ravis.opensrc@gmail.com/
- Fixed commit message description for DAMOS_QUOTA_NODE_INELIGIBLE_MEM_BP
per review feedback
- Added clarifying comment for ops-common.h include (for damon_get_folio())
- Fixed build error when CONFIG_DAMON_PADDR is disabled by adding
#ifdef CONFIG_DAMON_PADDR guards around functions using damon_get_folio()
- Dropped RFC tag per maintainer feedback
Changes since RFC v3:
=====================
https://lore.kernel.org/linux-mm/20260314223432.2292-1-ravis.opensrc@gmail.com/
- Fixed phys_addr_t overflow on 32-bit PAE systems by using phys_addr_t
for intermediate byte calculations
- Added cond_resched() per region to prevent CPU soft lockups
- Improved folio overlap calculation for accurate byte counting
Changes since RFC v2:
=====================
https://lore.kernel.org/linux-mm/20260228191045.1892-1-ravis.opensrc@gmail.com/
- Added node_ineligible_mem_bp complementary metric
- Fixed DAMON_OPS_PADDR validation in damon_commit_ctx()
- Improved commit message with two-scheme setup documentation
Changes since RFC v1:
=====================
https://lore.kernel.org/linux-mm/20260215184523.1432-1-ravis.opensrc@gmail.com/
- Initial implementation based on SeongJae Park's suggestion
- Added basic node_eligible_mem_bp metric
---
include/linux/damon.h | 3 +
mm/damon/core.c | 171 +++++++++++++++++++++++++++++++++++----
mm/damon/sysfs-schemes.c | 7 ++
3 files changed, 165 insertions(+), 16 deletions(-)
diff --git a/include/linux/damon.h b/include/linux/damon.h
index f2cdb7c3f5e6..986b8c902585 100644
--- a/include/linux/damon.h
+++ b/include/linux/damon.h
@@ -159,6 +159,8 @@ enum damos_action {
* @DAMOS_QUOTA_NODE_MEMCG_FREE_BP: MemFree ratio of a node for a cgroup.
* @DAMOS_QUOTA_ACTIVE_MEM_BP: Active to total LRU memory ratio.
* @DAMOS_QUOTA_INACTIVE_MEM_BP: Inactive to total LRU memory ratio.
+ * @DAMOS_QUOTA_NODE_ELIGIBLE_MEM_BP: Scheme-eligible memory ratio of a
+ * node in basis points (0-10000).
* @NR_DAMOS_QUOTA_GOAL_METRICS: Number of DAMOS quota goal metrics.
*
* Metrics equal to larger than @NR_DAMOS_QUOTA_GOAL_METRICS are unsupported.
@@ -172,6 +174,7 @@ enum damos_quota_goal_metric {
DAMOS_QUOTA_NODE_MEMCG_FREE_BP,
DAMOS_QUOTA_ACTIVE_MEM_BP,
DAMOS_QUOTA_INACTIVE_MEM_BP,
+ DAMOS_QUOTA_NODE_ELIGIBLE_MEM_BP,
NR_DAMOS_QUOTA_GOAL_METRICS,
};
diff --git a/mm/damon/core.c b/mm/damon/core.c
index 3dbbbfdeff71..a9303a8c4384 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -13,10 +13,14 @@
#include <linux/memcontrol.h>
#include <linux/mm.h>
#include <linux/psi.h>
+#include <linux/sched.h>
#include <linux/slab.h>
#include <linux/string.h>
#include <linux/string_choices.h>
+/* for damon_get_folio() used by node eligible memory metrics */
+#include "ops-common.h"
+
#define CREATE_TRACE_POINTS
#include <trace/events/damon.h>
@@ -1326,11 +1330,26 @@ static int damon_commit_targets(
int damon_commit_ctx(struct damon_ctx *dst, struct damon_ctx *src)
{
int err;
+ struct damos *scheme;
+ struct damos_quota_goal *goal;
dst->maybe_corrupted = true;
if (!is_power_of_2(src->min_region_sz))
return -EINVAL;
+ /* node_eligible_mem_bp metric requires PADDR ops */
+ if (src->ops.id != DAMON_OPS_PADDR) {
+ damon_for_each_scheme(scheme, src) {
+ struct damos_quota *quota = &scheme->quota;
+
+ damos_for_each_quota_goal(goal, quota) {
+ if (goal->metric ==
+ DAMOS_QUOTA_NODE_ELIGIBLE_MEM_BP)
+ return -EINVAL;
+ }
+ }
+ }
+
err = damon_commit_schemes(dst, src);
if (err)
return err;
@@ -2287,7 +2306,112 @@ static unsigned long damos_get_node_memcg_used_bp(
numerator = i.totalram - used_pages;
return mult_frac(numerator, 10000, i.totalram);
}
-#else
+
+#ifdef CONFIG_DAMON_PADDR
+/*
+ * damos_calc_eligible_bytes() - Calculate raw eligible bytes per node.
+ * @c: The DAMON context.
+ * @s: The scheme.
+ * @nid: The target NUMA node id.
+ * @total: Output for total eligible bytes across all nodes.
+ *
+ * Iterates through each folio in eligible regions to accurately determine
+ * which node the memory resides on. Returns eligible bytes on the specified
+ * node and sets *total to the sum across all nodes.
+ *
+ * Note: This function requires damon_get_folio() from ops-common.c, which is
+ * only available when CONFIG_DAMON_PADDR is enabled. It also requires the
+ * context to be using PADDR operations for meaningful results.
+ */
+static phys_addr_t damos_calc_eligible_bytes(struct damon_ctx *c,
+ struct damos *s, int nid, phys_addr_t *total)
+{
+ struct damon_target *t;
+ struct damon_region *r;
+ phys_addr_t total_eligible = 0;
+ phys_addr_t node_eligible = 0;
+
+ damon_for_each_target(t, c) {
+ damon_for_each_region(r, t) {
+ phys_addr_t addr, end_addr;
+
+ if (!__damos_valid_target(r, s))
+ continue;
+
+ /* Convert from core address units to physical bytes */
+ addr = (phys_addr_t)r->ar.start * c->addr_unit;
+ end_addr = (phys_addr_t)r->ar.end * c->addr_unit;
+ while (addr < end_addr) {
+ struct folio *folio;
+ phys_addr_t folio_start, folio_end;
+ phys_addr_t overlap_start, overlap_end;
+ phys_addr_t counted;
+
+ folio = damon_get_folio(PHYS_PFN(addr));
+ if (!folio) {
+ addr = PAGE_ALIGN_DOWN(addr + PAGE_SIZE);
+ continue;
+ }
+
+ /*
+ * Calculate exact overlap between the region
+ * [addr, end_addr) and the folio range.
+ * The folio may start before addr if addr is
+ * in the middle of a large folio.
+ */
+ folio_start = PFN_PHYS(folio_pfn(folio));
+ folio_end = folio_start + folio_size(folio);
+
+ overlap_start = max(addr, folio_start);
+ overlap_end = min(end_addr, folio_end);
+
+ if (overlap_end > overlap_start) {
+ counted = overlap_end - overlap_start;
+ total_eligible += counted;
+ if (folio_nid(folio) == nid)
+ node_eligible += counted;
+ }
+
+ /* Advance past the entire folio */
+ addr = folio_end;
+ folio_put(folio);
+ }
+ cond_resched();
+ }
+ }
+
+ *total = total_eligible;
+ return node_eligible;
+}
+
+static unsigned long damos_get_node_eligible_mem_bp(struct damon_ctx *c,
+ struct damos *s, int nid)
+{
+ phys_addr_t total_eligible = 0;
+ phys_addr_t node_eligible;
+
+ if (c->ops.id != DAMON_OPS_PADDR)
+ return 0;
+
+ if (nid < 0 || nid >= MAX_NUMNODES || !node_online(nid))
+ return 0;
+
+ node_eligible = damos_calc_eligible_bytes(c, s, nid, &total_eligible);
+
+ if (!total_eligible)
+ return 0;
+
+ return mult_frac((unsigned long)node_eligible, 10000,
+ (unsigned long)total_eligible);
+}
+#else /* CONFIG_DAMON_PADDR */
+static unsigned long damos_get_node_eligible_mem_bp(struct damon_ctx *c,
+ struct damos *s, int nid)
+{
+ return 0;
+}
+#endif /* CONFIG_DAMON_PADDR */
+#else /* CONFIG_NUMA */
static __kernel_ulong_t damos_get_node_mem_bp(
struct damos_quota_goal *goal)
{
@@ -2299,7 +2423,13 @@ static unsigned long damos_get_node_memcg_used_bp(
{
return 0;
}
-#endif
+
+static unsigned long damos_get_node_eligible_mem_bp(struct damon_ctx *c,
+ struct damos *s, int nid)
+{
+ return 0;
+}
+#endif /* CONFIG_NUMA */
/*
* Returns LRU-active or inactive memory to total LRU memory size ratio.
@@ -2319,7 +2449,8 @@ static unsigned int damos_get_in_active_mem_bp(bool active_ratio)
return mult_frac(inactive, 10000, total);
}
-static void damos_set_quota_goal_current_value(struct damos_quota_goal *goal)
+static void damos_set_quota_goal_current_value(struct damon_ctx *c,
+ struct damos *s, struct damos_quota_goal *goal)
{
u64 now_psi_total;
@@ -2345,19 +2476,24 @@ static void damos_set_quota_goal_current_value(struct damos_quota_goal *goal)
goal->current_value = damos_get_in_active_mem_bp(
goal->metric == DAMOS_QUOTA_ACTIVE_MEM_BP);
break;
+ case DAMOS_QUOTA_NODE_ELIGIBLE_MEM_BP:
+ goal->current_value = damos_get_node_eligible_mem_bp(c, s,
+ goal->nid);
+ break;
default:
break;
}
}
/* Return the highest score since it makes schemes least aggressive */
-static unsigned long damos_quota_score(struct damos_quota *quota)
+static unsigned long damos_quota_score(struct damon_ctx *c, struct damos *s)
{
struct damos_quota_goal *goal;
+ struct damos_quota *quota = &s->quota;
unsigned long highest_score = 0;
damos_for_each_quota_goal(goal, quota) {
- damos_set_quota_goal_current_value(goal);
+ damos_set_quota_goal_current_value(c, s, goal);
highest_score = max(highest_score,
mult_frac(goal->current_value, 10000,
goal->target_value));
@@ -2366,17 +2502,20 @@ static unsigned long damos_quota_score(struct damos_quota *quota)
return highest_score;
}
-static void damos_goal_tune_esz_bp_consist(struct damos_quota *quota)
+static void damos_goal_tune_esz_bp_consist(struct damon_ctx *c, struct damos *s)
{
- unsigned long score = damos_quota_score(quota);
+ struct damos_quota *quota = &s->quota;
+ unsigned long score = damos_quota_score(c, s);
quota->esz_bp = damon_feed_loop_next_input(
max(quota->esz_bp, 10000UL), score);
}
-static void damos_goal_tune_esz_bp_temporal(struct damos_quota *quota)
+static void damos_goal_tune_esz_bp_temporal(struct damon_ctx *c,
+ struct damos *s)
{
- unsigned long score = damos_quota_score(quota);
+ struct damos_quota *quota = &s->quota;
+ unsigned long score = damos_quota_score(c, s);
if (score >= 10000)
quota->esz_bp = 0;
@@ -2389,9 +2528,9 @@ static void damos_goal_tune_esz_bp_temporal(struct damos_quota *quota)
/*
* Called only if quota->ms, or quota->sz are set, or quota->goals is not empty
*/
-static void damos_set_effective_quota(struct damos_quota *quota,
- struct damon_ctx *ctx)
+static void damos_set_effective_quota(struct damon_ctx *c, struct damos *s)
{
+ struct damos_quota *quota = &s->quota;
unsigned long throughput;
unsigned long esz = ULONG_MAX;
@@ -2402,9 +2541,9 @@ static void damos_set_effective_quota(struct damos_quota *quota,
if (!list_empty("a->goals)) {
if (quota->goal_tuner == DAMOS_QUOTA_GOAL_TUNER_CONSIST)
- damos_goal_tune_esz_bp_consist(quota);
+ damos_goal_tune_esz_bp_consist(c, s);
else if (quota->goal_tuner == DAMOS_QUOTA_GOAL_TUNER_TEMPORAL)
- damos_goal_tune_esz_bp_temporal(quota);
+ damos_goal_tune_esz_bp_temporal(c, s);
esz = quota->esz_bp / 10000;
}
@@ -2415,7 +2554,7 @@ static void damos_set_effective_quota(struct damos_quota *quota,
else
throughput = PAGE_SIZE * 1024;
esz = min(throughput * quota->ms, esz);
- esz = max(ctx->min_region_sz, esz);
+ esz = max(c->min_region_sz, esz);
}
if (quota->sz && quota->sz < esz)
@@ -2452,7 +2591,7 @@ static void damos_adjust_quota(struct damon_ctx *c, struct damos *s)
/* First charge window */
if (!quota->total_charged_sz && !quota->charged_from) {
quota->charged_from = jiffies;
- damos_set_effective_quota(quota, c);
+ damos_set_effective_quota(c, s);
}
/* New charge window starts */
@@ -2467,7 +2606,7 @@ static void damos_adjust_quota(struct damon_ctx *c, struct damos *s)
quota->charged_sz = 0;
if (trace_damos_esz_enabled())
cached_esz = quota->esz;
- damos_set_effective_quota(quota, c);
+ damos_set_effective_quota(c, s);
if (trace_damos_esz_enabled() && quota->esz != cached_esz)
damos_trace_esz(c, s, quota);
}
diff --git a/mm/damon/sysfs-schemes.c b/mm/damon/sysfs-schemes.c
index 245d63808411..f21191d31f58 100644
--- a/mm/damon/sysfs-schemes.c
+++ b/mm/damon/sysfs-schemes.c
@@ -1094,6 +1094,10 @@ struct damos_sysfs_qgoal_metric_name damos_sysfs_qgoal_metric_names[] = {
.metric = DAMOS_QUOTA_INACTIVE_MEM_BP,
.name = "inactive_mem_bp",
},
+ {
+ .metric = DAMOS_QUOTA_NODE_ELIGIBLE_MEM_BP,
+ .name = "node_eligible_mem_bp",
+ },
};
static ssize_t target_metric_show(struct kobject *kobj,
@@ -2685,6 +2689,9 @@ static int damos_sysfs_add_quota_score(
}
goal->nid = sysfs_goal->nid;
break;
+ case DAMOS_QUOTA_NODE_ELIGIBLE_MEM_BP:
+ goal->nid = sysfs_goal->nid;
+ break;
default:
break;
}
base-commit: 80a5456b3cf475eea155b66d7a06ff266dc05647
--
2.43.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v8] mm/damon: add node_eligible_mem_bp goal metric
2026-04-26 0:32 [PATCH v8] mm/damon: add node_eligible_mem_bp goal metric Ravi Jonnalagadda
@ 2026-04-26 0:53 ` sashiko-bot
2026-04-26 17:08 ` SeongJae Park
2026-04-26 17:19 ` SeongJae Park
1 sibling, 1 reply; 5+ messages in thread
From: sashiko-bot @ 2026-04-26 0:53 UTC (permalink / raw)
To: Ravi Jonnalagadda; +Cc: damon
Sashiko AI review found 4 potential issue(s):
- [High] Casting phys_addr_t to unsigned long causes truncation and potential divide-by-zero on 32-bit systems
- [High] Missing cond_resched() inside the PFN-scanning loop leads to CPU soft lockups
- [High] Synchronous O(N) full-memory scanning defeats DAMON's bounded-overhead design
- [Medium] damos_calc_eligible_bytes() skips core filters, corrupting the metric calculation
--
commit efcd6d55e5292f8b5cd0c6202c9d95000e99dc4c
Author: Ravi Jonnalagadda <ravis.opensrc@gmail.com>
mm/damon: add node_eligible_mem_bp goal metric
This commit introduces a new quota goal metric, DAMOS_QUOTA_NODE_ELIGIBLE_MEM_BP,
to track the ratio of scheme-eligible memory on a specific NUMA node. It
implements calculation functions that iterate over eligible regions and examine
the NUMA node of each folio.
> diff --git a/mm/damon/core.c b/mm/damon/core.c
> index 3dbbbfdeff71..a9303a8c4384 100644
> --- a/mm/damon/core.c
> +++ b/mm/damon/core.c
[ ... ]
> @@ -2287,7 +2306,112 @@ static unsigned long damos_get_node_memcg_used_bp(
[ ... ]
> +static phys_addr_t damos_calc_eligible_bytes(struct damon_ctx *c,
> + struct damos *s, int nid, phys_addr_t *total)
> +{
> + struct damon_target *t;
> + struct damon_region *r;
> + phys_addr_t total_eligible = 0;
> + phys_addr_t node_eligible = 0;
> +
> + damon_for_each_target(t, c) {
> + damon_for_each_region(r, t) {
> + phys_addr_t addr, end_addr;
> +
> + if (!__damos_valid_target(r, s))
> + continue;
Does this also need to check damos_core_filter_out(c, t, r, s)? By skipping
the core filters, the eligible bytes calculation might erroneously include
memory that the scheme is explicitly configured to ignore.
> +
> + /* Convert from core address units to physical bytes */
> + addr = (phys_addr_t)r->ar.start * c->addr_unit;
> + end_addr = (phys_addr_t)r->ar.end * c->addr_unit;
> + while (addr < end_addr) {
Since DAMON is designed to provide low, constant-overhead memory monitoring via
region-based tracking, does performing a synchronous PFN-to-folio lookup for
every physical page across all eligible regions break this bounded-overhead
contract?
Because this function is called on every quota reset interval, scanning
potentially gigabytes of memory might introduce severe latency spikes and CPU
overhead.
Additionally, since DAMON regions can cover extremely large areas of memory,
should cond_resched() be called inside this inner while loop rather than
outside it? Iterating through millions of pages without yielding might trigger
RCU stalls and soft lockups.
> + struct folio *folio;
> + phys_addr_t folio_start, folio_end;
> + phys_addr_t overlap_start, overlap_end;
> + phys_addr_t counted;
> +
> + folio = damon_get_folio(PHYS_PFN(addr));
[ ... ]
> +
> + /* Advance past the entire folio */
> + addr = folio_end;
> + folio_put(folio);
> + }
> + cond_resched();
> + }
> + }
> +
> + *total = total_eligible;
> + return node_eligible;
> +}
> +
> +static unsigned long damos_get_node_eligible_mem_bp(struct damon_ctx *c,
> + struct damos *s, int nid)
> +{
> + phys_addr_t total_eligible = 0;
> + phys_addr_t node_eligible;
> +
> + if (c->ops.id != DAMON_OPS_PADDR)
> + return 0;
> +
> + if (nid < 0 || nid >= MAX_NUMNODES || !node_online(nid))
> + return 0;
> +
> + node_eligible = damos_calc_eligible_bytes(c, s, nid, &total_eligible);
> +
> + if (!total_eligible)
> + return 0;
> +
> + return mult_frac((unsigned long)node_eligible, 10000,
> + (unsigned long)total_eligible);
Can casting total_eligible from phys_addr_t to unsigned long cause a
divide-by-zero on 32-bit systems?
On 32-bit architectures with PAE, phys_addr_t is 64-bit while unsigned long
is only 32-bit. If the total eligible memory is exactly a multiple of 4GB,
the 64-bit value passes the if (!total_eligible) check, but the truncated
32-bit value passed as the divisor to mult_frac() evaluates to 0, which
would trigger a kernel panic.
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260426003245.2687-1-ravis.opensrc@gmail.com?part=1
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v8] mm/damon: add node_eligible_mem_bp goal metric
2026-04-26 0:53 ` sashiko-bot
@ 2026-04-26 17:08 ` SeongJae Park
0 siblings, 0 replies; 5+ messages in thread
From: SeongJae Park @ 2026-04-26 17:08 UTC (permalink / raw)
To: sashiko-bot; +Cc: SeongJae Park, Ravi Jonnalagadda, damon
On Sun, 26 Apr 2026 00:53:41 +0000 sashiko-bot@kernel.org wrote:
> Sashiko AI review found 4 potential issue(s):
> - [High] Casting phys_addr_t to unsigned long causes truncation and potential divide-by-zero on 32-bit systems
This is a new and good finding. Ravi, could we fix this?
> - [High] Missing cond_resched() inside the PFN-scanning loop leads to CPU soft lockups
Unlikely.
> - [High] Synchronous O(N) full-memory scanning defeats DAMON's bounded-overhead design
Intentded design.
> - [Medium] damos_calc_eligible_bytes() skips core filters, corrupting the metric calculation
Intended design.
> --
>
> commit efcd6d55e5292f8b5cd0c6202c9d95000e99dc4c
> Author: Ravi Jonnalagadda <ravis.opensrc@gmail.com>
>
> mm/damon: add node_eligible_mem_bp goal metric
>
> This commit introduces a new quota goal metric, DAMOS_QUOTA_NODE_ELIGIBLE_MEM_BP,
> to track the ratio of scheme-eligible memory on a specific NUMA node. It
> implements calculation functions that iterate over eligible regions and examine
> the NUMA node of each folio.
>
> > diff --git a/mm/damon/core.c b/mm/damon/core.c
> > index 3dbbbfdeff71..a9303a8c4384 100644
> > --- a/mm/damon/core.c
> > +++ b/mm/damon/core.c
> [ ... ]
> > @@ -2287,7 +2306,112 @@ static unsigned long damos_get_node_memcg_used_bp(
> [ ... ]
> > +static phys_addr_t damos_calc_eligible_bytes(struct damon_ctx *c,
> > + struct damos *s, int nid, phys_addr_t *total)
> > +{
> > + struct damon_target *t;
> > + struct damon_region *r;
> > + phys_addr_t total_eligible = 0;
> > + phys_addr_t node_eligible = 0;
> > +
> > + damon_for_each_target(t, c) {
> > + damon_for_each_region(r, t) {
> > + phys_addr_t addr, end_addr;
> > +
> > + if (!__damos_valid_target(r, s))
> > + continue;
>
> Does this also need to check damos_core_filter_out(c, t, r, s)? By skipping
> the core filters, the eligible bytes calculation might erroneously include
> memory that the scheme is explicitly configured to ignore.
As previously answered, this is an intended design.
>
> > +
> > + /* Convert from core address units to physical bytes */
> > + addr = (phys_addr_t)r->ar.start * c->addr_unit;
> > + end_addr = (phys_addr_t)r->ar.end * c->addr_unit;
> > + while (addr < end_addr) {
>
> Since DAMON is designed to provide low, constant-overhead memory monitoring via
> region-based tracking, does performing a synchronous PFN-to-folio lookup for
> every physical page across all eligible regions break this bounded-overhead
> contract?
The bounded onverhead is only for monitoring part. It is not promised for
DAMOS.
>
> Because this function is called on every quota reset interval, scanning
> potentially gigabytes of memory might introduce severe latency spikes and CPU
> overhead.
So no problem.
>
> Additionally, since DAMON regions can cover extremely large areas of memory,
> should cond_resched() be called inside this inner while loop rather than
> outside it? Iterating through millions of pages without yielding might trigger
> RCU stalls and soft lockups.
Unlikely. We could improve this when such issue really happens and is
reported.
>
> > + struct folio *folio;
> > + phys_addr_t folio_start, folio_end;
> > + phys_addr_t overlap_start, overlap_end;
> > + phys_addr_t counted;
> > +
> > + folio = damon_get_folio(PHYS_PFN(addr));
> [ ... ]
> > +
> > + /* Advance past the entire folio */
> > + addr = folio_end;
> > + folio_put(folio);
> > + }
> > + cond_resched();
> > + }
> > + }
> > +
> > + *total = total_eligible;
> > + return node_eligible;
> > +}
> > +
> > +static unsigned long damos_get_node_eligible_mem_bp(struct damon_ctx *c,
> > + struct damos *s, int nid)
> > +{
> > + phys_addr_t total_eligible = 0;
> > + phys_addr_t node_eligible;
> > +
> > + if (c->ops.id != DAMON_OPS_PADDR)
> > + return 0;
> > +
> > + if (nid < 0 || nid >= MAX_NUMNODES || !node_online(nid))
> > + return 0;
> > +
> > + node_eligible = damos_calc_eligible_bytes(c, s, nid, &total_eligible);
> > +
> > + if (!total_eligible)
> > + return 0;
> > +
> > + return mult_frac((unsigned long)node_eligible, 10000,
> > + (unsigned long)total_eligible);
>
> Can casting total_eligible from phys_addr_t to unsigned long cause a
> divide-by-zero on 32-bit systems?
>
> On 32-bit architectures with PAE, phys_addr_t is 64-bit while unsigned long
> is only 32-bit. If the total eligible memory is exactly a multiple of 4GB,
> the 64-bit value passes the if (!total_eligible) check, but the truncated
> 32-bit value passed as the divisor to mult_frac() evaluates to 0, which
> would trigger a kernel panic.
Good finding. Ravi, could we fix this? Maybe we can simply remove the
casting?
>
> --
> Sashiko AI review · https://sashiko.dev/#/patchset/20260426003245.2687-1-ravis.opensrc@gmail.com?part=1
Thanks,
SJ
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v8] mm/damon: add node_eligible_mem_bp goal metric
2026-04-26 0:32 [PATCH v8] mm/damon: add node_eligible_mem_bp goal metric Ravi Jonnalagadda
2026-04-26 0:53 ` sashiko-bot
@ 2026-04-26 17:19 ` SeongJae Park
2026-04-26 21:34 ` Ravi Jonnalagadda
1 sibling, 1 reply; 5+ messages in thread
From: SeongJae Park @ 2026-04-26 17:19 UTC (permalink / raw)
To: Ravi Jonnalagadda
Cc: SeongJae Park, damon, linux-mm, linux-kernel, linux-doc, akpm,
corbet, bijan311, ajayjoshi, honggyu.kim, yunjeong.mun
On Sat, 25 Apr 2026 17:32:45 -0700 Ravi Jonnalagadda <ravis.opensrc@gmail.com> wrote:
> Background and Motivation
> =========================
>
> In heterogeneous memory systems, controlling memory distribution across
> NUMA nodes is essential for performance optimization. This patch enables
> system-wide page distribution with target-state goals such as "maintain
> 60% of scheme-eligible memory on DRAM" using PA-mode DAMON schemes.
>
> Rather than using absolute thresholds, this metric tracks the ratio of
> memory that matches each scheme's access pattern filters on a target
> node, enabling the quota system to automatically adjust migration
> aggressiveness to maintain the desired distribution.
>
> What This Metric Measures
> =========================
>
> node_eligible_mem_bp:
> scheme_eligible_bytes_on_node / total_scheme_eligible_bytes * 10000
>
> Two-Scheme Setup for Hot Page Distribution
> ==========================================
>
> For maintaining 60% of hot memory on DRAM (node 0) and 40% on CXL
> (node 1):
>
> PULL scheme: migrate_hot to node 0
> goal: node_eligible_mem_bp, nid=0, target=6000
> addr filter: node 1 address range (only migrate FROM CXL)
> "Move hot pages to DRAM if less than 60% of hot data is in DRAM"
>
> PUSH scheme: migrate_hot to node 1
> goal: node_eligible_mem_bp, nid=1, target=4000
> addr filter: node 0 address range (only migrate FROM DRAM)
> "Move hot pages to CXL if less than 40% of hot data is in CXL"
>
> Each scheme independently measures its own eligible memory and adjusts
> its quota to achieve its target ratio. The schemes work in concert
> through DAMON's unified monitoring context, with the quota autotuner
> balancing their relative aggressiveness.
>
> Implementation Details
> ======================
>
> The implementation adds a new quota goal metric type
> DAMOS_QUOTA_NODE_ELIGIBLE_MEM_BP to the existing DAMOS quota goal
> framework. When this metric is configured for a scheme:
>
> 1. During each quota adjustment cycle, damos_get_node_eligible_mem_bp()
> is called to calculate the current memory distribution.
>
> 2. The function iterates through all regions that match the scheme's
> access pattern (via __damos_valid_target()) and calculates:
> - Total eligible bytes across all nodes
> - Eligible bytes specifically on the target node (goal->nid)
>
> 3. For each eligible region, damos_calc_eligible_bytes() walks through
> the physical address range, using damon_get_folio() to look up
> each folio and determine its NUMA node via folio_nid().
>
> 4. Large folios are handled by calculating the exact overlap between
> the region boundaries and folio boundaries, ensuring accurate
> byte counts even when regions partially span folios.
>
> 5. The ratio (node_eligible / total_eligible * 10000) is returned
> as basis points, which the quota autotuner uses to adjust the
> scheme's effective quota size (esz).
>
> The implementation requires CONFIG_DAMON_PADDR since damon_get_folio()
> is only available for physical address space monitoring.
>
> Testing Results
> ===============
>
> Functionally tested on a two-node heterogeneous memory system with DRAM
> (node 0) and CXL memory (node 1). A PUSH+PULL scheme configuration using
> migrate_hot actions was used to reach a target hot memory ratio between
> the two tiers.
>
> With the TEMPORAL tuner, the system converges quickly to the target
> distribution. The tuner drives esz to maximum when under goal and to
> zero once the goal is met, forming a simple on/off feedback loop that
> stabilizes at the desired ratio.
>
> With the CONSIST tuner, the scheme still converges but more slowly, as
> it migrates and then throttles itself based on quota feedback. The time
> to reach the goal varies depending on workload intensity.
>
> Note: This metric works with both TEMPORAL and CONSIST goal tuners.
>
> Suggested-by: SeongJae Park <sj@kernel.org>
> Signed-off-by: Ravi Jonnalagadda <ravis.opensrc@gmail.com>
Assuming below two minor things are addressed,
Reviewed-by: SeongJae Park <sj@kernel.org>
[...]
> +static unsigned long damos_get_node_eligible_mem_bp(struct damon_ctx *c,
> + struct damos *s, int nid)
> +{
> + phys_addr_t total_eligible = 0;
> + phys_addr_t node_eligible;
> +
> + if (c->ops.id != DAMON_OPS_PADDR)
> + return 0;
> +
> + if (nid < 0 || nid >= MAX_NUMNODES || !node_online(nid))
> + return 0;
> +
> + node_eligible = damos_calc_eligible_bytes(c, s, nid, &total_eligible);
> +
> + if (!total_eligible)
> + return 0;
> +
> + return mult_frac((unsigned long)node_eligible, 10000,
> + (unsigned long)total_eligible);
Sashiko found [1] total_eligible after the casting could be zero on 32bit
system, resulting in divide-by-zero. As I also replied to Sashiko review,
could you please fix this? It seems we can simply remove the castings.
[...]
> @@ -2389,9 +2528,9 @@ static void damos_goal_tune_esz_bp_temporal(struct damos_quota *quota)
> /*
> * Called only if quota->ms, or quota->sz are set, or quota->goals is not empty
> */
> -static void damos_set_effective_quota(struct damos_quota *quota,
> - struct damon_ctx *ctx)
> +static void damos_set_effective_quota(struct damon_ctx *c, struct damos *s)
Sorry for finding this late. Could we keep the dmon_ctx parameter name?
Otherwise, we introduce unnecessary change below.
If the mult_frac() divide-by-zero is not a real issue, I wouldn't insist this
change. But, if we will make a new version, let's do this together.
> {
> + struct damos_quota *quota = &s->quota;
> unsigned long throughput;
> unsigned long esz = ULONG_MAX;
>
> @@ -2402,9 +2541,9 @@ static void damos_set_effective_quota(struct damos_quota *quota,
>
> if (!list_empty("a->goals)) {
> if (quota->goal_tuner == DAMOS_QUOTA_GOAL_TUNER_CONSIST)
> - damos_goal_tune_esz_bp_consist(quota);
> + damos_goal_tune_esz_bp_consist(c, s);
> else if (quota->goal_tuner == DAMOS_QUOTA_GOAL_TUNER_TEMPORAL)
> - damos_goal_tune_esz_bp_temporal(quota);
> + damos_goal_tune_esz_bp_temporal(c, s);
> esz = quota->esz_bp / 10000;
> }
>
> @@ -2415,7 +2554,7 @@ static void damos_set_effective_quota(struct damos_quota *quota,
> else
> throughput = PAGE_SIZE * 1024;
> esz = min(throughput * quota->ms, esz);
> - esz = max(ctx->min_region_sz, esz);
> + esz = max(c->min_region_sz, esz);
Above change is unnecessarily introduced. Could we keep the old damon_ctx
parameter name?
[1] https://lore.kernel.org/20260426005341.B393EC2BCB0@smtp.kernel.org
Thanks,
SJ
[...]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v8] mm/damon: add node_eligible_mem_bp goal metric
2026-04-26 17:19 ` SeongJae Park
@ 2026-04-26 21:34 ` Ravi Jonnalagadda
0 siblings, 0 replies; 5+ messages in thread
From: Ravi Jonnalagadda @ 2026-04-26 21:34 UTC (permalink / raw)
To: SeongJae Park
Cc: damon, linux-mm, linux-kernel, linux-doc, akpm, corbet, bijan311,
ajayjoshi, honggyu.kim, yunjeong.mun
On Sun, Apr 26, 2026 at 10:19 AM SeongJae Park <sj@kernel.org> wrote:
>
> On Sat, 25 Apr 2026 17:32:45 -0700 Ravi Jonnalagadda <ravis.opensrc@gmail.com> wrote:
>
> > Background and Motivation
> > =========================
> >
> > In heterogeneous memory systems, controlling memory distribution across
> > NUMA nodes is essential for performance optimization. This patch enables
> > system-wide page distribution with target-state goals such as "maintain
> > 60% of scheme-eligible memory on DRAM" using PA-mode DAMON schemes.
> >
> > Rather than using absolute thresholds, this metric tracks the ratio of
> > memory that matches each scheme's access pattern filters on a target
> > node, enabling the quota system to automatically adjust migration
> > aggressiveness to maintain the desired distribution.
> >
> > What This Metric Measures
> > =========================
> >
> > node_eligible_mem_bp:
> > scheme_eligible_bytes_on_node / total_scheme_eligible_bytes * 10000
> >
> > Two-Scheme Setup for Hot Page Distribution
> > ==========================================
> >
> > For maintaining 60% of hot memory on DRAM (node 0) and 40% on CXL
> > (node 1):
> >
> > PULL scheme: migrate_hot to node 0
> > goal: node_eligible_mem_bp, nid=0, target=6000
> > addr filter: node 1 address range (only migrate FROM CXL)
> > "Move hot pages to DRAM if less than 60% of hot data is in DRAM"
> >
> > PUSH scheme: migrate_hot to node 1
> > goal: node_eligible_mem_bp, nid=1, target=4000
> > addr filter: node 0 address range (only migrate FROM DRAM)
> > "Move hot pages to CXL if less than 40% of hot data is in CXL"
> >
> > Each scheme independently measures its own eligible memory and adjusts
> > its quota to achieve its target ratio. The schemes work in concert
> > through DAMON's unified monitoring context, with the quota autotuner
> > balancing their relative aggressiveness.
> >
> > Implementation Details
> > ======================
> >
> > The implementation adds a new quota goal metric type
> > DAMOS_QUOTA_NODE_ELIGIBLE_MEM_BP to the existing DAMOS quota goal
> > framework. When this metric is configured for a scheme:
> >
> > 1. During each quota adjustment cycle, damos_get_node_eligible_mem_bp()
> > is called to calculate the current memory distribution.
> >
> > 2. The function iterates through all regions that match the scheme's
> > access pattern (via __damos_valid_target()) and calculates:
> > - Total eligible bytes across all nodes
> > - Eligible bytes specifically on the target node (goal->nid)
> >
> > 3. For each eligible region, damos_calc_eligible_bytes() walks through
> > the physical address range, using damon_get_folio() to look up
> > each folio and determine its NUMA node via folio_nid().
> >
> > 4. Large folios are handled by calculating the exact overlap between
> > the region boundaries and folio boundaries, ensuring accurate
> > byte counts even when regions partially span folios.
> >
> > 5. The ratio (node_eligible / total_eligible * 10000) is returned
> > as basis points, which the quota autotuner uses to adjust the
> > scheme's effective quota size (esz).
> >
> > The implementation requires CONFIG_DAMON_PADDR since damon_get_folio()
> > is only available for physical address space monitoring.
> >
> > Testing Results
> > ===============
> >
> > Functionally tested on a two-node heterogeneous memory system with DRAM
> > (node 0) and CXL memory (node 1). A PUSH+PULL scheme configuration using
> > migrate_hot actions was used to reach a target hot memory ratio between
> > the two tiers.
> >
> > With the TEMPORAL tuner, the system converges quickly to the target
> > distribution. The tuner drives esz to maximum when under goal and to
> > zero once the goal is met, forming a simple on/off feedback loop that
> > stabilizes at the desired ratio.
> >
> > With the CONSIST tuner, the scheme still converges but more slowly, as
> > it migrates and then throttles itself based on quota feedback. The time
> > to reach the goal varies depending on workload intensity.
> >
> > Note: This metric works with both TEMPORAL and CONSIST goal tuners.
> >
> > Suggested-by: SeongJae Park <sj@kernel.org>
> > Signed-off-by: Ravi Jonnalagadda <ravis.opensrc@gmail.com>
>
> Assuming below two minor things are addressed,
>
> Reviewed-by: SeongJae Park <sj@kernel.org>
Thank you SJ. Will send v9 addressing these two changes.
>
> [...]
> > +static unsigned long damos_get_node_eligible_mem_bp(struct damon_ctx *c,
> > + struct damos *s, int nid)
> > +{
> > + phys_addr_t total_eligible = 0;
> > + phys_addr_t node_eligible;
> > +
> > + if (c->ops.id != DAMON_OPS_PADDR)
> > + return 0;
> > +
> > + if (nid < 0 || nid >= MAX_NUMNODES || !node_online(nid))
> > + return 0;
> > +
> > + node_eligible = damos_calc_eligible_bytes(c, s, nid, &total_eligible);
> > +
> > + if (!total_eligible)
> > + return 0;
> > +
> > + return mult_frac((unsigned long)node_eligible, 10000,
> > + (unsigned long)total_eligible);
>
> Sashiko found [1] total_eligible after the casting could be zero on 32bit
> system, resulting in divide-by-zero. As I also replied to Sashiko review,
> could you please fix this? It seems we can simply remove the castings.
>
> [...]
> > @@ -2389,9 +2528,9 @@ static void damos_goal_tune_esz_bp_temporal(struct damos_quota *quota)
> > /*
> > * Called only if quota->ms, or quota->sz are set, or quota->goals is not empty
> > */
> > -static void damos_set_effective_quota(struct damos_quota *quota,
> > - struct damon_ctx *ctx)
> > +static void damos_set_effective_quota(struct damon_ctx *c, struct damos *s)
>
> Sorry for finding this late. Could we keep the dmon_ctx parameter name?
> Otherwise, we introduce unnecessary change below.
>
> If the mult_frac() divide-by-zero is not a real issue, I wouldn't insist this
> change. But, if we will make a new version, let's do this together.
>
> > {
> > + struct damos_quota *quota = &s->quota;
> > unsigned long throughput;
> > unsigned long esz = ULONG_MAX;
> >
> > @@ -2402,9 +2541,9 @@ static void damos_set_effective_quota(struct damos_quota *quota,
> >
> > if (!list_empty("a->goals)) {
> > if (quota->goal_tuner == DAMOS_QUOTA_GOAL_TUNER_CONSIST)
> > - damos_goal_tune_esz_bp_consist(quota);
> > + damos_goal_tune_esz_bp_consist(c, s);
> > else if (quota->goal_tuner == DAMOS_QUOTA_GOAL_TUNER_TEMPORAL)
> > - damos_goal_tune_esz_bp_temporal(quota);
> > + damos_goal_tune_esz_bp_temporal(c, s);
> > esz = quota->esz_bp / 10000;
> > }
> >
> > @@ -2415,7 +2554,7 @@ static void damos_set_effective_quota(struct damos_quota *quota,
> > else
> > throughput = PAGE_SIZE * 1024;
> > esz = min(throughput * quota->ms, esz);
> > - esz = max(ctx->min_region_sz, esz);
> > + esz = max(c->min_region_sz, esz);
>
> Above change is unnecessarily introduced. Could we keep the old damon_ctx
> parameter name?
>
> [1] https://lore.kernel.org/20260426005341.B393EC2BCB0@smtp.kernel.org
>
>
> Thanks,
> SJ
>
> [...]
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-04-26 21:34 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-26 0:32 [PATCH v8] mm/damon: add node_eligible_mem_bp goal metric Ravi Jonnalagadda
2026-04-26 0:53 ` sashiko-bot
2026-04-26 17:08 ` SeongJae Park
2026-04-26 17:19 ` SeongJae Park
2026-04-26 21:34 ` Ravi Jonnalagadda
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.