[RFC PATCH v2 0/2] mm/damon/paddr: Allow interleaving in migrate

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH v2 0/2] mm/damon/paddr: Allow interleaving in migrate_{hot,cold} actions
@ 2025-06-20 18:04 Bijan Tabatabai
  2025-06-20 18:04 ` [RFC PATCH v2 1/2] mm/mempolicy: Expose get_il_weight() to MM Bijan Tabatabai
                   ` (3 more replies)
  0 siblings, 4 replies; 30+ messages in thread
From: Bijan Tabatabai @ 2025-06-20 18:04 UTC (permalink / raw)
  To: damon, linux-mm, linux-kernel
  Cc: sj, akpm, david, ziy, matthew.brost, joshua.hahnjy, rakie.kim,
	byungchul, gourry, ying.huang, apopple, bijantabatab,
	venkataravis, emirakhur, ajayjoshi, vtavarespetr

From: Bijan Tabatabai <bijantabatab@micron.com>

A recent patch set automatically sets the interleave weight for each node
according to the node's maximum bandwidth [1]. In another thread, the patch
set's author, Joshua Hahn, wondered if/how thes weights show be changed if
the bandwidth utilization of the system changes [2]

This patch set adds the mechanism for dynamically changing how application
data is interleaved across nodes while leaving the policy of what the
interleave weights should be to userspace. It does this by modifying the
migrate_{hot,cold} DAMOS actions to allow passing in a list of migration
targets to their target_nid parameter. When this is done, the
migrate_{hot,cold} actions will migrate pages between the specified nodes
using the global interleave weights found at
/sys/kernel/mm/mempolicy/weighted_interleave/node<N>. This functionality
can be used to dynamically adjust how pages are interleaved by changing the
global weights. When only a single migration target is passed to
target_nid, the migrate_{hot,cold} actions will act the same as before.

There have been prior discussions about how changing the interleave weights
in response to the system's bandwidth utilization can be beneficial [2].
However, currently the interleave weights only are applied when data is
allocated. Migrating already allocated pages according to the dynamically
changing weights will better help balance the bandwidth utilization across
nodes.

As a toy example, imagine some application that uses 75% of the local
bandwidth. Assuming sufficient capacity, when running alone, we want to
keep that application's data in local memory. However, if a second
instance of that application begins, using the same amount of bandwidth,
it would be best to interleave the data of both processes to alleviate the
bandwidth pressure from the local node. Likewise, when one of the processes
ends, the data should be moves back to local memory.

We imagine there would be a userspace application that would monitor system
performance characteristics, such as bandwidth utilization or memory access
latency, and uses that information to tune the interleave weights. Others
seem to have come to a similar conclusion in previous discussions [3].
We are currently working on a userspace program that does this, but it is
not quite ready to be published yet.

We believe DAMON is the correct venue for the interleaving mechanism for a
few reasons. First, we noticed that we don't ahve to migrate all of the
application's pages to improve performance. we just need to migrate the
frequently accessed pages. DAMON's existing hotness traching is very useful
for this. Second, DAMON's quota system can be used to ensure we are not
using too much bandwidth for migrations. Finally, as Ying pointed out [4],
a complete solution must also handle when a memory node is at capacity. The
existing migrate_cold action can be used in conjunction with the
functionality added in this patch set to provide that complete solution.

Functionality Test
==================
Below is an example of this new functionality in use to confirm that these
patches behave as intended.
In this example, the user initially sets the interleave weights to
interleave the pages at a 1:1 ratio and start an application, alloc_data,
using those weights that allocates 1GB of data then sleeps. Afterwards, the
weights are changes to interleave pages at a 2:1 ratio. Using numastat, we
show that DAMON has migrated the application's data to match the new
interleave weights.
  $ # Show that the migrate_hot action is used with multiple targets
  $ cd /sys/kernel/mm/damon/admin/kdamonds/0
  $ sudo cat ./contexts/0/schemes/0/action
  migrate_hot
  $ sudo cat ./contexts/0/schemes/0/target_nid
  0-1
  $ # Initially interleave at a 1:1 ratio
  $ echo 1 | sudo tee /sys/kernel/mm/mempolicy/weighted_interleave/node0
  $ echo 1 | sudo tee /sys/kernel/mm/mempolicy/weighted_interleave/node1
  $ # Start alloc_data with the initial interleave ratio
  $ numactl -w 0,1 ~/alloc_data 1G &
  $ # Verify the initial allocation
  $ numastat -c -p alloc_data

  Per-node process memory usage (in MBs) for PID 12224 (alloc_data)
           Node 0 Node 1 Total
           ------ ------ -----
  Huge          0      0     0
  Heap          0      0     0
  Stack         0      0     0
  Private     514    514  1027
  -------  ------ ------ -----
  Total       514    514  1027
  $ # Start interleaving at a 2:1 ratio
  $ echo 2 | sudo tee /sys/kernel/mm/mempolicy/weighted_interleave/node0
  $ # Verify that DAMON has migrated data to match the new ratio
  $ numastat -c -p alloc_data

  Per-node process memory usage (in MBs) for PID 12224 (alloc_data)
           Node 0 Node 1 Total
           ------ ------ -----
  Huge          0      0     0
  Heap          0      0     0
  Stack         0      0     0
  Private     684    343  1027
  -------  ------ ------ -----
  Total       684    343  1027

Performance Test
================
Below is a simple example showing that interleaving application data using
these patches can improve application performance.
To do this, we run a bandwidth intensive embedding reduction application
[5]. This workload is useful for this test because it reports the time it
takes each iteration to run and reuses its buffers between allocation,
allowing us to clearly see the benefits of the migration.

We evaluate this a 128 core/256 thread AMD CPU, with 72 GB/s of local DDR
bandwidth and 26 GB/s of CXL memory.

Before we start the workload, the system bandwidth utilization is low, so
we start with interleave weights biased as much as possible to the local
node. When the workload begins, it saturates the local bandwidth, making
the page placement suboptimal. To alleviate this, we modify the interleave
weights, triggering DAMON to migrate the workload's data.

  $ cd /sys/kernel/mm/damon/admin/kdamonds/0/
  $ sudo cat ./contexts/0/schemes/0/action
  migrate_hot
  $ sudo cat ./contexts/0/schemes/0/target_nid
  0-1
  $ echo 255 | sudo tee /sys/kernel/mm/mempolicy/weighted_interleave/node0
  $ echo 1 | sudo tee /sys/kernel/mm/mempolicy/weighted_interleave/node1
  $ <path>/eval_baseline -d amazon_All -c 255 -r 100
  <clip startup output>
  Eval Phase 3: Running Baseline...

  REPEAT # 0 Baseline Total time : 9043.24 ms
  REPEAT # 1 Baseline Total time : 7307.71 ms
  REPEAT # 2 Baseline Total time : 7301.4 ms
  REPEAT # 3 Baseline Total time : 7312.44 ms
  REPEAT # 4 Baseline Total time : 7282.43 ms
  # Interleave weights changed to 3:1
  REPEAT # 5 Baseline Total time : 6754.78 ms
  REPEAT # 6 Baseline Total time : 5322.38 ms
  REPEAT # 7 Baseline Total time : 5359.89 ms
  REPEAT # 8 Baseline Total time : 5346.14 ms
  REPEAT # 9 Baseline Total time : 5321.98 ms

Updating the interleave weights, and having DAMON migrate the workload
data according to the weights resulted in an approximately 25% speedup.

Questions for Reviewers
=======================
1. Are you happy with the changes to the DAMON sysfs interface?
2. Setting an interleave weight to 0 is currently not allowed. This makes
   sense when the weights are only used for allocation. Does it make sense
   to allow 0 weights now?

Patches Sequence
================
This first patch exposes get_il_weight() in ./mm/internal.h to let DAMON
access the interleave weights.
The second patch implements the interleaving mechanism in the
migrate_{hot/cold} actions.

Revision History
================
Changes from v1
(https://lore.kernel.org/linux-mm/20250612181330.31236-1-bijan311@gmail.com/)
- Reuse migrate_{hot,cold} actions instead of creating a new action
- Remove vaddr implementation
- Remove most of the use of mempolicy, instead duplicate the interleave
  logic and access interleave weights directly
- Write more about the use case in the cover letter
- Write about why DAMON was used for this in the cover letter
- Add correctness test to the cover letter
- Add performance test

[1] https://lore.kernel.org/linux-mm/20250520141236.2987309-1-joshua.hahnjy@gmail.com/
[2] https://lore.kernel.org/linux-mm/20250313155705.1943522-1-joshua.hahnjy@gmail.com/
[3] https://lore.kernel.org/linux-mm/20250314151137.892379-1-joshua.hahnjy@gmail.com/
[4] https://lore.kernel.org/linux-mm/87frjfx6u4.fsf@DESKTOP-5N7EMDA/
[5] https://github.com/SNU-ARC/MERCI

Bijan Tabatabai (2):
  mm/mempolicy: Expose get_il_weight() to MM
  mm/damon/paddr: Allow multiple migrate targets

 include/linux/damon.h    |   8 +--
 mm/damon/core.c          |   9 ++--
 mm/damon/lru_sort.c      |   2 +-
 mm/damon/paddr.c         | 108 +++++++++++++++++++++++++++++++++++++--
 mm/damon/reclaim.c       |   2 +-
 mm/damon/sysfs-schemes.c |  14 +++--
 mm/internal.h            |   6 +++
 mm/mempolicy.c           |   2 +-
 samples/damon/mtier.c    |   6 ++-
 samples/damon/prcl.c     |   2 +-
 10 files changed, 138 insertions(+), 21 deletions(-)

-- 
2.43.5

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [RFC PATCH v2 1/2] mm/mempolicy: Expose get_il_weight() to MM
  2025-06-20 18:04 [RFC PATCH v2 0/2] mm/damon/paddr: Allow interleaving in migrate_{hot,cold} actions Bijan Tabatabai
@ 2025-06-20 18:04 ` Bijan Tabatabai
  2025-06-23 19:06   ` Gregory Price
                     ` (2 more replies)
  2025-06-20 18:04 ` [RFC PATCH v2 2/2] mm/damon/paddr: Allow multiple migrate targets Bijan Tabatabai
                   ` (2 subsequent siblings)
  3 siblings, 3 replies; 30+ messages in thread
From: Bijan Tabatabai @ 2025-06-20 18:04 UTC (permalink / raw)
  To: damon, linux-mm, linux-kernel
  Cc: sj, akpm, david, ziy, matthew.brost, joshua.hahnjy, rakie.kim,
	byungchul, gourry, ying.huang, apopple, bijantabatab,
	venkataravis, emirakhur, ajayjoshi, vtavarespetr

From: Bijan Tabatabai <bijantabatab@micron.com>

This patch exposes get_il_weight() in mm/internal.h. This patch is to
allow other parts of the MM subsystem, such as DAMON, to make page
placement decisions based on the global interleave weights.

Signed-off-by: Bijan Tabatabai <bijantabatab@micron.com>
---
 mm/internal.h  | 6 ++++++
 mm/mempolicy.c | 2 +-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/mm/internal.h b/mm/internal.h
index 3823fb356d3b..b718ebe7cad5 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1169,6 +1169,7 @@ extern int node_reclaim_mode;
 
 extern int node_reclaim(struct pglist_data *, gfp_t, unsigned int);
 extern int find_next_best_node(int node, nodemask_t *used_node_mask);
+extern u8 get_il_weight(int node);
 #else
 #define node_reclaim_mode 0
 
@@ -1181,6 +1182,11 @@ static inline int find_next_best_node(int node, nodemask_t *used_node_mask)
 {
 	return NUMA_NO_NODE;
 }
+
+static inline u8 get_il_weight(int node)
+{
+	return 1;
+}
 #endif
 
 static inline bool node_reclaim_enabled(void)
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 1ff7b2174eb7..b1713430ee9c 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -165,7 +165,7 @@ static unsigned int *node_bw_table;
  */
 static DEFINE_MUTEX(wi_state_lock);
 
-static u8 get_il_weight(int node)
+u8 get_il_weight(int node)
 {
 	struct weighted_interleave_state *state;
 	u8 weight = 1;
-- 
2.43.5



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH v2 2/2] mm/damon/paddr: Allow multiple migrate targets
  2025-06-20 18:04 [RFC PATCH v2 0/2] mm/damon/paddr: Allow interleaving in migrate_{hot,cold} actions Bijan Tabatabai
  2025-06-20 18:04 ` [RFC PATCH v2 1/2] mm/mempolicy: Expose get_il_weight() to MM Bijan Tabatabai
@ 2025-06-20 18:04 ` Bijan Tabatabai
  2025-06-21 18:02   ` SeongJae Park
  2025-06-20 20:21 ` [RFC PATCH v2 0/2] mm/damon/paddr: Allow interleaving in migrate_{hot,cold} actions SeongJae Park
  2025-06-23 13:45 ` Joshua Hahn
  3 siblings, 1 reply; 30+ messages in thread
From: Bijan Tabatabai @ 2025-06-20 18:04 UTC (permalink / raw)
  To: damon, linux-mm, linux-kernel
  Cc: sj, akpm, david, ziy, matthew.brost, joshua.hahnjy, rakie.kim,
	byungchul, gourry, ying.huang, apopple, bijantabatab,
	venkataravis, emirakhur, ajayjoshi, vtavarespetr

From: Bijan Tabatabai <bijantabatab@micron.com>

The migrate_{hot,cold} DAMONS actions take a parameter, target_nid, to
indicate what node the actions should migrate pages to. In this patch,
we allow passing in a list of migration targets into target_nid. When
this is done, the mirgate_{hot, cold} actions will migrate pages between
the specified nodes using the global interleave weights found at
/sys/kernel/mm/mempolicy/weighted_interleave/node<N>. This functionality
can be used to dynamically adjust how pages are interleaved in response
to changes in bandwidth utilization to improve performance, as discussed
in [1]. When only a single migration target is passed to target_nid, the
migrate_{hot,cold} actions will act the same as before.

Below is an example of this new functionality. The user initially sets
the interleave weights to interleave pages at a 1:1 ratio and starts an
application, alloc_data, using those weights that allocates 1GB of data
then sleeps. Afterwards, the weights are changed to interleave pages at
a 2:1 ratio. Using numastat, we show that DAMON has migrated the
application's pages to match the new interleave weights.
  $ # Show that the migrate_hot action is used with multiple targets
  $ cd /sys/kernel/mm/damon/admin/kdamonds/0
  $ sudo cat ./contexts/0/schemes/0/action
  migrate_hot
  $ sudo cat ./contexts/0/schemes/0/target_nid
  0-1
  $ # Initially interleave at a 1:1 ratio
  $ echo 1 | sudo tee /sys/kernel/mm/mempolicy/weighted_interleave/node0
  $ echo 1 | sudo tee /sys/kernel/mm/mempolicy/weighted_interleave/node1
  $ # Start alloc_data with the initial interleave ratio
  $ numactl -w 0,1 ~/alloc_data 1G &
  $ # Verify the initial allocation
  $ numastat -c -p alloc_data

  Per-node process memory usage (in MBs) for PID 12224 (alloc_data)
           Node 0 Node 1 Total
           ------ ------ -----
  Huge          0      0     0
  Heap          0      0     0
  Stack         0      0     0
  Private     514    514  1027
  -------  ------ ------ -----
  Total       514    514  1027
  $ # Start interleaving at a 2:1 ratio
  $ echo 2 | sudo tee /sys/kernel/mm/mempolicy/weighted_interleave/node0
  $ # Verify that DAMON has migrated data to match the new ratio
  $ numastat -c -p alloc_data

  Per-node process memory usage (in MBs) for PID 12224 (alloc_data)
           Node 0 Node 1 Total
           ------ ------ -----
  Huge          0      0     0
  Heap          0      0     0
  Stack         0      0     0
  Private     684    343  1027
  -------  ------ ------ -----
  Total       684    343  1027

[1] https://lore.kernel.org/linux-mm/20250313155705.1943522-1-joshua.hahnjy@gmail.com/

Signed-off-by: Bijan Tabatabai <bijantabatab@micron.com>
---
 include/linux/damon.h    |   8 +--
 mm/damon/core.c          |   9 ++--
 mm/damon/lru_sort.c      |   2 +-
 mm/damon/paddr.c         | 108 +++++++++++++++++++++++++++++++++++++--
 mm/damon/reclaim.c       |   2 +-
 mm/damon/sysfs-schemes.c |  14 +++--
 samples/damon/mtier.c    |   6 ++-
 samples/damon/prcl.c     |   2 +-
 8 files changed, 131 insertions(+), 20 deletions(-)

diff --git a/include/linux/damon.h b/include/linux/damon.h
index a4011726cb3b..24e726ee146a 100644
--- a/include/linux/damon.h
+++ b/include/linux/damon.h
@@ -454,7 +454,7 @@ struct damos_access_pattern {
  * @apply_interval_us:	The time between applying the @action.
  * @quota:		Control the aggressiveness of this scheme.
  * @wmarks:		Watermarks for automated (in)activation of this scheme.
- * @target_nid:		Destination node if @action is "migrate_{hot,cold}".
+ * @target_nids:	Destination nodes if @action is "migrate_{hot,cold}".
  * @filters:		Additional set of &struct damos_filter for &action.
  * @ops_filters:	ops layer handling &struct damos_filter objects list.
  * @last_applied:	Last @action applied ops-managing entity.
@@ -472,7 +472,7 @@ struct damos_access_pattern {
  * monitoring context are inactive, DAMON stops monitoring either, and just
  * repeatedly checks the watermarks.
  *
- * @target_nid is used to set the migration target node for migrate_hot or
+ * @target_nids is used to set the migration targets node for migrate_hot or
  * migrate_cold actions, which means it's only meaningful when @action is either
  * "migrate_hot" or "migrate_cold".
  *
@@ -517,7 +517,7 @@ struct damos {
 	struct damos_quota quota;
 	struct damos_watermarks wmarks;
 	union {
-		int target_nid;
+		nodemask_t target_nids;
 	};
 	struct list_head filters;
 	struct list_head ops_filters;
@@ -896,7 +896,7 @@ struct damos *damon_new_scheme(struct damos_access_pattern *pattern,
 			unsigned long apply_interval_us,
 			struct damos_quota *quota,
 			struct damos_watermarks *wmarks,
-			int target_nid);
+			nodemask_t *target_nids);
 void damon_add_scheme(struct damon_ctx *ctx, struct damos *s);
 void damon_destroy_scheme(struct damos *s);
 int damos_commit_quota_goals(struct damos_quota *dst, struct damos_quota *src);
diff --git a/mm/damon/core.c b/mm/damon/core.c
index b217e0120e09..b57eae393df8 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -378,7 +378,7 @@ struct damos *damon_new_scheme(struct damos_access_pattern *pattern,
 			unsigned long apply_interval_us,
 			struct damos_quota *quota,
 			struct damos_watermarks *wmarks,
-			int target_nid)
+			nodemask_t *target_nids)
 {
 	struct damos *scheme;
 
@@ -407,7 +407,10 @@ struct damos *damon_new_scheme(struct damos_access_pattern *pattern,
 	scheme->wmarks = *wmarks;
 	scheme->wmarks.activated = true;
 
-	scheme->target_nid = target_nid;
+	if (target_nids)
+		nodes_copy(scheme->target_nids, *target_nids);
+	else
+		nodes_clear(scheme->target_nids);
 
 	return scheme;
 }
@@ -1006,7 +1009,7 @@ static int damon_commit_schemes(struct damon_ctx *dst, struct damon_ctx *src)
 				src_scheme->action,
 				src_scheme->apply_interval_us,
 				&src_scheme->quota, &src_scheme->wmarks,
-				NUMA_NO_NODE);
+				NULL);
 		if (!new_scheme)
 			return -ENOMEM;
 		err = damos_commit(new_scheme, src_scheme);
diff --git a/mm/damon/lru_sort.c b/mm/damon/lru_sort.c
index 4af8fd4a390b..ef584c49ecf1 100644
--- a/mm/damon/lru_sort.c
+++ b/mm/damon/lru_sort.c
@@ -164,7 +164,7 @@ static struct damos *damon_lru_sort_new_scheme(
 			&quota,
 			/* (De)activate this according to the watermarks. */
 			&damon_lru_sort_wmarks,
-			NUMA_NO_NODE);
+			NULL);
 }
 
 /* Create a DAMON-based operation scheme for hot memory regions */
diff --git a/mm/damon/paddr.c b/mm/damon/paddr.c
index 4102a8c5f992..cbd262d21779 100644
--- a/mm/damon/paddr.c
+++ b/mm/damon/paddr.c
@@ -19,6 +19,12 @@
 #include "../internal.h"
 #include "ops-common.h"
 
+struct damon_pa_migrate_rmap_arg {
+	nodemask_t *nids;
+	u8 *weights;
+	int *target_nid;
+};
+
 static bool damon_folio_mkold_one(struct folio *folio,
 		struct vm_area_struct *vma, unsigned long addr, void *arg)
 {
@@ -502,12 +508,83 @@ static unsigned long damon_pa_migrate_pages(struct list_head *folio_list,
 	return nr_migrated;
 }
 
+static bool damon_pa_migrate_rmap(struct folio *folio,
+				  struct vm_area_struct *vma,
+				  unsigned long addr,
+				  void *arg)
+{
+	struct damon_pa_migrate_rmap_arg *rmap_arg;
+	pgoff_t ilx;
+	int order;
+	unsigned int target;
+	unsigned int weight_total = 0;
+	int nid;
+
+	rmap_arg = (struct damon_pa_migrate_rmap_arg *)arg;
+
+	order = folio_order(folio);
+	ilx = vma->vm_pgoff >> order;
+	ilx += (addr - vma->vm_start) >> (PAGE_SHIFT + order);
+
+	/* Same logic as weighted_interleave_nid() */
+	for_each_node_mask(nid, *rmap_arg->nids) {
+		weight_total += rmap_arg->weights[nid];
+	}
+
+	target = ilx % weight_total;
+	nid = first_node(*rmap_arg->nids);
+	while (target) {
+		if (target < rmap_arg->weights[nid])
+			break;
+		target -= rmap_arg->weights[nid];
+		nid = next_node_in(nid, *rmap_arg->nids);
+	}
+
+	if (nid == folio_nid(folio))
+		*rmap_arg->target_nid = NUMA_NO_NODE;
+	else
+		*rmap_arg->target_nid = nid;
+	return false;
+}
+
 static unsigned long damon_pa_migrate(struct damon_region *r, struct damos *s,
 		unsigned long *sz_filter_passed)
 {
 	unsigned long addr, applied;
-	LIST_HEAD(folio_list);
+	struct rmap_walk_control rwc;
+	struct damon_pa_migrate_rmap_arg rmap_arg;
+	struct list_head *folio_lists;
 	struct folio *folio;
+	u8 *weights;
+	int target_nid;
+	int nr_nodes;
+
+	nr_nodes = nodes_weight(s->target_nids);
+	if (!nr_nodes)
+		return 0;
+
+	folio_lists = kmalloc_array(nr_node_ids, sizeof(struct list_head),
+		GFP_KERNEL);
+	if (!folio_lists)
+		return 0;
+
+	weights = kmalloc_array(nr_node_ids, sizeof(u8), GFP_KERNEL);
+	if (!weights) {
+		kfree(folio_lists);
+		return 0;
+	}
+
+	for (int i = 0; i < nr_node_ids; i++) {
+		INIT_LIST_HEAD(&folio_lists[i]);
+		weights[i] = get_il_weight(i);
+	}
+
+	memset(&rwc, 0, sizeof(struct rmap_walk_control));
+	rwc.rmap_one = damon_pa_migrate_rmap;
+	rwc.arg = &rmap_arg;
+	rmap_arg.nids = &s->target_nids;
+	rmap_arg.weights = weights;
+	rmap_arg.target_nid = &target_nid;
 
 	addr = r->ar.start;
 	while (addr < r->ar.end) {
@@ -522,15 +599,38 @@ static unsigned long damon_pa_migrate(struct damon_region *r, struct damos *s,
 		else
 			*sz_filter_passed += folio_size(folio);
 
+		/*
+		 * If there is only one target node, migrate there. Otherwise,
+		 * interleave across the nodes according to the global
+		 * interleave weights
+		 */
+		if (nr_nodes == 1) {
+			target_nid = first_node(s->target_nids);
+		} else {
+			target_nid = NUMA_NO_NODE;
+			/* Updates target_nid */
+			rmap_walk(folio, &rwc);
+		}
+
+		if (target_nid == NUMA_NO_NODE)
+			goto put_folio;
+
 		if (!folio_isolate_lru(folio))
 			goto put_folio;
-		list_add(&folio->lru, &folio_list);
+		list_add(&folio->lru, &folio_lists[target_nid]);
 put_folio:
 		addr += folio_size(folio);
 		folio_put(folio);
 	}
-	applied = damon_pa_migrate_pages(&folio_list, s->target_nid);
-	cond_resched();
+
+	applied = 0;
+	for (int i = 0; i < nr_node_ids; i++) {
+		applied += damon_pa_migrate_pages(&folio_lists[i], i);
+		cond_resched();
+	}
+
+	kfree(weights);
+	kfree(folio_lists);
 	s->last_applied = folio;
 	return applied * PAGE_SIZE;
 }
diff --git a/mm/damon/reclaim.c b/mm/damon/reclaim.c
index a675150965e0..9b9546606424 100644
--- a/mm/damon/reclaim.c
+++ b/mm/damon/reclaim.c
@@ -178,7 +178,7 @@ static struct damos *damon_reclaim_new_scheme(void)
 			&damon_reclaim_quota,
 			/* (De)activate this according to the watermarks. */
 			&damon_reclaim_wmarks,
-			NUMA_NO_NODE);
+			NULL);
 }
 
 static int damon_reclaim_apply_parameters(void)
diff --git a/mm/damon/sysfs-schemes.c b/mm/damon/sysfs-schemes.c
index 0f6c9e1fec0b..eb4e2ded5c83 100644
--- a/mm/damon/sysfs-schemes.c
+++ b/mm/damon/sysfs-schemes.c
@@ -1583,7 +1583,7 @@ struct damon_sysfs_scheme {
 	struct damon_sysfs_scheme_filters *filters;
 	struct damon_sysfs_stats *stats;
 	struct damon_sysfs_scheme_regions *tried_regions;
-	int target_nid;
+	nodemask_t target_nids;
 };
 
 /* This should match with enum damos_action */
@@ -1611,7 +1611,7 @@ static struct damon_sysfs_scheme *damon_sysfs_scheme_alloc(
 	scheme->kobj = (struct kobject){};
 	scheme->action = action;
 	scheme->apply_interval_us = apply_interval_us;
-	scheme->target_nid = NUMA_NO_NODE;
+	nodes_clear(scheme->target_nids);
 	return scheme;
 }
 
@@ -1880,18 +1880,22 @@ static ssize_t target_nid_show(struct kobject *kobj,
 	struct damon_sysfs_scheme *scheme = container_of(kobj,
 			struct damon_sysfs_scheme, kobj);
 
-	return sysfs_emit(buf, "%d\n", scheme->target_nid);
+	return bitmap_print_to_pagebuf(true, buf, scheme->target_nids.bits, MAX_NUMNODES);
 }
 
 static ssize_t target_nid_store(struct kobject *kobj,
 		struct kobj_attribute *attr, const char *buf, size_t count)
 {
+	nodemask_t new;
 	struct damon_sysfs_scheme *scheme = container_of(kobj,
 			struct damon_sysfs_scheme, kobj);
 	int err = 0;
 
 	/* TODO: error handling for target_nid range. */
-	err = kstrtoint(buf, 0, &scheme->target_nid);
+	err = nodelist_parse(buf, new);
+
+	if (!err)
+		nodes_copy(scheme->target_nids, new);
 
 	return err ? err : count;
 }
@@ -2258,7 +2262,7 @@ static struct damos *damon_sysfs_mk_scheme(
 
 	scheme = damon_new_scheme(&pattern, sysfs_scheme->action,
 			sysfs_scheme->apply_interval_us, &quota, &wmarks,
-			sysfs_scheme->target_nid);
+			&sysfs_scheme->target_nids);
 	if (!scheme)
 		return NULL;
 
diff --git a/samples/damon/mtier.c b/samples/damon/mtier.c
index 36d2cd933f5a..b9ac075cbd25 100644
--- a/samples/damon/mtier.c
+++ b/samples/damon/mtier.c
@@ -47,6 +47,10 @@ static struct damon_ctx *damon_sample_mtier_build_ctx(bool promote)
 	struct damos *scheme;
 	struct damos_quota_goal *quota_goal;
 	struct damos_filter *filter;
+	nodemask_t target_node;
+
+	nodes_clear(target_node);
+	node_set(promote ? 0 : 1, target_node);
 
 	ctx = damon_new_ctx();
 	if (!ctx)
@@ -105,7 +109,7 @@ static struct damon_ctx *damon_sample_mtier_build_ctx(bool promote)
 				.weight_age = 100,
 			},
 			&(struct damos_watermarks){},
-			promote ? 0 : 1);	/* migrate target node id */
+			&target_node);
 	if (!scheme)
 		goto free_out;
 	damon_set_schemes(ctx, &scheme, 1);
diff --git a/samples/damon/prcl.c b/samples/damon/prcl.c
index 056b1b21a0fe..4d3e4e2e15cc 100644
--- a/samples/damon/prcl.c
+++ b/samples/damon/prcl.c
@@ -88,7 +88,7 @@ static int damon_sample_prcl_start(void)
 			0,
 			&(struct damos_quota){},
 			&(struct damos_watermarks){},
-			NUMA_NO_NODE);
+			NULL);
 	if (!scheme) {
 		damon_destroy_ctx(ctx);
 		return -ENOMEM;
-- 
2.43.5



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH v2 0/2] mm/damon/paddr: Allow interleaving in migrate_{hot,cold} actions
  2025-06-20 18:04 [RFC PATCH v2 0/2] mm/damon/paddr: Allow interleaving in migrate_{hot,cold} actions Bijan Tabatabai
  2025-06-20 18:04 ` [RFC PATCH v2 1/2] mm/mempolicy: Expose get_il_weight() to MM Bijan Tabatabai
  2025-06-20 18:04 ` [RFC PATCH v2 2/2] mm/damon/paddr: Allow multiple migrate targets Bijan Tabatabai
@ 2025-06-20 20:21 ` SeongJae Park
  2025-06-20 21:47   ` Bijan Tabatabai
  2025-06-23 19:28   ` Gregory Price
  2025-06-23 13:45 ` Joshua Hahn
  3 siblings, 2 replies; 30+ messages in thread
From: SeongJae Park @ 2025-06-20 20:21 UTC (permalink / raw)
  To: Bijan Tabatabai
  Cc: SeongJae Park, damon, linux-mm, linux-kernel, akpm, david, ziy,
	matthew.brost, joshua.hahnjy, rakie.kim, byungchul, gourry,
	ying.huang, apopple, bijantabatab, venkataravis, emirakhur,
	ajayjoshi, vtavarespetr

Hi Bijan,

On Fri, 20 Jun 2025 13:04:56 -0500 Bijan Tabatabai <bijan311@gmail.com> wrote:

[...]
> This patch set adds the mechanism for dynamically changing how application
> data is interleaved across nodes while leaving the policy of what the
> interleave weights should be to userspace. It does this by modifying the
> migrate_{hot,cold} DAMOS actions to allow passing in a list of migration
> targets to their target_nid parameter. When this is done, the
> migrate_{hot,cold} actions will migrate pages between the specified nodes
> using the global interleave weights found at
> /sys/kernel/mm/mempolicy/weighted_interleave/node<N>. This functionality
> can be used to dynamically adjust how pages are interleaved by changing the
> global weights. When only a single migration target is passed to
> target_nid, the migrate_{hot,cold} actions will act the same as before.

This means users are required to manipulate two interfaces.  DAMON sysfs for
target nodes, and weighted_interleave sysfs for weights.  I don't think this
coupling is very ideal.

Off the opt of my head, I concern if users could mistakenly forget updating one
of those, since the requirement is not very clear.  I think the interface
should clearly explain that.  For example, writing a special keywords, say,
"use_interleave_weights" to target_nid parameter sysfs file.  But, even in the
case, users who update weighted_interleave might foget updating target nodes on
DAMON interface.

I think letting DAMOS_MIGRATE_{HOT,COLD} to use all nodes as migration target
when the special keyword is given is one of better options.  This is what I
suggested to the previous version of this patch series.  But now I think it
would be better if we could just remove this coupling.

I understand a sort of this coupling is inevitable if the kernel should make
the connection between DAMON and weighted interleaving itself, without
user-space help.  But now I think we could get user-space help, according to
below.  Please keep reading.

[...]
> As a toy example, imagine some application that uses 75% of the local
> bandwidth. Assuming sufficient capacity, when running alone, we want to
> keep that application's data in local memory. However, if a second
> instance of that application begins, using the same amount of bandwidth,
> it would be best to interleave the data of both processes to alleviate the
> bandwidth pressure from the local node. Likewise, when one of the processes
> ends, the data should be moves back to local memory.
> 
> We imagine there would be a userspace application that would monitor system
> performance characteristics, such as bandwidth utilization or memory access
> latency, and uses that information to tune the interleave weights. Others
> seem to have come to a similar conclusion in previous discussions [3].
> We are currently working on a userspace program that does this, but it is
> not quite ready to be published yet.

So, at least in this toy example, we have user-space control.  Then, I think we
could decouple DAMON and weighted interleaving, and ask the usr-space tool to
be the connection between those.  That is, extend DAMOS_MIGRATE_{HOT,COLD} to
let users specify migration target nodes and their weights.  And ask the
user-space tool to periodically read weighted interleaving parameters that
could be auto-tuned, and update DAMOS_MIGRATE_{HOT,COLD} parameters
accordingly.  Actually the user-space tool on this example is making the
weights by itself, so this should be easy work to do?

Also, even for general use case, I think such user-space intervention is not
too much request.  Please let me know if I'm wrong.

> 
> We believe DAMON is the correct venue for the interleaving mechanism for a
> few reasons. First, we noticed that we don't ahve to migrate all of the
> application's pages to improve performance. we just need to migrate the
> frequently accessed pages. DAMON's existing hotness traching is very useful
> for this. Second, DAMON's quota system can be used to ensure we are not
> using too much bandwidth for migrations. Finally, as Ying pointed out [4],
> a complete solution must also handle when a memory node is at capacity. The
> existing migrate_cold action can be used in conjunction with the
> functionality added in this patch set to provide that complete solution.

These make perfect sense to me.  Thank you for adding this great summary.

> 
> Functionality Test
> ==================
[...]
> Performance Test
> ================
[...]
> Updating the interleave weights, and having DAMON migrate the workload
> data according to the weights resulted in an approximately 25% speedup.

Awesome.  Thank you for conducting this great tests and sharing the results!

> 
> Questions for Reviewers
> =======================
> 1. Are you happy with the changes to the DAMON sysfs interface?

I'm happy with it for RFC level implementation.  And in my opinion, you now
proved this is a good idea.  For next steps toward mainline landing, I'd like
to suggest below interface change.

Let's allow users specify DAMOS_MIGRATE_{HOT,COLD} target nodes and weights
using only DAMON interface.  And let the user-space tool do the synchronization
with weighted interleaving or other required works.

This may require writing not small amount of code, especially for DAMON sysfs
interface.  I think it is doable, though.  If you don't mind, I'd like to
quickly make a prototype and share with you.

What do you think?

> 2. Setting an interleave weight to 0 is currently not allowed. This makes
>    sense when the weights are only used for allocation. Does it make sense
>    to allow 0 weights now?

I have no opinion, and would like to let mempolicy folks make voices.  But if
we go on the decoupling approach as I suggested above, we can do this
discussion in a separate thread :)

[...]
> Revision History
> ================
> Changes from v1
> (https://lore.kernel.org/linux-mm/20250612181330.31236-1-bijan311@gmail.com/)
> - Reuse migrate_{hot,cold} actions instead of creating a new action
> - Remove vaddr implementation
> - Remove most of the use of mempolicy, instead duplicate the interleave
>   logic and access interleave weights directly
> - Write more about the use case in the cover letter
> - Write about why DAMON was used for this in the cover letter
> - Add correctness test to the cover letter
> - Add performance test

Again, thank you for revisioning.  Please bear in mind with me at next steps.
I believe this work is very promising.

Thanks,
SJ

[...]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH v2 0/2] mm/damon/paddr: Allow interleaving in migrate_{hot,cold} actions
  2025-06-20 20:21 ` [RFC PATCH v2 0/2] mm/damon/paddr: Allow interleaving in migrate_{hot,cold} actions SeongJae Park
@ 2025-06-20 21:47   ` Bijan Tabatabai
  2025-06-20 23:13     ` SeongJae Park
  2025-06-23 19:28   ` Gregory Price
  1 sibling, 1 reply; 30+ messages in thread
From: Bijan Tabatabai @ 2025-06-20 21:47 UTC (permalink / raw)
  To: SeongJae Park
  Cc: damon, linux-mm, linux-kernel, akpm, david, ziy, matthew.brost,
	joshua.hahnjy, rakie.kim, byungchul, gourry, ying.huang, apopple,
	bijantabatab, venkataravis, emirakhur, ajayjoshi, vtavarespetr

Hi SeongJae,

On Fri, Jun 20, 2025 at 3:21 PM SeongJae Park <sj@kernel.org> wrote:
>
> Hi Bijan,
>
> On Fri, 20 Jun 2025 13:04:56 -0500 Bijan Tabatabai <bijan311@gmail.com> wrote:
>
> [...]
> > As a toy example, imagine some application that uses 75% of the local
> > bandwidth. Assuming sufficient capacity, when running alone, we want to
> > keep that application's data in local memory. However, if a second
> > instance of that application begins, using the same amount of bandwidth,
> > it would be best to interleave the data of both processes to alleviate the
> > bandwidth pressure from the local node. Likewise, when one of the processes
> > ends, the data should be moves back to local memory.
> >
> > We imagine there would be a userspace application that would monitor system
> > performance characteristics, such as bandwidth utilization or memory access
> > latency, and uses that information to tune the interleave weights. Others
> > seem to have come to a similar conclusion in previous discussions [3].
> > We are currently working on a userspace program that does this, but it is
> > not quite ready to be published yet.
>
> So, at least in this toy example, we have user-space control.  Then, I think we
> could decouple DAMON and weighted interleaving, and ask the usr-space tool to
> be the connection between those.  That is, extend DAMOS_MIGRATE_{HOT,COLD} to
> let users specify migration target nodes and their weights.  And ask the
> user-space tool to periodically read weighted interleaving parameters that
> could be auto-tuned, and update DAMOS_MIGRATE_{HOT,COLD} parameters
> accordingly.  Actually the user-space tool on this example is making the
> weights by itself, so this should be easy work to do?
>
> Also, even for general use case, I think such user-space intervention is not
> too much request.  Please let me know if I'm wrong.

You are correct. The userspace tool would be coming up with the
weights, so it would not be hard for it to write those weights to two
places. I coupled the weights used in DAMON and weighted interleaving
for this revision and the previous because I could not think of a use
case where you would want to use different weights for allocation time
and migration, so it felt silly to have two different places with the
same data. However, I don't feel too strongly about this, so I'm
willing to defer to your judgement.

Also, our userspace tool updates these weights somewhat frequently,
several times per minute, when it detects a change in the bandwidth
utilization of the system to calibrate the interleave ratio. I am
concerned about how frequent changes to the scheme via the sysfs
interface will affect the effectiveness of DAMON's page sampling. From
what I understand, updates to the sysfs aren't saved until the user
writes to some sysfs file to commit them, then the damon context is
recreated from scratch. Would this throw away all the previous
sampling work done and work splitting and merging regions? I am not
super familiar with how the sysfs interface interacts with the rest of
the system, so this concern might be entirely unfounded, but I would
appreciate some clarification here.

[...]
> >
> > Questions for Reviewers
> > =======================
> > 1. Are you happy with the changes to the DAMON sysfs interface?
>
> I'm happy with it for RFC level implementation.  And in my opinion, you now
> proved this is a good idea.  For next steps toward mainline landing, I'd like
> to suggest below interface change.
>
> Let's allow users specify DAMOS_MIGRATE_{HOT,COLD} target nodes and weights
> using only DAMON interface.  And let the user-space tool do the synchronization
> with weighted interleaving or other required works.
>
> This may require writing not small amount of code, especially for DAMON sysfs
> interface.  I think it is doable, though.  If you don't mind, I'd like to
> quickly make a prototype and share with you.
>
> What do you think?

That sounds good to me! Having a prototype from you for the sysfs
interface would certainly be helpful, but if you're busy, I can take a
pass at it as well.

> > 2. Setting an interleave weight to 0 is currently not allowed. This makes
> >    sense when the weights are only used for allocation. Does it make sense
> >    to allow 0 weights now?
>
> I have no opinion, and would like to let mempolicy folks make voices.  But if
> we go on the decoupling approach as I suggested above, we can do this
> discussion in a separate thread :)
>
> [...]
> > Revision History
> > ================
> > Changes from v1
> > (https://lore.kernel.org/linux-mm/20250612181330.31236-1-bijan311@gmail.com/)
> > - Reuse migrate_{hot,cold} actions instead of creating a new action
> > - Remove vaddr implementation
> > - Remove most of the use of mempolicy, instead duplicate the interleave
> >   logic and access interleave weights directly
> > - Write more about the use case in the cover letter
> > - Write about why DAMON was used for this in the cover letter
> > - Add correctness test to the cover letter
> > - Add performance test
>
> Again, thank you for revisioning.  Please bear in mind with me at next steps.
> I believe this work is very promising.

Thank you for your help and feedback!
Bijan

[...]


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH v2 0/2] mm/damon/paddr: Allow interleaving in migrate_{hot,cold} actions
  2025-06-20 21:47   ` Bijan Tabatabai
@ 2025-06-20 23:13     ` SeongJae Park
  2025-06-21 17:36       ` SeongJae Park
  0 siblings, 1 reply; 30+ messages in thread
From: SeongJae Park @ 2025-06-20 23:13 UTC (permalink / raw)
  To: Bijan Tabatabai
  Cc: SeongJae Park, damon, linux-mm, linux-kernel, akpm, david, ziy,
	matthew.brost, joshua.hahnjy, rakie.kim, byungchul, gourry,
	ying.huang, apopple, bijantabatab, venkataravis, emirakhur,
	ajayjoshi, vtavarespetr

On Fri, 20 Jun 2025 16:47:26 -0500 Bijan Tabatabai <bijan311@gmail.com> wrote:

> Hi SeongJae,
> 
> On Fri, Jun 20, 2025 at 3:21 PM SeongJae Park <sj@kernel.org> wrote:
> >
[...]
> > Also, even for general use case, I think such user-space intervention is not
> > too much request.  Please let me know if I'm wrong.
> 
> You are correct. The userspace tool would be coming up with the
> weights, so it would not be hard for it to write those weights to two
> places. I coupled the weights used in DAMON and weighted interleaving
> for this revision and the previous because I could not think of a use
> case where you would want to use different weights for allocation time
> and migration, so it felt silly to have two different places with the
> same data. However, I don't feel too strongly about this, so I'm
> willing to defer to your judgement.

Thank you for being kind and flexible about the decision.  One such use case I
can think off the top of my head is when users want to do memory tiering, and
there are multiple nodes of same tier.  For example, if users want to migrate
hot pages in a node to the upper tier, and there are multiple nodes for the
tier, users may want to do the migration with same weight, or in proportion to
their free space.

So let's push this way for now.  Nothing is set on the stone, so please feel
free to let me know if you feel differently later.

> 
> Also, our userspace tool updates these weights somewhat frequently,
> several times per minute, when it detects a change in the bandwidth
> utilization of the system to calibrate the interleave ratio. I am
> concerned about how frequent changes to the scheme via the sysfs
> interface will affect the effectiveness of DAMON's page sampling. From
> what I understand, updates to the sysfs aren't saved until the user
> writes to some sysfs file to commit them,

This is correct.

> then the damon context is
> recreated from scratch.  Would this throw away all the previous sampling work
> done and work splitting and merging regions?

This is how an early version of DAMON sysfs interface was working.  Your
concern was true for the version.  Hence, we implemented online tuning
feature[1].  If you use DAMON user-space tool, 'damo tune'[2] is the command
for using this feature.  So, as long as you use the feature, updating weights
several times per minute shouldn't make such issues.

[1] https://lore.kernel.org/20220429160606.127307-1-sj@kernel.org
[2] https://github.com/damonitor/damo/blob/next/USAGE.md#damo-tune

> I am not
> super familiar with how the sysfs interface interacts with the rest of
> the system, so this concern might be entirely unfounded, but I would
> appreciate some clarification here.

Thank you for asking.  Please feel free to ask any more questions as you need!

[...]
> > This may require writing not small amount of code, especially for DAMON sysfs
> > interface.  I think it is doable, though.  If you don't mind, I'd like to
> > quickly make a prototype and share with you.
> >
> > What do you think?
> 
> That sounds good to me! Having a prototype from you for the sysfs
> interface would certainly be helpful, but if you're busy, I can take a
> pass at it as well.

Great.  I will try to do this by this weekend.

[...]
> Thank you for your help and feedback!

The pleasure if mine!


Thanks,
SJ

[...]


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH v2 0/2] mm/damon/paddr: Allow interleaving in migrate_{hot,cold} actions
  2025-06-20 23:13     ` SeongJae Park
@ 2025-06-21 17:36       ` SeongJae Park
  2025-06-23 14:39         ` Bijan Tabatabai
  0 siblings, 1 reply; 30+ messages in thread
From: SeongJae Park @ 2025-06-21 17:36 UTC (permalink / raw)
  To: SeongJae Park
  Cc: Bijan Tabatabai, damon, linux-mm, linux-kernel, akpm, david, ziy,
	matthew.brost, joshua.hahnjy, rakie.kim, byungchul, gourry,
	ying.huang, apopple, bijantabatab, venkataravis, emirakhur,
	ajayjoshi, vtavarespetr

On Fri, 20 Jun 2025 16:13:24 -0700 SeongJae Park <sj@kernel.org> wrote:

> On Fri, 20 Jun 2025 16:47:26 -0500 Bijan Tabatabai <bijan311@gmail.com> wrote:
[...]
> > That sounds good to me! Having a prototype from you for the sysfs
> > interface would certainly be helpful, but if you're busy, I can take a
> > pass at it as well.
> 
> Great.  I will try to do this by this weekend.

I just posted an RFC for this:
https://lore.kernel.org/20250621173131.23917-1-sj@kernel.org

Note that I didn't run exhaustive tests with it, so it may have silly bugs.  I
believe it could let you knwo what I'm thinking better than my cheap talks,
though ;)


Thanks,
SJ

[...]


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH v2 2/2] mm/damon/paddr: Allow multiple migrate targets
  2025-06-20 18:04 ` [RFC PATCH v2 2/2] mm/damon/paddr: Allow multiple migrate targets Bijan Tabatabai
@ 2025-06-21 18:02   ` SeongJae Park
  2025-06-21 18:11     ` SeongJae Park
  2025-06-23 14:16     ` Bijan Tabatabai
  0 siblings, 2 replies; 30+ messages in thread
From: SeongJae Park @ 2025-06-21 18:02 UTC (permalink / raw)
  To: Bijan Tabatabai
  Cc: SeongJae Park, damon, linux-mm, linux-kernel, akpm, david, ziy,
	matthew.brost, joshua.hahnjy, rakie.kim, byungchul, gourry,
	ying.huang, apopple, bijantabatab, venkataravis, emirakhur,
	ajayjoshi, vtavarespetr

Hi Bijan,

On Fri, 20 Jun 2025 13:04:58 -0500 Bijan Tabatabai <bijan311@gmail.com> wrote:

> From: Bijan Tabatabai <bijantabatab@micron.com>
> 
> The migrate_{hot,cold} DAMONS actions take a parameter, target_nid, to
> indicate what node the actions should migrate pages to. In this patch,
> we allow passing in a list of migration targets into target_nid. When
> this is done, the mirgate_{hot, cold} actions will migrate pages between
> the specified nodes using the global interleave weights found at
> /sys/kernel/mm/mempolicy/weighted_interleave/node<N>. This functionality
> can be used to dynamically adjust how pages are interleaved in response
> to changes in bandwidth utilization to improve performance, as discussed
> in [1]. When only a single migration target is passed to target_nid, the
> migrate_{hot,cold} actions will act the same as before.
[...]
>  include/linux/damon.h    |   8 +--
>  mm/damon/core.c          |   9 ++--
>  mm/damon/lru_sort.c      |   2 +-
>  mm/damon/paddr.c         | 108 +++++++++++++++++++++++++++++++++++++--
>  mm/damon/reclaim.c       |   2 +-
>  mm/damon/sysfs-schemes.c |  14 +++--
>  samples/damon/mtier.c    |   6 ++-
>  samples/damon/prcl.c     |   2 +-
>  8 files changed, 131 insertions(+), 20 deletions(-)

If we keep pursuing making DAMON users be able to specify multiple migration
destination nodes and their weights[1], I think we may need only paddr.c part
change of this patch in the final version of this great work.

[...]
>  static unsigned long damon_pa_migrate(struct damon_region *r, struct damos *s,
>  		unsigned long *sz_filter_passed)
>  {
>  	unsigned long addr, applied;
> -	LIST_HEAD(folio_list);
> +	struct rmap_walk_control rwc;
[...]
>  
>  	addr = r->ar.start;
>  	while (addr < r->ar.end) {
> @@ -522,15 +599,38 @@ static unsigned long damon_pa_migrate(struct damon_region *r, struct damos *s,
>  		else
>  			*sz_filter_passed += folio_size(folio);
>  
> +		/*
> +		 * If there is only one target node, migrate there. Otherwise,
> +		 * interleave across the nodes according to the global
> +		 * interleave weights
> +		 */
> +		if (nr_nodes == 1) {
> +			target_nid = first_node(s->target_nids);
> +		} else {
> +			target_nid = NUMA_NO_NODE;
> +			/* Updates target_nid */
> +			rmap_walk(folio, &rwc);
> +		}

So we are doing rmap_walk(), which is known to be not very fast, for getting
the target node id of this page, in a way very similar to that of weighted
interleaving, right?  I don't think we really need to behave that same to
weighted interleaving with the cost.

I'd hence suggest to implement and use a simple weights handling mechanism
here.  It could be roud-robin way, like weighted interleaving, or probabilistic
way, using damon_rand().

The round-robin way may be simpler in my opinion.  For example,

unsigned int damos_pa_nid_to_migrate(struct damos_migrate_dest *dest)
{
	static unsigned int nr_migrated = 0;
	unsigned int total_weight = 0;
	unsigned int weights_to_ignore;
	size_t i;

	for (i = 0; i < dest->nr_dests; i++)
		total_weight += dest->weight_arr[i];
	weights_to_ignore = nr_migrate++ % total_weight;
	total_weight = 0;
	for (i = 0; i < dest->nr_dests; i++) {
		total_weight += dest->weight_arr[i];
		if (total_weight >= weights_to_ignore)
			return dest->node_id_arr[i];
	}
	WARN_ON_ONCE(1, "I don't know what I did wrong");
	return 0;
}

Then, we could replace the above rmap_walk() call with this one.  What do you
think?

Nothing stands out from other parts to me.  I will do more thorough read on the
next revision, though.

[1] https://lore.kernel.org/20250621173639.35906-1-sj@kernel.org


Thanks,
SJ

[...]


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH v2 2/2] mm/damon/paddr: Allow multiple migrate targets
  2025-06-21 18:02   ` SeongJae Park
@ 2025-06-21 18:11     ` SeongJae Park
  2025-06-23 14:08       ` Joshua Hahn
  2025-06-23 14:27       ` Bijan Tabatabai
  2025-06-23 14:16     ` Bijan Tabatabai
  1 sibling, 2 replies; 30+ messages in thread
From: SeongJae Park @ 2025-06-21 18:11 UTC (permalink / raw)
  To: SeongJae Park
  Cc: Bijan Tabatabai, damon, linux-mm, linux-kernel, akpm, david, ziy,
	matthew.brost, joshua.hahnjy, rakie.kim, byungchul, gourry,
	ying.huang, apopple, bijantabatab, venkataravis, emirakhur,
	ajayjoshi, vtavarespetr

On Sat, 21 Jun 2025 11:02:15 -0700 SeongJae Park <sj@kernel.org> wrote:

[...]
> I'd hence suggest to implement and use a simple weights handling mechanism
> here.  It could be roud-robin way, like weighted interleaving, or probabilistic
> way, using damon_rand().
> 
> The round-robin way may be simpler in my opinion.  For example,
> 
> unsigned int damos_pa_nid_to_migrate(struct damos_migrate_dest *dest)
> {
> 	static unsigned int nr_migrated = 0;
> 	unsigned int total_weight = 0;
> 	unsigned int weights_to_ignore;
> 	size_t i;
> 
> 	for (i = 0; i < dest->nr_dests; i++)
> 		total_weight += dest->weight_arr[i];
> 	weights_to_ignore = nr_migrate++ % total_weight;

Actually, probabilistic way may be not that complicated.  Maybe we could to
below here.

	return damon_rand(0, total_weight) >= weight_to_ignore;

> 	total_weight = 0;
> 	for (i = 0; i < dest->nr_dests; i++) {
> 		total_weight += dest->weight_arr[i];
> 		if (total_weight >= weights_to_ignore)
> 			return dest->node_id_arr[i];
> 	}
> 	WARN_ON_ONCE(1, "I don't know what I did wrong");
> 	return 0;
> }

But damon_rand() might be more expensive than the roud-robin way, and arguably
roud-robin way is what usrs who familiar with weighted interleaving may easily
expect and even prefer?  I have no preferrence here.


Thanks,
SJ

[...]


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH v2 0/2] mm/damon/paddr: Allow interleaving in migrate_{hot,cold} actions
  2025-06-20 18:04 [RFC PATCH v2 0/2] mm/damon/paddr: Allow interleaving in migrate_{hot,cold} actions Bijan Tabatabai
                   ` (2 preceding siblings ...)
  2025-06-20 20:21 ` [RFC PATCH v2 0/2] mm/damon/paddr: Allow interleaving in migrate_{hot,cold} actions SeongJae Park
@ 2025-06-23 13:45 ` Joshua Hahn
  2025-06-23 14:57   ` Bijan Tabatabai
  3 siblings, 1 reply; 30+ messages in thread
From: Joshua Hahn @ 2025-06-23 13:45 UTC (permalink / raw)
  To: Bijan Tabatabai
  Cc: damon, linux-mm, linux-kernel, sj, akpm, david, ziy,
	matthew.brost, joshua.hahnjy, rakie.kim, byungchul, gourry,
	ying.huang, apopple, bijantabatab, venkataravis, emirakhur,
	ajayjoshi, vtavarespetr

On Fri, 20 Jun 2025 13:04:56 -0500 Bijan Tabatabai <bijan311@gmail.com> wrote:

Hi Bijan,

I hope you are doing well! Sorry for the late response. It seems like SJ
already gave some great feedback already though, so I will just chime in
with my 2c.

[...snip...]

> However, currently the interleave weights only are applied when data is
> allocated. Migrating already allocated pages according to the dynamically
> changing weights will better help balance the bandwidth utilization across
> nodes.
> 
> As a toy example, imagine some application that uses 75% of the local
> bandwidth. Assuming sufficient capacity, when running alone, we want to
> keep that application's data in local memory. However, if a second
> instance of that application begins, using the same amount of bandwidth,
> it would be best to interleave the data of both processes to alleviate the
> bandwidth pressure from the local node. Likewise, when one of the processes
> ends, the data should be moves back to local memory.

I think the addition of this example helps illustrate the neccesity for
interleaving, thank you for adding it in!

> We imagine there would be a userspace application that would monitor system
> performance characteristics, such as bandwidth utilization or memory access
> latency, and uses that information to tune the interleave weights. Others
> seem to have come to a similar conclusion in previous discussions [3].
>
> Functionality Test
> ==================

[...snip...]

> Performance Test
> ================
> Below is a simple example showing that interleaving application data using
> these patches can improve application performance.
> To do this, we run a bandwidth intensive embedding reduction application
> [5]. This workload is useful for this test because it reports the time it
> takes each iteration to run and reuses its buffers between allocation,
> allowing us to clearly see the benefits of the migration.
> 
> We evaluate this a 128 core/256 thread AMD CPU, with 72 GB/s of local DDR
> bandwidth and 26 GB/s of CXL memory.
> 
> Before we start the workload, the system bandwidth utilization is low, so
> we start with interleave weights biased as much as possible to the local
> node. When the workload begins, it saturates the local bandwidth, making
> the page placement suboptimal. To alleviate this, we modify the interleave
> weights, triggering DAMON to migrate the workload's data.
> 
>   $ cd /sys/kernel/mm/damon/admin/kdamonds/0/
>   $ sudo cat ./contexts/0/schemes/0/action
>   migrate_hot
>   $ sudo cat ./contexts/0/schemes/0/target_nid
>   0-1
>   $ echo 255 | sudo tee /sys/kernel/mm/mempolicy/weighted_interleave/node0
>   $ echo 1 | sudo tee /sys/kernel/mm/mempolicy/weighted_interleave/node1
>   $ <path>/eval_baseline -d amazon_All -c 255 -r 100
>   <clip startup output>
>   Eval Phase 3: Running Baseline...
> 
>   REPEAT # 0 Baseline Total time : 9043.24 ms
>   REPEAT # 1 Baseline Total time : 7307.71 ms
>   REPEAT # 2 Baseline Total time : 7301.4 ms
>   REPEAT # 3 Baseline Total time : 7312.44 ms
>   REPEAT # 4 Baseline Total time : 7282.43 ms
>   # Interleave weights changed to 3:1
>   REPEAT # 5 Baseline Total time : 6754.78 ms
>   REPEAT # 6 Baseline Total time : 5322.38 ms
>   REPEAT # 7 Baseline Total time : 5359.89 ms
>   REPEAT # 8 Baseline Total time : 5346.14 ms
>   REPEAT # 9 Baseline Total time : 5321.98 ms
> 
> Updating the interleave weights, and having DAMON migrate the workload
> data according to the weights resulted in an approximately 25% speedup.

Thank you for sharing these very impressive results! So if I can understand
correctly, this workload allocates once (mostly), and each iteration just
re-uses the same allocation, meaning the effects of the weighted interleave
change are isolated mostly to the migration portion.

Based on that understanding, I'm wondering if a longer benchmark would help
demonstrate the effects of this patch a bit better. That is, IIRC short-lived
workloads should see most of its benefits come from correct allocation,
while longer-lived workloads should see most of its benefits come from
correct migration policies. I don't have a good idea of what the threshold
is for characterizing short vs. long workloads, but I think this could be
another prospective test you can use to demonstrate the gains of your patch.

One last thing that I wanted to note is that it seems like iteration 5, where
I imagine there is some additional work needed to balance the page placement
from 255:0 to 3:1 *still* outperforms the normal case in the original
benchmark. Really awesome!!!

> Questions for Reviewers
> =======================
> 1. Are you happy with the changes to the DAMON sysfs interface?
> 2. Setting an interleave weight to 0 is currently not allowed. This makes
>    sense when the weights are only used for allocation. Does it make sense
>    to allow 0 weights now?

If the goal of 0 weights is to prevent migration to that node, I think that
we should try to re-use existing mechanisms. There was actually quite a bit
of discussion on whether 0 weights should be allowed (the entire converstaion
was split across multiple versions, but I think this is the first instance [1]).

How about using nodemasks instead? I think that they serve a more explicit
purpose of preventing certain nodes from being used. Please let me know if
I'm missing something as to why we cannot use nodemasks here : -) 

[...snip...]

One last thing that I wanted to note -- given that these weights now serve
a dual purpose of setting allocation & migration weights, does it make sense
to update the weighted interleave documentation with this information as well?
Or, since it really only affects DAMON users, should we be ok with leaving it
out?

My preference is that we include it in weighted interleave documentation
(Documentation/ABI/testing/sysfs-kernel-mm-mempolikcy-weighted-interleave)
so that anyone who edits weighted interleave code in the future will at least
be aware that the changes they make will have effects in other subsystems.

Thank you for sharing these results! I hope you have a great day :):)
Joshua

[1] https://lore.kernel.org/all/87msfkh1ls.fsf@DESKTOP-5N7EMDA/

Sent using hkml (https://github.com/sjp38/hackermail)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH v2 2/2] mm/damon/paddr: Allow multiple migrate targets
  2025-06-21 18:11     ` SeongJae Park
@ 2025-06-23 14:08       ` Joshua Hahn
  2025-06-23 16:50         ` SeongJae Park
  2025-06-23 14:27       ` Bijan Tabatabai
  1 sibling, 1 reply; 30+ messages in thread
From: Joshua Hahn @ 2025-06-23 14:08 UTC (permalink / raw)
  To: SeongJae Park
  Cc: Bijan Tabatabai, damon, linux-mm, linux-kernel, akpm, david, ziy,
	matthew.brost, joshua.hahnjy, rakie.kim, byungchul, gourry,
	ying.huang, apopple, bijantabatab, venkataravis, emirakhur,
	ajayjoshi, vtavarespetr

On Sat, 21 Jun 2025 11:11:27 -0700 SeongJae Park <sj@kernel.org> wrote:

> On Sat, 21 Jun 2025 11:02:15 -0700 SeongJae Park <sj@kernel.org> wrote:
> 
> [...]
> > I'd hence suggest to implement and use a simple weights handling mechanism
> > here.  It could be roud-robin way, like weighted interleaving, or probabilistic
> > way, using damon_rand().
> > 
> > The round-robin way may be simpler in my opinion.  For example,

[...snip...]

> Actually, probabilistic way may be not that complicated.  Maybe we could to
> below here.

[...snip...]

> But damon_rand() might be more expensive than the roud-robin way, and arguably
> roud-robin way is what usrs who familiar with weighted interleaving may easily
> expect and even prefer?  I have no preferrence here.

Hi SJ,

If you have no preference here, I would like to add some thoughts : -)

I think that code complexity aside, round-robin may be the better choice for
a few reasons. Like you mentioned, I think it is what users might be used to,
if they are coming from weighted interleave code. Also, I think a round-robin
way will prevent worst-case scenarios where we get a long stretch of allocations
on the "wrong" node (but maybe this isn't a big deal, since it is so unlikely).

Finaly -- If we run workloads with mempolicy wet to weighted interleave
*and* with the weights already set, then pages will be allocated in a
round-robin fashion. I think it may be best to try and minimize migration costs
by trying to keep these weights in-sync. That is, if we have a 2:1 ratio,
we will have the following allocation:

node0 | oo oo oo oo oo oo oo ...
node1 |   o  o  o  o  o  o   ...

Using a probabilistic migration, it might change the pattern:

node0 |   oooo oo o ooo oo ...
node1 | oo    o  o o   o   ...

That is, the ratio might be preserved, but we may be doing unnecessary
migrations, since a probabilistic allocation isn't aware of any underlying
patterns. With a round-robin allocation, we have a 1/total_weight chance that
there will be no additional migrations, depending on where the round-robin
begins. I also want to note that weighted interleave auto-tuning is written
to minimize total_weight.

I'm wondering what you think about this. Perhaps there is a way to know where
the "beginning" of round-robin should begin, so that we try to keep the
allocation & migration pattern as in-sync as possible? I have a suspicion
that I am way over-thinking this, and none of this really has a tangible
impact on performance as well ;)

Thank you as always SJ, have a great day!!
Joshua

> Thanks,
> SJ
> 
> [...]
> 

Sent using hkml (https://github.com/sjp38/hackermail)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH v2 2/2] mm/damon/paddr: Allow multiple migrate targets
  2025-06-21 18:02   ` SeongJae Park
  2025-06-21 18:11     ` SeongJae Park
@ 2025-06-23 14:16     ` Bijan Tabatabai
  2025-06-23 17:52       ` SeongJae Park
  1 sibling, 1 reply; 30+ messages in thread
From: Bijan Tabatabai @ 2025-06-23 14:16 UTC (permalink / raw)
  To: SeongJae Park
  Cc: damon, linux-mm, linux-kernel, akpm, david, ziy, matthew.brost,
	joshua.hahnjy, rakie.kim, byungchul, gourry, ying.huang, apopple,
	bijantabatab, venkataravis, emirakhur, ajayjoshi, vtavarespetr

On Sat, Jun 21, 2025 at 1:02 PM SeongJae Park <sj@kernel.org> wrote:
>
> Hi Bijan,
>
> On Fri, 20 Jun 2025 13:04:58 -0500 Bijan Tabatabai <bijan311@gmail.com> wrote:
>
> > From: Bijan Tabatabai <bijantabatab@micron.com>
> >
> > The migrate_{hot,cold} DAMONS actions take a parameter, target_nid, to
> > indicate what node the actions should migrate pages to. In this patch,
> > we allow passing in a list of migration targets into target_nid. When
> > this is done, the mirgate_{hot, cold} actions will migrate pages between
> > the specified nodes using the global interleave weights found at
> > /sys/kernel/mm/mempolicy/weighted_interleave/node<N>. This functionality
> > can be used to dynamically adjust how pages are interleaved in response
> > to changes in bandwidth utilization to improve performance, as discussed
> > in [1]. When only a single migration target is passed to target_nid, the
> > migrate_{hot,cold} actions will act the same as before.
> [...]
> >  include/linux/damon.h    |   8 +--
> >  mm/damon/core.c          |   9 ++--
> >  mm/damon/lru_sort.c      |   2 +-
> >  mm/damon/paddr.c         | 108 +++++++++++++++++++++++++++++++++++++--
> >  mm/damon/reclaim.c       |   2 +-
> >  mm/damon/sysfs-schemes.c |  14 +++--
> >  samples/damon/mtier.c    |   6 ++-
> >  samples/damon/prcl.c     |   2 +-
> >  8 files changed, 131 insertions(+), 20 deletions(-)
>
> If we keep pursuing making DAMON users be able to specify multiple migration
> destination nodes and their weights[1], I think we may need only paddr.c part
> change of this patch in the final version of this great work.

Sounds good to me.

> [...]
> >  static unsigned long damon_pa_migrate(struct damon_region *r, struct damos *s,
> >               unsigned long *sz_filter_passed)
> >  {
> >       unsigned long addr, applied;
> > -     LIST_HEAD(folio_list);
> > +     struct rmap_walk_control rwc;
> [...]
> >
> >       addr = r->ar.start;
> >       while (addr < r->ar.end) {
> > @@ -522,15 +599,38 @@ static unsigned long damon_pa_migrate(struct damon_region *r, struct damos *s,
> >               else
> >                       *sz_filter_passed += folio_size(folio);
> >
> > +             /*
> > +              * If there is only one target node, migrate there. Otherwise,
> > +              * interleave across the nodes according to the global
> > +              * interleave weights
> > +              */
> > +             if (nr_nodes == 1) {
> > +                     target_nid = first_node(s->target_nids);
> > +             } else {
> > +                     target_nid = NUMA_NO_NODE;
> > +                     /* Updates target_nid */
> > +                     rmap_walk(folio, &rwc);
> > +             }
>
> So we are doing rmap_walk(), which is known to be not very fast, for getting
> the target node id of this page, in a way very similar to that of weighted
> interleaving, right?  I don't think we really need to behave that same to
> weighted interleaving with the cost.
>
> I'd hence suggest to implement and use a simple weights handling mechanism
> here.  It could be roud-robin way, like weighted interleaving, or probabilistic
> way, using damon_rand().
>
> The round-robin way may be simpler in my opinion.  For example,
>
> unsigned int damos_pa_nid_to_migrate(struct damos_migrate_dest *dest)
> {
>         static unsigned int nr_migrated = 0;
>         unsigned int total_weight = 0;
>         unsigned int weights_to_ignore;
>         size_t i;
>
>         for (i = 0; i < dest->nr_dests; i++)
>                 total_weight += dest->weight_arr[i];
>         weights_to_ignore = nr_migrate++ % total_weight;
>         total_weight = 0;
>         for (i = 0; i < dest->nr_dests; i++) {
>                 total_weight += dest->weight_arr[i];
>                 if (total_weight >= weights_to_ignore)
>                         return dest->node_id_arr[i];
>         }
>         WARN_ON_ONCE(1, "I don't know what I did wrong");
>         return 0;
> }
>
> Then, we could replace the above rmap_walk() call with this one.  What do you
> think?

I do actually think doing the interleaving based on the VMA offset is
important for a couple of reasons.

1. If also using the weighted interleaving mempolicy, and the DAMON
weights are the same as the mempolicy weights, DAMON won't have to
migrate newly allocated pages. This is relatively minor, but helps
avoid unnecessary work.

2. More importantly, I believe this approach will cause a lot of
needless ping-ponging, where the same folios are being moved around
when they don't need to be. For example, let's say folios A-F are hot,
and just for simplification, if they are on the same node, they will
be in the same DAMON region, and only those folios are in those DAMON
regions. If all the folios start in Node 0 and both nodes have a
weight of 1, we have:

nr_migrated = 0
Node 0           Node 1
----------           ----------
A-F                  <empty>

After the scheme is first applied

nr_migrated = 6
Node 0           Node 1
----------           ----------
A,C,E              B,D,F

This is fine, but these folios are still hot, so the scheme will be
applied to them again

nr_migrated = 12
Node 0           Node 1
----------           ----------
A,E,D             C,D,F

If I am understanding your code sample correctly, this will continue
to happen each time the scheme is applied, causing folios to be
migrated for no reason. Using the VMA offset to determine where a page
should be placed avoids this problem because it gives a folio a single
node it can be in for a given set of interleave weights. This means
that in steady state, no folios will be migrated.

I see what you're saying about rmap_walks being expensive, but since
DAMON operates off the critical path for the workload, I don't think
the cost is that problematic.

[...]

Let me know what you think,
Bijan


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH v2 2/2] mm/damon/paddr: Allow multiple migrate targets
  2025-06-21 18:11     ` SeongJae Park
  2025-06-23 14:08       ` Joshua Hahn
@ 2025-06-23 14:27       ` Bijan Tabatabai
  2025-06-23 16:52         ` SeongJae Park
  1 sibling, 1 reply; 30+ messages in thread
From: Bijan Tabatabai @ 2025-06-23 14:27 UTC (permalink / raw)
  To: SeongJae Park
  Cc: damon, linux-mm, linux-kernel, akpm, david, ziy, matthew.brost,
	joshua.hahnjy, rakie.kim, byungchul, gourry, ying.huang, apopple,
	bijantabatab, venkataravis, emirakhur, ajayjoshi, vtavarespetr

On Sat, Jun 21, 2025 at 1:11 PM SeongJae Park <sj@kernel.org> wrote:
>
> On Sat, 21 Jun 2025 11:02:15 -0700 SeongJae Park <sj@kernel.org> wrote:
>
> [...]
> > I'd hence suggest to implement and use a simple weights handling mechanism
> > here.  It could be roud-robin way, like weighted interleaving, or probabilistic
> > way, using damon_rand().
> >
> > The round-robin way may be simpler in my opinion.  For example,
> >
> > unsigned int damos_pa_nid_to_migrate(struct damos_migrate_dest *dest)
> > {
> >       static unsigned int nr_migrated = 0;
> >       unsigned int total_weight = 0;
> >       unsigned int weights_to_ignore;
> >       size_t i;
> >
> >       for (i = 0; i < dest->nr_dests; i++)
> >               total_weight += dest->weight_arr[i];
> >       weights_to_ignore = nr_migrate++ % total_weight;
>
> Actually, probabilistic way may be not that complicated.  Maybe we could to
> below here.
>
>         return damon_rand(0, total_weight) >= weight_to_ignore;
>
> >       total_weight = 0;
> >       for (i = 0; i < dest->nr_dests; i++) {
> >               total_weight += dest->weight_arr[i];
> >               if (total_weight >= weights_to_ignore)
> >                       return dest->node_id_arr[i];
> >       }
> >       WARN_ON_ONCE(1, "I don't know what I did wrong");
> >       return 0;
> > }
>
> But damon_rand() might be more expensive than the roud-robin way, and arguably
> roud-robin way is what usrs who familiar with weighted interleaving may easily
> expect and even prefer?  I have no preferrence here.

I think my comments about the round-robin approach also apply here,
but all things equal, I generally prefer being deterministic over
being probabilistic when possible.

Bijan

[...]


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH v2 0/2] mm/damon/paddr: Allow interleaving in migrate_{hot,cold} actions
  2025-06-21 17:36       ` SeongJae Park
@ 2025-06-23 14:39         ` Bijan Tabatabai
  2025-06-23 16:32           ` SeongJae Park
  0 siblings, 1 reply; 30+ messages in thread
From: Bijan Tabatabai @ 2025-06-23 14:39 UTC (permalink / raw)
  To: SeongJae Park
  Cc: damon, linux-mm, linux-kernel, akpm, david, ziy, matthew.brost,
	joshua.hahnjy, rakie.kim, byungchul, gourry, ying.huang, apopple,
	bijantabatab, venkataravis, emirakhur, ajayjoshi, vtavarespetr

On Sat, Jun 21, 2025 at 12:36 PM SeongJae Park <sj@kernel.org> wrote:
>
> On Fri, 20 Jun 2025 16:13:24 -0700 SeongJae Park <sj@kernel.org> wrote:
>
> > On Fri, 20 Jun 2025 16:47:26 -0500 Bijan Tabatabai <bijan311@gmail.com> wrote:
> [...]
> > > That sounds good to me! Having a prototype from you for the sysfs
> > > interface would certainly be helpful, but if you're busy, I can take a
> > > pass at it as well.
> >
> > Great.  I will try to do this by this weekend.
>
> I just posted an RFC for this:
> https://lore.kernel.org/20250621173131.23917-1-sj@kernel.org
>
> Note that I didn't run exhaustive tests with it, so it may have silly bugs.  I
> believe it could let you knwo what I'm thinking better than my cheap talks,
> though ;)

I only skimmed through it, but this looks good to me, thanks for
throwing this together! I'll probe more when I use these patches for
the next revision.

I'll try to reply in that thread as well, but it was only sent to my
work email. I'll try to set up my work Outlook client to work with the
mailing list format, but in the future, could you also CC my personal
email (bijan311@gmail.com)? Gmail just makes formatting for the
mailing list easier for me.

Thanks,
Bijan

[...]


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH v2 0/2] mm/damon/paddr: Allow interleaving in migrate_{hot,cold} actions
  2025-06-23 13:45 ` Joshua Hahn
@ 2025-06-23 14:57   ` Bijan Tabatabai
  0 siblings, 0 replies; 30+ messages in thread
From: Bijan Tabatabai @ 2025-06-23 14:57 UTC (permalink / raw)
  To: Joshua Hahn
  Cc: damon, linux-mm, linux-kernel, sj, akpm, david, ziy,
	matthew.brost, rakie.kim, byungchul, gourry, ying.huang, apopple,
	bijantabatab, venkataravis, emirakhur, ajayjoshi, vtavarespetr

Hi Joshua,

On Mon, Jun 23, 2025 at 8:45 AM Joshua Hahn <joshua.hahnjy@gmail.com> wrote:
>
> On Fri, 20 Jun 2025 13:04:56 -0500 Bijan Tabatabai <bijan311@gmail.com> wrote:
>
> Hi Bijan,
>
> I hope you are doing well! Sorry for the late response.

No need to be sorry. I have no expectation patches sent on a Friday
afternoon will be looked at over the weekend.

[...]
> > Performance Test
> > ================
> > Below is a simple example showing that interleaving application data using
> > these patches can improve application performance.
> > To do this, we run a bandwidth intensive embedding reduction application
> > [5]. This workload is useful for this test because it reports the time it
> > takes each iteration to run and reuses its buffers between allocation,
> > allowing us to clearly see the benefits of the migration.
> >
> > We evaluate this a 128 core/256 thread AMD CPU, with 72 GB/s of local DDR
> > bandwidth and 26 GB/s of CXL memory.
> >
> > Before we start the workload, the system bandwidth utilization is low, so
> > we start with interleave weights biased as much as possible to the local
> > node. When the workload begins, it saturates the local bandwidth, making
> > the page placement suboptimal. To alleviate this, we modify the interleave
> > weights, triggering DAMON to migrate the workload's data.
> >
> >   $ cd /sys/kernel/mm/damon/admin/kdamonds/0/
> >   $ sudo cat ./contexts/0/schemes/0/action
> >   migrate_hot
> >   $ sudo cat ./contexts/0/schemes/0/target_nid
> >   0-1
> >   $ echo 255 | sudo tee /sys/kernel/mm/mempolicy/weighted_interleave/node0
> >   $ echo 1 | sudo tee /sys/kernel/mm/mempolicy/weighted_interleave/node1
> >   $ <path>/eval_baseline -d amazon_All -c 255 -r 100
> >   <clip startup output>
> >   Eval Phase 3: Running Baseline...
> >
> >   REPEAT # 0 Baseline Total time : 9043.24 ms
> >   REPEAT # 1 Baseline Total time : 7307.71 ms
> >   REPEAT # 2 Baseline Total time : 7301.4 ms
> >   REPEAT # 3 Baseline Total time : 7312.44 ms
> >   REPEAT # 4 Baseline Total time : 7282.43 ms
> >   # Interleave weights changed to 3:1
> >   REPEAT # 5 Baseline Total time : 6754.78 ms
> >   REPEAT # 6 Baseline Total time : 5322.38 ms
> >   REPEAT # 7 Baseline Total time : 5359.89 ms
> >   REPEAT # 8 Baseline Total time : 5346.14 ms
> >   REPEAT # 9 Baseline Total time : 5321.98 ms
> >
> > Updating the interleave weights, and having DAMON migrate the workload
> > data according to the weights resulted in an approximately 25% speedup.
>
> Thank you for sharing these very impressive results! So if I can understand
> correctly, this workload allocates once (mostly), and each iteration just
> re-uses the same allocation, meaning the effects of the weighted interleave
> change are isolated mostly to the migration portion.

That's correct.

> Based on that understanding, I'm wondering if a longer benchmark would help
> demonstrate the effects of this patch a bit better. That is, IIRC short-lived
> workloads should see most of its benefits come from correct allocation,
> while longer-lived workloads should see most of its benefits come from
> correct migration policies. I don't have a good idea of what the threshold
> is for characterizing short vs. long workloads, but I think this could be
> another prospective test you can use to demonstrate the gains of your patch.

You might be right. I'll try to think of something for the next
revision, but no promises.

[...]
> > Questions for Reviewers
> > =======================
> > 1. Are you happy with the changes to the DAMON sysfs interface?
> > 2. Setting an interleave weight to 0 is currently not allowed. This makes
> >    sense when the weights are only used for allocation. Does it make sense
> >    to allow 0 weights now?
>
> If the goal of 0 weights is to prevent migration to that node, I think that
> we should try to re-use existing mechanisms. There was actually quite a bit
> of discussion on whether 0 weights should be allowed (the entire converstaion
> was split across multiple versions, but I think this is the first instance [1]).

Thanks, I'll look over this.

> How about using nodemasks instead? I think that they serve a more explicit
> purpose of preventing certain nodes from being used. Please let me know if
> I'm missing something as to why we cannot use nodemasks here : -)

I think since we're moving towards DAMON having its own weights, this
would only apply to mempolicy. Changing an application's mempolic
nodemask would be nice, but based off Gregory's previous comments,
having something outside the application change that application's
nodemask might be a bit difficult [1]. Also, I think it would be
easier to change one weight rather than every affected application's
mempolicy.

> [...snip...]
>
> One last thing that I wanted to note -- given that these weights now serve
> a dual purpose of setting allocation & migration weights, does it make sense
> to update the weighted interleave documentation with this information as well?
> Or, since it really only affects DAMON users, should we be ok with leaving it
> out?
>
> My preference is that we include it in weighted interleave documentation
> (Documentation/ABI/testing/sysfs-kernel-mm-mempolikcy-weighted-interleave)
> so that anyone who edits weighted interleave code in the future will at least
> be aware that the changes they make will have effects in other subsystems.

I think if we continued to use the mempolicy weights, it would make
sense to document that. However, it seems like we are moving towards
using DAMON specific weights.

Thanks for your feedback,
Bijan

[1] https://lore.kernel.org/damon/aFBXuTtwhAV7BHeY@gourry-fedora-PF4VCD3F/


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH v2 0/2] mm/damon/paddr: Allow interleaving in migrate_{hot,cold} actions
  2025-06-23 14:39         ` Bijan Tabatabai
@ 2025-06-23 16:32           ` SeongJae Park
  0 siblings, 0 replies; 30+ messages in thread
From: SeongJae Park @ 2025-06-23 16:32 UTC (permalink / raw)
  To: Bijan Tabatabai
  Cc: SeongJae Park, damon, linux-mm, linux-kernel, akpm, david, ziy,
	matthew.brost, joshua.hahnjy, rakie.kim, byungchul, gourry,
	ying.huang, apopple, bijantabatab, venkataravis, emirakhur,
	ajayjoshi, vtavarespetr

On Mon, 23 Jun 2025 09:39:48 -0500 Bijan Tabatabai <bijan311@gmail.com> wrote:

> On Sat, Jun 21, 2025 at 12:36 PM SeongJae Park <sj@kernel.org> wrote:
> >
> > On Fri, 20 Jun 2025 16:13:24 -0700 SeongJae Park <sj@kernel.org> wrote:
> >
> > > On Fri, 20 Jun 2025 16:47:26 -0500 Bijan Tabatabai <bijan311@gmail.com> wrote:
[...]
> > I just posted an RFC for this:
> > https://lore.kernel.org/20250621173131.23917-1-sj@kernel.org
> >
> > Note that I didn't run exhaustive tests with it, so it may have silly bugs.  I
> > believe it could let you knwo what I'm thinking better than my cheap talks,
> > though ;)
> 
> I only skimmed through it, but this looks good to me, thanks for
> throwing this together! I'll probe more when I use these patches for
> the next revision.

Great, please feel free to add comments there.

> 
> I'll try to reply in that thread as well, but it was only sent to my
> work email. I'll try to set up my work Outlook client to work with the
> mailing list format,

I'm sorry to hear that.  I personally gave up working with my Outlook client
and using a tool named hkml[1].  I started developing it for my personal use
but now it is aimed to help contributors for DAMON and Linux kernel in general.
So, if you are interested, please feel free to try it and submit bug reports :)

> but in the future, could you also CC my personal
> email (bijan311@gmail.com)?

Sure, I will Cc your personal email together from next time.

> Gmail just makes formatting for the
> mailing list easier for me.

Right, Gmail is better than Outlook.  To me, hkml is even better than Gmail,
though ;)

[1] https://github.com/sjp38/hackermail


Thanks,
SJ

[...]


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH v2 2/2] mm/damon/paddr: Allow multiple migrate targets
  2025-06-23 14:08       ` Joshua Hahn
@ 2025-06-23 16:50         ` SeongJae Park
  0 siblings, 0 replies; 30+ messages in thread
From: SeongJae Park @ 2025-06-23 16:50 UTC (permalink / raw)
  To: Joshua Hahn
  Cc: SeongJae Park, Bijan Tabatabai, damon, linux-mm, linux-kernel,
	akpm, david, ziy, matthew.brost, rakie.kim, byungchul, gourry,
	ying.huang, apopple, bijantabatab, venkataravis, emirakhur,
	ajayjoshi, vtavarespetr

On Mon, 23 Jun 2025 07:08:07 -0700 Joshua Hahn <joshua.hahnjy@gmail.com> wrote:

> On Sat, 21 Jun 2025 11:11:27 -0700 SeongJae Park <sj@kernel.org> wrote:
> 
> > On Sat, 21 Jun 2025 11:02:15 -0700 SeongJae Park <sj@kernel.org> wrote:
> > 
> > [...]
> > > I'd hence suggest to implement and use a simple weights handling mechanism
> > > here.  It could be roud-robin way, like weighted interleaving, or probabilistic
> > > way, using damon_rand().
> > > 
> > > The round-robin way may be simpler in my opinion.  For example,
> 
> [...snip...]
>  
> > Actually, probabilistic way may be not that complicated.  Maybe we could to
> > below here.
> 
> [...snip...]
> 
> > But damon_rand() might be more expensive than the roud-robin way, and arguably
> > roud-robin way is what usrs who familiar with weighted interleaving may easily
> > expect and even prefer?  I have no preferrence here.
> 
> Hi SJ,
> 
> If you have no preference here, I would like to add some thoughts : -)
> 
[...]
> I think that code complexity aside, round-robin may be the better choice for
> a few reasons. Like you mentioned, I think it is what users might be used to,
> if they are coming from weighted interleave code. Also, I think a round-robin
> way will prevent worst-case scenarios where we get a long stretch of allocations
> on the "wrong" node (but maybe this isn't a big deal, since it is so unlikely).
> 
> Finaly -- If we run workloads with mempolicy wet to weighted interleave
> *and* with the weights already set, then pages will be allocated in a
> round-robin fashion. I think it may be best to try and minimize migration costs
> by trying to keep these weights in-sync. That is, if we have a 2:1 ratio,
> we will have the following allocation:
> 
> node0 | oo oo oo oo oo oo oo ...
> node1 |   o  o  o  o  o  o   ...
> 
> Using a probabilistic migration, it might change the pattern:
> 
> node0 |   oooo oo o ooo oo ...
> node1 | oo    o  o o   o   ...
> 
> That is, the ratio might be preserved, but we may be doing unnecessary
> migrations, since a probabilistic allocation isn't aware of any underlying
> patterns. With a round-robin allocation, we have a 1/total_weight chance that
> there will be no additional migrations, depending on where the round-robin
> begins. I also want to note that weighted interleave auto-tuning is written
> to minimize total_weight.
> 
> I'm wondering what you think about this. Perhaps there is a way to know where
> the "beginning" of round-robin should begin, so that we try to keep the
> allocation & migration pattern as in-sync as possible? I have a suspicion
> that I am way over-thinking this, and none of this really has a tangible
> impact on performance as well ;)

The theory makes sense to me.  I also not very sure how much visible difference
it will make on large scale real workloads, though.  Since at least the theory
makes sense and we show no risk, I think taking the round-robin appraoch would
be a saner action, unless we find other opinions or test results.

> 
> Thank you as always SJ, have a great day!!

Thank you, you too!


Thanks,
SJ

[...]


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH v2 2/2] mm/damon/paddr: Allow multiple migrate targets
  2025-06-23 14:27       ` Bijan Tabatabai
@ 2025-06-23 16:52         ` SeongJae Park
  0 siblings, 0 replies; 30+ messages in thread
From: SeongJae Park @ 2025-06-23 16:52 UTC (permalink / raw)
  To: Bijan Tabatabai
  Cc: SeongJae Park, damon, linux-mm, linux-kernel, akpm, david, ziy,
	matthew.brost, joshua.hahnjy, rakie.kim, byungchul, gourry,
	ying.huang, apopple, bijantabatab, venkataravis, emirakhur,
	ajayjoshi, vtavarespetr

On Mon, 23 Jun 2025 09:27:16 -0500 Bijan Tabatabai <bijan311@gmail.com> wrote:

[...]
> I think my comments about the round-robin approach also apply here,
> but all things equal, I generally prefer being deterministic over
> being probabilistic when possible.

Makes sense, let's go with the roun-robin approach :)


Thanks,
SJ

[...]


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH v2 2/2] mm/damon/paddr: Allow multiple migrate targets
  2025-06-23 14:16     ` Bijan Tabatabai
@ 2025-06-23 17:52       ` SeongJae Park
  2025-06-23 23:15         ` Bijan Tabatabai
  0 siblings, 1 reply; 30+ messages in thread
From: SeongJae Park @ 2025-06-23 17:52 UTC (permalink / raw)
  To: Bijan Tabatabai
  Cc: SeongJae Park, damon, linux-mm, linux-kernel, akpm, david, ziy,
	matthew.brost, joshua.hahnjy, rakie.kim, byungchul, gourry,
	ying.huang, apopple, bijantabatab, venkataravis, emirakhur,
	ajayjoshi, vtavarespetr

On Mon, 23 Jun 2025 09:16:53 -0500 Bijan Tabatabai <bijan311@gmail.com> wrote:

> On Sat, Jun 21, 2025 at 1:02 PM SeongJae Park <sj@kernel.org> wrote:
> >
> > Hi Bijan,
> >
> > On Fri, 20 Jun 2025 13:04:58 -0500 Bijan Tabatabai <bijan311@gmail.com> wrote:
> >
> > > From: Bijan Tabatabai <bijantabatab@micron.com>
> > >
> > > The migrate_{hot,cold} DAMONS actions take a parameter, target_nid, to
> > > indicate what node the actions should migrate pages to. In this patch,
> > > we allow passing in a list of migration targets into target_nid. When
> > > this is done, the mirgate_{hot, cold} actions will migrate pages between
> > > the specified nodes using the global interleave weights found at
> > > /sys/kernel/mm/mempolicy/weighted_interleave/node<N>. This functionality
> > > can be used to dynamically adjust how pages are interleaved in response
> > > to changes in bandwidth utilization to improve performance, as discussed
> > > in [1]. When only a single migration target is passed to target_nid, the
> > > migrate_{hot,cold} actions will act the same as before.
> > [...]
> > >  include/linux/damon.h    |   8 +--
> > >  mm/damon/core.c          |   9 ++--
> > >  mm/damon/lru_sort.c      |   2 +-
> > >  mm/damon/paddr.c         | 108 +++++++++++++++++++++++++++++++++++++--
> > >  mm/damon/reclaim.c       |   2 +-
> > >  mm/damon/sysfs-schemes.c |  14 +++--
> > >  samples/damon/mtier.c    |   6 ++-
> > >  samples/damon/prcl.c     |   2 +-
> > >  8 files changed, 131 insertions(+), 20 deletions(-)
> >
> > If we keep pursuing making DAMON users be able to specify multiple migration
> > destination nodes and their weights[1], I think we may need only paddr.c part
> > change of this patch in the final version of this great work.
> 
> Sounds good to me.
> 
> > [...]
> > >  static unsigned long damon_pa_migrate(struct damon_region *r, struct damos *s,
> > >               unsigned long *sz_filter_passed)
> > >  {
> > >       unsigned long addr, applied;
> > > -     LIST_HEAD(folio_list);
> > > +     struct rmap_walk_control rwc;
> > [...]
> > >
> > >       addr = r->ar.start;
> > >       while (addr < r->ar.end) {
> > > @@ -522,15 +599,38 @@ static unsigned long damon_pa_migrate(struct damon_region *r, struct damos *s,
> > >               else
> > >                       *sz_filter_passed += folio_size(folio);
> > >
> > > +             /*
> > > +              * If there is only one target node, migrate there. Otherwise,
> > > +              * interleave across the nodes according to the global
> > > +              * interleave weights
> > > +              */
> > > +             if (nr_nodes == 1) {
> > > +                     target_nid = first_node(s->target_nids);
> > > +             } else {
> > > +                     target_nid = NUMA_NO_NODE;
> > > +                     /* Updates target_nid */
> > > +                     rmap_walk(folio, &rwc);
> > > +             }
> >
> > So we are doing rmap_walk(), which is known to be not very fast, for getting
> > the target node id of this page, in a way very similar to that of weighted
> > interleaving, right?  I don't think we really need to behave that same to
> > weighted interleaving with the cost.
> >
> > I'd hence suggest to implement and use a simple weights handling mechanism
> > here.  It could be roud-robin way, like weighted interleaving, or probabilistic
> > way, using damon_rand().
> >
> > The round-robin way may be simpler in my opinion.  For example,
> >
> > unsigned int damos_pa_nid_to_migrate(struct damos_migrate_dest *dest)
> > {
> >         static unsigned int nr_migrated = 0;
> >         unsigned int total_weight = 0;
> >         unsigned int weights_to_ignore;
> >         size_t i;
> >
> >         for (i = 0; i < dest->nr_dests; i++)
> >                 total_weight += dest->weight_arr[i];
> >         weights_to_ignore = nr_migrate++ % total_weight;
> >         total_weight = 0;
> >         for (i = 0; i < dest->nr_dests; i++) {
> >                 total_weight += dest->weight_arr[i];
> >                 if (total_weight >= weights_to_ignore)
> >                         return dest->node_id_arr[i];
> >         }
> >         WARN_ON_ONCE(1, "I don't know what I did wrong");
> >         return 0;
> > }
> >
> > Then, we could replace the above rmap_walk() call with this one.  What do you
> > think?
> 
> I do actually think doing the interleaving based on the VMA offset is
> important for a couple of reasons.
> 
> 1. If also using the weighted interleaving mempolicy, and the DAMON
> weights are the same as the mempolicy weights, DAMON won't have to
> migrate newly allocated pages. This is relatively minor, but helps
> avoid unnecessary work.
> 
> 2. More importantly, I believe this approach will cause a lot of
> needless ping-ponging, where the same folios are being moved around
> when they don't need to be. For example, let's say folios A-F are hot,
> and just for simplification, if they are on the same node, they will
> be in the same DAMON region, and only those folios are in those DAMON
> regions. If all the folios start in Node 0 and both nodes have a
> weight of 1, we have:
> 
> nr_migrated = 0
> Node 0           Node 1
> ----------           ----------
> A-F                  <empty>
> 
> After the scheme is first applied
> 
> nr_migrated = 6
> Node 0           Node 1
> ----------           ----------
> A,C,E              B,D,F
> 
> This is fine, but these folios are still hot, so the scheme will be
> applied to them again
> 
> nr_migrated = 12
> Node 0           Node 1
> ----------           ----------
> A,E,D             C,D,F
> 
> If I am understanding your code sample correctly, this will continue
> to happen each time the scheme is applied, causing folios to be
> migrated for no reason.

Thank you for walking with me, Bijan.  I understand and agree your concerns.
Actually, this kind of unnecessary ping-pong is a general problem for DAMOS.
We hence made a few DAMOS features to avoid this issue.

The first feature is 'age' reset.  DAMOS sets 'age' of regions to zero when it
applies an action.  Hence if your DAMOS scheme has minimum 'age' for the target
access pattern, the region will not be selected as action target again, very
soon.

The second feature is the quota.  You can set speed limit of a DAMOS action, to
avoid DAMOS being too aggressive.  When DAMOS finds memory regions that
eligible for a given action and larger than the given quota, it calculates
access temperature of regions, and apply the action to only hottest or coldest
regions of quota amount.  Whether to prioritize hotter or colder depends on the
action.  DAMOS_MIGRATE_HOT prefers hotter one.  Together with the age reset,
this can reduce unnecessary pingpong.

The third feature is quota auto-tuning.  You can ask DAMON to adjust the quotas
on its own, based on some metrics.  Let me describe an example with memory
tiering use case.  Consider there are two NUMA nodes of different speed.  Node
0 is faster than node 1, samely for every CPU.  Then you can ask DAMON to
migrate hot pages on node 1 to node 0 aiming 99% of node 0 memory be allocated,
while migrating cold pages on node 0 to node 1 aiming 1% of node 0 memory be
free.  Then, DAMON will adjust the quotas for two different schemes based on
current node 0 memory used/free amount.  If node 0 memory is used less than
99%, hot pages migration scheme will work.  The aggressiveness will be
determined on the difference between the current memory usage and the target
usage.  For example, DAMON will try to migrate hot pages faster when node 0
memory usage is 50%, compared to when node 0 memory usage is 98%.  The cold
pages migration scheme will do nothing when node 0 memory is used less than
99%, since its goal (1% node 0 free memory ratio) is already over-achieved.
When the node 0 memory usage becomes 99% and no more allocation is made, DAMON
will be quiet.  Even if a few more allocations happen, DAMON will work in slow
speed, and hence make only reasonable and healthy amount of noise.

Back to your use case, you could set per-node ideal memory usage of
interleaving as the quota goal.  For example, on the 1:1 ratio interleaving on
2 NUMA nodes, you could use two DAMOS scheme, one aiming 50% node 0 memused,
and other one aiming 50% node 0 memfree.  Once pages are well interleaved, both
schemes will stop working for unnecessary pingponging.

Note that you can one of quota auto-tuning metric that DAMON support is
arbitrary user input.  When this is being used, users can simply feed any value
as current value of the goal metric.  For example, you can use application's
performance metric, memory bandwidth, or whatever.  You could see the
node0-node1 balance from your user-space tool and feed it to DAMON quota
auto-tuning.  Then, DAMON will do more migration when it is imbalanced, and no
more migration when it is well balanced.

Finally, you can change DAMON parameters including schemes while DAMON is
running.  You can add and remove schemes whenever you want, while DAMON keeps
monitoring the access pattern.  Your user-space tool can determine how
aggressive migration is required based on current memory balance and adjust
DAMOS quotas online, or even turns DAMOS schemes off/on on demand.

So I think you could avoid the problem using these features.  Does this make
sense to you?

In future, we could add more DAMOS self-feedback metric for this use case.  For
example, the memory usage balance of nodes.  My self-tuning example above was
using two schemes since there is no DAMOS quota goal tuning metric that can
directly be used for your use case.  But I'd say that shouldn't be a blocker of
this work.

> Using the VMA offset to determine where a page
> should be placed avoids this problem because it gives a folio a single
> node it can be in for a given set of interleave weights. This means
> that in steady state, no folios will be migrated.

This makes sense for this use case.  But I don't think this makes same sense
for possible other use cases, like memory tiering on systems having multiple
NUMA nodes of same tier.  If you really need this virtual address space based
deterministic behavior, it would make more sense to use virtual address spaces
monitoring (damon-vaddr).

> 
> I see what you're saying about rmap_walks being expensive, but since
> DAMON operates off the critical path for the workload, I don't think
> the cost is that problematic.

You're right.  We try to make DAMON be controllable (min/max_nr_regions or
DAMOS quotas) rather than always fast.  But, we still try to be fast and
invisible when possible.  Since this change is not only for interleaving but
also general multi-nodes migration and we have features that hopefully can
potentially address your concern, I'd like to think again with you.

Thanks,
SJ

[...]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH v2 1/2] mm/mempolicy: Expose get_il_weight() to MM
  2025-06-20 18:04 ` [RFC PATCH v2 1/2] mm/mempolicy: Expose get_il_weight() to MM Bijan Tabatabai
@ 2025-06-23 19:06   ` Gregory Price
  2025-06-23 19:14   ` David Hildenbrand
  2025-06-24 10:58   ` Huang, Ying
  2 siblings, 0 replies; 30+ messages in thread
From: Gregory Price @ 2025-06-23 19:06 UTC (permalink / raw)
  To: Bijan Tabatabai
  Cc: damon, linux-mm, linux-kernel, sj, akpm, david, ziy,
	matthew.brost, joshua.hahnjy, rakie.kim, byungchul, ying.huang,
	apopple, bijantabatab, venkataravis, emirakhur, ajayjoshi,
	vtavarespetr

On Fri, Jun 20, 2025 at 01:04:57PM -0500, Bijan Tabatabai wrote:
> From: Bijan Tabatabai <bijantabatab@micron.com>
> 
> This patch exposes get_il_weight() in mm/internal.h. This patch is to
> allow other parts of the MM subsystem, such as DAMON, to make page
> placement decisions based on the global interleave weights.
> 
> Signed-off-by: Bijan Tabatabai <bijantabatab@micron.com>

Reviewed-by: Gregory Price <gourry@gourry.net>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH v2 1/2] mm/mempolicy: Expose get_il_weight() to MM
  2025-06-20 18:04 ` [RFC PATCH v2 1/2] mm/mempolicy: Expose get_il_weight() to MM Bijan Tabatabai
  2025-06-23 19:06   ` Gregory Price
@ 2025-06-23 19:14   ` David Hildenbrand
  2025-06-23 19:38     ` Gregory Price
  2025-06-24 10:58   ` Huang, Ying
  2 siblings, 1 reply; 30+ messages in thread
From: David Hildenbrand @ 2025-06-23 19:14 UTC (permalink / raw)
  To: Bijan Tabatabai, damon, linux-mm, linux-kernel
  Cc: sj, akpm, ziy, matthew.brost, joshua.hahnjy, rakie.kim, byungchul,
	gourry, ying.huang, apopple, bijantabatab, venkataravis,
	emirakhur, ajayjoshi, vtavarespetr

On 20.06.25 20:04, Bijan Tabatabai wrote:
> From: Bijan Tabatabai <bijantabatab@micron.com>
> 
> This patch exposes get_il_weight() in mm/internal.h. This patch is to
> allow other parts of the MM subsystem, such as DAMON, to make page
> placement decisions based on the global interleave weights.
> 
> Signed-off-by: Bijan Tabatabai <bijantabatab@micron.com>
> ---
>   mm/internal.h  | 6 ++++++
>   mm/mempolicy.c | 2 +-
>   2 files changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/internal.h b/mm/internal.h
> index 3823fb356d3b..b718ebe7cad5 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -1169,6 +1169,7 @@ extern int node_reclaim_mode;
>   
>   extern int node_reclaim(struct pglist_data *, gfp_t, unsigned int);
>   extern int find_next_best_node(int node, nodemask_t *used_node_mask);
> +extern u8 get_il_weight(int node);

No new "extern". It's a blast from the past :)

>   #else
>   #define node_reclaim_mode 0
>   
> @@ -1181,6 +1182,11 @@ static inline int find_next_best_node(int node, nodemask_t *used_node_mask)
>   {
>   	return NUMA_NO_NODE;
>   }
> +
> +static inline u8 get_il_weight(int node)
> +{
> +	return 1;
> +}
>   #endif
>   
>   static inline bool node_reclaim_enabled(void)
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 1ff7b2174eb7..b1713430ee9c 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -165,7 +165,7 @@ static unsigned int *node_bw_table;
>    */
>   static DEFINE_MUTEX(wi_state_lock);
>   
> -static u8 get_il_weight(int node)
> +u8 get_il_weight(int node)

The function name is shockingly confusing when used outside this file. 
Do we have some namespace to at least highlight what this is about?

node_interleave_weight() might be a lot clearer?

So maybe let's just rename it to something less confusing as we use it 
from other files.

>   {
>   	struct weighted_interleave_state *state;
>   	u8 weight = 1;


-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH v2 0/2] mm/damon/paddr: Allow interleaving in migrate_{hot,cold} actions
  2025-06-20 20:21 ` [RFC PATCH v2 0/2] mm/damon/paddr: Allow interleaving in migrate_{hot,cold} actions SeongJae Park
  2025-06-20 21:47   ` Bijan Tabatabai
@ 2025-06-23 19:28   ` Gregory Price
  2025-06-23 23:21     ` Bijan Tabatabai
  1 sibling, 1 reply; 30+ messages in thread
From: Gregory Price @ 2025-06-23 19:28 UTC (permalink / raw)
  To: SeongJae Park
  Cc: Bijan Tabatabai, damon, linux-mm, linux-kernel, akpm, david, ziy,
	matthew.brost, joshua.hahnjy, rakie.kim, byungchul, ying.huang,
	apopple, bijantabatab, venkataravis, emirakhur, ajayjoshi,
	vtavarespetr

On Fri, Jun 20, 2025 at 01:21:55PM -0700, SeongJae Park wrote:
> Hi Bijan,
> 
> On Fri, 20 Jun 2025 13:04:56 -0500 Bijan Tabatabai <bijan311@gmail.com> wrote:
> 
> [...]
> > This patch set adds the mechanism for dynamically changing how application
> > data is interleaved across nodes while leaving the policy of what the
> > interleave weights should be to userspace. It does this by modifying the
> > migrate_{hot,cold} DAMOS actions to allow passing in a list of migration
> > targets to their target_nid parameter. When this is done, the
> > migrate_{hot,cold} actions will migrate pages between the specified nodes
> > using the global interleave weights found at
> > /sys/kernel/mm/mempolicy/weighted_interleave/node<N>. This functionality
> > can be used to dynamically adjust how pages are interleaved by changing the
> > global weights. When only a single migration target is passed to
> > target_nid, the migrate_{hot,cold} actions will act the same as before.
> 
> This means users are required to manipulate two interfaces.  DAMON sysfs for
> target nodes, and weighted_interleave sysfs for weights.  I don't think this
> coupling is very ideal.
> 

Just tossing this out there - weighted interleave sysfs entries *should*
be automatic, and the preferred weights shouldn't really ever change
over time.  Even if they did, if it's the result of devices coming and
going - the updates should also be automatic.

So, in practice, a usually probably only has to twiddle DAMON.

I don't have a strong opinion on whether DAMON should leverage the
mempolicy interface, but I think the way it is designed now is
acceptable.

~Gregory


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH v2 1/2] mm/mempolicy: Expose get_il_weight() to MM
  2025-06-23 19:14   ` David Hildenbrand
@ 2025-06-23 19:38     ` Gregory Price
  0 siblings, 0 replies; 30+ messages in thread
From: Gregory Price @ 2025-06-23 19:38 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Bijan Tabatabai, damon, linux-mm, linux-kernel, sj, akpm, ziy,
	matthew.brost, joshua.hahnjy, rakie.kim, byungchul, ying.huang,
	apopple, bijantabatab, venkataravis, emirakhur, ajayjoshi,
	vtavarespetr

On Mon, Jun 23, 2025 at 09:14:34PM +0200, David Hildenbrand wrote:
> > +u8 get_il_weight(int node)
> 
> The function name is shockingly confusing when used outside this file. Do we
> have some namespace to at least highlight what this is about?
> 
> node_interleave_weight() might be a lot clearer?
> 

this is fair and my fault as I didn't consider whether it'd ever be used
outside mempolicy.  So i agree it should be renamed or wrapped.

~Gregory


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH v2 2/2] mm/damon/paddr: Allow multiple migrate targets
  2025-06-23 17:52       ` SeongJae Park
@ 2025-06-23 23:15         ` Bijan Tabatabai
  2025-06-24  0:34           ` SeongJae Park
  0 siblings, 1 reply; 30+ messages in thread
From: Bijan Tabatabai @ 2025-06-23 23:15 UTC (permalink / raw)
  To: SeongJae Park
  Cc: damon, linux-mm, linux-kernel, akpm, david, ziy, matthew.brost,
	joshua.hahnjy, rakie.kim, byungchul, gourry, ying.huang, apopple,
	bijantabatab, venkataravis, emirakhur, ajayjoshi, vtavarespetr

[...]
> Thank you for walking with me, Bijan.  I understand and agree your concerns.
> Actually, this kind of unnecessary ping-pong is a general problem for DAMOS.
> We hence made a few DAMOS features to avoid this issue.
>
> The first feature is 'age' reset.  DAMOS sets 'age' of regions to zero when it
> applies an action.  Hence if your DAMOS scheme has minimum 'age' for the target
> access pattern, the region will not be selected as action target again, very
> soon.
>
> The second feature is the quota.  You can set speed limit of a DAMOS action, to
> avoid DAMOS being too aggressive.  When DAMOS finds memory regions that
> eligible for a given action and larger than the given quota, it calculates
> access temperature of regions, and apply the action to only hottest or coldest
> regions of quota amount.  Whether to prioritize hotter or colder depends on the
> action.  DAMOS_MIGRATE_HOT prefers hotter one.  Together with the age reset,
> this can reduce unnecessary pingpong.
>
> The third feature is quota auto-tuning.  You can ask DAMON to adjust the quotas
> on its own, based on some metrics.  Let me describe an example with memory
> tiering use case.  Consider there are two NUMA nodes of different speed.  Node
> 0 is faster than node 1, samely for every CPU.  Then you can ask DAMON to
> migrate hot pages on node 1 to node 0 aiming 99% of node 0 memory be allocated,
> while migrating cold pages on node 0 to node 1 aiming 1% of node 0 memory be
> free.  Then, DAMON will adjust the quotas for two different schemes based on
> current node 0 memory used/free amount.  If node 0 memory is used less than
> 99%, hot pages migration scheme will work.  The aggressiveness will be
> determined on the difference between the current memory usage and the target
> usage.  For example, DAMON will try to migrate hot pages faster when node 0
> memory usage is 50%, compared to when node 0 memory usage is 98%.  The cold
> pages migration scheme will do nothing when node 0 memory is used less than
> 99%, since its goal (1% node 0 free memory ratio) is already over-achieved.
> When the node 0 memory usage becomes 99% and no more allocation is made, DAMON
> will be quiet.  Even if a few more allocations happen, DAMON will work in slow
> speed, and hence make only reasonable and healthy amount of noise.
>
> Back to your use case, you could set per-node ideal memory usage of
> interleaving as the quota goal.  For example, on the 1:1 ratio interleaving on
> 2 NUMA nodes, you could use two DAMOS scheme, one aiming 50% node 0 memused,
> and other one aiming 50% node 0 memfree.  Once pages are well interleaved, both
> schemes will stop working for unnecessary pingponging.
>
> Note that you can one of quota auto-tuning metric that DAMON support is
> arbitrary user input.  When this is being used, users can simply feed any value
> as current value of the goal metric.  For example, you can use application's
> performance metric, memory bandwidth, or whatever.  You could see the
> node0-node1 balance from your user-space tool and feed it to DAMON quota
> auto-tuning.  Then, DAMON will do more migration when it is imbalanced, and no
> more migration when it is well balanced.
>
> Finally, you can change DAMON parameters including schemes while DAMON is
> running.  You can add and remove schemes whenever you want, while DAMON keeps
> monitoring the access pattern.  Your user-space tool can determine how
> aggressive migration is required based on current memory balance and adjust
> DAMOS quotas online, or even turns DAMOS schemes off/on on demand.
>
> So I think you could avoid the problem using these features.  Does this make
> sense to you?
>
> In future, we could add more DAMOS self-feedback metric for this use case.  For
> example, the memory usage balance of nodes.  My self-tuning example above was
> using two schemes since there is no DAMOS quota goal tuning metric that can
> directly be used for your use case.  But I'd say that shouldn't be a blocker of
> this work.

Hi SeongJae,

I really appreciate your detailed response.
The quota auto-tuning helps, but I feel like it's still not exactly
what I want. For example, I think a quota goal that stops migration
based on the memory usage balance gets quite a bit more complicated
when instead of interleaving all data, we are just interleaving *hot*
data. I haven't looked at it extensively, but I imagine it wouldn't be
easy to identify how much data is hot in the paddr setting, especially
because the regions can contain a significant amount of unallocated
data. Also, if the interleave weights changed, for example, from 11:9
to 10:10, it would be preferable if only 5% of data is migrated;
however, with the round robin approach, 50% would be. Finally, and I
forgot to mention this in my last message, the round-robin approach
does away with any notion of spatial locality, which does help the
effectiveness of interleaving [1]. I don't think anything done with
quotas can get around that. I wonder if there's an elegant way to
specify whether to use rmap or not, but my initial feeling is that
might just add complication to the code and interface for not enough
benefit.

Maybe, as you suggest later on, this is an indication that my use case
is a better fit for a vaddr scheme. I'll get into that more below.

> > Using the VMA offset to determine where a page
> > should be placed avoids this problem because it gives a folio a single
> > node it can be in for a given set of interleave weights. This means
> > that in steady state, no folios will be migrated.
>
> This makes sense for this use case.  But I don't think this makes same sense
> for possible other use cases, like memory tiering on systems having multiple
> NUMA nodes of same tier.

I see where you're coming from. I think the crux of this difference is
that in my use case, the set of nodes we are monitoring is the same as
the set of nodes we are migrating to, while in the use case you
describe, the set of nodes being monitored is disjoint from the set of
migration target nodes. I think this in particular makes ping ponging
more of a problem for my use case, compared to promotion/demotion
schemes.

> If you really need this virtual address space based
> deterministic behavior, it would make more sense to use virtual address spaces
> monitoring (damon-vaddr).

Maybe it does make sense for me to implement vaddr versions of the
migrate actions for my use case. One thing that gives me pause about
this, is that, from what I understand, it would be harder to have
vaddr schemes apply to processes that start after damon begins. I
think to do that, one would have to detect when a process starts, and
then do a damon tune to upgrade the targets list? It would be nice if,
say, you could specify a cgroup as a vaddr target and track all
processes in that cgroup, but that would be a different patchset for
another day.

But, using vaddr has other benefits, like the sampling would take into
account the locality of the accesses. There are also ways to make
vaddr sampling more efficient by using higher levels of the page
tables, that I don't think apply to paddr schemes [2]. I believe the
authors of [2] said they submitted their patches to the kernel, but I
don't know if it has been upstreamed (sorry about derailing the
conversation slightly).

[1] https://elixir.bootlin.com/linux/v6.16-rc3/source/mm/mempolicy.c#L213
[2] https://www.usenix.org/conference/atc24/presentation/nair

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH v2 0/2] mm/damon/paddr: Allow interleaving in migrate_{hot,cold} actions
  2025-06-23 19:28   ` Gregory Price
@ 2025-06-23 23:21     ` Bijan Tabatabai
  2025-06-26 19:13       ` Gregory Price
  0 siblings, 1 reply; 30+ messages in thread
From: Bijan Tabatabai @ 2025-06-23 23:21 UTC (permalink / raw)
  To: Gregory Price
  Cc: SeongJae Park, damon, linux-mm, linux-kernel, akpm, david, ziy,
	matthew.brost, joshua.hahnjy, rakie.kim, byungchul, ying.huang,
	apopple, bijantabatab, venkataravis, emirakhur, ajayjoshi,
	vtavarespetr

Hi Gregory,

On Mon, Jun 23, 2025 at 2:28 PM Gregory Price <gourry@gourry.net> wrote:
>
> On Fri, Jun 20, 2025 at 01:21:55PM -0700, SeongJae Park wrote:
> > Hi Bijan,
> >
> > On Fri, 20 Jun 2025 13:04:56 -0500 Bijan Tabatabai <bijan311@gmail.com> wrote:
> >
> > [...]
> > > This patch set adds the mechanism for dynamically changing how application
> > > data is interleaved across nodes while leaving the policy of what the
> > > interleave weights should be to userspace. It does this by modifying the
> > > migrate_{hot,cold} DAMOS actions to allow passing in a list of migration
> > > targets to their target_nid parameter. When this is done, the
> > > migrate_{hot,cold} actions will migrate pages between the specified nodes
> > > using the global interleave weights found at
> > > /sys/kernel/mm/mempolicy/weighted_interleave/node<N>. This functionality
> > > can be used to dynamically adjust how pages are interleaved by changing the
> > > global weights. When only a single migration target is passed to
> > > target_nid, the migrate_{hot,cold} actions will act the same as before.
> >
> > This means users are required to manipulate two interfaces.  DAMON sysfs for
> > target nodes, and weighted_interleave sysfs for weights.  I don't think this
> > coupling is very ideal.
> >
>
> Just tossing this out there - weighted interleave sysfs entries *should*
> be automatic, and the preferred weights shouldn't really ever change
> over time.  Even if they did, if it's the result of devices coming and
> going - the updates should also be automatic.

I'm not convinced this is true. If you have a workload that can
saturate the local bandwidth but not the remote bandwidth, wouldn't
you want the interleave weights to be more biased towards local memory
than you would for a workload that can saturate both the local and
remote bandwidth?

> So, in practice, a usually probably only has to twiddle DAMON.

That being said, I don't mind the idea of the mempolicy weights being
left untouched as a reasonable starting point for bandwidth intensive
applications and leaving the fine tuning to DAMON.

Thanks,
Bijan


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH v2 2/2] mm/damon/paddr: Allow multiple migrate targets
  2025-06-23 23:15         ` Bijan Tabatabai
@ 2025-06-24  0:34           ` SeongJae Park
  2025-06-24 16:01             ` Bijan Tabatabai
  0 siblings, 1 reply; 30+ messages in thread
From: SeongJae Park @ 2025-06-24  0:34 UTC (permalink / raw)
  To: Bijan Tabatabai
  Cc: SeongJae Park, damon, linux-mm, linux-kernel, akpm, david, ziy,
	matthew.brost, joshua.hahnjy, rakie.kim, byungchul, gourry,
	ying.huang, apopple, bijantabatab, venkataravis, emirakhur,
	ajayjoshi, vtavarespetr

On Mon, 23 Jun 2025 18:15:00 -0500 Bijan Tabatabai <bijan311@gmail.com> wrote:

[...]
> Hi SeongJae,
> 
> I really appreciate your detailed response.
> The quota auto-tuning helps, but I feel like it's still not exactly
> what I want. For example, I think a quota goal that stops migration
> based on the memory usage balance gets quite a bit more complicated
> when instead of interleaving all data, we are just interleaving *hot*
> data. I haven't looked at it extensively, but I imagine it wouldn't be
> easy to identify how much data is hot in the paddr setting,

I don't think so, and I don't see why you think so.  Could you please
elaborate?

> especially
> because the regions can contain a significant amount of unallocated
> data.

In the case, unallocated data shouldn't be accessed at all, so the region will
just look cold to DAMON.

> Also, if the interleave weights changed, for example, from 11:9
> to 10:10, it would be preferable if only 5% of data is migrated;
> however, with the round robin approach, 50% would be. Finally, and I
> forgot to mention this in my last message, the round-robin approach
> does away with any notion of spatial locality, which does help the
> effectiveness of interleaving [1].

We could use the probabilistic interleaving, if this is the problem?

> I don't think anything done with
> quotas can get around that.

I think I'm not getting your points well, sorry.  More elaboration of your
concern would be helpful.

> I wonder if there's an elegant way to
> specify whether to use rmap or not, but my initial feeling is that
> might just add complication to the code and interface for not enough
> benefit.

Agreed.  Please note that I'm open to add an interface for this behavior if the
benefit is clear.  I'm also thinking adding none-rmap migration first (if it
shows some benefit), and adding rmap support later with additional benefit
confirmation could also be an option.

> 
> Maybe, as you suggest later on, this is an indication that my use case
> is a better fit for a vaddr scheme. I'll get into that more below.
> 
> > > Using the VMA offset to determine where a page
> > > should be placed avoids this problem because it gives a folio a single
> > > node it can be in for a given set of interleave weights. This means
> > > that in steady state, no folios will be migrated.
> >
> > This makes sense for this use case.  But I don't think this makes same sense
> > for possible other use cases, like memory tiering on systems having multiple
> > NUMA nodes of same tier.
> 
> I see where you're coming from. I think the crux of this difference is
> that in my use case, the set of nodes we are monitoring is the same as
> the set of nodes we are migrating to, while in the use case you
> describe, the set of nodes being monitored is disjoint from the set of
> migration target nodes.

I understand and agree this difference.

> I think this in particular makes ping ponging
> more of a problem for my use case, compared to promotion/demotion
> schemes.

But again I'm failing at understanding this, sorry.  Could I ask more
elaborations?

> 
> > If you really need this virtual address space based
> > deterministic behavior, it would make more sense to use virtual address spaces
> > monitoring (damon-vaddr).
> 
> Maybe it does make sense for me to implement vaddr versions of the
> migrate actions for my use case.

Yes, that could also be an option.

> One thing that gives me pause about
> this, is that, from what I understand, it would be harder to have
> vaddr schemes apply to processes that start after damon begins. I
> think to do that, one would have to detect when a process starts, and
> then do a damon tune to upgrade the targets list? It would be nice if,
> say, you could specify a cgroup as a vaddr target and track all
> processes in that cgroup, but that would be a different patchset for
> another day.

I agree that could be a future thing to do.  Note that DAMON user-space tool
implements[1] a similar feature.

> 
> But, using vaddr has other benefits, like the sampling would take into
> account the locality of the accesses. There are also ways to make
> vaddr sampling more efficient by using higher levels of the page
> tables, that I don't think apply to paddr schemes [2]. I believe the
> authors of [2] said they submitted their patches to the kernel, but I
> don't know if it has been upstreamed (sorry about derailing the
> conversation slightly).

Thank you for reminding it.  It was nice finding and approach[2], but
unfortunately it didn't be upstreamed.  I now realize the monitoring intervals
auto-tuning[3] idea was partly motivated by the nice discussion, though.

[1] https://github.com/damonitor/damo/blob/next/release_note#L33
[2] https://lore.kernel.org/damon/20240318132848.82686-1-aravinda.prasad@intel.com/
[3] https://lkml.kernel.org/r/20250303221726.484227-1-sj@kernel.org


Thanks,
SJ

[...]

> 
> [1] https://elixir.bootlin.com/linux/v6.16-rc3/source/mm/mempolicy.c#L213
> [2] https://www.usenix.org/conference/atc24/presentation/nair


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH v2 1/2] mm/mempolicy: Expose get_il_weight() to MM
  2025-06-20 18:04 ` [RFC PATCH v2 1/2] mm/mempolicy: Expose get_il_weight() to MM Bijan Tabatabai
  2025-06-23 19:06   ` Gregory Price
  2025-06-23 19:14   ` David Hildenbrand
@ 2025-06-24 10:58   ` Huang, Ying
  2 siblings, 0 replies; 30+ messages in thread
From: Huang, Ying @ 2025-06-24 10:58 UTC (permalink / raw)
  To: Bijan Tabatabai
  Cc: damon, linux-mm, linux-kernel, sj, akpm, david, ziy,
	matthew.brost, joshua.hahnjy, rakie.kim, byungchul, gourry,
	apopple, bijantabatab, venkataravis, emirakhur, ajayjoshi,
	vtavarespetr

Bijan Tabatabai <bijan311@gmail.com> writes:

> From: Bijan Tabatabai <bijantabatab@micron.com>
>
> This patch exposes get_il_weight() in mm/internal.h. This patch is to
> allow other parts of the MM subsystem, such as DAMON, to make page
> placement decisions based on the global interleave weights.
>
> Signed-off-by: Bijan Tabatabai <bijantabatab@micron.com>
> ---
>  mm/internal.h  | 6 ++++++
>  mm/mempolicy.c | 2 +-
>  2 files changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/mm/internal.h b/mm/internal.h
> index 3823fb356d3b..b718ebe7cad5 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -1169,6 +1169,7 @@ extern int node_reclaim_mode;
>  
>  extern int node_reclaim(struct pglist_data *, gfp_t, unsigned int);
>  extern int find_next_best_node(int node, nodemask_t *used_node_mask);
> +extern u8 get_il_weight(int node);
>  #else
>  #define node_reclaim_mode 0
>  
> @@ -1181,6 +1182,11 @@ static inline int find_next_best_node(int node, nodemask_t *used_node_mask)
>  {
>  	return NUMA_NO_NODE;
>  }
> +
> +static inline u8 get_il_weight(int node)
> +{
> +	return 1;
> +}
>  #endif
>  
>  static inline bool node_reclaim_enabled(void)
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 1ff7b2174eb7..b1713430ee9c 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -165,7 +165,7 @@ static unsigned int *node_bw_table;
>   */
>  static DEFINE_MUTEX(wi_state_lock);
>  
> -static u8 get_il_weight(int node)
> +u8 get_il_weight(int node)
>  {
>  	struct weighted_interleave_state *state;
>  	u8 weight = 1;

I agree with David on the function naming and "extern".  Otherwise, this
looks good to me, Thanks!  Feel free to add my

Reviewed-by: Huang Ying <ying.huang@linux.alibaba.com>

in the future versions.

---
Best Regards,
Huang, Ying


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH v2 2/2] mm/damon/paddr: Allow multiple migrate targets
  2025-06-24  0:34           ` SeongJae Park
@ 2025-06-24 16:01             ` Bijan Tabatabai
  2025-06-24 22:33               ` SeongJae Park
  0 siblings, 1 reply; 30+ messages in thread
From: Bijan Tabatabai @ 2025-06-24 16:01 UTC (permalink / raw)
  To: SeongJae Park
  Cc: damon, linux-mm, linux-kernel, akpm, david, ziy, matthew.brost,
	joshua.hahnjy, rakie.kim, byungchul, gourry, ying.huang, apopple,
	bijantabatab, venkataravis, emirakhur, ajayjoshi, vtavarespetr

On Mon, Jun 23, 2025 at 7:34 PM SeongJae Park <sj@kernel.org> wrote:
>
> On Mon, 23 Jun 2025 18:15:00 -0500 Bijan Tabatabai <bijan311@gmail.com> wrote:
>
> [...]
> > Hi SeongJae,
> >
> > I really appreciate your detailed response.
> > The quota auto-tuning helps, but I feel like it's still not exactly
> > what I want. For example, I think a quota goal that stops migration
> > based on the memory usage balance gets quite a bit more complicated
> > when instead of interleaving all data, we are just interleaving *hot*
> > data. I haven't looked at it extensively, but I imagine it wouldn't be
> > easy to identify how much data is hot in the paddr setting,
>
> I don't think so, and I don't see why you think so.  Could you please
> elaborate?

Elaborated below.

> > especially
> > because the regions can contain a significant amount of unallocated
> > data.
>
> In the case, unallocated data shouldn't be accessed at all, so the region will
> just look cold to DAMON.

"Significant" was too strong of a word, but if physical memory is
fragmented, couldn't there be a non-negligible amount of unallocated
memory in a hot region? If so, I think it would mean that you cannot
simply take a sum of the sizes of the hot regions in each node to
compute how the hot data is interleaved because those regions may
contain unallocated memory that shouldn't count for that calculation.
Does that make sense?

It's very possible I might be overthinking this and it won't be an
issue in practice. It might be best to not worry about it until it
becomes an issue in practice.

> > Also, if the interleave weights changed, for example, from 11:9
> > to 10:10, it would be preferable if only 5% of data is migrated;
> > however, with the round robin approach, 50% would be.

Elaborating more on this:
Imagine a process begins with a weights of 3 and 2 for node 0 and 1
respectively in both DAMON and the weighted interleave policy. If you
looked at the which node a page resides in for a group of contiguous
pages, it would be something like this (using letters to represent the
virtual addresses):

A -> node 0
B -> node 0
C -> node 0
D -> node 1
E -> node 1
F -> node 0
G -> node 0
H -> node 0
I -> node 1
J -> node 1

If we use a user defined quota autotuning mechanism like you described
in [1] to stop DAMON interleaving when we detect that the data is
interleaved correctly, no interleaving would happen, which is good.
However, let's say we change the DAMON weights to be 4:1. My
understanding is that DAMON applies the scheme to regions in ascending
order of physical address (for paddr schemes), so if using the
round-robin algorithm you provided in [2], the interleaving would
apply to the pages in node 0 first, then node 1. For the sake of
simplicity, let's say in this scenario the pages in the same node are
sorted by their virtual address, so the interleaving would be applied
in the order ABCFGHDEIJ. This would result in the following page
placement

A -> node 0
B -> node 0
C -> node 0
D -> node 0
E -> node 0
F -> node 0
G -> node 1
H -> node 0
I -> node 0
J -> node 1

So, four pages, D, E, F, G,  and I, have been migrated. However, if
they were interleaved using their virtual addresses*, only pages D and
I would have been migrated.

* Technically, the mempolicy code interleaves based on the offset from
the start of the VMA, but that difference doesn't change this example.

> > Finally, and I
> > forgot to mention this in my last message, the round-robin approach
> > does away with any notion of spatial locality, which does help the
> > effectiveness of interleaving [1].

Elaborating more on this.
As implied by the comment in [3], interleaving works better the finer
grained it is done in virtual memory. As an extreme example, if you
had weights of 3:2, putting the first 60% of a process's data in node
0 and the remaining 40% in node 1 would satisfy the ratio globally,
but you would likely not see the benefits of interleaving. We can see
in the example above that your round-robin approach does not maintain
the desired interleave ratio locally, even though it does globally.

> We could use the probabilistic interleaving, if this is the problem?

I don't think so. In the above example, with probabilistic
interleaving, you would still migrate, on average, 20% of the pages in
node 0 and 80% of the pages in node 1. Similarly, the probablistic
interleaving also does not consider the virtual address, so it
wouldn't maintain the interleave ratio locally in the virtual address
space either.

> > I don't think anything done with
> > quotas can get around that.
>
> I think I'm not getting your points well, sorry.  More elaboration of your
> concern would be helpful.

I elaborated more above. Hopefully that clears up any confusion. If
you still have questions, maybe it would be easier to e-meet and have
a live discussion about it? I see you have a DAMON chat slot open
tomorrow at 9:30 PT [4]. If you have nothing else scheduled, maybe
that would be a good time to chat?

[...]

> > I see where you're coming from. I think the crux of this difference is
> > that in my use case, the set of nodes we are monitoring is the same as
> > the set of nodes we are migrating to, while in the use case you
> > describe, the set of nodes being monitored is disjoint from the set of
> > migration target nodes.
>
> I understand and agree this difference.
>
> > I think this in particular makes ping ponging
> > more of a problem for my use case, compared to promotion/demotion
> > schemes.
>
> But again I'm failing at understanding this, sorry.  Could I ask more
> elaborations?

Sure, and sorry for needing to elaborate so much.

What I was trying to say is that in the case where a scheme is
monitoring the same nodes it is migrating to, when it detects a hot
region, it will interleave the pages in the region between the nodes.
If there are two nodes, and assuming the access pattern was uniform
across the region, we have now turned one hot region into two. Using
the algorithms you provided earlier, the next time the scheme is
applied, it will interleave both of those regions again because the
only information it has about where to place pages is how many pages
it has previously interleaved. Using virtual addresses to interleave
solves this problem by providing one and only one location a page
should be in given a set of interleave weights.

When a scheme is monitoring one set of nodes and migrating to another
disjoint set of nodes, you don't have this problem because once the
pages are migrated, they won't be considered by the scheme until some
other scheme moves those pages back into the monitored nodes.

Does that make sense?

> >
> > > If you really need this virtual address space based
> > > deterministic behavior, it would make more sense to use virtual address spaces
> > > monitoring (damon-vaddr).
> >
> > Maybe it does make sense for me to implement vaddr versions of the
> > migrate actions for my use case.
>
> Yes, that could also be an option.

Given how much my explanations here stressed that having access to the
virtual addresses solves the problems I mentioned, I think the path
forward for the next revision should be:

1) Have the paddr migration scheme use the round-robin interleaving
that you provided - This would be good for the use case you described
where you promote pages from a node into multiple nodes of the same
tier.
2) Implement a vaddr migration scheme that uses the virtual address
based interleaving - This is useful for my target use case of
balancing bandwidth utilization between nodes?

If the vaddr scheme proves insufficient down the line for my use case,
we can have another discussion at that time.
How does this sound to you?

> > One thing that gives me pause about
> > this, is that, from what I understand, it would be harder to have
> > vaddr schemes apply to processes that start after damon begins. I
> > think to do that, one would have to detect when a process starts, and
> > then do a damon tune to upgrade the targets list? It would be nice if,
> > say, you could specify a cgroup as a vaddr target and track all
> > processes in that cgroup, but that would be a different patchset for
> > another day.
>
> I agree that could be a future thing to do.  Note that DAMON user-space tool
> implements[1] a similar feature.

Thanks, I'll take a look at that.

[...]

Thanks again for the time you are spending on these discussions. I do
appreciate it, and I hope I'm not taking up too much of your time.

Bijan

[1] https://lore.kernel.org/damon/20250623175204.43917-1-sj@kernel.org/
[2] https://lore.kernel.org/damon/20250621180215.36243-1-sj@kernel.org/
[3] https://elixir.bootlin.com/linux/v6.16-rc3/source/mm/mempolicy.c#L213
[4] https://lore.kernel.org/damon/20250620205819.98472-1-sj@kernel.org/T/#t

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH v2 2/2] mm/damon/paddr: Allow multiple migrate targets
  2025-06-24 16:01             ` Bijan Tabatabai
@ 2025-06-24 22:33               ` SeongJae Park
  0 siblings, 0 replies; 30+ messages in thread
From: SeongJae Park @ 2025-06-24 22:33 UTC (permalink / raw)
  To: Bijan Tabatabai
  Cc: SeongJae Park, damon, linux-mm, linux-kernel, akpm, david, ziy,
	matthew.brost, joshua.hahnjy, rakie.kim, byungchul, gourry,
	ying.huang, apopple, bijantabatab, venkataravis, emirakhur,
	ajayjoshi, vtavarespetr

On Tue, 24 Jun 2025 11:01:46 -0500 Bijan Tabatabai <bijan311@gmail.com> wrote:

> On Mon, Jun 23, 2025 at 7:34 PM SeongJae Park <sj@kernel.org> wrote:
> >
> > On Mon, 23 Jun 2025 18:15:00 -0500 Bijan Tabatabai <bijan311@gmail.com> wrote:
> >
> > [...]
> > > Hi SeongJae,
> > >
> > > I really appreciate your detailed response.
> > > The quota auto-tuning helps, but I feel like it's still not exactly
> > > what I want. For example, I think a quota goal that stops migration
> > > based on the memory usage balance gets quite a bit more complicated
> > > when instead of interleaving all data, we are just interleaving *hot*
> > > data. I haven't looked at it extensively, but I imagine it wouldn't be
> > > easy to identify how much data is hot in the paddr setting,
> >
> > I don't think so, and I don't see why you think so.  Could you please
> > elaborate?
> 
> Elaborated below.
> 
> > > especially
> > > because the regions can contain a significant amount of unallocated
> > > data.
> >
> > In the case, unallocated data shouldn't be accessed at all, so the region will
> > just look cold to DAMON.
> 
> "Significant" was too strong of a word, but if physical memory is
> fragmented, couldn't there be a non-negligible amount of unallocated
> memory in a hot region? If so, I think it would mean that you cannot
> simply take a sum of the sizes of the hot regions in each node to
> compute how the hot data is interleaved because those regions may
> contain unallocated memory that shouldn't count for that calculation.
> Does that make sense?
> 
> It's very possible I might be overthinking this and it won't be an
> issue in practice. It might be best to not worry about it until it
> becomes an issue in practice.

So, I understand this is not a problem for only this migration-based
interleaving approach, but general DAMON accuracy on physical address space.

I agree.  DAMON's region-based mechanism provides only best effort results, and
depending on the workloads, virtual address space based monitoring has chances
to work better than physical address space.  Whether the accuracy level is
acceptable or not would depend on use case.  What I can say at the moment is,
there are use cases of physical address based DAMON, and we are working further
for making it better.  Recent monitoring intervals auto-tuning was one such
efforts.

> 
> > > Also, if the interleave weights changed, for example, from 11:9
> > > to 10:10, it would be preferable if only 5% of data is migrated;
> > > however, with the round robin approach, 50% would be.
> 
> Elaborating more on this:
> Imagine a process begins with a weights of 3 and 2 for node 0 and 1
> respectively in both DAMON and the weighted interleave policy. If you
> looked at the which node a page resides in for a group of contiguous
> pages, it would be something like this (using letters to represent the
> virtual addresses):
> 
> A -> node 0
> B -> node 0
> C -> node 0
> D -> node 1
> E -> node 1
> F -> node 0
> G -> node 0
> H -> node 0
> I -> node 1
> J -> node 1
> 
> If we use a user defined quota autotuning mechanism like you described
> in [1] to stop DAMON interleaving when we detect that the data is
> interleaved correctly, no interleaving would happen, which is good.
> However, let's say we change the DAMON weights to be 4:1. My
> understanding is that DAMON applies the scheme to regions in ascending
> order of physical address (for paddr schemes), so if using the
> round-robin algorithm you provided in [2], the interleaving would
> apply to the pages in node 0 first, then node 1. For the sake of
> simplicity, let's say in this scenario the pages in the same node are
> sorted by their virtual address,
> so the interleaving would be applied
> in the order ABCFGHDEIJ. This would result in the following page
> placement
> 
> A -> node 0
> B -> node 0
> C -> node 0
> D -> node 0
> E -> node 0
> F -> node 0
> G -> node 1
> H -> node 0
> I -> node 0
> J -> node 1
> 
> So, four pages, D, E, F, G,  and I, have been migrated. However, if
> they were interleaved using their virtual addresses*, only pages D and
> I would have been migrated.

Thank you for this kind and detailed explanation.  Now I can understand your
concern.  To my understanding, one of the important assumptions on the example
is that pages of the DAMON region are sorted on both virtual address space and
physical address space.  I'm not very sure if that is a reasonable assumption.

Hence, in such virtual address-randomly mixed DAMON region cases, I think
physical address based approach could work better than when virtual
address-sorted case, and sometimes even make less migration than virtual
address-based case.  For example, let's say we found a DAMON region of 6 pages
in node 0.  And the virtual address-based interleaving targets of the 6 pages
are all node 1, while physical address-based interleaving target nodes are 3
pages for node 0 and other 3 pages for node 1 (weights 1:1).  Then virtual
address-based interleaving does 6 migrations while the other one does only 3
migrations.

So both approach has worst case scenario, and my gut feeling is that this might
not result in big difference as long as careful scheme aggressiveness control
is made (whether using quota auto-tuning or manual online DAMON parameters
tuning).  Of course it would be nice if we have good test results.

Meanwhile, assuming the DAMON regions on the physical address space may have
pages of random virtual addresses, I find another concern about the behavior
from users' perspective.  In my opinion, commonly expected behaviors for the
interface we are working together is, "DAMON will migrate pages of the regions
of the access pattern to give NUMA nodes with given weights." If there is a
DAMON region having random virtual addrss page, and the virtual address-based
target node selection results in a target nodes ratio that is different from
the user-specified weights (like above example of 6 pages), I think users could
be confused about the results.

To go in this direction, I think the interface should clearly explain the
behavior.  I should confess I concern even in the case, if the behavior is
unnecessarily complex for users and developers.

Alternatively, if we go fully virtual address based approach (monitor on
virtual address and operate with virtual address), I think the behavior would
be optimum and easier to expected by users.

> 
> * Technically, the mempolicy code interleaves based on the offset from
> the start of the VMA, but that difference doesn't change this example.
> 
> > > Finally, and I
> > > forgot to mention this in my last message, the round-robin approach
> > > does away with any notion of spatial locality, which does help the
> > > effectiveness of interleaving [1].
> 
> Elaborating more on this.
> As implied by the comment in [3], interleaving works better the finer
> grained it is done in virtual memory. As an extreme example, if you
> had weights of 3:2, putting the first 60% of a process's data in node
> 0 and the remaining 40% in node 1 would satisfy the ratio globally,
> but you would likely not see the benefits of interleaving. We can see
> in the example above that your round-robin approach does not maintain
> the desired interleave ratio locally, even though it does globally.

I'm not very agreed if keeping the ratio process-locally is always beneficial.
At least when people use DAMON in the physical address space, I'd assume their
concern is global memory bandwidth control.  And hence I imagine they would
care about the global ratio more than local ratio.

I agree there could be a case users care about process local ratio, though.  In
the case, I feel using virtual address based mode sounds more natural, unless
there is other reason to use physical address space mode.

> 
> > We could use the probabilistic interleaving, if this is the problem?
> 
> I don't think so. In the above example, with probabilistic
> interleaving, you would still migrate, on average, 20% of the pages in
> node 0 and 80% of the pages in node 1. Similarly, the probablistic
> interleaving also does not consider the virtual address, so it
> wouldn't maintain the interleave ratio locally in the virtual address
> space either.

Right.  But if the DAMON region is having random virtual address pages, it
wouldn't always result in the worst case.

> 
> > > I don't think anything done with
> > > quotas can get around that.
> >
> > I think I'm not getting your points well, sorry.  More elaboration of your
> > concern would be helpful.
> 
> I elaborated more above. Hopefully that clears up any confusion. If
> you still have questions, maybe it would be easier to e-meet and have
> a live discussion about it? I see you have a DAMON chat slot open
> tomorrow at 9:30 PT [4]. If you have nothing else scheduled, maybe
> that would be a good time to chat?

Nice suggestion, thank you.  For readers of this mail other than I and Bijan, I
and Bijan made an appointment for this off list.

> 
> [...]
> 
> > > I see where you're coming from. I think the crux of this difference is
> > > that in my use case, the set of nodes we are monitoring is the same as
> > > the set of nodes we are migrating to, while in the use case you
> > > describe, the set of nodes being monitored is disjoint from the set of
> > > migration target nodes.
> >
> > I understand and agree this difference.
> >
> > > I think this in particular makes ping ponging
> > > more of a problem for my use case, compared to promotion/demotion
> > > schemes.
> >
> > But again I'm failing at understanding this, sorry.  Could I ask more
> > elaborations?
> 
> Sure, and sorry for needing to elaborate so much.

No worry, thank you for walking with me.

> 
> What I was trying to say is that in the case where a scheme is
> monitoring the same nodes it is migrating to, when it detects a hot
> region, it will interleave the pages in the region between the nodes.
> If there are two nodes, and assuming the access pattern was uniform
> across the region, we have now turned one hot region into two. Using
> the algorithms you provided earlier, the next time the scheme is
> applied, it will interleave both of those regions again because the
> only information it has about where to place pages is how many pages
> it has previously interleaved.

You're correct, if we don't use a nodes imbalance state based scheme tuning 
approach.  With the approach, but, this will not happen?

> Using virtual addresses to interleave
> solves this problem by providing one and only one location a page
> should be in given a set of interleave weights.

Even in the case, DAMON will do rmap of the pages.  Even if it is entirely
virtual address mode, rmap walking is not needed, but DAMON should still
iterate each pages.  For large systems, that might be not negligible system
resource.  I think even if we use virtual address based target node selection,
or entirely virtual address mode, that kind of imbalance status based scheme
tuning could be beneficial.

> 
> When a scheme is monitoring one set of nodes and migrating to another
> disjoint set of nodes, you don't have this problem because once the
> pages are migrated, they won't be considered by the scheme until some
> other scheme moves those pages back into the monitored nodes.

Not technically, in my opinion.  In the tiering scenario, we have two schemes
migrating ho/cold pages to each other node.  Without quota or very good
hot/coldness thresholds, unnecessary pingpoing can happen.

> 
> Does that make sense?
> 
> > >
> > > > If you really need this virtual address space based
> > > > deterministic behavior, it would make more sense to use virtual address spaces
> > > > monitoring (damon-vaddr).
> > >
> > > Maybe it does make sense for me to implement vaddr versions of the
> > > migrate actions for my use case.
> >
> > Yes, that could also be an option.
> 
> Given how much my explanations here stressed that having access to the
> virtual addresses solves the problems I mentioned, I think the path
> forward for the next revision should be:
> 
> 1) Have the paddr migration scheme use the round-robin interleaving
> that you provided - This would be good for the use case you described
> where you promote pages from a node into multiple nodes of the same
> tier.

If you will not use that, I don't think you need to do that.  Let's work for
only our own selfish purposes ;)  I believe that is a better approach for long
term maintenance, too.

> 2) Implement a vaddr migration scheme that uses the virtual address
> based interleaving - This is useful for my target use case of
> balancing bandwidth utilization between nodes?
> 
> If the vaddr scheme proves insufficient down the line for my use case,
> we can have another discussion at that time.
> How does this sound to you?

I find no concern at fully virtual address based approach off the top of my
head.  And if you think it will work for you and you will use that, I think
this is the right way to go.  I also think based on our previous discussion,
this could work for your and other's multiple use cases.

We still have a few conflicting opinions about physical address monitoring
based approach.  I love having the discussions with you.  But I don't think
those can be blockers of 2).  So to me, 2) seems the right way to go.

> 
> > > One thing that gives me pause about
> > > this, is that, from what I understand, it would be harder to have
> > > vaddr schemes apply to processes that start after damon begins. I
> > > think to do that, one would have to detect when a process starts, and
> > > then do a damon tune to upgrade the targets list? It would be nice if,
> > > say, you could specify a cgroup as a vaddr target and track all
> > > processes in that cgroup, but that would be a different patchset for
> > > another day.
> >
> > I agree that could be a future thing to do.  Note that DAMON user-space tool
> > implements[1] a similar feature.
> 
> Thanks, I'll take a look at that.
> 
> [...]
> 
> Thanks again for the time you are spending on these discussions. I do
> appreciate it, and I hope I'm not taking up too much of your time.

I also appreciate your time and patience.  I see we are near to the alignment,
though, please bear in mind!

Thanks,
SJ

[...]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH v2 0/2] mm/damon/paddr: Allow interleaving in migrate_{hot,cold} actions
  2025-06-23 23:21     ` Bijan Tabatabai
@ 2025-06-26 19:13       ` Gregory Price
  0 siblings, 0 replies; 30+ messages in thread
From: Gregory Price @ 2025-06-26 19:13 UTC (permalink / raw)
  To: Bijan Tabatabai
  Cc: SeongJae Park, damon, linux-mm, linux-kernel, akpm, david, ziy,
	matthew.brost, joshua.hahnjy, rakie.kim, byungchul, ying.huang,
	apopple, bijantabatab, venkataravis, emirakhur, ajayjoshi,
	vtavarespetr

On Mon, Jun 23, 2025 at 06:21:20PM -0500, Bijan Tabatabai wrote:
> I'm not convinced this is true. If you have a workload that can
> saturate the local bandwidth but not the remote bandwidth, wouldn't
> you want the interleave weights to be more biased towards local memory
> than you would for a workload that can saturate both the local and
> remote bandwidth?
>

That sounds like an argument for task/process/cgroup-local weights, as
opposed to an argument to twiddle global weights.

These things need not solve all problems for everyone.

~Gregory


^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2025-06-26 19:13 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-20 18:04 [RFC PATCH v2 0/2] mm/damon/paddr: Allow interleaving in migrate_{hot,cold} actions Bijan Tabatabai
2025-06-20 18:04 ` [RFC PATCH v2 1/2] mm/mempolicy: Expose get_il_weight() to MM Bijan Tabatabai
2025-06-23 19:06   ` Gregory Price
2025-06-23 19:14   ` David Hildenbrand
2025-06-23 19:38     ` Gregory Price
2025-06-24 10:58   ` Huang, Ying
2025-06-20 18:04 ` [RFC PATCH v2 2/2] mm/damon/paddr: Allow multiple migrate targets Bijan Tabatabai
2025-06-21 18:02   ` SeongJae Park
2025-06-21 18:11     ` SeongJae Park
2025-06-23 14:08       ` Joshua Hahn
2025-06-23 16:50         ` SeongJae Park
2025-06-23 14:27       ` Bijan Tabatabai
2025-06-23 16:52         ` SeongJae Park
2025-06-23 14:16     ` Bijan Tabatabai
2025-06-23 17:52       ` SeongJae Park
2025-06-23 23:15         ` Bijan Tabatabai
2025-06-24  0:34           ` SeongJae Park
2025-06-24 16:01             ` Bijan Tabatabai
2025-06-24 22:33               ` SeongJae Park
2025-06-20 20:21 ` [RFC PATCH v2 0/2] mm/damon/paddr: Allow interleaving in migrate_{hot,cold} actions SeongJae Park
2025-06-20 21:47   ` Bijan Tabatabai
2025-06-20 23:13     ` SeongJae Park
2025-06-21 17:36       ` SeongJae Park
2025-06-23 14:39         ` Bijan Tabatabai
2025-06-23 16:32           ` SeongJae Park
2025-06-23 19:28   ` Gregory Price
2025-06-23 23:21     ` Bijan Tabatabai
2025-06-26 19:13       ` Gregory Price
2025-06-23 13:45 ` Joshua Hahn
2025-06-23 14:57   ` Bijan Tabatabai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).