* [RFC PATCH 1/4] mm/vmscan: refactor reclaim_pages with reclaim_or_migrate_folios
2024-01-15 4:52 [RFC PATCH 0/4] DAMON based 2-tier memory management for CXL memory Honggyu Kim
@ 2024-01-15 4:52 ` Honggyu Kim
2024-01-16 20:32 ` SeongJae Park
2024-01-15 4:52 ` [RFC PATCH 2/4] mm/damon: introduce DAMOS_DEMOTE action for demotion Honggyu Kim
` (3 subsequent siblings)
4 siblings, 1 reply; 15+ messages in thread
From: Honggyu Kim @ 2024-01-15 4:52 UTC (permalink / raw)
To: sj, damon, linux-mm
Cc: linux-trace-kernel, linux-kernel, kernel_team, akpm, apopple,
baolin.wang, dave.jiang, linmiaohe, lizhijian, mathieu.desnoyers,
mhiramat, rostedt, surenb, yangx.jy, ying.huang, ziy, Honggyu Kim
Since we will introduce reclaim_pages like functions such as
demote_pages and promote_pages, the most of the code can be shared.
This is a preparation patch that introduces reclaim_or_migrate_folios()
to cover all the logics, but it provides a handler for the different
actions.
No functional changes applied.
Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
---
mm/vmscan.c | 18 ++++++++++++------
1 file changed, 12 insertions(+), 6 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index bba207f41b14..7ca2396ccc3b 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2107,15 +2107,16 @@ static unsigned int reclaim_folio_list(struct list_head *folio_list,
return nr_reclaimed;
}
-unsigned long reclaim_pages(struct list_head *folio_list)
+static unsigned long reclaim_or_migrate_folios(struct list_head *folio_list,
+ unsigned int (*handler)(struct list_head *, struct pglist_data *))
{
int nid;
- unsigned int nr_reclaimed = 0;
+ unsigned int nr_folios = 0;
LIST_HEAD(node_folio_list);
unsigned int noreclaim_flag;
if (list_empty(folio_list))
- return nr_reclaimed;
+ return nr_folios;
noreclaim_flag = memalloc_noreclaim_save();
@@ -2129,15 +2130,20 @@ unsigned long reclaim_pages(struct list_head *folio_list)
continue;
}
- nr_reclaimed += reclaim_folio_list(&node_folio_list, NODE_DATA(nid));
+ nr_folios += handler(&node_folio_list, NODE_DATA(nid));
nid = folio_nid(lru_to_folio(folio_list));
} while (!list_empty(folio_list));
- nr_reclaimed += reclaim_folio_list(&node_folio_list, NODE_DATA(nid));
+ nr_folios += handler(&node_folio_list, NODE_DATA(nid));
memalloc_noreclaim_restore(noreclaim_flag);
- return nr_reclaimed;
+ return nr_folios;
+}
+
+unsigned long reclaim_pages(struct list_head *folio_list)
+{
+ return reclaim_or_migrate_folios(folio_list, reclaim_folio_list);
}
static unsigned long shrink_list(enum lru_list lru, unsigned long nr_to_scan,
--
2.34.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [RFC PATCH 1/4] mm/vmscan: refactor reclaim_pages with reclaim_or_migrate_folios
2024-01-15 4:52 ` [RFC PATCH 1/4] mm/vmscan: refactor reclaim_pages with reclaim_or_migrate_folios Honggyu Kim
@ 2024-01-16 20:32 ` SeongJae Park
0 siblings, 0 replies; 15+ messages in thread
From: SeongJae Park @ 2024-01-16 20:32 UTC (permalink / raw)
To: Honggyu Kim
Cc: sj, damon, linux-mm, linux-trace-kernel, linux-kernel,
kernel_team, akpm, apopple, baolin.wang, dave.jiang, linmiaohe,
lizhijian, mathieu.desnoyers, mhiramat, rostedt, surenb, yangx.jy,
ying.huang, ziy
On Mon, 15 Jan 2024 13:52:49 +0900 Honggyu Kim <honggyu.kim@sk.com> wrote:
> Since we will introduce reclaim_pages like functions such as
> demote_pages and promote_pages, the most of the code can be shared.
>
> This is a preparation patch that introduces reclaim_or_migrate_folios()
> to cover all the logics, but it provides a handler for the different
> actions.
>
> No functional changes applied.
>
> Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
> ---
> mm/vmscan.c | 18 ++++++++++++------
> 1 file changed, 12 insertions(+), 6 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index bba207f41b14..7ca2396ccc3b 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2107,15 +2107,16 @@ static unsigned int reclaim_folio_list(struct list_head *folio_list,
> return nr_reclaimed;
> }
>
> -unsigned long reclaim_pages(struct list_head *folio_list)
> +static unsigned long reclaim_or_migrate_folios(struct list_head *folio_list,
> + unsigned int (*handler)(struct list_head *, struct pglist_data *))
I'm not very sure if extending this function for general migration is the right
approach, since we have dedicated functions for the migration.
I'd like to hear others' opinions.
> {
> int nid;
> - unsigned int nr_reclaimed = 0;
> + unsigned int nr_folios = 0;
> LIST_HEAD(node_folio_list);
> unsigned int noreclaim_flag;
>
> if (list_empty(folio_list))
> - return nr_reclaimed;
> + return nr_folios;
>
> noreclaim_flag = memalloc_noreclaim_save();
>
> @@ -2129,15 +2130,20 @@ unsigned long reclaim_pages(struct list_head *folio_list)
> continue;
> }
>
> - nr_reclaimed += reclaim_folio_list(&node_folio_list, NODE_DATA(nid));
> + nr_folios += handler(&node_folio_list, NODE_DATA(nid));
> nid = folio_nid(lru_to_folio(folio_list));
> } while (!list_empty(folio_list));
>
> - nr_reclaimed += reclaim_folio_list(&node_folio_list, NODE_DATA(nid));
> + nr_folios += handler(&node_folio_list, NODE_DATA(nid));
>
> memalloc_noreclaim_restore(noreclaim_flag);
>
> - return nr_reclaimed;
> + return nr_folios;
> +}
> +
> +unsigned long reclaim_pages(struct list_head *folio_list)
> +{
> + return reclaim_or_migrate_folios(folio_list, reclaim_folio_list);
> }
>
> static unsigned long shrink_list(enum lru_list lru, unsigned long nr_to_scan,
> --
> 2.34.1
^ permalink raw reply [flat|nested] 15+ messages in thread
* [RFC PATCH 2/4] mm/damon: introduce DAMOS_DEMOTE action for demotion
2024-01-15 4:52 [RFC PATCH 0/4] DAMON based 2-tier memory management for CXL memory Honggyu Kim
2024-01-15 4:52 ` [RFC PATCH 1/4] mm/vmscan: refactor reclaim_pages with reclaim_or_migrate_folios Honggyu Kim
@ 2024-01-15 4:52 ` Honggyu Kim
2024-01-16 20:32 ` SeongJae Park
2024-01-15 4:52 ` [RFC PATCH 3/4] mm/memory-tiers: add next_promotion_node to find promotion target Honggyu Kim
` (2 subsequent siblings)
4 siblings, 1 reply; 15+ messages in thread
From: Honggyu Kim @ 2024-01-15 4:52 UTC (permalink / raw)
To: sj, damon, linux-mm
Cc: linux-trace-kernel, linux-kernel, kernel_team, akpm, apopple,
baolin.wang, dave.jiang, linmiaohe, lizhijian, mathieu.desnoyers,
mhiramat, rostedt, surenb, yangx.jy, ying.huang, ziy, Honggyu Kim
This patch introduces DAMOS_DEMOTE action, which is similar to
DAMOS_PAGEOUT, but demote folios instead of swapping them out.
Since there are some common routines with pageout, many functions have
similar logics between pageout and demote.
The execution sequence of DAMOS_PAGEOUT and DAMOS_DEMOTE look as follows.
DAMOS_PAGEOUT action
damo_pa_apply_scheme
-> damon_pa_reclaim
-> reclaim_pages
-> reclaim_folio_list
-> shrink_folio_list
DAMOS_DEMOTE action
damo_pa_apply_scheme
-> damon_pa_reclaim
-> demote_pages
-> do_demote_folio_list
-> __demote_folio_list
-> demote_folio_list
__demote_folio_list() is a minimized version of shrink_folio_list(), but
it's minified only for demotion.
Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
---
include/linux/damon.h | 2 +
mm/damon/paddr.c | 17 +++++---
mm/damon/sysfs-schemes.c | 1 +
mm/internal.h | 1 +
mm/vmscan.c | 84 ++++++++++++++++++++++++++++++++++++++++
5 files changed, 99 insertions(+), 6 deletions(-)
diff --git a/include/linux/damon.h b/include/linux/damon.h
index e00ddf1ed39c..4c0a0fef09c5 100644
--- a/include/linux/damon.h
+++ b/include/linux/damon.h
@@ -106,6 +106,7 @@ struct damon_target {
* @DAMOS_LRU_PRIO: Prioritize the region on its LRU lists.
* @DAMOS_LRU_DEPRIO: Deprioritize the region on its LRU lists.
* @DAMOS_STAT: Do nothing but count the stat.
+ * @DAMOS_DEMOTE: Do demotion for the current region.
* @NR_DAMOS_ACTIONS: Total number of DAMOS actions
*
* The support of each action is up to running &struct damon_operations.
@@ -123,6 +124,7 @@ enum damos_action {
DAMOS_LRU_PRIO,
DAMOS_LRU_DEPRIO,
DAMOS_STAT, /* Do nothing but only record the stat */
+ DAMOS_DEMOTE,
NR_DAMOS_ACTIONS,
};
diff --git a/mm/damon/paddr.c b/mm/damon/paddr.c
index 081e2a325778..d3e3f077cd00 100644
--- a/mm/damon/paddr.c
+++ b/mm/damon/paddr.c
@@ -224,7 +224,7 @@ static bool damos_pa_filter_out(struct damos *scheme, struct folio *folio)
return false;
}
-static unsigned long damon_pa_pageout(struct damon_region *r, struct damos *s)
+static unsigned long damon_pa_reclaim(struct damon_region *r, struct damos *s, bool is_demote)
{
unsigned long addr, applied;
LIST_HEAD(folio_list);
@@ -242,14 +242,17 @@ static unsigned long damon_pa_pageout(struct damon_region *r, struct damos *s)
folio_test_clear_young(folio);
if (!folio_isolate_lru(folio))
goto put_folio;
- if (folio_test_unevictable(folio))
+ if (folio_test_unevictable(folio) && !is_demote)
folio_putback_lru(folio);
else
list_add(&folio->lru, &folio_list);
put_folio:
folio_put(folio);
}
- applied = reclaim_pages(&folio_list);
+ if (is_demote)
+ applied = demote_pages(&folio_list);
+ else
+ applied = reclaim_pages(&folio_list);
cond_resched();
return applied * PAGE_SIZE;
}
@@ -297,13 +300,15 @@ static unsigned long damon_pa_apply_scheme(struct damon_ctx *ctx,
{
switch (scheme->action) {
case DAMOS_PAGEOUT:
- return damon_pa_pageout(r, scheme);
+ return damon_pa_reclaim(r, scheme, false);
case DAMOS_LRU_PRIO:
return damon_pa_mark_accessed(r, scheme);
case DAMOS_LRU_DEPRIO:
return damon_pa_deactivate_pages(r, scheme);
case DAMOS_STAT:
break;
+ case DAMOS_DEMOTE:
+ return damon_pa_reclaim(r, scheme, true);
default:
/* DAMOS actions that not yet supported by 'paddr'. */
break;
@@ -317,11 +322,11 @@ static int damon_pa_scheme_score(struct damon_ctx *context,
{
switch (scheme->action) {
case DAMOS_PAGEOUT:
+ case DAMOS_LRU_DEPRIO:
+ case DAMOS_DEMOTE:
return damon_cold_score(context, r, scheme);
case DAMOS_LRU_PRIO:
return damon_hot_score(context, r, scheme);
- case DAMOS_LRU_DEPRIO:
- return damon_cold_score(context, r, scheme);
default:
break;
}
diff --git a/mm/damon/sysfs-schemes.c b/mm/damon/sysfs-schemes.c
index fe0fe2562000..ac7cd3f17b12 100644
--- a/mm/damon/sysfs-schemes.c
+++ b/mm/damon/sysfs-schemes.c
@@ -1187,6 +1187,7 @@ static const char * const damon_sysfs_damos_action_strs[] = {
"lru_prio",
"lru_deprio",
"stat",
+ "demote",
};
static struct damon_sysfs_scheme *damon_sysfs_scheme_alloc(
diff --git a/mm/internal.h b/mm/internal.h
index b61034bd50f5..2380397ec2f3 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -869,6 +869,7 @@ extern void set_pageblock_order(void);
unsigned long reclaim_pages(struct list_head *folio_list);
unsigned int reclaim_clean_pages_from_list(struct zone *zone,
struct list_head *folio_list);
+unsigned long demote_pages(struct list_head *folio_list);
/* The ALLOC_WMARK bits are used as an index to zone->watermark */
#define ALLOC_WMARK_MIN WMARK_MIN
#define ALLOC_WMARK_LOW WMARK_LOW
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 7ca2396ccc3b..eaa3dd6b7562 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -998,6 +998,66 @@ static bool may_enter_fs(struct folio *folio, gfp_t gfp_mask)
return !data_race(folio_swap_flags(folio) & SWP_FS_OPS);
}
+/*
+ * __demote_folio_list() returns the number of demoted pages
+ */
+static unsigned int __demote_folio_list(struct list_head *folio_list,
+ struct pglist_data *pgdat, struct scan_control *sc)
+{
+ LIST_HEAD(ret_folios);
+ LIST_HEAD(demote_folios);
+ unsigned int nr_demoted = 0;
+
+ if (next_demotion_node(pgdat->node_id) == NUMA_NO_NODE)
+ return 0;
+
+ cond_resched();
+
+ while (!list_empty(folio_list)) {
+ struct folio *folio;
+ enum folio_references references;
+
+ cond_resched();
+
+ folio = lru_to_folio(folio_list);
+ list_del(&folio->lru);
+
+ if (!folio_trylock(folio))
+ goto keep;
+
+ VM_BUG_ON_FOLIO(folio_test_active(folio), folio);
+
+ references = folio_check_references(folio, sc);
+ if (references == FOLIOREF_KEEP)
+ goto keep_locked;
+
+ /* Relocate its contents to another node. */
+ list_add(&folio->lru, &demote_folios);
+ folio_unlock(folio);
+ continue;
+keep_locked:
+ folio_unlock(folio);
+keep:
+ list_add(&folio->lru, &ret_folios);
+ VM_BUG_ON_FOLIO(folio_test_lru(folio), folio);
+ }
+ /* 'folio_list' is always empty here */
+
+ /* Migrate folios selected for demotion */
+ nr_demoted += demote_folio_list(&demote_folios, pgdat);
+ /* Folios that could not be demoted are still in @demote_folios */
+ if (!list_empty(&demote_folios)) {
+ /* Folios which weren't demoted go back on @folio_list */
+ list_splice_init(&demote_folios, folio_list);
+ }
+
+ try_to_unmap_flush();
+
+ list_splice(&ret_folios, folio_list);
+
+ return nr_demoted;
+}
+
/*
* shrink_folio_list() returns the number of reclaimed pages
*/
@@ -2107,6 +2167,25 @@ static unsigned int reclaim_folio_list(struct list_head *folio_list,
return nr_reclaimed;
}
+static unsigned int do_demote_folio_list(struct list_head *folio_list,
+ struct pglist_data *pgdat)
+{
+ unsigned int nr_demoted;
+ struct folio *folio;
+ struct scan_control sc = {
+ .gfp_mask = GFP_KERNEL,
+ };
+
+ nr_demoted = __demote_folio_list(folio_list, pgdat, &sc);
+ while (!list_empty(folio_list)) {
+ folio = lru_to_folio(folio_list);
+ list_del(&folio->lru);
+ folio_putback_lru(folio);
+ }
+
+ return nr_demoted;
+}
+
static unsigned long reclaim_or_migrate_folios(struct list_head *folio_list,
unsigned int (*handler)(struct list_head *, struct pglist_data *))
{
@@ -2146,6 +2225,11 @@ unsigned long reclaim_pages(struct list_head *folio_list)
return reclaim_or_migrate_folios(folio_list, reclaim_folio_list);
}
+unsigned long demote_pages(struct list_head *folio_list)
+{
+ return reclaim_or_migrate_folios(folio_list, do_demote_folio_list);
+}
+
static unsigned long shrink_list(enum lru_list lru, unsigned long nr_to_scan,
struct lruvec *lruvec, struct scan_control *sc)
{
--
2.34.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [RFC PATCH 2/4] mm/damon: introduce DAMOS_DEMOTE action for demotion
2024-01-15 4:52 ` [RFC PATCH 2/4] mm/damon: introduce DAMOS_DEMOTE action for demotion Honggyu Kim
@ 2024-01-16 20:32 ` SeongJae Park
0 siblings, 0 replies; 15+ messages in thread
From: SeongJae Park @ 2024-01-16 20:32 UTC (permalink / raw)
To: Honggyu Kim
Cc: sj, damon, linux-mm, linux-trace-kernel, linux-kernel,
kernel_team, akpm, apopple, baolin.wang, dave.jiang, linmiaohe,
lizhijian, mathieu.desnoyers, mhiramat, rostedt, surenb, yangx.jy,
ying.huang, ziy
On Mon, 15 Jan 2024 13:52:50 +0900 Honggyu Kim <honggyu.kim@sk.com> wrote:
> This patch introduces DAMOS_DEMOTE action, which is similar to
> DAMOS_PAGEOUT, but demote folios instead of swapping them out.
>
> Since there are some common routines with pageout, many functions have
> similar logics between pageout and demote.
>
> The execution sequence of DAMOS_PAGEOUT and DAMOS_DEMOTE look as follows.
>
> DAMOS_PAGEOUT action
> damo_pa_apply_scheme
Nit. s/damo/damon/
> -> damon_pa_reclaim
> -> reclaim_pages
> -> reclaim_folio_list
> -> shrink_folio_list
>
> DAMOS_DEMOTE action
> damo_pa_apply_scheme
Ditto.
> -> damon_pa_reclaim
> -> demote_pages
> -> do_demote_folio_list
> -> __demote_folio_list
> -> demote_folio_list
I think implementation of 'demote_pages()' might better to be separated.
I'm also feeling the naming a bit strange, since I was usually thinking '__'
prefix is for functions that will internally used. That is, I'd assume
__demote_folio_list() is called from demote_folio_list(), but this function is
doing in an opposite way.
>
> __demote_folio_list() is a minimized version of shrink_folio_list(), but
> it's minified only for demotion.
>
> Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
> ---
> include/linux/damon.h | 2 +
> mm/damon/paddr.c | 17 +++++---
> mm/damon/sysfs-schemes.c | 1 +
> mm/internal.h | 1 +
> mm/vmscan.c | 84 ++++++++++++++++++++++++++++++++++++++++
> 5 files changed, 99 insertions(+), 6 deletions(-)
>
> diff --git a/include/linux/damon.h b/include/linux/damon.h
> index e00ddf1ed39c..4c0a0fef09c5 100644
> --- a/include/linux/damon.h
> +++ b/include/linux/damon.h
> @@ -106,6 +106,7 @@ struct damon_target {
> * @DAMOS_LRU_PRIO: Prioritize the region on its LRU lists.
> * @DAMOS_LRU_DEPRIO: Deprioritize the region on its LRU lists.
> * @DAMOS_STAT: Do nothing but count the stat.
> + * @DAMOS_DEMOTE: Do demotion for the current region.
I'd prefer defining DEMOTE before STAT, like we introduced LRU_PRIO/DEPRIO
after STAT but defined there. It would help keeping the two different groups
of operations separated (STAT is different from other actions since it is not
for makeing real changes but only get statistics and monitoring results
querying).
> * @NR_DAMOS_ACTIONS: Total number of DAMOS actions
> *
> * The support of each action is up to running &struct damon_operations.
> @@ -123,6 +124,7 @@ enum damos_action {
> DAMOS_LRU_PRIO,
> DAMOS_LRU_DEPRIO,
> DAMOS_STAT, /* Do nothing but only record the stat */
> + DAMOS_DEMOTE,
Ditto.
> NR_DAMOS_ACTIONS,
> };
>
> diff --git a/mm/damon/paddr.c b/mm/damon/paddr.c
> index 081e2a325778..d3e3f077cd00 100644
> --- a/mm/damon/paddr.c
> +++ b/mm/damon/paddr.c
> @@ -224,7 +224,7 @@ static bool damos_pa_filter_out(struct damos *scheme, struct folio *folio)
> return false;
> }
>
> -static unsigned long damon_pa_pageout(struct damon_region *r, struct damos *s)
> +static unsigned long damon_pa_reclaim(struct damon_region *r, struct damos *s, bool is_demote)
I understand that reclamation could include both pageout and demotion, but not
sure if that is making its purpose clearer or more ambiguous. What about
renaming to '..._demote_or_pageout()', like
'damon_pa_mark_accessed_or_deactivate()'? Also, 'is_demote' could be simply
'demote'.
I think having a separate function, say, damon_pa_demote() is also ok, if it
makes code easier to read and not intorduce too much duplicated lines of code.
Also, I'd prefer keeping the 80 columns limit[1] by breaking this line.
[1] https://docs.kernel.org/process/coding-style.html?highlight=coding+style#breaking-long-lines-and-strings
> {
> unsigned long addr, applied;
> LIST_HEAD(folio_list);
> @@ -242,14 +242,17 @@ static unsigned long damon_pa_pageout(struct damon_region *r, struct damos *s)
> folio_test_clear_young(folio);
> if (!folio_isolate_lru(folio))
> goto put_folio;
> - if (folio_test_unevictable(folio))
> + if (folio_test_unevictable(folio) && !is_demote)
> folio_putback_lru(folio);
> else
> list_add(&folio->lru, &folio_list);
> put_folio:
> folio_put(folio);
> }
> - applied = reclaim_pages(&folio_list);
> + if (is_demote)
> + applied = demote_pages(&folio_list);
> + else
> + applied = reclaim_pages(&folio_list);
> cond_resched();
> return applied * PAGE_SIZE;
> }
> @@ -297,13 +300,15 @@ static unsigned long damon_pa_apply_scheme(struct damon_ctx *ctx,
> {
> switch (scheme->action) {
> case DAMOS_PAGEOUT:
> - return damon_pa_pageout(r, scheme);
> + return damon_pa_reclaim(r, scheme, false);
> case DAMOS_LRU_PRIO:
> return damon_pa_mark_accessed(r, scheme);
> case DAMOS_LRU_DEPRIO:
> return damon_pa_deactivate_pages(r, scheme);
> case DAMOS_STAT:
> break;
> + case DAMOS_DEMOTE:
> + return damon_pa_reclaim(r, scheme, true);
I'd like to keep the order of the branches aligned with that of 'enum
damos_action'.
> default:
> /* DAMOS actions that not yet supported by 'paddr'. */
> break;
> @@ -317,11 +322,11 @@ static int damon_pa_scheme_score(struct damon_ctx *context,
> {
> switch (scheme->action) {
> case DAMOS_PAGEOUT:
> + case DAMOS_LRU_DEPRIO:
> + case DAMOS_DEMOTE:
> return damon_cold_score(context, r, scheme);
> case DAMOS_LRU_PRIO:
> return damon_hot_score(context, r, scheme);
> - case DAMOS_LRU_DEPRIO:
> - return damon_cold_score(context, r, scheme);
I'd slightly prefer having yet another dedicated 'case' branch for
DAMOS_DEMOTE, to keep the order same to that of 'enum damos_action', and the
old style.
> default:
> break;
> }
> diff --git a/mm/damon/sysfs-schemes.c b/mm/damon/sysfs-schemes.c
> index fe0fe2562000..ac7cd3f17b12 100644
> --- a/mm/damon/sysfs-schemes.c
> +++ b/mm/damon/sysfs-schemes.c
> @@ -1187,6 +1187,7 @@ static const char * const damon_sysfs_damos_action_strs[] = {
> "lru_prio",
> "lru_deprio",
> "stat",
> + "demote",
If we define DEMOTE before STAT on enum damos_action, this would also need to
be updated.
> };
>
> static struct damon_sysfs_scheme *damon_sysfs_scheme_alloc(
> diff --git a/mm/internal.h b/mm/internal.h
> index b61034bd50f5..2380397ec2f3 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -869,6 +869,7 @@ extern void set_pageblock_order(void);
> unsigned long reclaim_pages(struct list_head *folio_list);
> unsigned int reclaim_clean_pages_from_list(struct zone *zone,
> struct list_head *folio_list);
> +unsigned long demote_pages(struct list_head *folio_list);
> /* The ALLOC_WMARK bits are used as an index to zone->watermark */
> #define ALLOC_WMARK_MIN WMARK_MIN
> #define ALLOC_WMARK_LOW WMARK_LOW
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 7ca2396ccc3b..eaa3dd6b7562 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -998,6 +998,66 @@ static bool may_enter_fs(struct folio *folio, gfp_t gfp_mask)
> return !data_race(folio_swap_flags(folio) & SWP_FS_OPS);
> }
>
> +/*
> + * __demote_folio_list() returns the number of demoted pages
> + */
> +static unsigned int __demote_folio_list(struct list_head *folio_list,
> + struct pglist_data *pgdat, struct scan_control *sc)
> +{
> + LIST_HEAD(ret_folios);
> + LIST_HEAD(demote_folios);
> + unsigned int nr_demoted = 0;
> +
> + if (next_demotion_node(pgdat->node_id) == NUMA_NO_NODE)
> + return 0;
> +
> + cond_resched();
> +
> + while (!list_empty(folio_list)) {
> + struct folio *folio;
> + enum folio_references references;
> +
> + cond_resched();
> +
> + folio = lru_to_folio(folio_list);
> + list_del(&folio->lru);
> +
> + if (!folio_trylock(folio))
> + goto keep;
> +
> + VM_BUG_ON_FOLIO(folio_test_active(folio), folio);
> +
> + references = folio_check_references(folio, sc);
> + if (references == FOLIOREF_KEEP)
> + goto keep_locked;
> +
> + /* Relocate its contents to another node. */
> + list_add(&folio->lru, &demote_folios);
> + folio_unlock(folio);
> + continue;
> +keep_locked:
> + folio_unlock(folio);
> +keep:
> + list_add(&folio->lru, &ret_folios);
> + VM_BUG_ON_FOLIO(folio_test_lru(folio), folio);
> + }
> + /* 'folio_list' is always empty here */
> +
> + /* Migrate folios selected for demotion */
> + nr_demoted += demote_folio_list(&demote_folios, pgdat);
As mentioned above, I was assuming demote_folio_list() would call
__demote_folio_list(). I think the name of this function might better to be
changed.
> + /* Folios that could not be demoted are still in @demote_folios */
> + if (!list_empty(&demote_folios)) {
> + /* Folios which weren't demoted go back on @folio_list */
> + list_splice_init(&demote_folios, folio_list);
> + }
> +
> + try_to_unmap_flush();
> +
> + list_splice(&ret_folios, folio_list);
> +
> + return nr_demoted;
> +}
> +
> /*
> * shrink_folio_list() returns the number of reclaimed pages
> */
> @@ -2107,6 +2167,25 @@ static unsigned int reclaim_folio_list(struct list_head *folio_list,
> return nr_reclaimed;
> }
>
> +static unsigned int do_demote_folio_list(struct list_head *folio_list,
> + struct pglist_data *pgdat)
> +{
> + unsigned int nr_demoted;
> + struct folio *folio;
> + struct scan_control sc = {
> + .gfp_mask = GFP_KERNEL,
> + };
> +
> + nr_demoted = __demote_folio_list(folio_list, pgdat, &sc);
> + while (!list_empty(folio_list)) {
> + folio = lru_to_folio(folio_list);
> + list_del(&folio->lru);
> + folio_putback_lru(folio);
> + }
> +
> + return nr_demoted;
> +}
> +
> static unsigned long reclaim_or_migrate_folios(struct list_head *folio_list,
> unsigned int (*handler)(struct list_head *, struct pglist_data *))
> {
> @@ -2146,6 +2225,11 @@ unsigned long reclaim_pages(struct list_head *folio_list)
> return reclaim_or_migrate_folios(folio_list, reclaim_folio_list);
> }
>
> +unsigned long demote_pages(struct list_head *folio_list)
> +{
> + return reclaim_or_migrate_folios(folio_list, do_demote_folio_list);
> +}
> +
If DAMON is the only user of this function and we have no plan for adding more
user of this function, I think implementing this in mm/damon/ might make sense.
> static unsigned long shrink_list(enum lru_list lru, unsigned long nr_to_scan,
> struct lruvec *lruvec, struct scan_control *sc)
> {
> --
> 2.34.1
^ permalink raw reply [flat|nested] 15+ messages in thread
* [RFC PATCH 3/4] mm/memory-tiers: add next_promotion_node to find promotion target
2024-01-15 4:52 [RFC PATCH 0/4] DAMON based 2-tier memory management for CXL memory Honggyu Kim
2024-01-15 4:52 ` [RFC PATCH 1/4] mm/vmscan: refactor reclaim_pages with reclaim_or_migrate_folios Honggyu Kim
2024-01-15 4:52 ` [RFC PATCH 2/4] mm/damon: introduce DAMOS_DEMOTE action for demotion Honggyu Kim
@ 2024-01-15 4:52 ` Honggyu Kim
2024-01-16 20:32 ` SeongJae Park
2024-01-15 4:52 ` [RFC PATCH 4/4] mm/damon: introduce DAMOS_PROMOTE action for promotion Honggyu Kim
2024-01-16 20:31 ` [RFC PATCH 0/4] DAMON based 2-tier memory management for CXL memory SeongJae Park
4 siblings, 1 reply; 15+ messages in thread
From: Honggyu Kim @ 2024-01-15 4:52 UTC (permalink / raw)
To: sj, damon, linux-mm
Cc: linux-trace-kernel, linux-kernel, kernel_team, akpm, apopple,
baolin.wang, dave.jiang, linmiaohe, lizhijian, mathieu.desnoyers,
mhiramat, rostedt, surenb, yangx.jy, ying.huang, ziy,
Hyeongtak Ji
From: Hyeongtak Ji <hyeongtak.ji@sk.com>
This patch adds next_promotion_node that can be used to identify the
appropriate promotion target based on memory tiers. When multiple
promotion target nodes are available, the nearest node is selected based
on numa distance.
Signed-off-by: Hyeongtak Ji <hyeongtak.ji@sk.com>
---
include/linux/memory-tiers.h | 11 +++++++++
mm/memory-tiers.c | 43 ++++++++++++++++++++++++++++++++++++
2 files changed, 54 insertions(+)
diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h
index 1e39d27bee41..0788e435fc50 100644
--- a/include/linux/memory-tiers.h
+++ b/include/linux/memory-tiers.h
@@ -50,6 +50,7 @@ int mt_set_default_dram_perf(int nid, struct node_hmem_attrs *perf,
int mt_perf_to_adistance(struct node_hmem_attrs *perf, int *adist);
#ifdef CONFIG_MIGRATION
int next_demotion_node(int node);
+int next_promotion_node(int node);
void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets);
bool node_is_toptier(int node);
#else
@@ -58,6 +59,11 @@ static inline int next_demotion_node(int node)
return NUMA_NO_NODE;
}
+static inline int next_promotion_node(int node)
+{
+ return NUMA_NO_NODE;
+}
+
static inline void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets)
{
*targets = NODE_MASK_NONE;
@@ -101,6 +107,11 @@ static inline int next_demotion_node(int node)
return NUMA_NO_NODE;
}
+static inline int next_promotion_node(int node)
+{
+ return NUMA_NO_NODE;
+}
+
static inline void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets)
{
*targets = NODE_MASK_NONE;
diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
index 8d5291add2bc..0060ee571cf4 100644
--- a/mm/memory-tiers.c
+++ b/mm/memory-tiers.c
@@ -335,6 +335,49 @@ int next_demotion_node(int node)
return target;
}
+/*
+ * Select a promotion target that is close to the from node among the given
+ * two nodes.
+ *
+ * TODO: consider other decision policy as node_distance may not be precise.
+ */
+static int select_promotion_target(int a, int b, int from)
+{
+ if (node_distance(from, a) < node_distance(from, b))
+ return a;
+ else
+ return b;
+}
+
+/**
+ * next_promotion_node() - Get the next node in the promotion path
+ * @node: The starting node to lookup the next node
+ *
+ * Return: node id for next memory node in the promotion path hierarchy
+ * from @node; NUMA_NO_NODE if @node is the toptier.
+ */
+int next_promotion_node(int node)
+{
+ int target = NUMA_NO_NODE;
+ int nid;
+
+ if (node_is_toptier(node))
+ return NUMA_NO_NODE;
+
+ rcu_read_lock();
+ for_each_node_state(nid, N_MEMORY) {
+ if (node_isset(node, node_demotion[nid].preferred)) {
+ if (target == NUMA_NO_NODE)
+ target = nid;
+ else
+ target = select_promotion_target(nid, target, node);
+ }
+ }
+ rcu_read_unlock();
+
+ return target;
+}
+
static void disable_all_demotion_targets(void)
{
struct memory_tier *memtier;
--
2.34.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [RFC PATCH 3/4] mm/memory-tiers: add next_promotion_node to find promotion target
2024-01-15 4:52 ` [RFC PATCH 3/4] mm/memory-tiers: add next_promotion_node to find promotion target Honggyu Kim
@ 2024-01-16 20:32 ` SeongJae Park
0 siblings, 0 replies; 15+ messages in thread
From: SeongJae Park @ 2024-01-16 20:32 UTC (permalink / raw)
To: Honggyu Kim
Cc: sj, damon, linux-mm, linux-trace-kernel, linux-kernel,
kernel_team, akpm, apopple, baolin.wang, dave.jiang, linmiaohe,
lizhijian, mathieu.desnoyers, mhiramat, rostedt, surenb, yangx.jy,
ying.huang, ziy, Hyeongtak Ji
On Mon, 15 Jan 2024 13:52:51 +0900 Honggyu Kim <honggyu.kim@sk.com> wrote:
> From: Hyeongtak Ji <hyeongtak.ji@sk.com>
>
> This patch adds next_promotion_node that can be used to identify the
> appropriate promotion target based on memory tiers. When multiple
> promotion target nodes are available, the nearest node is selected based
> on numa distance.
>
> Signed-off-by: Hyeongtak Ji <hyeongtak.ji@sk.com>
> ---
> include/linux/memory-tiers.h | 11 +++++++++
> mm/memory-tiers.c | 43 ++++++++++++++++++++++++++++++++++++
> 2 files changed, 54 insertions(+)
>
> diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h
> index 1e39d27bee41..0788e435fc50 100644
> --- a/include/linux/memory-tiers.h
> +++ b/include/linux/memory-tiers.h
> @@ -50,6 +50,7 @@ int mt_set_default_dram_perf(int nid, struct node_hmem_attrs *perf,
> int mt_perf_to_adistance(struct node_hmem_attrs *perf, int *adist);
> #ifdef CONFIG_MIGRATION
> int next_demotion_node(int node);
> +int next_promotion_node(int node);
> void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets);
> bool node_is_toptier(int node);
> #else
> @@ -58,6 +59,11 @@ static inline int next_demotion_node(int node)
> return NUMA_NO_NODE;
> }
>
> +static inline int next_promotion_node(int node)
> +{
> + return NUMA_NO_NODE;
> +}
> +
> static inline void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets)
> {
> *targets = NODE_MASK_NONE;
> @@ -101,6 +107,11 @@ static inline int next_demotion_node(int node)
> return NUMA_NO_NODE;
> }
>
> +static inline int next_promotion_node(int node)
> +{
> + return NUMA_NO_NODE;
> +}
> +
> static inline void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets)
> {
> *targets = NODE_MASK_NONE;
> diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
> index 8d5291add2bc..0060ee571cf4 100644
> --- a/mm/memory-tiers.c
> +++ b/mm/memory-tiers.c
> @@ -335,6 +335,49 @@ int next_demotion_node(int node)
> return target;
> }
>
> +/*
> + * Select a promotion target that is close to the from node among the given
> + * two nodes.
> + *
> + * TODO: consider other decision policy as node_distance may not be precise.
> + */
> +static int select_promotion_target(int a, int b, int from)
> +{
> + if (node_distance(from, a) < node_distance(from, b))
> + return a;
> + else
> + return b;
> +}
> +
> +/**
> + * next_promotion_node() - Get the next node in the promotion path
> + * @node: The starting node to lookup the next node
> + *
> + * Return: node id for next memory node in the promotion path hierarchy
> + * from @node; NUMA_NO_NODE if @node is the toptier.
> + */
> +int next_promotion_node(int node)
> +{
> + int target = NUMA_NO_NODE;
> + int nid;
> +
> + if (node_is_toptier(node))
> + return NUMA_NO_NODE;
> +
> + rcu_read_lock();
> + for_each_node_state(nid, N_MEMORY) {
> + if (node_isset(node, node_demotion[nid].preferred)) {
> + if (target == NUMA_NO_NODE)
> + target = nid;
> + else
> + target = select_promotion_target(nid, target, node);
> + }
> + }
> + rcu_read_unlock();
> +
> + return target;
> +}
> +
If this is gonna used by only DAMON and we don't have a concrete plan to making
this used by others, I think implementing this in mm/damon/ might make sense.
> static void disable_all_demotion_targets(void)
> {
> struct memory_tier *memtier;
> --
> 2.34.1
^ permalink raw reply [flat|nested] 15+ messages in thread
* [RFC PATCH 4/4] mm/damon: introduce DAMOS_PROMOTE action for promotion
2024-01-15 4:52 [RFC PATCH 0/4] DAMON based 2-tier memory management for CXL memory Honggyu Kim
` (2 preceding siblings ...)
2024-01-15 4:52 ` [RFC PATCH 3/4] mm/memory-tiers: add next_promotion_node to find promotion target Honggyu Kim
@ 2024-01-15 4:52 ` Honggyu Kim
2024-01-16 20:32 ` SeongJae Park
2024-01-16 20:31 ` [RFC PATCH 0/4] DAMON based 2-tier memory management for CXL memory SeongJae Park
4 siblings, 1 reply; 15+ messages in thread
From: Honggyu Kim @ 2024-01-15 4:52 UTC (permalink / raw)
To: sj, damon, linux-mm
Cc: linux-trace-kernel, linux-kernel, kernel_team, akpm, apopple,
baolin.wang, dave.jiang, linmiaohe, lizhijian, mathieu.desnoyers,
mhiramat, rostedt, surenb, yangx.jy, ying.huang, ziy,
Hyeongtak Ji, Honggyu Kim
From: Hyeongtak Ji <hyeongtak.ji@sk.com>
This patch introduces DAMOS_PROMOTE action for paddr mode.
It includes renaming alloc_demote_folio to alloc_migrate_folio to use it
for promotion as well.
The execution sequence of DAMOS_DEMOTE and DAMOS_PROMOTE look as
follows for comparison.
DAMOS_DEMOTE action
damo_pa_apply_scheme
-> damon_pa_reclaim
-> demote_pages
-> do_demote_folio_list
-> __demote_folio_list
-> demote_folio_list
DAMOS_PROMOTE action
damo_pa_apply_scheme
-> damon_pa_promote
-> promote_pages
-> do_promote_folio_list
-> __promote_folio_list
-> promote_folio_list
Signed-off-by: Hyeongtak Ji <hyeongtak.ji@sk.com>
Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
---
include/linux/damon.h | 2 +
include/linux/migrate_mode.h | 1 +
include/linux/vm_event_item.h | 1 +
include/trace/events/migrate.h | 3 +-
mm/damon/paddr.c | 29 ++++++++
mm/damon/sysfs-schemes.c | 1 +
mm/internal.h | 1 +
mm/vmscan.c | 129 ++++++++++++++++++++++++++++++++-
mm/vmstat.c | 1 +
9 files changed, 165 insertions(+), 3 deletions(-)
diff --git a/include/linux/damon.h b/include/linux/damon.h
index 4c0a0fef09c5..477060bb6718 100644
--- a/include/linux/damon.h
+++ b/include/linux/damon.h
@@ -107,6 +107,7 @@ struct damon_target {
* @DAMOS_LRU_DEPRIO: Deprioritize the region on its LRU lists.
* @DAMOS_STAT: Do nothing but count the stat.
* @DAMOS_DEMOTE: Do demotion for the current region.
+ * @DAMOS_PROMOTE: Do promotion if possible, otherwise do nothing.
* @NR_DAMOS_ACTIONS: Total number of DAMOS actions
*
* The support of each action is up to running &struct damon_operations.
@@ -125,6 +126,7 @@ enum damos_action {
DAMOS_LRU_DEPRIO,
DAMOS_STAT, /* Do nothing but only record the stat */
DAMOS_DEMOTE,
+ DAMOS_PROMOTE,
NR_DAMOS_ACTIONS,
};
diff --git a/include/linux/migrate_mode.h b/include/linux/migrate_mode.h
index f37cc03f9369..63f75eb9abf3 100644
--- a/include/linux/migrate_mode.h
+++ b/include/linux/migrate_mode.h
@@ -29,6 +29,7 @@ enum migrate_reason {
MR_CONTIG_RANGE,
MR_LONGTERM_PIN,
MR_DEMOTION,
+ MR_PROMOTION,
MR_TYPES
};
diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
index 8abfa1240040..63cf920afeaa 100644
--- a/include/linux/vm_event_item.h
+++ b/include/linux/vm_event_item.h
@@ -44,6 +44,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
PGDEMOTE_KSWAPD,
PGDEMOTE_DIRECT,
PGDEMOTE_KHUGEPAGED,
+ PGPROMOTE,
PGSCAN_KSWAPD,
PGSCAN_DIRECT,
PGSCAN_KHUGEPAGED,
diff --git a/include/trace/events/migrate.h b/include/trace/events/migrate.h
index 0190ef725b43..f0dd569c1e62 100644
--- a/include/trace/events/migrate.h
+++ b/include/trace/events/migrate.h
@@ -22,7 +22,8 @@
EM( MR_NUMA_MISPLACED, "numa_misplaced") \
EM( MR_CONTIG_RANGE, "contig_range") \
EM( MR_LONGTERM_PIN, "longterm_pin") \
- EMe(MR_DEMOTION, "demotion")
+ EM( MR_DEMOTION, "demotion") \
+ EMe(MR_PROMOTION, "promotion")
/*
* First define the enums in the above macros to be exported to userspace
diff --git a/mm/damon/paddr.c b/mm/damon/paddr.c
index d3e3f077cd00..360ce69d5898 100644
--- a/mm/damon/paddr.c
+++ b/mm/damon/paddr.c
@@ -257,6 +257,32 @@ static unsigned long damon_pa_reclaim(struct damon_region *r, struct damos *s, b
return applied * PAGE_SIZE;
}
+static unsigned long damon_pa_promote(struct damon_region *r, struct damos *s)
+{
+ unsigned long addr, applied;
+ LIST_HEAD(folio_list);
+
+ for (addr = r->ar.start; addr < r->ar.end; addr += PAGE_SIZE) {
+ struct folio *folio = damon_get_folio(PHYS_PFN(addr));
+
+ if (!folio)
+ continue;
+
+ if (damos_pa_filter_out(s, folio))
+ goto put_folio;
+
+ if (!folio_isolate_lru(folio))
+ goto put_folio;
+
+ list_add(&folio->lru, &folio_list);
+put_folio:
+ folio_put(folio);
+ }
+ applied = promote_pages(&folio_list);
+ cond_resched();
+ return applied * PAGE_SIZE;
+}
+
static inline unsigned long damon_pa_mark_accessed_or_deactivate(
struct damon_region *r, struct damos *s, bool mark_accessed)
{
@@ -309,6 +335,8 @@ static unsigned long damon_pa_apply_scheme(struct damon_ctx *ctx,
break;
case DAMOS_DEMOTE:
return damon_pa_reclaim(r, scheme, true);
+ case DAMOS_PROMOTE:
+ return damon_pa_promote(r, scheme);
default:
/* DAMOS actions that not yet supported by 'paddr'. */
break;
@@ -326,6 +354,7 @@ static int damon_pa_scheme_score(struct damon_ctx *context,
case DAMOS_DEMOTE:
return damon_cold_score(context, r, scheme);
case DAMOS_LRU_PRIO:
+ case DAMOS_PROMOTE:
return damon_hot_score(context, r, scheme);
default:
break;
diff --git a/mm/damon/sysfs-schemes.c b/mm/damon/sysfs-schemes.c
index ac7cd3f17b12..1b84d0af7e1f 100644
--- a/mm/damon/sysfs-schemes.c
+++ b/mm/damon/sysfs-schemes.c
@@ -1188,6 +1188,7 @@ static const char * const damon_sysfs_damos_action_strs[] = {
"lru_deprio",
"stat",
"demote",
+ "promote",
};
static struct damon_sysfs_scheme *damon_sysfs_scheme_alloc(
diff --git a/mm/internal.h b/mm/internal.h
index 2380397ec2f3..f159455e63d4 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -870,6 +870,7 @@ unsigned long reclaim_pages(struct list_head *folio_list);
unsigned int reclaim_clean_pages_from_list(struct zone *zone,
struct list_head *folio_list);
unsigned long demote_pages(struct list_head *folio_list);
+unsigned long promote_pages(struct list_head *folio_list);
/* The ALLOC_WMARK bits are used as an index to zone->watermark */
#define ALLOC_WMARK_MIN WMARK_MIN
#define ALLOC_WMARK_LOW WMARK_LOW
diff --git a/mm/vmscan.c b/mm/vmscan.c
index eaa3dd6b7562..f03be320f9ad 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -910,7 +910,7 @@ static void folio_check_dirty_writeback(struct folio *folio,
mapping->a_ops->is_dirty_writeback(folio, dirty, writeback);
}
-static struct folio *alloc_demote_folio(struct folio *src,
+static struct folio *alloc_migrate_folio(struct folio *src,
unsigned long private)
{
struct folio *dst;
@@ -973,7 +973,7 @@ static unsigned int demote_folio_list(struct list_head *demote_folios,
node_get_allowed_targets(pgdat, &allowed_mask);
/* Demotion ignores all cpuset and mempolicy settings */
- migrate_pages(demote_folios, alloc_demote_folio, NULL,
+ migrate_pages(demote_folios, alloc_migrate_folio, NULL,
(unsigned long)&mtc, MIGRATE_ASYNC, MR_DEMOTION,
&nr_succeeded);
@@ -982,6 +982,48 @@ static unsigned int demote_folio_list(struct list_head *demote_folios,
return nr_succeeded;
}
+/*
+ * Take folios on @promote_folios and attempt to promote them to another node.
+ * Folios which are not promoted are left on @promote_folios.
+ */
+static unsigned int promote_folio_list(struct list_head *promote_folios,
+ struct pglist_data *pgdat)
+{
+ int target_nid = next_promotion_node(pgdat->node_id);
+ unsigned int nr_succeeded;
+ nodemask_t allowed_mask = NODE_MASK_NONE;
+
+ struct migration_target_control mtc = {
+ /*
+ * Allocate from 'node', or fail quickly and quietly.
+ * When this happens, 'page' will likely be stayed
+ * instead of migrated.
+ */
+ .gfp_mask = (GFP_HIGHUSER_MOVABLE & ~__GFP_RECLAIM) | __GFP_NOWARN |
+ __GFP_NOMEMALLOC | GFP_NOWAIT,
+ .nid = target_nid,
+ .nmask = &allowed_mask
+ };
+
+ if (pgdat->node_id == target_nid)
+ return 0;
+
+ if (list_empty(promote_folios))
+ return 0;
+
+ if (target_nid == NUMA_NO_NODE)
+ return 0;
+
+ /* Promotion ignores all cpuset and mempolicy settings */
+ migrate_pages(promote_folios, alloc_migrate_folio, NULL,
+ (unsigned long)&mtc, MIGRATE_ASYNC, MR_PROMOTION,
+ &nr_succeeded);
+
+ __count_vm_events(PGPROMOTE, nr_succeeded);
+
+ return nr_succeeded;
+}
+
static bool may_enter_fs(struct folio *folio, gfp_t gfp_mask)
{
if (gfp_mask & __GFP_FS)
@@ -1058,6 +1100,65 @@ static unsigned int __demote_folio_list(struct list_head *folio_list,
return nr_demoted;
}
+/*
+ * __promote_folio_list() returns the number of promoted pages
+ */
+static unsigned int __promote_folio_list(struct list_head *folio_list,
+ struct pglist_data *pgdat, struct scan_control *sc)
+{
+ LIST_HEAD(ret_folios);
+ LIST_HEAD(promote_folios);
+ unsigned int nr_promoted = 0;
+
+ cond_resched();
+
+ while (!list_empty(folio_list)) {
+ struct folio *folio;
+ enum folio_references references;
+
+ cond_resched();
+
+ folio = lru_to_folio(folio_list);
+ list_del(&folio->lru);
+
+ if (!folio_trylock(folio))
+ goto keep;
+
+ VM_BUG_ON_FOLIO(folio_test_active(folio), folio);
+
+ references = folio_check_references(folio, sc);
+ if (references == FOLIOREF_KEEP ||
+ references == FOLIOREF_RECLAIM ||
+ references == FOLIOREF_RECLAIM_CLEAN)
+ goto keep_locked;
+
+ /* Relocate its contents to another node. */
+ list_add(&folio->lru, &promote_folios);
+ folio_unlock(folio);
+ continue;
+keep_locked:
+ folio_unlock(folio);
+keep:
+ list_add(&folio->lru, &ret_folios);
+ VM_BUG_ON_FOLIO(folio_test_lru(folio), folio);
+ }
+ /* 'folio_list' is always empty here */
+
+ /* Migrate folios selected for promotion */
+ nr_promoted += promote_folio_list(&promote_folios, pgdat);
+ /* Folios that could not be promoted are still in @promote_folios */
+ if (!list_empty(&promote_folios)) {
+ /* Folios which weren't promoted go back on @folio_list */
+ list_splice_init(&promote_folios, folio_list);
+ }
+
+ try_to_unmap_flush();
+
+ list_splice(&ret_folios, folio_list);
+
+ return nr_promoted;
+}
+
/*
* shrink_folio_list() returns the number of reclaimed pages
*/
@@ -2186,6 +2287,25 @@ static unsigned int do_demote_folio_list(struct list_head *folio_list,
return nr_demoted;
}
+static unsigned int do_promote_folio_list(struct list_head *folio_list,
+ struct pglist_data *pgdat)
+{
+ unsigned int nr_promoted;
+ struct folio *folio;
+ struct scan_control sc = {
+ .gfp_mask = GFP_KERNEL,
+ };
+
+ nr_promoted = __promote_folio_list(folio_list, pgdat, &sc);
+ while (!list_empty(folio_list)) {
+ folio = lru_to_folio(folio_list);
+ list_del(&folio->lru);
+ folio_putback_lru(folio);
+ }
+
+ return nr_promoted;
+}
+
static unsigned long reclaim_or_migrate_folios(struct list_head *folio_list,
unsigned int (*handler)(struct list_head *, struct pglist_data *))
{
@@ -2230,6 +2350,11 @@ unsigned long demote_pages(struct list_head *folio_list)
return reclaim_or_migrate_folios(folio_list, do_demote_folio_list);
}
+unsigned long promote_pages(struct list_head *folio_list)
+{
+ return reclaim_or_migrate_folios(folio_list, do_promote_folio_list);
+}
+
static unsigned long shrink_list(enum lru_list lru, unsigned long nr_to_scan,
struct lruvec *lruvec, struct scan_control *sc)
{
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 359460deb377..c703abdb8137 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1282,6 +1282,7 @@ const char * const vmstat_text[] = {
"pgdemote_kswapd",
"pgdemote_direct",
"pgdemote_khugepaged",
+ "pgpromote",
"pgscan_kswapd",
"pgscan_direct",
"pgscan_khugepaged",
--
2.34.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [RFC PATCH 4/4] mm/damon: introduce DAMOS_PROMOTE action for promotion
2024-01-15 4:52 ` [RFC PATCH 4/4] mm/damon: introduce DAMOS_PROMOTE action for promotion Honggyu Kim
@ 2024-01-16 20:32 ` SeongJae Park
0 siblings, 0 replies; 15+ messages in thread
From: SeongJae Park @ 2024-01-16 20:32 UTC (permalink / raw)
To: Honggyu Kim
Cc: sj, damon, linux-mm, linux-trace-kernel, linux-kernel,
kernel_team, akpm, apopple, baolin.wang, dave.jiang, linmiaohe,
lizhijian, mathieu.desnoyers, mhiramat, rostedt, surenb, yangx.jy,
ying.huang, ziy, Hyeongtak Ji
On Mon, 15 Jan 2024 13:52:52 +0900 Honggyu Kim <honggyu.kim@sk.com> wrote:
> From: Hyeongtak Ji <hyeongtak.ji@sk.com>
>
> This patch introduces DAMOS_PROMOTE action for paddr mode.
>
> It includes renaming alloc_demote_folio to alloc_migrate_folio to use it
> for promotion as well.
>
> The execution sequence of DAMOS_DEMOTE and DAMOS_PROMOTE look as
> follows for comparison.
>
> DAMOS_DEMOTE action
> damo_pa_apply_scheme
> -> damon_pa_reclaim
> -> demote_pages
> -> do_demote_folio_list
> -> __demote_folio_list
> -> demote_folio_list
>
> DAMOS_PROMOTE action
> damo_pa_apply_scheme
> -> damon_pa_promote
> -> promote_pages
> -> do_promote_folio_list
> -> __promote_folio_list
> -> promote_folio_list
>
> Signed-off-by: Hyeongtak Ji <hyeongtak.ji@sk.com>
> Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
> ---
> include/linux/damon.h | 2 +
> include/linux/migrate_mode.h | 1 +
> include/linux/vm_event_item.h | 1 +
> include/trace/events/migrate.h | 3 +-
> mm/damon/paddr.c | 29 ++++++++
> mm/damon/sysfs-schemes.c | 1 +
> mm/internal.h | 1 +
> mm/vmscan.c | 129 ++++++++++++++++++++++++++++++++-
> mm/vmstat.c | 1 +
> 9 files changed, 165 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/damon.h b/include/linux/damon.h
> index 4c0a0fef09c5..477060bb6718 100644
> --- a/include/linux/damon.h
> +++ b/include/linux/damon.h
> @@ -107,6 +107,7 @@ struct damon_target {
> * @DAMOS_LRU_DEPRIO: Deprioritize the region on its LRU lists.
> * @DAMOS_STAT: Do nothing but count the stat.
> * @DAMOS_DEMOTE: Do demotion for the current region.
> + * @DAMOS_PROMOTE: Do promotion if possible, otherwise do nothing.
Like LRU_PRIO is defined before LRU_DEPRIO, what about defining PROMOTE before
DEMOTE?
> * @NR_DAMOS_ACTIONS: Total number of DAMOS actions
> *
> * The support of each action is up to running &struct damon_operations.
> @@ -125,6 +126,7 @@ enum damos_action {
> DAMOS_LRU_DEPRIO,
> DAMOS_STAT, /* Do nothing but only record the stat */
> DAMOS_DEMOTE,
> + DAMOS_PROMOTE,
> NR_DAMOS_ACTIONS,
> };
>
> diff --git a/include/linux/migrate_mode.h b/include/linux/migrate_mode.h
> index f37cc03f9369..63f75eb9abf3 100644
> --- a/include/linux/migrate_mode.h
> +++ b/include/linux/migrate_mode.h
> @@ -29,6 +29,7 @@ enum migrate_reason {
> MR_CONTIG_RANGE,
> MR_LONGTERM_PIN,
> MR_DEMOTION,
> + MR_PROMOTION,
> MR_TYPES
> };
>
> diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
> index 8abfa1240040..63cf920afeaa 100644
> --- a/include/linux/vm_event_item.h
> +++ b/include/linux/vm_event_item.h
> @@ -44,6 +44,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
> PGDEMOTE_KSWAPD,
> PGDEMOTE_DIRECT,
> PGDEMOTE_KHUGEPAGED,
> + PGPROMOTE,
> PGSCAN_KSWAPD,
> PGSCAN_DIRECT,
> PGSCAN_KHUGEPAGED,
> diff --git a/include/trace/events/migrate.h b/include/trace/events/migrate.h
> index 0190ef725b43..f0dd569c1e62 100644
> --- a/include/trace/events/migrate.h
> +++ b/include/trace/events/migrate.h
> @@ -22,7 +22,8 @@
> EM( MR_NUMA_MISPLACED, "numa_misplaced") \
> EM( MR_CONTIG_RANGE, "contig_range") \
> EM( MR_LONGTERM_PIN, "longterm_pin") \
> - EMe(MR_DEMOTION, "demotion")
> + EM( MR_DEMOTION, "demotion") \
> + EMe(MR_PROMOTION, "promotion")
>
> /*
> * First define the enums in the above macros to be exported to userspace
> diff --git a/mm/damon/paddr.c b/mm/damon/paddr.c
> index d3e3f077cd00..360ce69d5898 100644
> --- a/mm/damon/paddr.c
> +++ b/mm/damon/paddr.c
> @@ -257,6 +257,32 @@ static unsigned long damon_pa_reclaim(struct damon_region *r, struct damos *s, b
> return applied * PAGE_SIZE;
> }
>
> +static unsigned long damon_pa_promote(struct damon_region *r, struct damos *s)
> +{
> + unsigned long addr, applied;
> + LIST_HEAD(folio_list);
> +
> + for (addr = r->ar.start; addr < r->ar.end; addr += PAGE_SIZE) {
> + struct folio *folio = damon_get_folio(PHYS_PFN(addr));
> +
> + if (!folio)
> + continue;
> +
> + if (damos_pa_filter_out(s, folio))
> + goto put_folio;
> +
> + if (!folio_isolate_lru(folio))
> + goto put_folio;
> +
> + list_add(&folio->lru, &folio_list);
> +put_folio:
> + folio_put(folio);
> + }
> + applied = promote_pages(&folio_list);
> + cond_resched();
> + return applied * PAGE_SIZE;
> +}
> +
> static inline unsigned long damon_pa_mark_accessed_or_deactivate(
> struct damon_region *r, struct damos *s, bool mark_accessed)
> {
> @@ -309,6 +335,8 @@ static unsigned long damon_pa_apply_scheme(struct damon_ctx *ctx,
> break;
> case DAMOS_DEMOTE:
> return damon_pa_reclaim(r, scheme, true);
> + case DAMOS_PROMOTE:
> + return damon_pa_promote(r, scheme);
> default:
> /* DAMOS actions that not yet supported by 'paddr'. */
> break;
> @@ -326,6 +354,7 @@ static int damon_pa_scheme_score(struct damon_ctx *context,
> case DAMOS_DEMOTE:
> return damon_cold_score(context, r, scheme);
> case DAMOS_LRU_PRIO:
> + case DAMOS_PROMOTE:
> return damon_hot_score(context, r, scheme);
As mentioned on the previous patch, I'd prefer keeping the order of operations
and having dedicated branches for each operation.
> default:
> break;
> diff --git a/mm/damon/sysfs-schemes.c b/mm/damon/sysfs-schemes.c
> index ac7cd3f17b12..1b84d0af7e1f 100644
> --- a/mm/damon/sysfs-schemes.c
> +++ b/mm/damon/sysfs-schemes.c
> @@ -1188,6 +1188,7 @@ static const char * const damon_sysfs_damos_action_strs[] = {
> "lru_deprio",
> "stat",
> "demote",
> + "promote",
> };
>
> static struct damon_sysfs_scheme *damon_sysfs_scheme_alloc(
> diff --git a/mm/internal.h b/mm/internal.h
> index 2380397ec2f3..f159455e63d4 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -870,6 +870,7 @@ unsigned long reclaim_pages(struct list_head *folio_list);
> unsigned int reclaim_clean_pages_from_list(struct zone *zone,
> struct list_head *folio_list);
> unsigned long demote_pages(struct list_head *folio_list);
> +unsigned long promote_pages(struct list_head *folio_list);
> /* The ALLOC_WMARK bits are used as an index to zone->watermark */
> #define ALLOC_WMARK_MIN WMARK_MIN
> #define ALLOC_WMARK_LOW WMARK_LOW
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index eaa3dd6b7562..f03be320f9ad 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -910,7 +910,7 @@ static void folio_check_dirty_writeback(struct folio *folio,
> mapping->a_ops->is_dirty_writeback(folio, dirty, writeback);
> }
>
> -static struct folio *alloc_demote_folio(struct folio *src,
> +static struct folio *alloc_migrate_folio(struct folio *src,
> unsigned long private)
As also mentioned on the previous patch, I'm unsure if vmscan.c is the right
place for general migration.
> {
> struct folio *dst;
> @@ -973,7 +973,7 @@ static unsigned int demote_folio_list(struct list_head *demote_folios,
> node_get_allowed_targets(pgdat, &allowed_mask);
>
> /* Demotion ignores all cpuset and mempolicy settings */
> - migrate_pages(demote_folios, alloc_demote_folio, NULL,
> + migrate_pages(demote_folios, alloc_migrate_folio, NULL,
> (unsigned long)&mtc, MIGRATE_ASYNC, MR_DEMOTION,
> &nr_succeeded);
>
> @@ -982,6 +982,48 @@ static unsigned int demote_folio_list(struct list_head *demote_folios,
> return nr_succeeded;
> }
>
> +/*
> + * Take folios on @promote_folios and attempt to promote them to another node.
> + * Folios which are not promoted are left on @promote_folios.
> + */
> +static unsigned int promote_folio_list(struct list_head *promote_folios,
> + struct pglist_data *pgdat)
> +{
> + int target_nid = next_promotion_node(pgdat->node_id);
> + unsigned int nr_succeeded;
> + nodemask_t allowed_mask = NODE_MASK_NONE;
> +
> + struct migration_target_control mtc = {
> + /*
> + * Allocate from 'node', or fail quickly and quietly.
> + * When this happens, 'page' will likely be stayed
> + * instead of migrated.
> + */
> + .gfp_mask = (GFP_HIGHUSER_MOVABLE & ~__GFP_RECLAIM) | __GFP_NOWARN |
> + __GFP_NOMEMALLOC | GFP_NOWAIT,
> + .nid = target_nid,
> + .nmask = &allowed_mask
> + };
> +
> + if (pgdat->node_id == target_nid)
> + return 0;
> +
> + if (list_empty(promote_folios))
> + return 0;
> +
> + if (target_nid == NUMA_NO_NODE)
> + return 0;
> +
> + /* Promotion ignores all cpuset and mempolicy settings */
> + migrate_pages(promote_folios, alloc_migrate_folio, NULL,
> + (unsigned long)&mtc, MIGRATE_ASYNC, MR_PROMOTION,
> + &nr_succeeded);
> +
> + __count_vm_events(PGPROMOTE, nr_succeeded);
> +
> + return nr_succeeded;
> +}
> +
> static bool may_enter_fs(struct folio *folio, gfp_t gfp_mask)
> {
> if (gfp_mask & __GFP_FS)
> @@ -1058,6 +1100,65 @@ static unsigned int __demote_folio_list(struct list_head *folio_list,
> return nr_demoted;
> }
>
> +/*
> + * __promote_folio_list() returns the number of promoted pages
> + */
> +static unsigned int __promote_folio_list(struct list_head *folio_list,
> + struct pglist_data *pgdat, struct scan_control *sc)
> +{
> + LIST_HEAD(ret_folios);
> + LIST_HEAD(promote_folios);
> + unsigned int nr_promoted = 0;
> +
> + cond_resched();
> +
> + while (!list_empty(folio_list)) {
> + struct folio *folio;
> + enum folio_references references;
> +
> + cond_resched();
> +
> + folio = lru_to_folio(folio_list);
> + list_del(&folio->lru);
> +
> + if (!folio_trylock(folio))
> + goto keep;
> +
> + VM_BUG_ON_FOLIO(folio_test_active(folio), folio);
> +
> + references = folio_check_references(folio, sc);
> + if (references == FOLIOREF_KEEP ||
> + references == FOLIOREF_RECLAIM ||
> + references == FOLIOREF_RECLAIM_CLEAN)
> + goto keep_locked;
> +
> + /* Relocate its contents to another node. */
> + list_add(&folio->lru, &promote_folios);
> + folio_unlock(folio);
> + continue;
> +keep_locked:
> + folio_unlock(folio);
> +keep:
> + list_add(&folio->lru, &ret_folios);
> + VM_BUG_ON_FOLIO(folio_test_lru(folio), folio);
> + }
> + /* 'folio_list' is always empty here */
> +
> + /* Migrate folios selected for promotion */
> + nr_promoted += promote_folio_list(&promote_folios, pgdat);
Again as mentioned on the previous patch, I was thinking promote_folios_list()
may call __promote_folio_list(). Making the __ prefix usage consistent with
other functions might be better, in my opinion.
> + /* Folios that could not be promoted are still in @promote_folios */
> + if (!list_empty(&promote_folios)) {
> + /* Folios which weren't promoted go back on @folio_list */
> + list_splice_init(&promote_folios, folio_list);
> + }
> +
> + try_to_unmap_flush();
> +
> + list_splice(&ret_folios, folio_list);
> +
> + return nr_promoted;
> +}
> +
> /*
> * shrink_folio_list() returns the number of reclaimed pages
> */
> @@ -2186,6 +2287,25 @@ static unsigned int do_demote_folio_list(struct list_head *folio_list,
> return nr_demoted;
> }
>
> +static unsigned int do_promote_folio_list(struct list_head *folio_list,
> + struct pglist_data *pgdat)
> +{
> + unsigned int nr_promoted;
> + struct folio *folio;
> + struct scan_control sc = {
> + .gfp_mask = GFP_KERNEL,
> + };
> +
> + nr_promoted = __promote_folio_list(folio_list, pgdat, &sc);
> + while (!list_empty(folio_list)) {
> + folio = lru_to_folio(folio_list);
> + list_del(&folio->lru);
> + folio_putback_lru(folio);
> + }
> +
> + return nr_promoted;
> +}
> +
> static unsigned long reclaim_or_migrate_folios(struct list_head *folio_list,
> unsigned int (*handler)(struct list_head *, struct pglist_data *))
> {
> @@ -2230,6 +2350,11 @@ unsigned long demote_pages(struct list_head *folio_list)
> return reclaim_or_migrate_folios(folio_list, do_demote_folio_list);
> }
>
> +unsigned long promote_pages(struct list_head *folio_list)
> +{
> + return reclaim_or_migrate_folios(folio_list, do_promote_folio_list);
> +}
> +
> static unsigned long shrink_list(enum lru_list lru, unsigned long nr_to_scan,
> struct lruvec *lruvec, struct scan_control *sc)
> {
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 359460deb377..c703abdb8137 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -1282,6 +1282,7 @@ const char * const vmstat_text[] = {
> "pgdemote_kswapd",
> "pgdemote_direct",
> "pgdemote_khugepaged",
> + "pgpromote",
> "pgscan_kswapd",
> "pgscan_direct",
> "pgscan_khugepaged",
> --
> 2.34.1
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH 0/4] DAMON based 2-tier memory management for CXL memory
2024-01-15 4:52 [RFC PATCH 0/4] DAMON based 2-tier memory management for CXL memory Honggyu Kim
` (3 preceding siblings ...)
2024-01-15 4:52 ` [RFC PATCH 4/4] mm/damon: introduce DAMOS_PROMOTE action for promotion Honggyu Kim
@ 2024-01-16 20:31 ` SeongJae Park
2024-01-17 11:49 ` Honggyu Kim
4 siblings, 1 reply; 15+ messages in thread
From: SeongJae Park @ 2024-01-16 20:31 UTC (permalink / raw)
To: Honggyu Kim
Cc: sj, damon, linux-mm, linux-trace-kernel, linux-kernel,
kernel_team, akpm, apopple, baolin.wang, dave.jiang, linmiaohe,
lizhijian, mathieu.desnoyers, mhiramat, rostedt, surenb, yangx.jy,
ying.huang, ziy, Hyeongtak Ji, Rakie Kim
Hello,
On Mon, 15 Jan 2024 13:52:48 +0900 Honggyu Kim <honggyu.kim@sk.com> wrote:
> There was an RFC IDEA "DAMOS-based Tiered-Memory Management" previously
> posted at [1].
>
> It says there is no implementation of the demote/promote DAMOS action
> are made. This RFC is about its implementation for physical address
> space.
>
[...]
> Evaluation Results
> ==================
>
[...]
> In summary of both results, our evaluation shows that "DAMON 2-tier"
> memory management reduces the performance slowdown compared to the
> "default" memory policy from 15~17% to 4~5% when the system runs with
> high memory pressure on its fast tier DRAM nodes.
>
> The similar evaluation was done in another machine that has 256GB of
> local DRAM and 96GB of CXL memory. The performance slowdown is reduced
> from 20~24% for "default" to 5~7% for "DAMON 2-tier".
>
> Having these DAMOS_DEMOTE and DAMOS_PROMOTE actions can make 2-tier
> memory systems run more efficiently under high memory pressures.
Thank you so much for this great patches and the above nice test results. I
believe the test setup and results make sense, and merging a revised version of
this patchset would provide real benefits to the users.
In a high level, I think it might better to separate DAMON internal changes
from DAMON external changes.
For DAMON part changes, I have no big concern other than trivial coding style
level comments.
For DAMON-external changes that implementing demote_pages() and
promote_pages(), I'm unsure if the implementation is reusing appropriate
functions, and if those are placee in right source file. Especially, I'm
unsure if vmscan.c is the right place for promotion code. Also I don't know if
there is a good agreement on the promotion/demotion target node decision. That
should be because I'm not that familiar with the areas and the files, but I
feel this might because our discussions on the promotion and the demotion
operations are having rooms for being more matured. Because I'm not very
faimiliar with the part, I'd like to hear others' comments, too.
To this end, I feel the problem might be able to be simpler, because this
patchset is trying to provide two sophisticated operations, while I think a
simpler approach might be possible. My humble simpler idea is adding a DAMOS
operation for moving pages to a given node (like sys_move_phy_pages RFC[1]),
instead of the promote/demote. Because the general pages migration can handle
multiple cases including the promote/demote in my humble assumption. In more
detail, users could decide which is the appropriate node for promotion or
demotion and use the new DAMOS action to do promotion and demotion. Users
would requested to decide which node is the proper promotion/demotion target
nodes, but that decision wouldn't be that hard in my opinion.
For this, 'struct damos' would need to be updated for such argument-dependent
actions, like 'struct damos_filter' is haing a union.
In future, we could extend the operation to the promotion and the demotion
after the dicussion around the promotion and demotion is matured, if required.
And assuming DAMON be extended for originating CPU-aware access monitoring, the
new DAMOS action would also cover more use cases such as general NUMA nodes
balancing (extending DAMON for CPU-aware monitoring would required), and some
complex configurations where having both CPU affinity and tiered memory. I
also think that may well fit with my RFC idea[2] for tiered memory management.
Looking forward to opinions from you and others. I admig I miss many things,
and more than happy to be enlightened.
[1] https://lwn.net/Articles/944007/
[2] https://lore.kernel.org/damon/20231112195602.61525-1-sj@kernel.org/
Thanks,
SJ
>
> Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
> Signed-off-by: Hyeongtak Ji <hyeongtak.ji@sk.com>
> Signed-off-by: Rakie Kim <rakie.kim@sk.com>
>
> [1] https://lore.kernel.org/damon/20231112195602.61525-1-sj@kernel.org
> [2] https://github.com/skhynix/hmsdk
> [3] https://github.com/redis/redis/tree/7.0.0
> [4] https://github.com/brianfrankcooper/YCSB/tree/0.17.0
> [5] https://dl.acm.org/doi/10.1145/3503222.3507731
> [6] https://dl.acm.org/doi/10.1145/3582016.3582063
>
> Honggyu Kim (2):
> mm/vmscan: refactor reclaim_pages with reclaim_or_migrate_folios
> mm/damon: introduce DAMOS_DEMOTE action for demotion
>
> Hyeongtak Ji (2):
> mm/memory-tiers: add next_promotion_node to find promotion target
> mm/damon: introduce DAMOS_PROMOTE action for promotion
>
> include/linux/damon.h | 4 +
> include/linux/memory-tiers.h | 11 ++
> include/linux/migrate_mode.h | 1 +
> include/linux/vm_event_item.h | 1 +
> include/trace/events/migrate.h | 3 +-
> mm/damon/paddr.c | 46 ++++++-
> mm/damon/sysfs-schemes.c | 2 +
> mm/internal.h | 2 +
> mm/memory-tiers.c | 43 ++++++
> mm/vmscan.c | 231 +++++++++++++++++++++++++++++++--
> mm/vmstat.c | 1 +
> 11 files changed, 330 insertions(+), 15 deletions(-)
>
>
> base-commit: 0dd3ee31125508cd67f7e7172247f05b7fd1753a
> --
> 2.34.1
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH 0/4] DAMON based 2-tier memory management for CXL memory
2024-01-16 20:31 ` [RFC PATCH 0/4] DAMON based 2-tier memory management for CXL memory SeongJae Park
@ 2024-01-17 11:49 ` Honggyu Kim
2024-01-17 21:11 ` SeongJae Park
0 siblings, 1 reply; 15+ messages in thread
From: Honggyu Kim @ 2024-01-17 11:49 UTC (permalink / raw)
To: sj
Cc: akpm, apopple, baolin.wang, damon, dave.jiang, honggyu.kim,
hyeongtak.ji, kernel_team, linmiaohe, linux-kernel, linux-mm,
linux-trace-kernel, lizhijian, mathieu.desnoyers, mhiramat,
rakie.kim, rostedt, surenb, yangx.jy, ying.huang, ziy
Hi SeongJae,
Thanks very much for your comments in details.
On Tue, 16 Jan 2024 12:31:59 -0800 SeongJae Park <sj@kernel.org> wrote:
> Thank you so much for this great patches and the above nice test results. I
> believe the test setup and results make sense, and merging a revised version of
> this patchset would provide real benefits to the users.
Glad to hear that!
> In a high level, I think it might better to separate DAMON internal changes
> from DAMON external changes.
I agree. I can't guarantee but I can move all the external changes
inside mm/damon, but will try that as much as possible.
> For DAMON part changes, I have no big concern other than trivial coding style
> level comments.
Sure. I will fix those.
> For DAMON-external changes that implementing demote_pages() and
> promote_pages(), I'm unsure if the implementation is reusing appropriate
> functions, and if those are placee in right source file. Especially, I'm
> unsure if vmscan.c is the right place for promotion code. Also I don't know if
> there is a good agreement on the promotion/demotion target node decision. That
> should be because I'm not that familiar with the areas and the files, but I
> feel this might because our discussions on the promotion and the demotion
> operations are having rooms for being more matured. Because I'm not very
> faimiliar with the part, I'd like to hear others' comments, too.
I would also like to hear others' comments, but this might not be needed
if most of external code can be moved to mm/damon.
> To this end, I feel the problem might be able te be simpler, because this
> patchset is trying to provide two sophisticated operations, while I think a
> simpler approach might be possible. My humble simpler idea is adding a DAMOS
> operation for moving pages to a given node (like sys_move_phy_pages RFC[1]),
> instead of the promote/demote. Because the general pages migration can handle
> multiple cases including the promote/demote in my humble assumption.
My initial implementation was similar but I found that it's not accurate
enough due to the nature of inaccuracy of DAMON regions. I saw that
many pages were demoted and promoted back and forth because migration
target regions include both hot and cold pages together.
So I have implemented the demotion and promotion logics based on the
shrink_folio_list, which contains many corner case handling logics for
reclaim.
Having the current demotion and promotion logics makes the hot/cold
migration pretty accurate as expected. We made a simple program called
"hot_cold" and it receives 2 arguments for hot size and cold size in MB.
For example, "hot_cold 200 500" allocates 200MB of hot memory and 500MB
of cold memory. It basically allocates 2 large blocks of memory with
mmap, then repeat memset for the initial 200MB to make it accessed in an
infinite loop.
Let's say there are 3 nodes in the system and the first node0 and node1
are the first tier, and node2 is the second tier.
$ cat /sys/devices/virtual/memory_tiering/memory_tier4/nodelist
0-1
$ cat /sys/devices/virtual/memory_tiering/memory_tier22/nodelist
2
Here is the result of partitioning hot/cold memory and I put execution
command at the right side of numastat result. I initially ran each
hot_cold program with preferred setting so that they initially allocate
memory on one of node0 or node2, but they gradually migrated based on
their access frequencies.
$ numastat -c -p hot_cold
Per-node process memory usage (in MBs)
PID Node 0 Node 1 Node 2 Total
--------------- ------ ------ ------ -----
754 (hot_cold) 1800 0 2000 3800 <- hot_cold 1800 2000
1184 (hot_cold) 300 0 500 800 <- hot_cold 300 500
1818 (hot_cold) 801 0 3199 4000 <- hot_cold 800 3200
30289 (hot_cold) 4 0 5 10 <- hot_cold 3 5
30325 (hot_cold) 31 0 51 81 <- hot_cold 30 50
--------------- ------ ------ ------ -----
Total 2938 0 5756 8695
The final node placement result shows that DAMON accurately migrated
pages by their hotness for multiple processes.
> In more detail, users could decide which is the appropriate node for promotion
> or demotion and use the new DAMOS action to do promotion and demotion. Users
> would requested to decide which node is the proper promotion/demotion target
> nodes, but that decision wouldn't be that hard in my opinion.
>
> For this, 'struct damos' would need to be updated for such argument-dependent
> actions, like 'struct damos_filter' is haing a union.
That might be a better solution. I will think about it.
> In future, we could extend the operation to the promotion and the demotion
> after the dicussion around the promotion and demotion is matured, if required.
> And assuming DAMON be extended for originating CPU-aware access monitoring, the
> new DAMOS action would also cover more use cases such as general NUMA nodes
> balancing (extending DAMON for CPU-aware monitoring would required), and some
> complex configurations where having both CPU affinity and tiered memory. I
> also think that may well fit with my RFC idea[2] for tiered memory management.
>
> Looking forward to opinions from you and others. I admig I miss many things,
> and more than happy to be enlightened.
>
> [1] https://lwn.net/Articles/944007/
> [2] https://lore.kernel.org/damon/20231112195602.61525-1-sj@kernel.org/
Thanks very much for your comments. I will need a few more days for the
update but will try to address your concerns as much as possible.
Thanks,
Honggyu
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH 0/4] DAMON based 2-tier memory management for CXL memory
2024-01-17 11:49 ` Honggyu Kim
@ 2024-01-17 21:11 ` SeongJae Park
2024-01-17 21:24 ` SeongJae Park
2024-01-18 10:40 ` Hyeongtak Ji
0 siblings, 2 replies; 15+ messages in thread
From: SeongJae Park @ 2024-01-17 21:11 UTC (permalink / raw)
To: Honggyu Kim
Cc: sj, akpm, apopple, baolin.wang, damon, dave.jiang, hyeongtak.ji,
kernel_team, linmiaohe, linux-kernel, linux-mm,
linux-trace-kernel, lizhijian, mathieu.desnoyers, mhiramat,
rakie.kim, rostedt, surenb, yangx.jy, ying.huang, ziy
Hi Honggyu,
On Wed, 17 Jan 2024 20:49:25 +0900 Honggyu Kim <honggyu.kim@sk.com> wrote:
> Hi SeongJae,
>
> Thanks very much for your comments in details.
>
> On Tue, 16 Jan 2024 12:31:59 -0800 SeongJae Park <sj@kernel.org> wrote:
>
> > Thank you so much for this great patches and the above nice test results. I
> > believe the test setup and results make sense, and merging a revised version of
> > this patchset would provide real benefits to the users.
>
> Glad to hear that!
>
> > In a high level, I think it might better to separate DAMON internal changes
> > from DAMON external changes.
>
> I agree. I can't guarantee but I can move all the external changes
> inside mm/damon, but will try that as much as possible.
>
> > For DAMON part changes, I have no big concern other than trivial coding style
> > level comments.
>
> Sure. I will fix those.
>
> > For DAMON-external changes that implementing demote_pages() and
> > promote_pages(), I'm unsure if the implementation is reusing appropriate
> > functions, and if those are placee in right source file. Especially, I'm
> > unsure if vmscan.c is the right place for promotion code. Also I don't know if
> > there is a good agreement on the promotion/demotion target node decision. That
> > should be because I'm not that familiar with the areas and the files, but I
> > feel this might because our discussions on the promotion and the demotion
> > operations are having rooms for being more matured. Because I'm not very
> > faimiliar with the part, I'd like to hear others' comments, too.
>
> I would also like to hear others' comments, but this might not be needed
> if most of external code can be moved to mm/damon.
>
> > To this end, I feel the problem might be able te be simpler, because this
> > patchset is trying to provide two sophisticated operations, while I think a
> > simpler approach might be possible. My humble simpler idea is adding a DAMOS
> > operation for moving pages to a given node (like sys_move_phy_pages RFC[1]),
> > instead of the promote/demote. Because the general pages migration can handle
> > multiple cases including the promote/demote in my humble assumption.
>
> My initial implementation was similar but I found that it's not accurate
> enough due to the nature of inaccuracy of DAMON regions. I saw that
> many pages were demoted and promoted back and forth because migration
> target regions include both hot and cold pages together.
>
> So I have implemented the demotion and promotion logics based on the
> shrink_folio_list, which contains many corner case handling logics for
> reclaim.
>
> Having the current demotion and promotion logics makes the hot/cold
> migration pretty accurate as expected. We made a simple program called
> "hot_cold" and it receives 2 arguments for hot size and cold size in MB.
> For example, "hot_cold 200 500" allocates 200MB of hot memory and 500MB
> of cold memory. It basically allocates 2 large blocks of memory with
> mmap, then repeat memset for the initial 200MB to make it accessed in an
> infinite loop.
>
> Let's say there are 3 nodes in the system and the first node0 and node1
> are the first tier, and node2 is the second tier.
>
> $ cat /sys/devices/virtual/memory_tiering/memory_tier4/nodelist
> 0-1
>
> $ cat /sys/devices/virtual/memory_tiering/memory_tier22/nodelist
> 2
>
> Here is the result of partitioning hot/cold memory and I put execution
> command at the right side of numastat result. I initially ran each
> hot_cold program with preferred setting so that they initially allocate
> memory on one of node0 or node2, but they gradually migrated based on
> their access frequencies.
>
> $ numastat -c -p hot_cold
> Per-node process memory usage (in MBs)
> PID Node 0 Node 1 Node 2 Total
> --------------- ------ ------ ------ -----
> 754 (hot_cold) 1800 0 2000 3800 <- hot_cold 1800 2000
> 1184 (hot_cold) 300 0 500 800 <- hot_cold 300 500
> 1818 (hot_cold) 801 0 3199 4000 <- hot_cold 800 3200
> 30289 (hot_cold) 4 0 5 10 <- hot_cold 3 5
> 30325 (hot_cold) 31 0 51 81 <- hot_cold 30 50
> --------------- ------ ------ ------ -----
> Total 2938 0 5756 8695
>
> The final node placement result shows that DAMON accurately migrated
> pages by their hotness for multiple processes.
What was the result when the corner cases handling logics were not applied?
And, what are the corner cases handling logic that seemed essential? I show
the page granularity active/reference check could indeed provide many
improvements, but that's only my humble assumption.
If the corner cases are indeed better to be applied in page granularity, I
agree we need some more efforts since DAMON monitoring results are not page
granularity aware by the design. Users could increase min_nr_regions to make
it more accurate, and we have plan to support page granularity monitoring,
though. But maybe the overhead could be unacceptable.
Ideal solution would be making DAMON more accurate while keeping current level
of overhead. We indeed have TODO items for DAMON accuracy improvement, but
this may take some time that might unacceptable for your case.
If that's the case, I think the additional corner handling (or, page gran
additional access check) could be made as DAMOS filters[1], since DAMOS filters
can be applied in page granularity, and designed for this kind of handling of
information that DAMON monitoring results cannot provide. More specifically,
we could have filters for promotion-qualifying pages and demotion-qualifying
pages. In this way, I think we can keep the action more flexible while the
filters can be applied in creative ways.
[1] https://git.kernel.org/sj/c/98def236f63c66629fb6b2d4b69cecffc5b46539
>
> > In more detail, users could decide which is the appropriate node for promotion
> > or demotion and use the new DAMOS action to do promotion and demotion. Users
> > would requested to decide which node is the proper promotion/demotion target
> > nodes, but that decision wouldn't be that hard in my opinion.
> >
> > For this, 'struct damos' would need to be updated for such argument-dependent
> > actions, like 'struct damos_filter' is haing a union.
>
> That might be a better solution. I will think about it.
More specifically, I think receiving an address range as the argument might
more flexible than just NUMA node. Maybe we can imagine proactively migrating
cold movable pages from normal zones to movable zones, to avoid normal zone
memory pressure.
>
> > In future, we could extend the operation to the promotion and the demotion
> > after the dicussion around the promotion and demotion is matured, if required.
> > And assuming DAMON be extended for originating CPU-aware access monitoring, the
> > new DAMOS action would also cover more use cases such as general NUMA nodes
> > balancing (extending DAMON for CPU-aware monitoring would required), and some
> > complex configurations where having both CPU affinity and tiered memory. I
> > also think that may well fit with my RFC idea[2] for tiered memory management.
> >
> > Looking forward to opinions from you and others. I admig I miss many things,
> > and more than happy to be enlightened.
> >
> > [1] https://lwn.net/Articles/944007/
> > [2] https://lore.kernel.org/damon/20231112195602.61525-1-sj@kernel.org/
>
> Thanks very much for your comments. I will need a few more days for the
> update but will try to address your concerns as much as possible.
No problem, please take your time. I'm looking forward to the next version :)
Thanks,
SJ
>
> Thanks,
> Honggyu
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH 0/4] DAMON based 2-tier memory management for CXL memory
2024-01-17 21:11 ` SeongJae Park
@ 2024-01-17 21:24 ` SeongJae Park
2024-01-18 10:40 ` Hyeongtak Ji
1 sibling, 0 replies; 15+ messages in thread
From: SeongJae Park @ 2024-01-17 21:24 UTC (permalink / raw)
To: SeongJae Park
Cc: Honggyu Kim, akpm, apopple, baolin.wang, damon, dave.jiang,
hyeongtak.ji, kernel_team, linmiaohe, linux-kernel, linux-mm,
linux-trace-kernel, lizhijian, mathieu.desnoyers, mhiramat,
rakie.kim, rostedt, surenb, yangx.jy, ying.huang, ziy
On Wed, 17 Jan 2024 13:11:03 -0800 SeongJae Park <sj@kernel.org> wrote:
[...]
> Hi Honggyu,
>
> On Wed, 17 Jan 2024 20:49:25 +0900 Honggyu Kim <honggyu.kim@sk.com> wrote:
>
> > Hi SeongJae,
> >
> > Thanks very much for your comments in details.
> >
> > On Tue, 16 Jan 2024 12:31:59 -0800 SeongJae Park <sj@kernel.org> wrote:
> >
[...]
> > > To this end, I feel the problem might be able te be simpler, because this
> > > patchset is trying to provide two sophisticated operations, while I think a
> > > simpler approach might be possible. My humble simpler idea is adding a DAMOS
> > > operation for moving pages to a given node (like sys_move_phy_pages RFC[1]),
> > > instead of the promote/demote. Because the general pages migration can handle
> > > multiple cases including the promote/demote in my humble assumption.
[...]
> > > In more detail, users could decide which is the appropriate node for promotion
> > > or demotion and use the new DAMOS action to do promotion and demotion. Users
> > > would requested to decide which node is the proper promotion/demotion target
> > > nodes, but that decision wouldn't be that hard in my opinion.
> > >
> > > For this, 'struct damos' would need to be updated for such argument-dependent
> > > actions, like 'struct damos_filter' is haing a union.
> >
> > That might be a better solution. I will think about it.
>
> More specifically, I think receiving an address range as the argument might
> more flexible than just NUMA node. Maybe we can imagine proactively migrating
> cold movable pages from normal zones to movable zones, to avoid normal zone
> memory pressure.
Yet another crazy idea. Finding hot regions in the middle of cold region and
move to besides of other hot pages. As a result, memory is sorted by access
temperature even in same node, and the system gains more spatial locality,
which benefits general locality-based algorithms including DAMON's adaptive
regions adjustment.
Thanks,
SJ
[...]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH 0/4] DAMON based 2-tier memory management for CXL memory
2024-01-17 21:11 ` SeongJae Park
2024-01-17 21:24 ` SeongJae Park
@ 2024-01-18 10:40 ` Hyeongtak Ji
2024-01-18 17:17 ` SeongJae Park
1 sibling, 1 reply; 15+ messages in thread
From: Hyeongtak Ji @ 2024-01-18 10:40 UTC (permalink / raw)
To: sj
Cc: akpm, apopple, baolin.wang, damon, dave.jiang, honggyu.kim,
hyeongtak.ji, kernel_team, linmiaohe, linux-kernel, linux-mm,
linux-trace-kernel, lizhijian, mathieu.desnoyers, mhiramat,
rakie.kim, rostedt, surenb, yangx.jy, ying.huang, ziy
Hi SeongJae,
On Wed, 17 Jan 2024 SeongJae Park <sj@kernel.org> wrote:
[...]
>> Let's say there are 3 nodes in the system and the first node0 and node1
>> are the first tier, and node2 is the second tier.
>>
>> $ cat /sys/devices/virtual/memory_tiering/memory_tier4/nodelist
>> 0-1
>>
>> $ cat /sys/devices/virtual/memory_tiering/memory_tier22/nodelist
>> 2
>>
>> Here is the result of partitioning hot/cold memory and I put execution
>> command at the right side of numastat result. I initially ran each
>> hot_cold program with preferred setting so that they initially allocate
>> memory on one of node0 or node2, but they gradually migrated based on
>> their access frequencies.
>>
>> $ numastat -c -p hot_cold
>> Per-node process memory usage (in MBs)
>> PID Node 0 Node 1 Node 2 Total
>> --------------- ------ ------ ------ -----
>> 754 (hot_cold) 1800 0 2000 3800 <- hot_cold 1800 2000
>> 1184 (hot_cold) 300 0 500 800 <- hot_cold 300 500
>> 1818 (hot_cold) 801 0 3199 4000 <- hot_cold 800 3200
>> 30289 (hot_cold) 4 0 5 10 <- hot_cold 3 5
>> 30325 (hot_cold) 31 0 51 81 <- hot_cold 30 50
>> --------------- ------ ------ ------ -----
>> Total 2938 0 5756 8695
>>
>> The final node placement result shows that DAMON accurately migrated
>> pages by their hotness for multiple processes.
>
> What was the result when the corner cases handling logics were not applied?
This is the result of the same test that Honggyu did, but with an insufficient
corner cases handling logics.
$ numastat -c -p hot_cold
Per-node process memory usage (in MBs)
PID Node 0 Node 1 Node 2 Total
-------------- ------ ------ ------ -----
862 (hot_cold) 2256 0 1545 3801 <- hot_cold 1800 2000
863 (hot_cold) 403 0 398 801 <- hot_cold 300 500
864 (hot_cold) 1520 0 2482 4001 <- hot_cold 800 3200
865 (hot_cold) 6 0 3 9 <- hot_cold 3 5
866 (hot_cold) 29 0 52 81 <- hot_cold 30 50
-------------- ------ ------ ------ -----
Total 4215 0 4480 8695
As time goes by, DAMON keeps trying to split the hot/cold region, but it does
not seem to be enough.
$ numastat -c -p hot_cold
Per-node process memory usage (in MBs)
PID Node 0 Node 1 Node 2 Total
-------------- ------ ------ ------ -----
862 (hot_cold) 2022 0 1780 3801 <- hot_cold 1800 2000
863 (hot_cold) 351 0 450 801 <- hot_cold 300 500
864 (hot_cold) 1134 0 2868 4001 <- hot_cold 800 3200
865 (hot_cold) 7 0 2 9 <- hot_cold 3 5
866 (hot_cold) 43 0 39 81 <- hot_cold 30 50
-------------- ------ ------ ------ -----
Total 3557 0 5138 8695
>
> And, what are the corner cases handling logic that seemed essential? I show
> the page granularity active/reference check could indeed provide many
> improvements, but that's only my humble assumption.
Yes, the page granularity active/reference check is essential. To make the
above "insufficient" result, the only thing I did was to promote
inactive/not_referenced pages.
diff --git a/mm/vmscan.c b/mm/vmscan.c
index f03be320f9ad..c2aefb883c54 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1127,9 +1127,7 @@ static unsigned int __promote_folio_list(struct list_head *folio_list,
VM_BUG_ON_FOLIO(folio_test_active(folio), folio);
references = folio_check_references(folio, sc);
- if (references == FOLIOREF_KEEP ||
- references == FOLIOREF_RECLAIM ||
- references == FOLIOREF_RECLAIM_CLEAN)
+ if (references == FOLIOREF_KEEP )
goto keep_locked;
/* Relocate its contents to another node. */
>
> If the corner cases are indeed better to be applied in page granularity, I
> agree we need some more efforts since DAMON monitoring results are not page
> granularity aware by the design. Users could increase min_nr_regions to make
> it more accurate, and we have plan to support page granularity monitoring,
> though. But maybe the overhead could be unacceptable.
>
> Ideal solution would be making DAMON more accurate while keeping current level
> of overhead. We indeed have TODO items for DAMON accuracy improvement, but
> this may take some time that might unacceptable for your case.
>
> If that's the case, I think the additional corner handling (or, page gran
> additional access check) could be made as DAMOS filters[1], since DAMOS filters
> can be applied in page granularity, and designed for this kind of handling of
> information that DAMON monitoring results cannot provide. More specifically,
> we could have filters for promotion-qualifying pages and demotion-qualifying
> pages. In this way, I think we can keep the action more flexible while the
> filters can be applied in creative ways.
Making corner handling as a new DAMOS filters is a good idea. I'm just a bit
concerned if adding new filters might cause users to care more.
Kind regards,
Hyeongtak
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [RFC PATCH 0/4] DAMON based 2-tier memory management for CXL memory
2024-01-18 10:40 ` Hyeongtak Ji
@ 2024-01-18 17:17 ` SeongJae Park
0 siblings, 0 replies; 15+ messages in thread
From: SeongJae Park @ 2024-01-18 17:17 UTC (permalink / raw)
To: Hyeongtak Ji
Cc: sj, akpm, apopple, baolin.wang, damon, dave.jiang, honggyu.kim,
kernel_team, linmiaohe, linux-kernel, linux-mm,
linux-trace-kernel, lizhijian, mathieu.desnoyers, mhiramat,
rakie.kim, rostedt, surenb, yangx.jy, ying.huang, ziy
On Thu, 18 Jan 2024 19:40:16 +0900 Hyeongtak Ji <hyeongtak.ji@sk.com> wrote:
> Hi SeongJae,
>
> On Wed, 17 Jan 2024 SeongJae Park <sj@kernel.org> wrote:
>
> [...]
> >> Let's say there are 3 nodes in the system and the first node0 and node1
> >> are the first tier, and node2 is the second tier.
> >>
> >> $ cat /sys/devices/virtual/memory_tiering/memory_tier4/nodelist
> >> 0-1
> >>
> >> $ cat /sys/devices/virtual/memory_tiering/memory_tier22/nodelist
> >> 2
> >>
> >> Here is the result of partitioning hot/cold memory and I put execution
> >> command at the right side of numastat result. I initially ran each
> >> hot_cold program with preferred setting so that they initially allocate
> >> memory on one of node0 or node2, but they gradually migrated based on
> >> their access frequencies.
> >>
> >> $ numastat -c -p hot_cold
> >> Per-node process memory usage (in MBs)
> >> PID Node 0 Node 1 Node 2 Total
> >> --------------- ------ ------ ------ -----
> >> 754 (hot_cold) 1800 0 2000 3800 <- hot_cold 1800 2000
> >> 1184 (hot_cold) 300 0 500 800 <- hot_cold 300 500
> >> 1818 (hot_cold) 801 0 3199 4000 <- hot_cold 800 3200
> >> 30289 (hot_cold) 4 0 5 10 <- hot_cold 3 5
> >> 30325 (hot_cold) 31 0 51 81 <- hot_cold 30 50
> >> --------------- ------ ------ ------ -----
> >> Total 2938 0 5756 8695
> >>
> >> The final node placement result shows that DAMON accurately migrated
> >> pages by their hotness for multiple processes.
> >
> > What was the result when the corner cases handling logics were not applied?
>
> This is the result of the same test that Honggyu did, but with an insufficient
> corner cases handling logics.
>
> $ numastat -c -p hot_cold
>
> Per-node process memory usage (in MBs)
> PID Node 0 Node 1 Node 2 Total
> -------------- ------ ------ ------ -----
> 862 (hot_cold) 2256 0 1545 3801 <- hot_cold 1800 2000
> 863 (hot_cold) 403 0 398 801 <- hot_cold 300 500
> 864 (hot_cold) 1520 0 2482 4001 <- hot_cold 800 3200
> 865 (hot_cold) 6 0 3 9 <- hot_cold 3 5
> 866 (hot_cold) 29 0 52 81 <- hot_cold 30 50
> -------------- ------ ------ ------ -----
> Total 4215 0 4480 8695
>
> As time goes by, DAMON keeps trying to split the hot/cold region, but it does
> not seem to be enough.
>
> $ numastat -c -p hot_cold
>
> Per-node process memory usage (in MBs)
> PID Node 0 Node 1 Node 2 Total
> -------------- ------ ------ ------ -----
> 862 (hot_cold) 2022 0 1780 3801 <- hot_cold 1800 2000
> 863 (hot_cold) 351 0 450 801 <- hot_cold 300 500
> 864 (hot_cold) 1134 0 2868 4001 <- hot_cold 800 3200
> 865 (hot_cold) 7 0 2 9 <- hot_cold 3 5
> 866 (hot_cold) 43 0 39 81 <- hot_cold 30 50
> -------------- ------ ------ ------ -----
> Total 3557 0 5138 8695
>
> >
> > And, what are the corner cases handling logic that seemed essential? I show
> > the page granularity active/reference check could indeed provide many
> > improvements, but that's only my humble assumption.
>
> Yes, the page granularity active/reference check is essential. To make the
> above "insufficient" result, the only thing I did was to promote
> inactive/not_referenced pages.
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index f03be320f9ad..c2aefb883c54 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1127,9 +1127,7 @@ static unsigned int __promote_folio_list(struct list_head *folio_list,
> VM_BUG_ON_FOLIO(folio_test_active(folio), folio);
>
> references = folio_check_references(folio, sc);
> - if (references == FOLIOREF_KEEP ||
> - references == FOLIOREF_RECLAIM ||
> - references == FOLIOREF_RECLAIM_CLEAN)
> + if (references == FOLIOREF_KEEP )
> goto keep_locked;
>
> /* Relocate its contents to another node. */
Thank you for sharing the details :) I think DAMOS filters based approach
could be worthy to try, then.
>
> >
> > If the corner cases are indeed better to be applied in page granularity, I
> > agree we need some more efforts since DAMON monitoring results are not page
> > granularity aware by the design. Users could increase min_nr_regions to make
> > it more accurate, and we have plan to support page granularity monitoring,
> > though. But maybe the overhead could be unacceptable.
> >
> > Ideal solution would be making DAMON more accurate while keeping current level
> > of overhead. We indeed have TODO items for DAMON accuracy improvement, but
> > this may take some time that might unacceptable for your case.
> >
> > If that's the case, I think the additional corner handling (or, page gran
> > additional access check) could be made as DAMOS filters[1], since DAMOS filters
> > can be applied in page granularity, and designed for this kind of handling of
> > information that DAMON monitoring results cannot provide. More specifically,
> > we could have filters for promotion-qualifying pages and demotion-qualifying
> > pages. In this way, I think we can keep the action more flexible while the
> > filters can be applied in creative ways.
>
> Making corner handling as a new DAMOS filters is a good idea. I'm just a bit
> concerned if adding new filters might cause users to care more.
I prefer keeping DAMON API and Sysfs interface flexible and easy to extended
even if it increases number of parameters, while providing simplified
high level interfaces for end users aiming to use DAMON for specific use cases,
like DAMON_RECLAIM, DAMON_LRU_SORT, and damo do. Hence I'm not very concerned.
Thanks,
SJ
>
> Kind regards,
> Hyeongtak
^ permalink raw reply [flat|nested] 15+ messages in thread