* [RFC PATCH v3 1/7] mm/damon/paddr: refactor DAMOS_PAGEOUT with migration_mode
2024-04-05 6:08 [RFC PATCH v3 0/7] DAMON based tiered memory management for CXL memory Honggyu Kim
@ 2024-04-05 6:08 ` Honggyu Kim
2024-04-05 19:19 ` SeongJae Park
2024-04-05 6:08 ` [RFC PATCH v3 2/7] mm: make alloc_demote_folio externally invokable for migration Honggyu Kim
` (7 subsequent siblings)
8 siblings, 1 reply; 28+ messages in thread
From: Honggyu Kim @ 2024-04-05 6:08 UTC (permalink / raw)
To: sj, damon, linux-mm
Cc: akpm, apopple, baolin.wang, dave.jiang, honggyu.kim, hyeongtak.ji,
kernel_team, linmiaohe, linux-kernel, linux-trace-kernel,
mathieu.desnoyers, mhiramat, rakie.kim, rostedt, surenb, yangx.jy,
ying.huang, ziy, 42.hyeyoo, art.jeongseob
This is a preparation patch that introduces migration modes.
The damon_pa_pageout is renamed to damon_pa_migrate and it receives an
extra argument for migration_mode.
No functional changes applied.
Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
---
mm/damon/paddr.c | 18 +++++++++++++++---
1 file changed, 15 insertions(+), 3 deletions(-)
diff --git a/mm/damon/paddr.c b/mm/damon/paddr.c
index 081e2a325778..277a1c4d833c 100644
--- a/mm/damon/paddr.c
+++ b/mm/damon/paddr.c
@@ -224,7 +224,12 @@ static bool damos_pa_filter_out(struct damos *scheme, struct folio *folio)
return false;
}
-static unsigned long damon_pa_pageout(struct damon_region *r, struct damos *s)
+enum migration_mode {
+ MIG_PAGEOUT,
+};
+
+static unsigned long damon_pa_migrate(struct damon_region *r, struct damos *s,
+ enum migration_mode mm)
{
unsigned long addr, applied;
LIST_HEAD(folio_list);
@@ -249,7 +254,14 @@ static unsigned long damon_pa_pageout(struct damon_region *r, struct damos *s)
put_folio:
folio_put(folio);
}
- applied = reclaim_pages(&folio_list);
+ switch (mm) {
+ case MIG_PAGEOUT:
+ applied = reclaim_pages(&folio_list);
+ break;
+ default:
+ /* Unexpected migration mode. */
+ return 0;
+ }
cond_resched();
return applied * PAGE_SIZE;
}
@@ -297,7 +309,7 @@ static unsigned long damon_pa_apply_scheme(struct damon_ctx *ctx,
{
switch (scheme->action) {
case DAMOS_PAGEOUT:
- return damon_pa_pageout(r, scheme);
+ return damon_pa_migrate(r, scheme, MIG_PAGEOUT);
case DAMOS_LRU_PRIO:
return damon_pa_mark_accessed(r, scheme);
case DAMOS_LRU_DEPRIO:
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread* Re: [RFC PATCH v3 1/7] mm/damon/paddr: refactor DAMOS_PAGEOUT with migration_mode
2024-04-05 6:08 ` [RFC PATCH v3 1/7] mm/damon/paddr: refactor DAMOS_PAGEOUT with migration_mode Honggyu Kim
@ 2024-04-05 19:19 ` SeongJae Park
2024-05-11 20:16 ` SeongJae Park
0 siblings, 1 reply; 28+ messages in thread
From: SeongJae Park @ 2024-04-05 19:19 UTC (permalink / raw)
To: Honggyu Kim
Cc: SeongJae Park, damon, linux-mm, akpm, apopple, baolin.wang,
dave.jiang, hyeongtak.ji, kernel_team, linmiaohe, linux-kernel,
linux-trace-kernel, mathieu.desnoyers, mhiramat, rakie.kim,
rostedt, surenb, yangx.jy, ying.huang, ziy, 42.hyeyoo,
art.jeongseob
On Fri, 5 Apr 2024 15:08:50 +0900 Honggyu Kim <honggyu.kim@sk.com> wrote:
> This is a preparation patch that introduces migration modes.
>
> The damon_pa_pageout is renamed to damon_pa_migrate and it receives an
> extra argument for migration_mode.
I personally think keeping damon_pa_pageout() as is and adding a new function
(damon_pa_migrate()) with some duplicated code is also ok, but this approach
also looks fine to me. So I have no strong opinion here, but just letting you
know I would have no objection at both approaches.
>
> No functional changes applied.
>
> Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
> ---
> mm/damon/paddr.c | 18 +++++++++++++++---
> 1 file changed, 15 insertions(+), 3 deletions(-)
>
> diff --git a/mm/damon/paddr.c b/mm/damon/paddr.c
> index 081e2a325778..277a1c4d833c 100644
> --- a/mm/damon/paddr.c
> +++ b/mm/damon/paddr.c
> @@ -224,7 +224,12 @@ static bool damos_pa_filter_out(struct damos *scheme, struct folio *folio)
> return false;
> }
>
> -static unsigned long damon_pa_pageout(struct damon_region *r, struct damos *s)
> +enum migration_mode {
> + MIG_PAGEOUT,
> +};
To avoid name conflicts, what about renaming to 'damos_migration_mode' and
'DAMOS_MIG_PAGEOUT'?
> +
> +static unsigned long damon_pa_migrate(struct damon_region *r, struct damos *s,
> + enum migration_mode mm)
My poor brain has a bit confused with the name. What about calling it 'mode'?
> {
> unsigned long addr, applied;
> LIST_HEAD(folio_list);
> @@ -249,7 +254,14 @@ static unsigned long damon_pa_pageout(struct damon_region *r, struct damos *s)
> put_folio:
> folio_put(folio);
> }
> - applied = reclaim_pages(&folio_list);
> + switch (mm) {
> + case MIG_PAGEOUT:
> + applied = reclaim_pages(&folio_list);
> + break;
> + default:
> + /* Unexpected migration mode. */
> + return 0;
> + }
> cond_resched();
> return applied * PAGE_SIZE;
> }
> @@ -297,7 +309,7 @@ static unsigned long damon_pa_apply_scheme(struct damon_ctx *ctx,
> {
> switch (scheme->action) {
> case DAMOS_PAGEOUT:
> - return damon_pa_pageout(r, scheme);
> + return damon_pa_migrate(r, scheme, MIG_PAGEOUT);
> case DAMOS_LRU_PRIO:
> return damon_pa_mark_accessed(r, scheme);
> case DAMOS_LRU_DEPRIO:
> --
> 2.34.1
Thanks,
SJ
^ permalink raw reply [flat|nested] 28+ messages in thread* Re: [RFC PATCH v3 1/7] mm/damon/paddr: refactor DAMOS_PAGEOUT with migration_mode
2024-04-05 19:19 ` SeongJae Park
@ 2024-05-11 20:16 ` SeongJae Park
2024-05-12 18:00 ` SeongJae Park
0 siblings, 1 reply; 28+ messages in thread
From: SeongJae Park @ 2024-05-11 20:16 UTC (permalink / raw)
To: SeongJae Park
Cc: Honggyu Kim, damon, linux-mm, akpm, apopple, baolin.wang,
dave.jiang, hyeongtak.ji, kernel_team, linmiaohe, linux-kernel,
linux-trace-kernel, mathieu.desnoyers, mhiramat, rakie.kim,
rostedt, surenb, yangx.jy, ying.huang, ziy, 42.hyeyoo,
art.jeongseob
On Fri, 5 Apr 2024 12:19:07 -0700 SeongJae Park <sj@kernel.org> wrote:
> On Fri, 5 Apr 2024 15:08:50 +0900 Honggyu Kim <honggyu.kim@sk.com> wrote:
>
> > This is a preparation patch that introduces migration modes.
> >
> > The damon_pa_pageout is renamed to damon_pa_migrate and it receives an
> > extra argument for migration_mode.
>
> I personally think keeping damon_pa_pageout() as is and adding a new function
> (damon_pa_migrate()) with some duplicated code is also ok, but this approach
> also looks fine to me. So I have no strong opinion here, but just letting you
> know I would have no objection at both approaches.
Meanwhile, we added one more logic in damon_pa_pageout() for doing page
idleness double check on its own[1]. It makes reusing damon_pa_pageout() for
multiple reason a bit complex. I think the complexity added a problem in this
patch that I also missed before due to the complexity. Show below comment in
line. Hence now I think it would be better to do the suggested way.
If we use the approach, this patch is no more necessary, and therefore can be
dropped.
[1] https://lore.kernel.org/20240426195247.100306-1-sj@kernel.org
Thanks,
SJ
[...]
>
> >
> > No functional changes applied.
> >
> > Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
> > ---
> > mm/damon/paddr.c | 18 +++++++++++++++---
> > 1 file changed, 15 insertions(+), 3 deletions(-)
> >
> > diff --git a/mm/damon/paddr.c b/mm/damon/paddr.c
> > index 081e2a325778..277a1c4d833c 100644
> > --- a/mm/damon/paddr.c
> > +++ b/mm/damon/paddr.c
> > @@ -224,7 +224,12 @@ static bool damos_pa_filter_out(struct damos *scheme, struct folio *folio)
> > return false;
> > }
> >
> > -static unsigned long damon_pa_pageout(struct damon_region *r, struct damos *s)
> > +enum migration_mode {
> > + MIG_PAGEOUT,
> > +};
>
> To avoid name conflicts, what about renaming to 'damos_migration_mode' and
> 'DAMOS_MIG_PAGEOUT'?
>
> > +
> > +static unsigned long damon_pa_migrate(struct damon_region *r, struct damos *s,
> > + enum migration_mode mm)
>
> My poor brain has a bit confused with the name. What about calling it 'mode'?
>
> > {
> > unsigned long addr, applied;
> > LIST_HEAD(folio_list);
> > @@ -249,7 +254,14 @@ static unsigned long damon_pa_pageout(struct damon_region *r, struct damos *s)
Before this line, damon_pa_pageout() calls folio_clear_referenced() and
folio_test_clear_young() for the folio, because this is pageout code. Changed
function, damon_pa_migrate() is not only for cold pages but general migrations.
Hence it should also be handled based on the migration mode, but not handled.
I think this problem came from the increased complexity of this function.
Hence I think it is better to keep damon_pa_pageout() as is and adding a new
function for migration.
Thanks,
SJ
[...]
^ permalink raw reply [flat|nested] 28+ messages in thread* Re: [RFC PATCH v3 1/7] mm/damon/paddr: refactor DAMOS_PAGEOUT with migration_mode
2024-05-11 20:16 ` SeongJae Park
@ 2024-05-12 18:00 ` SeongJae Park
0 siblings, 0 replies; 28+ messages in thread
From: SeongJae Park @ 2024-05-12 18:00 UTC (permalink / raw)
To: SeongJae Park
Cc: Honggyu Kim, damon, linux-mm, akpm, apopple, baolin.wang,
dave.jiang, hyeongtak.ji, kernel_team, linmiaohe, linux-kernel,
linux-trace-kernel, mathieu.desnoyers, mhiramat, rakie.kim,
rostedt, surenb, yangx.jy, ying.huang, ziy, 42.hyeyoo,
art.jeongseob
On Sat, 11 May 2024 13:16:17 -0700 SeongJae Park <sj@kernel.org> wrote:
> On Fri, 5 Apr 2024 12:19:07 -0700 SeongJae Park <sj@kernel.org> wrote:
>
> > On Fri, 5 Apr 2024 15:08:50 +0900 Honggyu Kim <honggyu.kim@sk.com> wrote:
> >
> > > This is a preparation patch that introduces migration modes.
> > >
> > > The damon_pa_pageout is renamed to damon_pa_migrate and it receives an
> > > extra argument for migration_mode.
> >
> > I personally think keeping damon_pa_pageout() as is and adding a new function
> > (damon_pa_migrate()) with some duplicated code is also ok, but this approach
> > also looks fine to me. So I have no strong opinion here, but just letting you
> > know I would have no objection at both approaches.
>
> Meanwhile, we added one more logic in damon_pa_pageout() for doing page
> idleness double check on its own[1]. It makes reusing damon_pa_pageout() for
> multiple reason a bit complex. I think the complexity added a problem in this
> patch that I also missed before due to the complexity. Show below comment in
> line. Hence now I think it would be better to do the suggested way.
>
> If we use the approach, this patch is no more necessary, and therefore can be
> dropped.
>
> [1] https://lore.kernel.org/20240426195247.100306-1-sj@kernel.org
I updated this patchset to address comments on this thread, and posted it as
RFC patchset v4 on behalf of Honggyu under his approval:
https://lore.kernel.org/20240512175447.75943-1-sj@kernel.org
Thanks,
SJ
[...]
^ permalink raw reply [flat|nested] 28+ messages in thread
* [RFC PATCH v3 2/7] mm: make alloc_demote_folio externally invokable for migration
2024-04-05 6:08 [RFC PATCH v3 0/7] DAMON based tiered memory management for CXL memory Honggyu Kim
2024-04-05 6:08 ` [RFC PATCH v3 1/7] mm/damon/paddr: refactor DAMOS_PAGEOUT with migration_mode Honggyu Kim
@ 2024-04-05 6:08 ` Honggyu Kim
2024-04-05 19:20 ` SeongJae Park
2024-04-05 6:08 ` [RFC PATCH v3 3/7] mm/damon/sysfs-schemes: add target_nid on sysfs-schemes Honggyu Kim
` (6 subsequent siblings)
8 siblings, 1 reply; 28+ messages in thread
From: Honggyu Kim @ 2024-04-05 6:08 UTC (permalink / raw)
To: sj, damon, linux-mm
Cc: akpm, apopple, baolin.wang, dave.jiang, honggyu.kim, hyeongtak.ji,
kernel_team, linmiaohe, linux-kernel, linux-trace-kernel,
mathieu.desnoyers, mhiramat, rakie.kim, rostedt, surenb, yangx.jy,
ying.huang, ziy, 42.hyeyoo, art.jeongseob
The alloc_demote_folio can be used out of vmscan.c so it'd be better to
remove static keyword from it.
This function can also be used for both demotion and promotion so it'd
be better to rename it from alloc_demote_folio to alloc_migrate_folio.
Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
---
mm/internal.h | 1 +
mm/vmscan.c | 10 +++++++---
2 files changed, 8 insertions(+), 3 deletions(-)
diff --git a/mm/internal.h b/mm/internal.h
index f309a010d50f..c96ff9bc82d0 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -866,6 +866,7 @@ extern unsigned long __must_check vm_mmap_pgoff(struct file *, unsigned long,
unsigned long, unsigned long);
extern void set_pageblock_order(void);
+struct folio *alloc_migrate_folio(struct folio *src, unsigned long private);
unsigned long reclaim_pages(struct list_head *folio_list);
unsigned int reclaim_clean_pages_from_list(struct zone *zone,
struct list_head *folio_list);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 4255619a1a31..9e456cac03b4 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -910,8 +910,7 @@ static void folio_check_dirty_writeback(struct folio *folio,
mapping->a_ops->is_dirty_writeback(folio, dirty, writeback);
}
-static struct folio *alloc_demote_folio(struct folio *src,
- unsigned long private)
+struct folio *alloc_migrate_folio(struct folio *src, unsigned long private)
{
struct folio *dst;
nodemask_t *allowed_mask;
@@ -935,6 +934,11 @@ static struct folio *alloc_demote_folio(struct folio *src,
if (dst)
return dst;
+ /*
+ * Allocation failed from the target node so try to allocate from
+ * fallback nodes based on allowed_mask.
+ * See fallback_alloc() at mm/slab.c.
+ */
mtc->gfp_mask &= ~__GFP_THISNODE;
mtc->nmask = allowed_mask;
@@ -973,7 +977,7 @@ static unsigned int demote_folio_list(struct list_head *demote_folios,
node_get_allowed_targets(pgdat, &allowed_mask);
/* Demotion ignores all cpuset and mempolicy settings */
- migrate_pages(demote_folios, alloc_demote_folio, NULL,
+ migrate_pages(demote_folios, alloc_migrate_folio, NULL,
(unsigned long)&mtc, MIGRATE_ASYNC, MR_DEMOTION,
&nr_succeeded);
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread* Re: [RFC PATCH v3 2/7] mm: make alloc_demote_folio externally invokable for migration
2024-04-05 6:08 ` [RFC PATCH v3 2/7] mm: make alloc_demote_folio externally invokable for migration Honggyu Kim
@ 2024-04-05 19:20 ` SeongJae Park
0 siblings, 0 replies; 28+ messages in thread
From: SeongJae Park @ 2024-04-05 19:20 UTC (permalink / raw)
To: Honggyu Kim
Cc: SeongJae Park, damon, linux-mm, akpm, apopple, baolin.wang,
dave.jiang, hyeongtak.ji, kernel_team, linmiaohe, linux-kernel,
linux-trace-kernel, mathieu.desnoyers, mhiramat, rakie.kim,
rostedt, surenb, yangx.jy, ying.huang, ziy, 42.hyeyoo,
art.jeongseob
On Fri, 5 Apr 2024 15:08:51 +0900 Honggyu Kim <honggyu.kim@sk.com> wrote:
> The alloc_demote_folio can be used out of vmscan.c so it'd be better to
> remove static keyword from it.
>
> This function can also be used for both demotion and promotion so it'd
> be better to rename it from alloc_demote_folio to alloc_migrate_folio.
I'm not sure if renaming is really needed, but has no strong opinion.
>
> Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
I have one more trivial comment below, but finds no blocker for me.
Reviewed-by: SeongJae Park <sj@kernel.org>
> ---
> mm/internal.h | 1 +
> mm/vmscan.c | 10 +++++++---
> 2 files changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/mm/internal.h b/mm/internal.h
> index f309a010d50f..c96ff9bc82d0 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -866,6 +866,7 @@ extern unsigned long __must_check vm_mmap_pgoff(struct file *, unsigned long,
> unsigned long, unsigned long);
>
> extern void set_pageblock_order(void);
> +struct folio *alloc_migrate_folio(struct folio *src, unsigned long private);
> unsigned long reclaim_pages(struct list_head *folio_list);
> unsigned int reclaim_clean_pages_from_list(struct zone *zone,
> struct list_head *folio_list);
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 4255619a1a31..9e456cac03b4 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -910,8 +910,7 @@ static void folio_check_dirty_writeback(struct folio *folio,
> mapping->a_ops->is_dirty_writeback(folio, dirty, writeback);
> }
>
> -static struct folio *alloc_demote_folio(struct folio *src,
> - unsigned long private)
> +struct folio *alloc_migrate_folio(struct folio *src, unsigned long private)
> {
> struct folio *dst;
> nodemask_t *allowed_mask;
> @@ -935,6 +934,11 @@ static struct folio *alloc_demote_folio(struct folio *src,
> if (dst)
> return dst;
>
> + /*
> + * Allocation failed from the target node so try to allocate from
> + * fallback nodes based on allowed_mask.
> + * See fallback_alloc() at mm/slab.c.
> + */
I think this might better to be a separate cleanup patch, but given its small
size, I have no strong opinion.
> mtc->gfp_mask &= ~__GFP_THISNODE;
> mtc->nmask = allowed_mask;
>
> @@ -973,7 +977,7 @@ static unsigned int demote_folio_list(struct list_head *demote_folios,
> node_get_allowed_targets(pgdat, &allowed_mask);
>
> /* Demotion ignores all cpuset and mempolicy settings */
> - migrate_pages(demote_folios, alloc_demote_folio, NULL,
> + migrate_pages(demote_folios, alloc_migrate_folio, NULL,
> (unsigned long)&mtc, MIGRATE_ASYNC, MR_DEMOTION,
> &nr_succeeded);
>
> --
> 2.34.1
Thanks,
SJ
^ permalink raw reply [flat|nested] 28+ messages in thread
* [RFC PATCH v3 3/7] mm/damon/sysfs-schemes: add target_nid on sysfs-schemes
2024-04-05 6:08 [RFC PATCH v3 0/7] DAMON based tiered memory management for CXL memory Honggyu Kim
2024-04-05 6:08 ` [RFC PATCH v3 1/7] mm/damon/paddr: refactor DAMOS_PAGEOUT with migration_mode Honggyu Kim
2024-04-05 6:08 ` [RFC PATCH v3 2/7] mm: make alloc_demote_folio externally invokable for migration Honggyu Kim
@ 2024-04-05 6:08 ` Honggyu Kim
2024-04-05 6:08 ` [RFC PATCH v3 4/7] mm/migrate: add MR_DAMON to migrate_reason Honggyu Kim
` (5 subsequent siblings)
8 siblings, 0 replies; 28+ messages in thread
From: Honggyu Kim @ 2024-04-05 6:08 UTC (permalink / raw)
To: sj, damon, linux-mm
Cc: akpm, apopple, baolin.wang, dave.jiang, honggyu.kim, hyeongtak.ji,
kernel_team, linmiaohe, linux-kernel, linux-trace-kernel,
mathieu.desnoyers, mhiramat, rakie.kim, rostedt, surenb, yangx.jy,
ying.huang, ziy, 42.hyeyoo, art.jeongseob
From: Hyeongtak Ji <hyeongtak.ji@sk.com>
This patch adds target_nid under
/sys/kernel/mm/damon/admin/kdamonds/<N>/contexts/<N>/schemes/<N>/
The 'target_nid' can be used as the destination node for DAMOS actions
such as DAMOS_MIGRATE_{HOT,COLD} in the follow up patches.
Signed-off-by: Hyeongtak Ji <hyeongtak.ji@sk.com>
Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
---
include/linux/damon.h | 11 ++++++++++-
mm/damon/core.c | 5 ++++-
mm/damon/dbgfs.c | 2 +-
mm/damon/lru_sort.c | 3 ++-
mm/damon/reclaim.c | 3 ++-
mm/damon/sysfs-schemes.c | 33 ++++++++++++++++++++++++++++++++-
6 files changed, 51 insertions(+), 6 deletions(-)
diff --git a/include/linux/damon.h b/include/linux/damon.h
index 5881e4ac30be..24ea33a03d5d 100644
--- a/include/linux/damon.h
+++ b/include/linux/damon.h
@@ -337,6 +337,7 @@ struct damos_access_pattern {
* @apply_interval_us: The time between applying the @action.
* @quota: Control the aggressiveness of this scheme.
* @wmarks: Watermarks for automated (in)activation of this scheme.
+ * @target_nid: Destination node if @action is "migrate_{hot,cold}".
* @filters: Additional set of &struct damos_filter for &action.
* @stat: Statistics of this scheme.
* @list: List head for siblings.
@@ -352,6 +353,10 @@ struct damos_access_pattern {
* monitoring context are inactive, DAMON stops monitoring either, and just
* repeatedly checks the watermarks.
*
+ * @target_nid is used to set the migration target node for migrate_hot or
+ * migrate_cold actions, which means it's only meaningful when @action is either
+ * "migrate_hot" or "migrate_cold".
+ *
* Before applying the &action to a memory region, &struct damon_operations
* implementation could check pages of the region and skip &action to respect
* &filters
@@ -373,6 +378,9 @@ struct damos {
/* public: */
struct damos_quota quota;
struct damos_watermarks wmarks;
+ union {
+ int target_nid;
+ };
struct list_head filters;
struct damos_stat stat;
struct list_head list;
@@ -677,7 +685,8 @@ struct damos *damon_new_scheme(struct damos_access_pattern *pattern,
enum damos_action action,
unsigned long apply_interval_us,
struct damos_quota *quota,
- struct damos_watermarks *wmarks);
+ struct damos_watermarks *wmarks,
+ int target_nid);
void damon_add_scheme(struct damon_ctx *ctx, struct damos *s);
void damon_destroy_scheme(struct damos *s);
diff --git a/mm/damon/core.c b/mm/damon/core.c
index 5b325749fc12..7ff0259d9fa6 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -316,7 +316,8 @@ struct damos *damon_new_scheme(struct damos_access_pattern *pattern,
enum damos_action action,
unsigned long apply_interval_us,
struct damos_quota *quota,
- struct damos_watermarks *wmarks)
+ struct damos_watermarks *wmarks,
+ int target_nid)
{
struct damos *scheme;
@@ -341,6 +342,8 @@ struct damos *damon_new_scheme(struct damos_access_pattern *pattern,
scheme->wmarks = *wmarks;
scheme->wmarks.activated = true;
+ scheme->target_nid = target_nid;
+
return scheme;
}
diff --git a/mm/damon/dbgfs.c b/mm/damon/dbgfs.c
index 7dac24e69e3b..d04fdccfa65b 100644
--- a/mm/damon/dbgfs.c
+++ b/mm/damon/dbgfs.c
@@ -279,7 +279,7 @@ static struct damos **str_to_schemes(const char *str, ssize_t len,
pos += parsed;
scheme = damon_new_scheme(&pattern, action, 0, "a,
- &wmarks);
+ &wmarks, NUMA_NO_NODE);
if (!scheme)
goto fail;
diff --git a/mm/damon/lru_sort.c b/mm/damon/lru_sort.c
index 3de2916a65c3..3775f0f2743d 100644
--- a/mm/damon/lru_sort.c
+++ b/mm/damon/lru_sort.c
@@ -163,7 +163,8 @@ static struct damos *damon_lru_sort_new_scheme(
/* under the quota. */
"a,
/* (De)activate this according to the watermarks. */
- &damon_lru_sort_wmarks);
+ &damon_lru_sort_wmarks,
+ NUMA_NO_NODE);
}
/* Create a DAMON-based operation scheme for hot memory regions */
diff --git a/mm/damon/reclaim.c b/mm/damon/reclaim.c
index 66e190f0374a..84e6e96b5dcc 100644
--- a/mm/damon/reclaim.c
+++ b/mm/damon/reclaim.c
@@ -147,7 +147,8 @@ static struct damos *damon_reclaim_new_scheme(void)
/* under the quota. */
&damon_reclaim_quota,
/* (De)activate this according to the watermarks. */
- &damon_reclaim_wmarks);
+ &damon_reclaim_wmarks,
+ NUMA_NO_NODE);
}
static void damon_reclaim_copy_quota_status(struct damos_quota *dst,
diff --git a/mm/damon/sysfs-schemes.c b/mm/damon/sysfs-schemes.c
index ae0f0b314f3a..1a30ea82c890 100644
--- a/mm/damon/sysfs-schemes.c
+++ b/mm/damon/sysfs-schemes.c
@@ -6,6 +6,7 @@
*/
#include <linux/slab.h>
+#include <linux/numa.h>
#include "sysfs-common.h"
@@ -1393,6 +1394,7 @@ struct damon_sysfs_scheme {
struct damon_sysfs_scheme_filters *filters;
struct damon_sysfs_stats *stats;
struct damon_sysfs_scheme_regions *tried_regions;
+ int target_nid;
};
/* This should match with enum damos_action */
@@ -1418,6 +1420,7 @@ static struct damon_sysfs_scheme *damon_sysfs_scheme_alloc(
scheme->kobj = (struct kobject){};
scheme->action = action;
scheme->apply_interval_us = apply_interval_us;
+ scheme->target_nid = NUMA_NO_NODE;
return scheme;
}
@@ -1640,6 +1643,28 @@ static ssize_t apply_interval_us_store(struct kobject *kobj,
return err ? err : count;
}
+static ssize_t target_nid_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ struct damon_sysfs_scheme *scheme = container_of(kobj,
+ struct damon_sysfs_scheme, kobj);
+
+ return sysfs_emit(buf, "%d\n", scheme->target_nid);
+}
+
+static ssize_t target_nid_store(struct kobject *kobj,
+ struct kobj_attribute *attr, const char *buf, size_t count)
+{
+ struct damon_sysfs_scheme *scheme = container_of(kobj,
+ struct damon_sysfs_scheme, kobj);
+ int err = 0;
+
+ /* TODO: error handling for target_nid range. */
+ err = kstrtoint(buf, 0, &scheme->target_nid);
+
+ return err ? err : count;
+}
+
static void damon_sysfs_scheme_release(struct kobject *kobj)
{
kfree(container_of(kobj, struct damon_sysfs_scheme, kobj));
@@ -1651,9 +1676,13 @@ static struct kobj_attribute damon_sysfs_scheme_action_attr =
static struct kobj_attribute damon_sysfs_scheme_apply_interval_us_attr =
__ATTR_RW_MODE(apply_interval_us, 0600);
+static struct kobj_attribute damon_sysfs_scheme_target_nid_attr =
+ __ATTR_RW_MODE(target_nid, 0600);
+
static struct attribute *damon_sysfs_scheme_attrs[] = {
&damon_sysfs_scheme_action_attr.attr,
&damon_sysfs_scheme_apply_interval_us_attr.attr,
+ &damon_sysfs_scheme_target_nid_attr.attr,
NULL,
};
ATTRIBUTE_GROUPS(damon_sysfs_scheme);
@@ -1956,7 +1985,8 @@ static struct damos *damon_sysfs_mk_scheme(
damos_sysfs_set_quota_score(sysfs_quotas->goals, "a);
scheme = damon_new_scheme(&pattern, sysfs_scheme->action,
- sysfs_scheme->apply_interval_us, "a, &wmarks);
+ sysfs_scheme->apply_interval_us, "a, &wmarks,
+ sysfs_scheme->target_nid);
if (!scheme)
return NULL;
@@ -1987,6 +2017,7 @@ static void damon_sysfs_update_scheme(struct damos *scheme,
scheme->action = sysfs_scheme->action;
scheme->apply_interval_us = sysfs_scheme->apply_interval_us;
+ scheme->target_nid = sysfs_scheme->target_nid;
scheme->quota.ms = sysfs_quotas->ms;
scheme->quota.sz = sysfs_quotas->sz;
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread* [RFC PATCH v3 4/7] mm/migrate: add MR_DAMON to migrate_reason
2024-04-05 6:08 [RFC PATCH v3 0/7] DAMON based tiered memory management for CXL memory Honggyu Kim
` (2 preceding siblings ...)
2024-04-05 6:08 ` [RFC PATCH v3 3/7] mm/damon/sysfs-schemes: add target_nid on sysfs-schemes Honggyu Kim
@ 2024-04-05 6:08 ` Honggyu Kim
2024-04-05 19:20 ` SeongJae Park
2024-04-05 6:08 ` [RFC PATCH v3 5/7] mm/damon/paddr: introduce DAMOS_MIGRATE_COLD action for demotion Honggyu Kim
` (4 subsequent siblings)
8 siblings, 1 reply; 28+ messages in thread
From: Honggyu Kim @ 2024-04-05 6:08 UTC (permalink / raw)
To: sj, damon, linux-mm
Cc: akpm, apopple, baolin.wang, dave.jiang, honggyu.kim, hyeongtak.ji,
kernel_team, linmiaohe, linux-kernel, linux-trace-kernel,
mathieu.desnoyers, mhiramat, rakie.kim, rostedt, surenb, yangx.jy,
ying.huang, ziy, 42.hyeyoo, art.jeongseob
The current patch series introduces DAMON based migration across NUMA
nodes so it'd be better to have a new migrate_reason in trace events.
Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
---
include/linux/migrate_mode.h | 1 +
include/trace/events/migrate.h | 3 ++-
2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/include/linux/migrate_mode.h b/include/linux/migrate_mode.h
index f37cc03f9369..cec36b7e7ced 100644
--- a/include/linux/migrate_mode.h
+++ b/include/linux/migrate_mode.h
@@ -29,6 +29,7 @@ enum migrate_reason {
MR_CONTIG_RANGE,
MR_LONGTERM_PIN,
MR_DEMOTION,
+ MR_DAMON,
MR_TYPES
};
diff --git a/include/trace/events/migrate.h b/include/trace/events/migrate.h
index 0190ef725b43..cd01dd7b3640 100644
--- a/include/trace/events/migrate.h
+++ b/include/trace/events/migrate.h
@@ -22,7 +22,8 @@
EM( MR_NUMA_MISPLACED, "numa_misplaced") \
EM( MR_CONTIG_RANGE, "contig_range") \
EM( MR_LONGTERM_PIN, "longterm_pin") \
- EMe(MR_DEMOTION, "demotion")
+ EM( MR_DEMOTION, "demotion") \
+ EMe(MR_DAMON, "damon")
/*
* First define the enums in the above macros to be exported to userspace
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread* Re: [RFC PATCH v3 4/7] mm/migrate: add MR_DAMON to migrate_reason
2024-04-05 6:08 ` [RFC PATCH v3 4/7] mm/migrate: add MR_DAMON to migrate_reason Honggyu Kim
@ 2024-04-05 19:20 ` SeongJae Park
0 siblings, 0 replies; 28+ messages in thread
From: SeongJae Park @ 2024-04-05 19:20 UTC (permalink / raw)
To: Honggyu Kim
Cc: SeongJae Park, damon, linux-mm, akpm, apopple, baolin.wang,
dave.jiang, hyeongtak.ji, kernel_team, linmiaohe, linux-kernel,
linux-trace-kernel, mathieu.desnoyers, mhiramat, rakie.kim,
rostedt, surenb, yangx.jy, ying.huang, ziy, 42.hyeyoo,
art.jeongseob
On Fri, 5 Apr 2024 15:08:53 +0900 Honggyu Kim <honggyu.kim@sk.com> wrote:
> The current patch series introduces DAMON based migration across NUMA
> nodes so it'd be better to have a new migrate_reason in trace events.
>
> Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Thanks,
SJ
> ---
> include/linux/migrate_mode.h | 1 +
> include/trace/events/migrate.h | 3 ++-
> 2 files changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/migrate_mode.h b/include/linux/migrate_mode.h
> index f37cc03f9369..cec36b7e7ced 100644
> --- a/include/linux/migrate_mode.h
> +++ b/include/linux/migrate_mode.h
> @@ -29,6 +29,7 @@ enum migrate_reason {
> MR_CONTIG_RANGE,
> MR_LONGTERM_PIN,
> MR_DEMOTION,
> + MR_DAMON,
> MR_TYPES
> };
>
> diff --git a/include/trace/events/migrate.h b/include/trace/events/migrate.h
> index 0190ef725b43..cd01dd7b3640 100644
> --- a/include/trace/events/migrate.h
> +++ b/include/trace/events/migrate.h
> @@ -22,7 +22,8 @@
> EM( MR_NUMA_MISPLACED, "numa_misplaced") \
> EM( MR_CONTIG_RANGE, "contig_range") \
> EM( MR_LONGTERM_PIN, "longterm_pin") \
> - EMe(MR_DEMOTION, "demotion")
> + EM( MR_DEMOTION, "demotion") \
> + EMe(MR_DAMON, "damon")
>
> /*
> * First define the enums in the above macros to be exported to userspace
> --
> 2.34.1
>
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* [RFC PATCH v3 5/7] mm/damon/paddr: introduce DAMOS_MIGRATE_COLD action for demotion
2024-04-05 6:08 [RFC PATCH v3 0/7] DAMON based tiered memory management for CXL memory Honggyu Kim
` (3 preceding siblings ...)
2024-04-05 6:08 ` [RFC PATCH v3 4/7] mm/migrate: add MR_DAMON to migrate_reason Honggyu Kim
@ 2024-04-05 6:08 ` Honggyu Kim
2024-04-05 7:55 ` Hyeongtak Ji
2024-04-05 19:24 ` SeongJae Park
2024-04-05 6:08 ` [RFC PATCH v3 6/7] mm/damon/paddr: introduce DAMOS_MIGRATE_HOT action for promotion Honggyu Kim
` (3 subsequent siblings)
8 siblings, 2 replies; 28+ messages in thread
From: Honggyu Kim @ 2024-04-05 6:08 UTC (permalink / raw)
To: sj, damon, linux-mm
Cc: akpm, apopple, baolin.wang, dave.jiang, honggyu.kim, hyeongtak.ji,
kernel_team, linmiaohe, linux-kernel, linux-trace-kernel,
mathieu.desnoyers, mhiramat, rakie.kim, rostedt, surenb, yangx.jy,
ying.huang, ziy, 42.hyeyoo, art.jeongseob
This patch introduces DAMOS_MIGRATE_COLD action, which is similar to
DAMOS_PAGEOUT, but migrate folios to the given 'target_nid' in the sysfs
instead of swapping them out.
The 'target_nid' sysfs knob is created by this patch to inform the
migration target node ID.
Here is one of the example usage of this 'migrate_cold' action.
$ cd /sys/kernel/mm/damon/admin/kdamonds/<N>
$ cat contexts/<N>/schemes/<N>/action
migrate_cold
$ echo 2 > contexts/<N>/schemes/<N>/target_nid
$ echo commit > state
$ numactl -p 0 ./hot_cold 500M 600M &
$ numastat -c -p hot_cold
Per-node process memory usage (in MBs)
PID Node 0 Node 1 Node 2 Total
-------------- ------ ------ ------ -----
701 (hot_cold) 501 0 601 1101
Since there are some common routines with pageout, many functions have
similar logics between pageout and migrate cold.
damon_pa_migrate_folio_list() is a minimized version of
shrink_folio_list(), but it's minified only for demotion.
Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
Signed-off-by: Hyeongtak Ji <hyeongtak.ji@sk.com>
---
include/linux/damon.h | 2 +
mm/damon/paddr.c | 146 ++++++++++++++++++++++++++++++++++++++-
mm/damon/sysfs-schemes.c | 4 ++
3 files changed, 151 insertions(+), 1 deletion(-)
diff --git a/include/linux/damon.h b/include/linux/damon.h
index 24ea33a03d5d..df8671e69a70 100644
--- a/include/linux/damon.h
+++ b/include/linux/damon.h
@@ -105,6 +105,7 @@ struct damon_target {
* @DAMOS_NOHUGEPAGE: Call ``madvise()`` for the region with MADV_NOHUGEPAGE.
* @DAMOS_LRU_PRIO: Prioritize the region on its LRU lists.
* @DAMOS_LRU_DEPRIO: Deprioritize the region on its LRU lists.
+ * @DAMOS_MIGRATE_COLD: Migrate for the given cold region.
* @DAMOS_STAT: Do nothing but count the stat.
* @NR_DAMOS_ACTIONS: Total number of DAMOS actions
*
@@ -122,6 +123,7 @@ enum damos_action {
DAMOS_NOHUGEPAGE,
DAMOS_LRU_PRIO,
DAMOS_LRU_DEPRIO,
+ DAMOS_MIGRATE_COLD,
DAMOS_STAT, /* Do nothing but only record the stat */
NR_DAMOS_ACTIONS,
};
diff --git a/mm/damon/paddr.c b/mm/damon/paddr.c
index 277a1c4d833c..fe217a26f788 100644
--- a/mm/damon/paddr.c
+++ b/mm/damon/paddr.c
@@ -12,6 +12,9 @@
#include <linux/pagemap.h>
#include <linux/rmap.h>
#include <linux/swap.h>
+#include <linux/memory-tiers.h>
+#include <linux/migrate.h>
+#include <linux/mm_inline.h>
#include "../internal.h"
#include "ops-common.h"
@@ -226,8 +229,137 @@ static bool damos_pa_filter_out(struct damos *scheme, struct folio *folio)
enum migration_mode {
MIG_PAGEOUT,
+ MIG_MIGRATE_COLD,
};
+static unsigned int migrate_folio_list(struct list_head *migrate_folios,
+ struct pglist_data *pgdat,
+ int target_nid)
+{
+ unsigned int nr_succeeded;
+ nodemask_t allowed_mask = NODE_MASK_NONE;
+
+ struct migration_target_control mtc = {
+ /*
+ * Allocate from 'node', or fail quickly and quietly.
+ * When this happens, 'page' will likely just be discarded
+ * instead of migrated.
+ */
+ .gfp_mask = (GFP_HIGHUSER_MOVABLE & ~__GFP_RECLAIM) | __GFP_NOWARN |
+ __GFP_NOMEMALLOC | GFP_NOWAIT,
+ .nid = target_nid,
+ .nmask = &allowed_mask
+ };
+
+ if (pgdat->node_id == target_nid || target_nid == NUMA_NO_NODE)
+ return 0;
+
+ if (list_empty(migrate_folios))
+ return 0;
+
+ /* Migration ignores all cpuset and mempolicy settings */
+ migrate_pages(migrate_folios, alloc_migrate_folio, NULL,
+ (unsigned long)&mtc, MIGRATE_ASYNC, MR_DAMON,
+ &nr_succeeded);
+
+ return nr_succeeded;
+}
+
+static unsigned int damon_pa_migrate_folio_list(struct list_head *folio_list,
+ struct pglist_data *pgdat,
+ enum migration_mode mm,
+ int target_nid)
+{
+ unsigned int nr_migrated = 0;
+ struct folio *folio;
+ LIST_HEAD(ret_folios);
+ LIST_HEAD(migrate_folios);
+
+ cond_resched();
+
+ while (!list_empty(folio_list)) {
+ struct folio *folio;
+
+ cond_resched();
+
+ folio = lru_to_folio(folio_list);
+ list_del(&folio->lru);
+
+ if (!folio_trylock(folio))
+ goto keep;
+
+ VM_BUG_ON_FOLIO(folio_test_active(folio), folio);
+
+ /* Relocate its contents to another node. */
+ list_add(&folio->lru, &migrate_folios);
+ folio_unlock(folio);
+ continue;
+keep:
+ list_add(&folio->lru, &ret_folios);
+ VM_BUG_ON_FOLIO(folio_test_lru(folio), folio);
+ }
+ /* 'folio_list' is always empty here */
+
+ /* Migrate folios selected for migration */
+ nr_migrated += migrate_folio_list(&migrate_folios, pgdat, target_nid);
+ /* Folios that could not be migrated are still in @migrate_folios */
+ if (!list_empty(&migrate_folios)) {
+ /* Folios which weren't migrated go back on @folio_list */
+ list_splice_init(&migrate_folios, folio_list);
+ }
+
+ try_to_unmap_flush();
+
+ list_splice(&ret_folios, folio_list);
+
+ while (!list_empty(folio_list)) {
+ folio = lru_to_folio(folio_list);
+ list_del(&folio->lru);
+ folio_putback_lru(folio);
+ }
+
+ return nr_migrated;
+}
+
+static unsigned long damon_pa_migrate_pages(struct list_head *folio_list,
+ enum migration_mode mm,
+ int target_nid)
+{
+ int nid;
+ unsigned int nr_migrated = 0;
+ LIST_HEAD(node_folio_list);
+ unsigned int noreclaim_flag;
+
+ if (list_empty(folio_list))
+ return nr_migrated;
+
+ noreclaim_flag = memalloc_noreclaim_save();
+
+ nid = folio_nid(lru_to_folio(folio_list));
+ do {
+ struct folio *folio = lru_to_folio(folio_list);
+
+ if (nid == folio_nid(folio)) {
+ folio_clear_active(folio);
+ list_move(&folio->lru, &node_folio_list);
+ continue;
+ }
+
+ nr_migrated += damon_pa_migrate_folio_list(&node_folio_list,
+ NODE_DATA(nid), mm,
+ target_nid);
+ nid = folio_nid(lru_to_folio(folio_list));
+ } while (!list_empty(folio_list));
+
+ nr_migrated += damon_pa_migrate_folio_list(&node_folio_list,
+ NODE_DATA(nid), mm,
+ target_nid);
+
+ memalloc_noreclaim_restore(noreclaim_flag);
+
+ return nr_migrated;
+}
+
static unsigned long damon_pa_migrate(struct damon_region *r, struct damos *s,
enum migration_mode mm)
{
@@ -247,7 +379,11 @@ static unsigned long damon_pa_migrate(struct damon_region *r, struct damos *s,
folio_test_clear_young(folio);
if (!folio_isolate_lru(folio))
goto put_folio;
- if (folio_test_unevictable(folio))
+ /*
+ * Since unevictable folios can be demoted or promoted,
+ * unevictable test is needed only for pageout.
+ */
+ if (mm == MIG_PAGEOUT && folio_test_unevictable(folio))
folio_putback_lru(folio);
else
list_add(&folio->lru, &folio_list);
@@ -258,6 +394,10 @@ static unsigned long damon_pa_migrate(struct damon_region *r, struct damos *s,
case MIG_PAGEOUT:
applied = reclaim_pages(&folio_list);
break;
+ case MIG_MIGRATE_COLD:
+ applied = damon_pa_migrate_pages(&folio_list, mm,
+ s->target_nid);
+ break;
default:
/* Unexpected migration mode. */
return 0;
@@ -314,6 +454,8 @@ static unsigned long damon_pa_apply_scheme(struct damon_ctx *ctx,
return damon_pa_mark_accessed(r, scheme);
case DAMOS_LRU_DEPRIO:
return damon_pa_deactivate_pages(r, scheme);
+ case DAMOS_MIGRATE_COLD:
+ return damon_pa_migrate(r, scheme, MIG_MIGRATE_COLD);
case DAMOS_STAT:
break;
default:
@@ -334,6 +476,8 @@ static int damon_pa_scheme_score(struct damon_ctx *context,
return damon_hot_score(context, r, scheme);
case DAMOS_LRU_DEPRIO:
return damon_cold_score(context, r, scheme);
+ case DAMOS_MIGRATE_COLD:
+ return damon_cold_score(context, r, scheme);
default:
break;
}
diff --git a/mm/damon/sysfs-schemes.c b/mm/damon/sysfs-schemes.c
index 1a30ea82c890..18b7d054c748 100644
--- a/mm/damon/sysfs-schemes.c
+++ b/mm/damon/sysfs-schemes.c
@@ -1406,6 +1406,7 @@ static const char * const damon_sysfs_damos_action_strs[] = {
"nohugepage",
"lru_prio",
"lru_deprio",
+ "migrate_cold",
"stat",
};
@@ -1659,6 +1660,9 @@ static ssize_t target_nid_store(struct kobject *kobj,
struct damon_sysfs_scheme, kobj);
int err = 0;
+ if (scheme->action != DAMOS_MIGRATE_COLD)
+ return -EINVAL;
+
/* TODO: error handling for target_nid range. */
err = kstrtoint(buf, 0, &scheme->target_nid);
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread* Re: [RFC PATCH v3 5/7] mm/damon/paddr: introduce DAMOS_MIGRATE_COLD action for demotion
2024-04-05 6:08 ` [RFC PATCH v3 5/7] mm/damon/paddr: introduce DAMOS_MIGRATE_COLD action for demotion Honggyu Kim
@ 2024-04-05 7:55 ` Hyeongtak Ji
2024-04-05 19:24 ` SeongJae Park
2024-04-05 19:24 ` SeongJae Park
1 sibling, 1 reply; 28+ messages in thread
From: Hyeongtak Ji @ 2024-04-05 7:55 UTC (permalink / raw)
To: Honggyu Kim
Cc: akpm, apopple, baolin.wang, dave.jiang, hyeongtak.ji, kernel_team,
linmiaohe, linux-kernel, linux-trace-kernel, mathieu.desnoyers,
mhiramat, rakie.kim, rostedt, surenb, yangx.jy, ying.huang, ziy,
42.hyeyoo, art.jeongseob
On Fri, 5 Apr 2024 15:08:54 +0900 Honggyu Kim <honggyu.kim@sk.com> wrote:
...snip...
> +static unsigned long damon_pa_migrate_pages(struct list_head *folio_list,
> + enum migration_mode mm,
> + int target_nid)
> +{
> + int nid;
> + unsigned int nr_migrated = 0;
> + LIST_HEAD(node_folio_list);
> + unsigned int noreclaim_flag;
> +
> + if (list_empty(folio_list))
> + return nr_migrated;
How about checking if `target_nid` is `NUMA_NO_NODE` or not earlier,
> +
> + noreclaim_flag = memalloc_noreclaim_save();
> +
> + nid = folio_nid(lru_to_folio(folio_list));
> + do {
> + struct folio *folio = lru_to_folio(folio_list);
> +
> + if (nid == folio_nid(folio)) {
> + folio_clear_active(folio);
> + list_move(&folio->lru, &node_folio_list);
> + continue;
> + }
> +
> + nr_migrated += damon_pa_migrate_folio_list(&node_folio_list,
> + NODE_DATA(nid), mm,
> + target_nid);
> + nid = folio_nid(lru_to_folio(folio_list));
> + } while (!list_empty(folio_list));
> +
> + nr_migrated += damon_pa_migrate_folio_list(&node_folio_list,
> + NODE_DATA(nid), mm,
> + target_nid);
> +
> + memalloc_noreclaim_restore(noreclaim_flag);
> +
> + return nr_migrated;
> +}
> +
...snip...
> +static unsigned int migrate_folio_list(struct list_head *migrate_folios,
> + struct pglist_data *pgdat,
> + int target_nid)
> +{
> + unsigned int nr_succeeded;
> + nodemask_t allowed_mask = NODE_MASK_NONE;
> +
> + struct migration_target_control mtc = {
> + /*
> + * Allocate from 'node', or fail quickly and quietly.
> + * When this happens, 'page' will likely just be discarded
> + * instead of migrated.
> + */
> + .gfp_mask = (GFP_HIGHUSER_MOVABLE & ~__GFP_RECLAIM) | __GFP_NOWARN |
> + __GFP_NOMEMALLOC | GFP_NOWAIT,
> + .nid = target_nid,
> + .nmask = &allowed_mask
> + };
> +
> + if (pgdat->node_id == target_nid || target_nid == NUMA_NO_NODE)
> + return 0;
instead of here.
> +
> + if (list_empty(migrate_folios))
> + return 0;
> +
> + /* Migration ignores all cpuset and mempolicy settings */
> + migrate_pages(migrate_folios, alloc_migrate_folio, NULL,
> + (unsigned long)&mtc, MIGRATE_ASYNC, MR_DAMON,
> + &nr_succeeded);
> +
> + return nr_succeeded;
> +}
> +
...snip...
Kind regards,
Hyeongtak
^ permalink raw reply [flat|nested] 28+ messages in thread* Re: [RFC PATCH v3 5/7] mm/damon/paddr: introduce DAMOS_MIGRATE_COLD action for demotion
2024-04-05 7:55 ` Hyeongtak Ji
@ 2024-04-05 19:24 ` SeongJae Park
0 siblings, 0 replies; 28+ messages in thread
From: SeongJae Park @ 2024-04-05 19:24 UTC (permalink / raw)
To: Hyeongtak Ji
Cc: SeongJae Park, Honggyu Kim, akpm, apopple, baolin.wang,
dave.jiang, hyeongtak.ji, kernel_team, linmiaohe, linux-kernel,
linux-trace-kernel, mathieu.desnoyers, mhiramat, rakie.kim,
rostedt, surenb, yangx.jy, ying.huang, ziy, 42.hyeyoo,
art.jeongseob
On Fri, 5 Apr 2024 16:55:57 +0900 Hyeongtak Ji <hyeongtak.ji@gmail.com> wrote:
> On Fri, 5 Apr 2024 15:08:54 +0900 Honggyu Kim <honggyu.kim@sk.com> wrote:
>
> ...snip...
>
> > +static unsigned long damon_pa_migrate_pages(struct list_head *folio_list,
> > + enum migration_mode mm,
> > + int target_nid)
> > +{
> > + int nid;
> > + unsigned int nr_migrated = 0;
> > + LIST_HEAD(node_folio_list);
> > + unsigned int noreclaim_flag;
> > +
> > + if (list_empty(folio_list))
> > + return nr_migrated;
>
> How about checking if `target_nid` is `NUMA_NO_NODE` or not earlier,
>
> > +
> > + noreclaim_flag = memalloc_noreclaim_save();
> > +
> > + nid = folio_nid(lru_to_folio(folio_list));
> > + do {
> > + struct folio *folio = lru_to_folio(folio_list);
> > +
> > + if (nid == folio_nid(folio)) {
> > + folio_clear_active(folio);
> > + list_move(&folio->lru, &node_folio_list);
> > + continue;
> > + }
> > +
> > + nr_migrated += damon_pa_migrate_folio_list(&node_folio_list,
> > + NODE_DATA(nid), mm,
> > + target_nid);
> > + nid = folio_nid(lru_to_folio(folio_list));
> > + } while (!list_empty(folio_list));
> > +
> > + nr_migrated += damon_pa_migrate_folio_list(&node_folio_list,
> > + NODE_DATA(nid), mm,
> > + target_nid);
> > +
> > + memalloc_noreclaim_restore(noreclaim_flag);
> > +
> > + return nr_migrated;
> > +}
> > +
>
> ...snip...
>
> > +static unsigned int migrate_folio_list(struct list_head *migrate_folios,
> > + struct pglist_data *pgdat,
> > + int target_nid)
> > +{
> > + unsigned int nr_succeeded;
> > + nodemask_t allowed_mask = NODE_MASK_NONE;
> > +
> > + struct migration_target_control mtc = {
> > + /*
> > + * Allocate from 'node', or fail quickly and quietly.
> > + * When this happens, 'page' will likely just be discarded
> > + * instead of migrated.
> > + */
> > + .gfp_mask = (GFP_HIGHUSER_MOVABLE & ~__GFP_RECLAIM) | __GFP_NOWARN |
> > + __GFP_NOMEMALLOC | GFP_NOWAIT,
> > + .nid = target_nid,
> > + .nmask = &allowed_mask
> > + };
> > +
> > + if (pgdat->node_id == target_nid || target_nid == NUMA_NO_NODE)
> > + return 0;
>
> instead of here.
Agree. As I replied on the previous reply, I think this check can be done from
the caller (or the caller of the caller) of this function.
>
> > +
> > + if (list_empty(migrate_folios))
> > + return 0;
Same for this.
> > +
> > + /* Migration ignores all cpuset and mempolicy settings */
> > + migrate_pages(migrate_folios, alloc_migrate_folio, NULL,
> > + (unsigned long)&mtc, MIGRATE_ASYNC, MR_DAMON,
> > + &nr_succeeded);
> > +
> > + return nr_succeeded;
> > +}
> > +
>
> ...snip...
>
> Kind regards,
> Hyeongtak
>
Thanks,
SJ
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC PATCH v3 5/7] mm/damon/paddr: introduce DAMOS_MIGRATE_COLD action for demotion
2024-04-05 6:08 ` [RFC PATCH v3 5/7] mm/damon/paddr: introduce DAMOS_MIGRATE_COLD action for demotion Honggyu Kim
2024-04-05 7:55 ` Hyeongtak Ji
@ 2024-04-05 19:24 ` SeongJae Park
2024-04-08 12:06 ` Honggyu Kim
1 sibling, 1 reply; 28+ messages in thread
From: SeongJae Park @ 2024-04-05 19:24 UTC (permalink / raw)
To: Honggyu Kim
Cc: SeongJae Park, damon, linux-mm, akpm, apopple, baolin.wang,
dave.jiang, hyeongtak.ji, kernel_team, linmiaohe, linux-kernel,
linux-trace-kernel, mathieu.desnoyers, mhiramat, rakie.kim,
rostedt, surenb, yangx.jy, ying.huang, ziy, 42.hyeyoo,
art.jeongseob
On Fri, 5 Apr 2024 15:08:54 +0900 Honggyu Kim <honggyu.kim@sk.com> wrote:
> This patch introduces DAMOS_MIGRATE_COLD action, which is similar to
> DAMOS_PAGEOUT, but migrate folios to the given 'target_nid' in the sysfs
> instead of swapping them out.
>
> The 'target_nid' sysfs knob is created by this patch to inform the
> migration target node ID.
Isn't it created by the previous patch?
>
> Here is one of the example usage of this 'migrate_cold' action.
>
> $ cd /sys/kernel/mm/damon/admin/kdamonds/<N>
> $ cat contexts/<N>/schemes/<N>/action
> migrate_cold
> $ echo 2 > contexts/<N>/schemes/<N>/target_nid
> $ echo commit > state
> $ numactl -p 0 ./hot_cold 500M 600M &
> $ numastat -c -p hot_cold
>
> Per-node process memory usage (in MBs)
> PID Node 0 Node 1 Node 2 Total
> -------------- ------ ------ ------ -----
> 701 (hot_cold) 501 0 601 1101
>
> Since there are some common routines with pageout, many functions have
> similar logics between pageout and migrate cold.
>
> damon_pa_migrate_folio_list() is a minimized version of
> shrink_folio_list(), but it's minified only for demotion.
MIGRATE_COLD is not only for demotion, right? I think the last two words are
better to be removed for reducing unnecessary confuses.
>
> Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
> Signed-off-by: Hyeongtak Ji <hyeongtak.ji@sk.com>
> ---
> include/linux/damon.h | 2 +
> mm/damon/paddr.c | 146 ++++++++++++++++++++++++++++++++++++++-
> mm/damon/sysfs-schemes.c | 4 ++
> 3 files changed, 151 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/damon.h b/include/linux/damon.h
> index 24ea33a03d5d..df8671e69a70 100644
> --- a/include/linux/damon.h
> +++ b/include/linux/damon.h
> @@ -105,6 +105,7 @@ struct damon_target {
> * @DAMOS_NOHUGEPAGE: Call ``madvise()`` for the region with MADV_NOHUGEPAGE.
> * @DAMOS_LRU_PRIO: Prioritize the region on its LRU lists.
> * @DAMOS_LRU_DEPRIO: Deprioritize the region on its LRU lists.
> + * @DAMOS_MIGRATE_COLD: Migrate for the given cold region.
Whether it will be for cold region or not is depending on the target access
pattern. What about 'Migrate the regions in coldest regions first manner.'?
Or, simply 'Migrate the regions (prioritize cold)' here, and explain about the
prioritization under quota on the detailed comments part?
Also, let's use tab consistently.
> * @DAMOS_STAT: Do nothing but count the stat.
> * @NR_DAMOS_ACTIONS: Total number of DAMOS actions
> *
> @@ -122,6 +123,7 @@ enum damos_action {
> DAMOS_NOHUGEPAGE,
> DAMOS_LRU_PRIO,
> DAMOS_LRU_DEPRIO,
> + DAMOS_MIGRATE_COLD,
> DAMOS_STAT, /* Do nothing but only record the stat */
> NR_DAMOS_ACTIONS,
> };
> diff --git a/mm/damon/paddr.c b/mm/damon/paddr.c
> index 277a1c4d833c..fe217a26f788 100644
> --- a/mm/damon/paddr.c
> +++ b/mm/damon/paddr.c
> @@ -12,6 +12,9 @@
> #include <linux/pagemap.h>
> #include <linux/rmap.h>
> #include <linux/swap.h>
> +#include <linux/memory-tiers.h>
> +#include <linux/migrate.h>
> +#include <linux/mm_inline.h>
>
> #include "../internal.h"
> #include "ops-common.h"
> @@ -226,8 +229,137 @@ static bool damos_pa_filter_out(struct damos *scheme, struct folio *folio)
>
> enum migration_mode {
> MIG_PAGEOUT,
> + MIG_MIGRATE_COLD,
> };
>
> +static unsigned int migrate_folio_list(struct list_head *migrate_folios,
> + struct pglist_data *pgdat,
> + int target_nid)
To avoid name collisions, I'd prefer having damon_pa_prefix. I show this patch
is defining damon_pa_migrate_folio_list() below, though. What about
__damon_pa_migrate_folio_list()?
> +{
> + unsigned int nr_succeeded;
> + nodemask_t allowed_mask = NODE_MASK_NONE;
> +
I personally prefer not having empty lines in the middle of variable
declarations/definitions. Could we remove this empty line?
> + struct migration_target_control mtc = {
> + /*
> + * Allocate from 'node', or fail quickly and quietly.
> + * When this happens, 'page' will likely just be discarded
> + * instead of migrated.
> + */
> + .gfp_mask = (GFP_HIGHUSER_MOVABLE & ~__GFP_RECLAIM) | __GFP_NOWARN |
> + __GFP_NOMEMALLOC | GFP_NOWAIT,
> + .nid = target_nid,
> + .nmask = &allowed_mask
> + };
> +
> + if (pgdat->node_id == target_nid || target_nid == NUMA_NO_NODE)
> + return 0;
> +
> + if (list_empty(migrate_folios))
> + return 0;
Can't these checks be done by the caller?
> +
> + /* Migration ignores all cpuset and mempolicy settings */
> + migrate_pages(migrate_folios, alloc_migrate_folio, NULL,
> + (unsigned long)&mtc, MIGRATE_ASYNC, MR_DAMON,
> + &nr_succeeded);
> +
> + return nr_succeeded;
> +}
> +
> +static unsigned int damon_pa_migrate_folio_list(struct list_head *folio_list,
> + struct pglist_data *pgdat,
> + enum migration_mode mm,
Again, 'mm' makes my poor brain a bit confused. What about 'mode'?
And, seems this is not used at all in this function? Can we just drop this?
> + int target_nid)
> +{
> + unsigned int nr_migrated = 0;
> + struct folio *folio;
> + LIST_HEAD(ret_folios);
> + LIST_HEAD(migrate_folios);
> +
> + cond_resched();
We will do this again at the beginning of the loop. Do we need this here?
> +
> + while (!list_empty(folio_list)) {
> + struct folio *folio;
> +
> + cond_resched();
> +
> + folio = lru_to_folio(folio_list);
> + list_del(&folio->lru);
> +
> + if (!folio_trylock(folio))
> + goto keep;
> +
> + VM_BUG_ON_FOLIO(folio_test_active(folio), folio);
Why? I think we could want to migrate active pages in some use case, e.g., to
reduce memory bandwidth?
> +
> + /* Relocate its contents to another node. */
> + list_add(&folio->lru, &migrate_folios);
> + folio_unlock(folio);
> + continue;
> +keep:
> + list_add(&folio->lru, &ret_folios);
> + VM_BUG_ON_FOLIO(folio_test_lru(folio), folio);
Can this happen? I think this could be too much test? checkpatch.pl also
warns.
> + }
> + /* 'folio_list' is always empty here */
> +
> + /* Migrate folios selected for migration */
> + nr_migrated += migrate_folio_list(&migrate_folios, pgdat, target_nid);
> + /* Folios that could not be migrated are still in @migrate_folios */
> + if (!list_empty(&migrate_folios)) {
> + /* Folios which weren't migrated go back on @folio_list */
> + list_splice_init(&migrate_folios, folio_list);
> + }
Let's not use braces for single statement
(https://docs.kernel.org/process/coding-style.html#placing-braces-and-spaces).
> +
> + try_to_unmap_flush();
> +
> + list_splice(&ret_folios, folio_list);
Can't we move remaining folios in migrate_folios to ret_folios at once?
> +
> + while (!list_empty(folio_list)) {
> + folio = lru_to_folio(folio_list);
> + list_del(&folio->lru);
> + folio_putback_lru(folio);
> + }
> +
> + return nr_migrated;
> +}
> +
> +static unsigned long damon_pa_migrate_pages(struct list_head *folio_list,
> + enum migration_mode mm,
Again, I'd prefer calling this 'mode' or something other than 'mm'.
And, seems 'mm' is not really used in this function. It is passed to
'damon_pa_migrate_folio_list()' but it deosn't really use it. Can we drop
this?
> + int target_nid)
> +{
> + int nid;
> + unsigned int nr_migrated = 0;
Let's make this matches with the return type of this function.
> + LIST_HEAD(node_folio_list);
> + unsigned int noreclaim_flag;
> +
> + if (list_empty(folio_list))
> + return nr_migrated;
> +
> + noreclaim_flag = memalloc_noreclaim_save();
> +
> + nid = folio_nid(lru_to_folio(folio_list));
> + do {
> + struct folio *folio = lru_to_folio(folio_list);
> +
> + if (nid == folio_nid(folio)) {
> + folio_clear_active(folio);
I think this was necessary for demotion, but now this should be removed since
this function is no more for demotion but for migrating random pages, right?
> + list_move(&folio->lru, &node_folio_list);
> + continue;
> + }
> +
> + nr_migrated += damon_pa_migrate_folio_list(&node_folio_list,
> + NODE_DATA(nid), mm,
> + target_nid);
> + nid = folio_nid(lru_to_folio(folio_list));
> + } while (!list_empty(folio_list));
> +
> + nr_migrated += damon_pa_migrate_folio_list(&node_folio_list,
> + NODE_DATA(nid), mm,
> + target_nid);
> +
> + memalloc_noreclaim_restore(noreclaim_flag);
> +
> + return nr_migrated;
> +}
> +
> static unsigned long damon_pa_migrate(struct damon_region *r, struct damos *s,
> enum migration_mode mm)
> {
> @@ -247,7 +379,11 @@ static unsigned long damon_pa_migrate(struct damon_region *r, struct damos *s,
> folio_test_clear_young(folio);
> if (!folio_isolate_lru(folio))
> goto put_folio;
> - if (folio_test_unevictable(folio))
> + /*
> + * Since unevictable folios can be demoted or promoted,
Let's use the term 'migrated' instead of 'demoted' or 'promoted'.
> + * unevictable test is needed only for pageout.
> + */
> + if (mm == MIG_PAGEOUT && folio_test_unevictable(folio))
> folio_putback_lru(folio);
> else
> list_add(&folio->lru, &folio_list);
> @@ -258,6 +394,10 @@ static unsigned long damon_pa_migrate(struct damon_region *r, struct damos *s,
> case MIG_PAGEOUT:
> applied = reclaim_pages(&folio_list);
> break;
> + case MIG_MIGRATE_COLD:
> + applied = damon_pa_migrate_pages(&folio_list, mm,
> + s->target_nid);
> + break;
> default:
> /* Unexpected migration mode. */
> return 0;
> @@ -314,6 +454,8 @@ static unsigned long damon_pa_apply_scheme(struct damon_ctx *ctx,
> return damon_pa_mark_accessed(r, scheme);
> case DAMOS_LRU_DEPRIO:
> return damon_pa_deactivate_pages(r, scheme);
> + case DAMOS_MIGRATE_COLD:
> + return damon_pa_migrate(r, scheme, MIG_MIGRATE_COLD);
> case DAMOS_STAT:
> break;
> default:
> @@ -334,6 +476,8 @@ static int damon_pa_scheme_score(struct damon_ctx *context,
> return damon_hot_score(context, r, scheme);
> case DAMOS_LRU_DEPRIO:
> return damon_cold_score(context, r, scheme);
> + case DAMOS_MIGRATE_COLD:
> + return damon_cold_score(context, r, scheme);
> default:
> break;
> }
> diff --git a/mm/damon/sysfs-schemes.c b/mm/damon/sysfs-schemes.c
> index 1a30ea82c890..18b7d054c748 100644
> --- a/mm/damon/sysfs-schemes.c
> +++ b/mm/damon/sysfs-schemes.c
> @@ -1406,6 +1406,7 @@ static const char * const damon_sysfs_damos_action_strs[] = {
> "nohugepage",
> "lru_prio",
> "lru_deprio",
> + "migrate_cold",
> "stat",
> };
>
> @@ -1659,6 +1660,9 @@ static ssize_t target_nid_store(struct kobject *kobj,
> struct damon_sysfs_scheme, kobj);
> int err = 0;
>
> + if (scheme->action != DAMOS_MIGRATE_COLD)
> + return -EINVAL;
> +
I think user could set target_nid first, and then action. So I think this
should not return error?
> /* TODO: error handling for target_nid range. */
> err = kstrtoint(buf, 0, &scheme->target_nid);
>
> --
> 2.34.1
>
>
Thanks,
SJ
^ permalink raw reply [flat|nested] 28+ messages in thread* Re: [RFC PATCH v3 5/7] mm/damon/paddr: introduce DAMOS_MIGRATE_COLD action for demotion
2024-04-05 19:24 ` SeongJae Park
@ 2024-04-08 12:06 ` Honggyu Kim
2024-04-08 17:52 ` SeongJae Park
0 siblings, 1 reply; 28+ messages in thread
From: Honggyu Kim @ 2024-04-08 12:06 UTC (permalink / raw)
To: SeongJae Park
Cc: damon, linux-mm, akpm, apopple, baolin.wang, dave.jiang,
hyeongtak.ji, kernel_team, linmiaohe, linux-kernel,
linux-trace-kernel, mathieu.desnoyers, mhiramat, rakie.kim,
rostedt, surenb, yangx.jy, ying.huang, ziy, 42.hyeyoo,
art.jeongseob
On Fri, 5 Apr 2024 12:24:30 -0700 SeongJae Park <sj@kernel.org> wrote:
> On Fri, 5 Apr 2024 15:08:54 +0900 Honggyu Kim <honggyu.kim@sk.com> wrote:
>
> > This patch introduces DAMOS_MIGRATE_COLD action, which is similar to
> > DAMOS_PAGEOUT, but migrate folios to the given 'target_nid' in the sysfs
> > instead of swapping them out.
> >
> > The 'target_nid' sysfs knob is created by this patch to inform the
> > migration target node ID.
>
> Isn't it created by the previous patch?
Right. I didn't fix the commit message after split this patch. I will
fix it.
> >
> > Here is one of the example usage of this 'migrate_cold' action.
> >
> > $ cd /sys/kernel/mm/damon/admin/kdamonds/<N>
> > $ cat contexts/<N>/schemes/<N>/action
> > migrate_cold
> > $ echo 2 > contexts/<N>/schemes/<N>/target_nid
> > $ echo commit > state
> > $ numactl -p 0 ./hot_cold 500M 600M &
> > $ numastat -c -p hot_cold
> >
> > Per-node process memory usage (in MBs)
> > PID Node 0 Node 1 Node 2 Total
> > -------------- ------ ------ ------ -----
> > 701 (hot_cold) 501 0 601 1101
> >
> > Since there are some common routines with pageout, many functions have
> > similar logics between pageout and migrate cold.
> >
> > damon_pa_migrate_folio_list() is a minimized version of
> > shrink_folio_list(), but it's minified only for demotion.
>
> MIGRATE_COLD is not only for demotion, right? I think the last two words are
> better to be removed for reducing unnecessary confuses.
You mean the last two sentences? I will remove them if you feel it's
confusing.
> >
> > Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
> > Signed-off-by: Hyeongtak Ji <hyeongtak.ji@sk.com>
> > ---
> > include/linux/damon.h | 2 +
> > mm/damon/paddr.c | 146 ++++++++++++++++++++++++++++++++++++++-
> > mm/damon/sysfs-schemes.c | 4 ++
> > 3 files changed, 151 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/linux/damon.h b/include/linux/damon.h
> > index 24ea33a03d5d..df8671e69a70 100644
> > --- a/include/linux/damon.h
> > +++ b/include/linux/damon.h
> > @@ -105,6 +105,7 @@ struct damon_target {
> > * @DAMOS_NOHUGEPAGE: Call ``madvise()`` for the region with MADV_NOHUGEPAGE.
> > * @DAMOS_LRU_PRIO: Prioritize the region on its LRU lists.
> > * @DAMOS_LRU_DEPRIO: Deprioritize the region on its LRU lists.
> > + * @DAMOS_MIGRATE_COLD: Migrate for the given cold region.
>
> Whether it will be for cold region or not is depending on the target access
> pattern. What about 'Migrate the regions in coldest regions first manner.'?
> Or, simply 'Migrate the regions (prioritize cold)' here, and explain about the
> prioritization under quota on the detailed comments part?
"Migrate the regions in coldest regions first manner under quota" sounds
better. I will change it.
> Also, let's use tab consistently.
Yeah, it's a mistake. will fix it.
> > * @DAMOS_STAT: Do nothing but count the stat.
> > * @NR_DAMOS_ACTIONS: Total number of DAMOS actions
> > *
> > @@ -122,6 +123,7 @@ enum damos_action {
> > DAMOS_NOHUGEPAGE,
> > DAMOS_LRU_PRIO,
> > DAMOS_LRU_DEPRIO,
> > + DAMOS_MIGRATE_COLD,
> > DAMOS_STAT, /* Do nothing but only record the stat */
> > NR_DAMOS_ACTIONS,
> > };
> > diff --git a/mm/damon/paddr.c b/mm/damon/paddr.c
> > index 277a1c4d833c..fe217a26f788 100644
> > --- a/mm/damon/paddr.c
> > +++ b/mm/damon/paddr.c
> > @@ -12,6 +12,9 @@
> > #include <linux/pagemap.h>
> > #include <linux/rmap.h>
> > #include <linux/swap.h>
> > +#include <linux/memory-tiers.h>
> > +#include <linux/migrate.h>
> > +#include <linux/mm_inline.h>
> >
> > #include "../internal.h"
> > #include "ops-common.h"
> > @@ -226,8 +229,137 @@ static bool damos_pa_filter_out(struct damos *scheme, struct folio *folio)
> >
> > enum migration_mode {
> > MIG_PAGEOUT,
> > + MIG_MIGRATE_COLD,
> > };
> >
> > +static unsigned int migrate_folio_list(struct list_head *migrate_folios,
> > + struct pglist_data *pgdat,
> > + int target_nid)
>
> To avoid name collisions, I'd prefer having damon_pa_prefix. I show this patch
> is defining damon_pa_migrate_folio_list() below, though. What about
> __damon_pa_migrate_folio_list()?
Ack. I will change it to __damon_pa_migrate_folio_list().
> > +{
> > + unsigned int nr_succeeded;
> > + nodemask_t allowed_mask = NODE_MASK_NONE;
> > +
>
> I personally prefer not having empty lines in the middle of variable
> declarations/definitions. Could we remove this empty line?
I can remove it, but I would like to have more discussion about this
issue. The current implementation allows only a single migration
target with "target_nid", but users might want to provide fall back
migration target nids.
For example, if more than two CXL nodes exist in the system, users might
want to migrate cold pages to any CXL nodes. In such cases, we might
have to make "target_nid" accept comma separated node IDs. nodemask can
be better but we should provide a way to change the scanning order.
I would like to hear how you think about this.
> > + struct migration_target_control mtc = {
> > + /*
> > + * Allocate from 'node', or fail quickly and quietly.
> > + * When this happens, 'page' will likely just be discarded
> > + * instead of migrated.
> > + */
> > + .gfp_mask = (GFP_HIGHUSER_MOVABLE & ~__GFP_RECLAIM) | __GFP_NOWARN |
> > + __GFP_NOMEMALLOC | GFP_NOWAIT,
> > + .nid = target_nid,
> > + .nmask = &allowed_mask
> > + };
> > +
> > + if (pgdat->node_id == target_nid || target_nid == NUMA_NO_NODE)
> > + return 0;
> > +
> > + if (list_empty(migrate_folios))
> > + return 0;
>
> Can't these checks be done by the caller?
Sure. I will move them to the caller.
> > +
> > + /* Migration ignores all cpuset and mempolicy settings */
> > + migrate_pages(migrate_folios, alloc_migrate_folio, NULL,
> > + (unsigned long)&mtc, MIGRATE_ASYNC, MR_DAMON,
> > + &nr_succeeded);
> > +
> > + return nr_succeeded;
> > +}
> > +
> > +static unsigned int damon_pa_migrate_folio_list(struct list_head *folio_list,
> > + struct pglist_data *pgdat,
> > + enum migration_mode mm,
>
> Again, 'mm' makes my poor brain a bit confused. What about 'mode'?
> And, seems this is not used at all in this function? Can we just drop this?
Ack. I will remove it in this patch and introduce it in the patch where
it's used.
> > + int target_nid)
> > +{
> > + unsigned int nr_migrated = 0;
> > + struct folio *folio;
> > + LIST_HEAD(ret_folios);
> > + LIST_HEAD(migrate_folios);
> > +
> > + cond_resched();
>
> We will do this again at the beginning of the loop. Do we need this here?
This comes from shrink_folio_list() but this function is way simpler so
it can be removed.
> > +
> > + while (!list_empty(folio_list)) {
> > + struct folio *folio;
> > +
> > + cond_resched();
> > +
> > + folio = lru_to_folio(folio_list);
> > + list_del(&folio->lru);
> > +
> > + if (!folio_trylock(folio))
> > + goto keep;
> > +
> > + VM_BUG_ON_FOLIO(folio_test_active(folio), folio);
>
> Why? I think we could want to migrate active pages in some use case, e.g., to
> reduce memory bandwidth?
Yeah, I will remove it.
> > +
> > + /* Relocate its contents to another node. */
> > + list_add(&folio->lru, &migrate_folios);
> > + folio_unlock(folio);
> > + continue;
> > +keep:
> > + list_add(&folio->lru, &ret_folios);
> > + VM_BUG_ON_FOLIO(folio_test_lru(folio), folio);
>
> Can this happen? I think this could be too much test? checkpatch.pl also
> warns.
Likewise, the current shrink_folio_list does so brought it in this patch
as well, but I think we can remove it here.
> > + }
> > + /* 'folio_list' is always empty here */
> > +
> > + /* Migrate folios selected for migration */
> > + nr_migrated += migrate_folio_list(&migrate_folios, pgdat, target_nid);
> > + /* Folios that could not be migrated are still in @migrate_folios */
> > + if (!list_empty(&migrate_folios)) {
> > + /* Folios which weren't migrated go back on @folio_list */
> > + list_splice_init(&migrate_folios, folio_list);
> > + }
>
> Let's not use braces for single statement
> (https://docs.kernel.org/process/coding-style.html#placing-braces-and-spaces).
Hmm.. I know the convention but left it as is because of the comment.
If I remove the braces, it would have a weird alignment for the two
lines for comment and statement lines.
> > +
> > + try_to_unmap_flush();
> > +
> > + list_splice(&ret_folios, folio_list);
>
> Can't we move remaining folios in migrate_folios to ret_folios at once?
I will see if it's possible.
> > +
> > + while (!list_empty(folio_list)) {
> > + folio = lru_to_folio(folio_list);
> > + list_del(&folio->lru);
> > + folio_putback_lru(folio);
> > + }
> > +
> > + return nr_migrated;
> > +}
> > +
> > +static unsigned long damon_pa_migrate_pages(struct list_head *folio_list,
> > + enum migration_mode mm,
>
> Again, I'd prefer calling this 'mode' or something other than 'mm'.
> And, seems 'mm' is not really used in this function. It is passed to
> 'damon_pa_migrate_folio_list()' but it deosn't really use it. Can we drop
> this?
Sure. I will drop it here and rename it to "mode" where it's used.
> > + int target_nid)
> > +{
> > + int nid;
> > + unsigned int nr_migrated = 0;
>
> Let's make this matches with the return type of this function.
Ack. will change it to unsigned long.
> > + LIST_HEAD(node_folio_list);
> > + unsigned int noreclaim_flag;
> > +
> > + if (list_empty(folio_list))
> > + return nr_migrated;
> > +
> > + noreclaim_flag = memalloc_noreclaim_save();
> > +
> > + nid = folio_nid(lru_to_folio(folio_list));
> > + do {
> > + struct folio *folio = lru_to_folio(folio_list);
> > +
> > + if (nid == folio_nid(folio)) {
> > + folio_clear_active(folio);
>
> I think this was necessary for demotion, but now this should be removed since
> this function is no more for demotion but for migrating random pages, right?
Yeah, it can be removed because we do migration instead of demotion,
but I need to make sure if it doesn't change the performance evaluation
results.
> > + list_move(&folio->lru, &node_folio_list);
> > + continue;
> > + }
> > +
> > + nr_migrated += damon_pa_migrate_folio_list(&node_folio_list,
> > + NODE_DATA(nid), mm,
> > + target_nid);
> > + nid = folio_nid(lru_to_folio(folio_list));
> > + } while (!list_empty(folio_list));
> > +
> > + nr_migrated += damon_pa_migrate_folio_list(&node_folio_list,
> > + NODE_DATA(nid), mm,
> > + target_nid);
> > +
> > + memalloc_noreclaim_restore(noreclaim_flag);
> > +
> > + return nr_migrated;
> > +}
> > +
> > static unsigned long damon_pa_migrate(struct damon_region *r, struct damos *s,
> > enum migration_mode mm)
> > {
> > @@ -247,7 +379,11 @@ static unsigned long damon_pa_migrate(struct damon_region *r, struct damos *s,
> > folio_test_clear_young(folio);
> > if (!folio_isolate_lru(folio))
> > goto put_folio;
> > - if (folio_test_unevictable(folio))
> > + /*
> > + * Since unevictable folios can be demoted or promoted,
>
> Let's use the term 'migrated' instead of 'demoted' or 'promoted'.
Ack.
> > + * unevictable test is needed only for pageout.
> > + */
> > + if (mm == MIG_PAGEOUT && folio_test_unevictable(folio))
> > folio_putback_lru(folio);
> > else
> > list_add(&folio->lru, &folio_list);
> > @@ -258,6 +394,10 @@ static unsigned long damon_pa_migrate(struct damon_region *r, struct damos *s,
> > case MIG_PAGEOUT:
> > applied = reclaim_pages(&folio_list);
> > break;
> > + case MIG_MIGRATE_COLD:
> > + applied = damon_pa_migrate_pages(&folio_list, mm,
> > + s->target_nid);
> > + break;
> > default:
> > /* Unexpected migration mode. */
> > return 0;
> > @@ -314,6 +454,8 @@ static unsigned long damon_pa_apply_scheme(struct damon_ctx *ctx,
> > return damon_pa_mark_accessed(r, scheme);
> > case DAMOS_LRU_DEPRIO:
> > return damon_pa_deactivate_pages(r, scheme);
> > + case DAMOS_MIGRATE_COLD:
> > + return damon_pa_migrate(r, scheme, MIG_MIGRATE_COLD);
> > case DAMOS_STAT:
> > break;
> > default:
> > @@ -334,6 +476,8 @@ static int damon_pa_scheme_score(struct damon_ctx *context,
> > return damon_hot_score(context, r, scheme);
> > case DAMOS_LRU_DEPRIO:
> > return damon_cold_score(context, r, scheme);
> > + case DAMOS_MIGRATE_COLD:
> > + return damon_cold_score(context, r, scheme);
> > default:
> > break;
> > }
> > diff --git a/mm/damon/sysfs-schemes.c b/mm/damon/sysfs-schemes.c
> > index 1a30ea82c890..18b7d054c748 100644
> > --- a/mm/damon/sysfs-schemes.c
> > +++ b/mm/damon/sysfs-schemes.c
> > @@ -1406,6 +1406,7 @@ static const char * const damon_sysfs_damos_action_strs[] = {
> > "nohugepage",
> > "lru_prio",
> > "lru_deprio",
> > + "migrate_cold",
> > "stat",
> > };
> >
> > @@ -1659,6 +1660,9 @@ static ssize_t target_nid_store(struct kobject *kobj,
> > struct damon_sysfs_scheme, kobj);
> > int err = 0;
> >
> > + if (scheme->action != DAMOS_MIGRATE_COLD)
> > + return -EINVAL;
> > +
>
> I think user could set target_nid first, and then action. So I think this
> should not return error?
Make sense. I will drop this check.
Thanks,
Honggyu
> > /* TODO: error handling for target_nid range. */
> > err = kstrtoint(buf, 0, &scheme->target_nid);
> >
> > --
> > 2.34.1
> >
> >
>
>
> Thanks,
> SJ
^ permalink raw reply [flat|nested] 28+ messages in thread* Re: [RFC PATCH v3 5/7] mm/damon/paddr: introduce DAMOS_MIGRATE_COLD action for demotion
2024-04-08 12:06 ` Honggyu Kim
@ 2024-04-08 17:52 ` SeongJae Park
2024-04-09 9:54 ` Honggyu Kim
0 siblings, 1 reply; 28+ messages in thread
From: SeongJae Park @ 2024-04-08 17:52 UTC (permalink / raw)
To: Honggyu Kim
Cc: SeongJae Park, damon, linux-mm, akpm, apopple, baolin.wang,
dave.jiang, hyeongtak.ji, kernel_team, linmiaohe, linux-kernel,
linux-trace-kernel, mathieu.desnoyers, mhiramat, rakie.kim,
rostedt, surenb, yangx.jy, ying.huang, ziy, 42.hyeyoo,
art.jeongseob
On Mon, 8 Apr 2024 21:06:44 +0900 Honggyu Kim <honggyu.kim@sk.com> wrote:
> On Fri, 5 Apr 2024 12:24:30 -0700 SeongJae Park <sj@kernel.org> wrote:
> > On Fri, 5 Apr 2024 15:08:54 +0900 Honggyu Kim <honggyu.kim@sk.com> wrote:
[...]
> > > Here is one of the example usage of this 'migrate_cold' action.
> > >
> > > $ cd /sys/kernel/mm/damon/admin/kdamonds/<N>
> > > $ cat contexts/<N>/schemes/<N>/action
> > > migrate_cold
> > > $ echo 2 > contexts/<N>/schemes/<N>/target_nid
> > > $ echo commit > state
> > > $ numactl -p 0 ./hot_cold 500M 600M &
> > > $ numastat -c -p hot_cold
> > >
> > > Per-node process memory usage (in MBs)
> > > PID Node 0 Node 1 Node 2 Total
> > > -------------- ------ ------ ------ -----
> > > 701 (hot_cold) 501 0 601 1101
> > >
> > > Since there are some common routines with pageout, many functions have
> > > similar logics between pageout and migrate cold.
> > >
> > > damon_pa_migrate_folio_list() is a minimized version of
> > > shrink_folio_list(), but it's minified only for demotion.
> >
> > MIGRATE_COLD is not only for demotion, right? I think the last two words are
> > better to be removed for reducing unnecessary confuses.
>
> You mean the last two sentences? I will remove them if you feel it's
> confusing.
Yes. My real intended suggestion was 's/only for demotion/only for
migration/', but entirely removing the sentences is also ok for me.
>
> > >
> > > Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
> > > Signed-off-by: Hyeongtak Ji <hyeongtak.ji@sk.com>
> > > ---
> > > include/linux/damon.h | 2 +
> > > mm/damon/paddr.c | 146 ++++++++++++++++++++++++++++++++++++++-
> > > mm/damon/sysfs-schemes.c | 4 ++
> > > 3 files changed, 151 insertions(+), 1 deletion(-)
[...]
> > > --- a/mm/damon/paddr.c
> > > +++ b/mm/damon/paddr.c
[...]
> > > +{
> > > + unsigned int nr_succeeded;
> > > + nodemask_t allowed_mask = NODE_MASK_NONE;
> > > +
> >
> > I personally prefer not having empty lines in the middle of variable
> > declarations/definitions. Could we remove this empty line?
>
> I can remove it, but I would like to have more discussion about this
> issue. The current implementation allows only a single migration
> target with "target_nid", but users might want to provide fall back
> migration target nids.
>
> For example, if more than two CXL nodes exist in the system, users might
> want to migrate cold pages to any CXL nodes. In such cases, we might
> have to make "target_nid" accept comma separated node IDs. nodemask can
> be better but we should provide a way to change the scanning order.
>
> I would like to hear how you think about this.
Good point. I think we could later extend the sysfs file to receive the
comma-separated numbers, or even mask. For simplicity, adding sysfs files
dedicated for the different format of inputs could also be an option (e.g.,
target_nids_list, target_nids_mask). But starting from this single node as is
now looks ok to me.
[...]
> > > + /* 'folio_list' is always empty here */
> > > +
> > > + /* Migrate folios selected for migration */
> > > + nr_migrated += migrate_folio_list(&migrate_folios, pgdat, target_nid);
> > > + /* Folios that could not be migrated are still in @migrate_folios */
> > > + if (!list_empty(&migrate_folios)) {
> > > + /* Folios which weren't migrated go back on @folio_list */
> > > + list_splice_init(&migrate_folios, folio_list);
> > > + }
> >
> > Let's not use braces for single statement
> > (https://docs.kernel.org/process/coding-style.html#placing-braces-and-spaces).
>
> Hmm.. I know the convention but left it as is because of the comment.
> If I remove the braces, it would have a weird alignment for the two
> lines for comment and statement lines.
I don't really hate such alignment. But if you don't like it, how about moving
the comment out of the if statement? Having one comment for one-line if
statement looks not bad to me.
>
> > > +
> > > + try_to_unmap_flush();
> > > +
> > > + list_splice(&ret_folios, folio_list);
> >
> > Can't we move remaining folios in migrate_folios to ret_folios at once?
>
> I will see if it's possible.
Thank you. Not a strict request, though.
[...]
> > > + nid = folio_nid(lru_to_folio(folio_list));
> > > + do {
> > > + struct folio *folio = lru_to_folio(folio_list);
> > > +
> > > + if (nid == folio_nid(folio)) {
> > > + folio_clear_active(folio);
> >
> > I think this was necessary for demotion, but now this should be removed since
> > this function is no more for demotion but for migrating random pages, right?
>
> Yeah, it can be removed because we do migration instead of demotion,
> but I need to make sure if it doesn't change the performance evaluation
> results.
Yes, please ensure the test results are valid :)
Thanks,
SJ
[...]
^ permalink raw reply [flat|nested] 28+ messages in thread* Re: [RFC PATCH v3 5/7] mm/damon/paddr: introduce DAMOS_MIGRATE_COLD action for demotion
2024-04-08 17:52 ` SeongJae Park
@ 2024-04-09 9:54 ` Honggyu Kim
2024-04-09 16:18 ` SeongJae Park
0 siblings, 1 reply; 28+ messages in thread
From: Honggyu Kim @ 2024-04-09 9:54 UTC (permalink / raw)
To: SeongJae Park
Cc: damon, linux-mm, akpm, apopple, baolin.wang, dave.jiang,
hyeongtak.ji, kernel_team, linmiaohe, linux-kernel,
linux-trace-kernel, mathieu.desnoyers, mhiramat, rakie.kim,
rostedt, surenb, yangx.jy, ying.huang, ziy, 42.hyeyoo,
art.jeongseob, Honggyu Kim
Hi SeongJae,
On Mon, 8 Apr 2024 10:52:28 -0700 SeongJae Park <sj@kernel.org> wrote:
> On Mon, 8 Apr 2024 21:06:44 +0900 Honggyu Kim <honggyu.kim@sk.com> wrote:
>
> > On Fri, 5 Apr 2024 12:24:30 -0700 SeongJae Park <sj@kernel.org> wrote:
> > > On Fri, 5 Apr 2024 15:08:54 +0900 Honggyu Kim <honggyu.kim@sk.com> wrote:
> [...]
> > > > Here is one of the example usage of this 'migrate_cold' action.
> > > >
> > > > $ cd /sys/kernel/mm/damon/admin/kdamonds/<N>
> > > > $ cat contexts/<N>/schemes/<N>/action
> > > > migrate_cold
> > > > $ echo 2 > contexts/<N>/schemes/<N>/target_nid
> > > > $ echo commit > state
> > > > $ numactl -p 0 ./hot_cold 500M 600M &
> > > > $ numastat -c -p hot_cold
> > > >
> > > > Per-node process memory usage (in MBs)
> > > > PID Node 0 Node 1 Node 2 Total
> > > > -------------- ------ ------ ------ -----
> > > > 701 (hot_cold) 501 0 601 1101
> > > >
> > > > Since there are some common routines with pageout, many functions have
> > > > similar logics between pageout and migrate cold.
> > > >
> > > > damon_pa_migrate_folio_list() is a minimized version of
> > > > shrink_folio_list(), but it's minified only for demotion.
> > >
> > > MIGRATE_COLD is not only for demotion, right? I think the last two words are
> > > better to be removed for reducing unnecessary confuses.
> >
> > You mean the last two sentences? I will remove them if you feel it's
> > confusing.
>
> Yes. My real intended suggestion was 's/only for demotion/only for
> migration/', but entirely removing the sentences is also ok for me.
Ack.
> >
> > > >
> > > > Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
> > > > Signed-off-by: Hyeongtak Ji <hyeongtak.ji@sk.com>
> > > > ---
> > > > include/linux/damon.h | 2 +
> > > > mm/damon/paddr.c | 146 ++++++++++++++++++++++++++++++++++++++-
> > > > mm/damon/sysfs-schemes.c | 4 ++
> > > > 3 files changed, 151 insertions(+), 1 deletion(-)
> [...]
> > > > --- a/mm/damon/paddr.c
> > > > +++ b/mm/damon/paddr.c
> [...]
> > > > +{
> > > > + unsigned int nr_succeeded;
> > > > + nodemask_t allowed_mask = NODE_MASK_NONE;
> > > > +
> > >
> > > I personally prefer not having empty lines in the middle of variable
> > > declarations/definitions. Could we remove this empty line?
> >
> > I can remove it, but I would like to have more discussion about this
> > issue. The current implementation allows only a single migration
> > target with "target_nid", but users might want to provide fall back
> > migration target nids.
> >
> > For example, if more than two CXL nodes exist in the system, users might
> > want to migrate cold pages to any CXL nodes. In such cases, we might
> > have to make "target_nid" accept comma separated node IDs. nodemask can
> > be better but we should provide a way to change the scanning order.
> >
> > I would like to hear how you think about this.
>
> Good point. I think we could later extend the sysfs file to receive the
> comma-separated numbers, or even mask. For simplicity, adding sysfs files
> dedicated for the different format of inputs could also be an option (e.g.,
> target_nids_list, target_nids_mask). But starting from this single node as is
> now looks ok to me.
If you think we can start from a single node, then I will keep it as is.
But are you okay if I change the same 'target_nid' to accept
comma-separated numbers later? Or do you want to introduce another knob
such as 'target_nids_list'? What about rename 'target_nid' to
'target_nids' at the first place?
> [...]
> > > > + /* 'folio_list' is always empty here */
> > > > +
> > > > + /* Migrate folios selected for migration */
> > > > + nr_migrated += migrate_folio_list(&migrate_folios, pgdat, target_nid);
> > > > + /* Folios that could not be migrated are still in @migrate_folios */
> > > > + if (!list_empty(&migrate_folios)) {
> > > > + /* Folios which weren't migrated go back on @folio_list */
> > > > + list_splice_init(&migrate_folios, folio_list);
> > > > + }
> > >
> > > Let's not use braces for single statement
> > > (https://docs.kernel.org/process/coding-style.html#placing-braces-and-spaces).
> >
> > Hmm.. I know the convention but left it as is because of the comment.
> > If I remove the braces, it would have a weird alignment for the two
> > lines for comment and statement lines.
>
> I don't really hate such alignment. But if you don't like it, how about moving
> the comment out of the if statement? Having one comment for one-line if
> statement looks not bad to me.
Ack. I will manage this in the next revision.
> >
> > > > +
> > > > + try_to_unmap_flush();
> > > > +
> > > > + list_splice(&ret_folios, folio_list);
> > >
> > > Can't we move remaining folios in migrate_folios to ret_folios at once?
> >
> > I will see if it's possible.
>
> Thank you. Not a strict request, though.
>
> [...]
> > > > + nid = folio_nid(lru_to_folio(folio_list));
> > > > + do {
> > > > + struct folio *folio = lru_to_folio(folio_list);
> > > > +
> > > > + if (nid == folio_nid(folio)) {
> > > > + folio_clear_active(folio);
> > >
> > > I think this was necessary for demotion, but now this should be removed since
> > > this function is no more for demotion but for migrating random pages, right?
> >
> > Yeah, it can be removed because we do migration instead of demotion,
> > but I need to make sure if it doesn't change the performance evaluation
> > results.
>
> Yes, please ensure the test results are valid :)
Sure. Thanks for your review in details!
Please note that I will be out of office this week so won't be able to
answer quickly.
Thanks,
Honggyu
>
> Thanks,
> SJ
>
> [...]
>
^ permalink raw reply [flat|nested] 28+ messages in thread* Re: [RFC PATCH v3 5/7] mm/damon/paddr: introduce DAMOS_MIGRATE_COLD action for demotion
2024-04-09 9:54 ` Honggyu Kim
@ 2024-04-09 16:18 ` SeongJae Park
0 siblings, 0 replies; 28+ messages in thread
From: SeongJae Park @ 2024-04-09 16:18 UTC (permalink / raw)
To: Honggyu Kim
Cc: SeongJae Park, damon, linux-mm, akpm, apopple, baolin.wang,
dave.jiang, hyeongtak.ji, kernel_team, linmiaohe, linux-kernel,
linux-trace-kernel, mathieu.desnoyers, mhiramat, rakie.kim,
rostedt, surenb, yangx.jy, ying.huang, ziy, 42.hyeyoo,
art.jeongseob
Hi Honggyu,
On Tue, 9 Apr 2024 18:54:14 +0900 Honggyu Kim <honggyu.kim@sk.com> wrote:
> On Mon, 8 Apr 2024 10:52:28 -0700 SeongJae Park <sj@kernel.org> wrote:
> > On Mon, 8 Apr 2024 21:06:44 +0900 Honggyu Kim <honggyu.kim@sk.com> wrote:
> > > On Fri, 5 Apr 2024 12:24:30 -0700 SeongJae Park <sj@kernel.org> wrote:
> > > > On Fri, 5 Apr 2024 15:08:54 +0900 Honggyu Kim <honggyu.kim@sk.com> wrote:
[...]
> > > I can remove it, but I would like to have more discussion about this
> > > issue. The current implementation allows only a single migration
> > > target with "target_nid", but users might want to provide fall back
> > > migration target nids.
> > >
> > > For example, if more than two CXL nodes exist in the system, users might
> > > want to migrate cold pages to any CXL nodes. In such cases, we might
> > > have to make "target_nid" accept comma separated node IDs. nodemask can
> > > be better but we should provide a way to change the scanning order.
> > >
> > > I would like to hear how you think about this.
> >
> > Good point. I think we could later extend the sysfs file to receive the
> > comma-separated numbers, or even mask. For simplicity, adding sysfs files
> > dedicated for the different format of inputs could also be an option (e.g.,
> > target_nids_list, target_nids_mask). But starting from this single node as is
> > now looks ok to me.
>
> If you think we can start from a single node, then I will keep it as is.
> But are you okay if I change the same 'target_nid' to accept
> comma-separated numbers later? Or do you want to introduce another knob
> such as 'target_nids_list'? What about rename 'target_nid' to
> 'target_nids' at the first place?
I have no strong concern or opinion about this at the moment. Please feel free
to renaming it to 'taget_nids' if you think that's better.
[...]
> Please note that I will be out of office this week so won't be able to
> answer quickly.
No problem, I hope you to take and enjoy your time :)
Thanks,
SJ
[...]
^ permalink raw reply [flat|nested] 28+ messages in thread
* [RFC PATCH v3 6/7] mm/damon/paddr: introduce DAMOS_MIGRATE_HOT action for promotion
2024-04-05 6:08 [RFC PATCH v3 0/7] DAMON based tiered memory management for CXL memory Honggyu Kim
` (4 preceding siblings ...)
2024-04-05 6:08 ` [RFC PATCH v3 5/7] mm/damon/paddr: introduce DAMOS_MIGRATE_COLD action for demotion Honggyu Kim
@ 2024-04-05 6:08 ` Honggyu Kim
2024-04-05 19:26 ` SeongJae Park
2024-04-05 6:08 ` [RFC PATCH v3 7/7] mm/damon: Add "damon_migrate_{hot,cold}" vmstat Honggyu Kim
` (2 subsequent siblings)
8 siblings, 1 reply; 28+ messages in thread
From: Honggyu Kim @ 2024-04-05 6:08 UTC (permalink / raw)
To: sj, damon, linux-mm
Cc: akpm, apopple, baolin.wang, dave.jiang, honggyu.kim, hyeongtak.ji,
kernel_team, linmiaohe, linux-kernel, linux-trace-kernel,
mathieu.desnoyers, mhiramat, rakie.kim, rostedt, surenb, yangx.jy,
ying.huang, ziy, 42.hyeyoo, art.jeongseob
From: Hyeongtak Ji <hyeongtak.ji@sk.com>
This patch introduces DAMOS_MIGRATE_HOT action, which is similar to
DAMOS_MIGRATE_COLD, but it is targeted to migrate hot pages.
It migrates pages inside the given region to the 'target_nid' NUMA node
in the sysfs.
Here is one of the example usage of this 'migrate_hot' action.
$ cd /sys/kernel/mm/damon/admin/kdamonds/<N>
$ cat contexts/<N>/schemes/<N>/action
migrate_hot
$ echo 0 > contexts/<N>/schemes/<N>/target_nid
$ echo commit > state
$ numactl -p 2 ./hot_cold 500M 600M &
$ numastat -c -p hot_cold
Per-node process memory usage (in MBs)
PID Node 0 Node 1 Node 2 Total
-------------- ------ ------ ------ -----
701 (hot_cold) 501 0 601 1101
Signed-off-by: Hyeongtak Ji <hyeongtak.ji@sk.com>
Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
---
include/linux/damon.h | 2 ++
mm/damon/paddr.c | 12 ++++++++++--
mm/damon/sysfs-schemes.c | 4 +++-
3 files changed, 15 insertions(+), 3 deletions(-)
diff --git a/include/linux/damon.h b/include/linux/damon.h
index df8671e69a70..934c95a7c042 100644
--- a/include/linux/damon.h
+++ b/include/linux/damon.h
@@ -105,6 +105,7 @@ struct damon_target {
* @DAMOS_NOHUGEPAGE: Call ``madvise()`` for the region with MADV_NOHUGEPAGE.
* @DAMOS_LRU_PRIO: Prioritize the region on its LRU lists.
* @DAMOS_LRU_DEPRIO: Deprioritize the region on its LRU lists.
+ * @DAMOS_MIGRATE_HOT: Migrate for the given hot region.
* @DAMOS_MIGRATE_COLD: Migrate for the given cold region.
* @DAMOS_STAT: Do nothing but count the stat.
* @NR_DAMOS_ACTIONS: Total number of DAMOS actions
@@ -123,6 +124,7 @@ enum damos_action {
DAMOS_NOHUGEPAGE,
DAMOS_LRU_PRIO,
DAMOS_LRU_DEPRIO,
+ DAMOS_MIGRATE_HOT,
DAMOS_MIGRATE_COLD,
DAMOS_STAT, /* Do nothing but only record the stat */
NR_DAMOS_ACTIONS,
diff --git a/mm/damon/paddr.c b/mm/damon/paddr.c
index fe217a26f788..fd9d35b5cc83 100644
--- a/mm/damon/paddr.c
+++ b/mm/damon/paddr.c
@@ -229,6 +229,7 @@ static bool damos_pa_filter_out(struct damos *scheme, struct folio *folio)
enum migration_mode {
MIG_PAGEOUT,
+ MIG_MIGRATE_HOT,
MIG_MIGRATE_COLD,
};
@@ -375,8 +376,10 @@ static unsigned long damon_pa_migrate(struct damon_region *r, struct damos *s,
if (damos_pa_filter_out(s, folio))
goto put_folio;
- folio_clear_referenced(folio);
- folio_test_clear_young(folio);
+ if (mm != MIG_MIGRATE_HOT) {
+ folio_clear_referenced(folio);
+ folio_test_clear_young(folio);
+ }
if (!folio_isolate_lru(folio))
goto put_folio;
/*
@@ -394,6 +397,7 @@ static unsigned long damon_pa_migrate(struct damon_region *r, struct damos *s,
case MIG_PAGEOUT:
applied = reclaim_pages(&folio_list);
break;
+ case MIG_MIGRATE_HOT:
case MIG_MIGRATE_COLD:
applied = damon_pa_migrate_pages(&folio_list, mm,
s->target_nid);
@@ -454,6 +458,8 @@ static unsigned long damon_pa_apply_scheme(struct damon_ctx *ctx,
return damon_pa_mark_accessed(r, scheme);
case DAMOS_LRU_DEPRIO:
return damon_pa_deactivate_pages(r, scheme);
+ case DAMOS_MIGRATE_HOT:
+ return damon_pa_migrate(r, scheme, MIG_MIGRATE_HOT);
case DAMOS_MIGRATE_COLD:
return damon_pa_migrate(r, scheme, MIG_MIGRATE_COLD);
case DAMOS_STAT:
@@ -476,6 +482,8 @@ static int damon_pa_scheme_score(struct damon_ctx *context,
return damon_hot_score(context, r, scheme);
case DAMOS_LRU_DEPRIO:
return damon_cold_score(context, r, scheme);
+ case DAMOS_MIGRATE_HOT:
+ return damon_hot_score(context, r, scheme);
case DAMOS_MIGRATE_COLD:
return damon_cold_score(context, r, scheme);
default:
diff --git a/mm/damon/sysfs-schemes.c b/mm/damon/sysfs-schemes.c
index 18b7d054c748..1d2f62aa79ca 100644
--- a/mm/damon/sysfs-schemes.c
+++ b/mm/damon/sysfs-schemes.c
@@ -1406,6 +1406,7 @@ static const char * const damon_sysfs_damos_action_strs[] = {
"nohugepage",
"lru_prio",
"lru_deprio",
+ "migrate_hot",
"migrate_cold",
"stat",
};
@@ -1660,7 +1661,8 @@ static ssize_t target_nid_store(struct kobject *kobj,
struct damon_sysfs_scheme, kobj);
int err = 0;
- if (scheme->action != DAMOS_MIGRATE_COLD)
+ if (scheme->action != DAMOS_MIGRATE_HOT &&
+ scheme->action != DAMOS_MIGRATE_COLD)
return -EINVAL;
/* TODO: error handling for target_nid range. */
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread* Re: [RFC PATCH v3 6/7] mm/damon/paddr: introduce DAMOS_MIGRATE_HOT action for promotion
2024-04-05 6:08 ` [RFC PATCH v3 6/7] mm/damon/paddr: introduce DAMOS_MIGRATE_HOT action for promotion Honggyu Kim
@ 2024-04-05 19:26 ` SeongJae Park
0 siblings, 0 replies; 28+ messages in thread
From: SeongJae Park @ 2024-04-05 19:26 UTC (permalink / raw)
To: Honggyu Kim
Cc: SeongJae Park, damon, linux-mm, akpm, apopple, baolin.wang,
dave.jiang, hyeongtak.ji, kernel_team, linmiaohe, linux-kernel,
linux-trace-kernel, mathieu.desnoyers, mhiramat, rakie.kim,
rostedt, surenb, yangx.jy, ying.huang, ziy, 42.hyeyoo,
art.jeongseob
On Fri, 5 Apr 2024 15:08:55 +0900 Honggyu Kim <honggyu.kim@sk.com> wrote:
> From: Hyeongtak Ji <hyeongtak.ji@sk.com>
>
> This patch introduces DAMOS_MIGRATE_HOT action, which is similar to
> DAMOS_MIGRATE_COLD, but it is targeted to migrate hot pages.
My understanding of our last discussion was that 'HOT/COLD' here is only for
prioritization score function. If I'm not wrong, this is not for targeting,
but just prioritize migrating hot pages first under the quota.
>
> It migrates pages inside the given region to the 'target_nid' NUMA node
> in the sysfs.
>
> Here is one of the example usage of this 'migrate_hot' action.
>
> $ cd /sys/kernel/mm/damon/admin/kdamonds/<N>
> $ cat contexts/<N>/schemes/<N>/action
> migrate_hot
> $ echo 0 > contexts/<N>/schemes/<N>/target_nid
> $ echo commit > state
> $ numactl -p 2 ./hot_cold 500M 600M &
> $ numastat -c -p hot_cold
>
> Per-node process memory usage (in MBs)
> PID Node 0 Node 1 Node 2 Total
> -------------- ------ ------ ------ -----
> 701 (hot_cold) 501 0 601 1101
>
> Signed-off-by: Hyeongtak Ji <hyeongtak.ji@sk.com>
> Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
> ---
> include/linux/damon.h | 2 ++
> mm/damon/paddr.c | 12 ++++++++++--
> mm/damon/sysfs-schemes.c | 4 +++-
> 3 files changed, 15 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/damon.h b/include/linux/damon.h
> index df8671e69a70..934c95a7c042 100644
> --- a/include/linux/damon.h
> +++ b/include/linux/damon.h
> @@ -105,6 +105,7 @@ struct damon_target {
> * @DAMOS_NOHUGEPAGE: Call ``madvise()`` for the region with MADV_NOHUGEPAGE.
> * @DAMOS_LRU_PRIO: Prioritize the region on its LRU lists.
> * @DAMOS_LRU_DEPRIO: Deprioritize the region on its LRU lists.
> + * @DAMOS_MIGRATE_HOT: Migrate for the given hot region.
As commented on the previous patch, this could be bit re-phrased.
Also, let's use tabs consistently.
> * @DAMOS_MIGRATE_COLD: Migrate for the given cold region.
> * @DAMOS_STAT: Do nothing but count the stat.
> * @NR_DAMOS_ACTIONS: Total number of DAMOS actions
> @@ -123,6 +124,7 @@ enum damos_action {
> DAMOS_NOHUGEPAGE,
> DAMOS_LRU_PRIO,
> DAMOS_LRU_DEPRIO,
> + DAMOS_MIGRATE_HOT,
> DAMOS_MIGRATE_COLD,
> DAMOS_STAT, /* Do nothing but only record the stat */
> NR_DAMOS_ACTIONS,
> diff --git a/mm/damon/paddr.c b/mm/damon/paddr.c
> index fe217a26f788..fd9d35b5cc83 100644
> --- a/mm/damon/paddr.c
> +++ b/mm/damon/paddr.c
> @@ -229,6 +229,7 @@ static bool damos_pa_filter_out(struct damos *scheme, struct folio *folio)
>
> enum migration_mode {
> MIG_PAGEOUT,
> + MIG_MIGRATE_HOT,
> MIG_MIGRATE_COLD,
> };
It looks like we don't need MIG_MIGRATE_HOT and MIG_MIGRATE_COLD in real, but
just one, say, MIG_MIGRATE, since the code can know if it should use what
prioritization score function with DAMOS action?
Also, as I commented on the previous one, I'd prefer having DAMOS_ prefix.
>
> @@ -375,8 +376,10 @@ static unsigned long damon_pa_migrate(struct damon_region *r, struct damos *s,
> if (damos_pa_filter_out(s, folio))
> goto put_folio;
>
> - folio_clear_referenced(folio);
> - folio_test_clear_young(folio);
> + if (mm != MIG_MIGRATE_HOT) {
> + folio_clear_referenced(folio);
> + folio_test_clear_young(folio);
> + }
We agreed to this check via 'young' page type DAMOS filter, and let this
doesn't care about it, right? If I'm not wrong, I think this should be
removed?
> if (!folio_isolate_lru(folio))
> goto put_folio;
> /*
> @@ -394,6 +397,7 @@ static unsigned long damon_pa_migrate(struct damon_region *r, struct damos *s,
> case MIG_PAGEOUT:
> applied = reclaim_pages(&folio_list);
> break;
> + case MIG_MIGRATE_HOT:
> case MIG_MIGRATE_COLD:
> applied = damon_pa_migrate_pages(&folio_list, mm,
> s->target_nid);
> @@ -454,6 +458,8 @@ static unsigned long damon_pa_apply_scheme(struct damon_ctx *ctx,
> return damon_pa_mark_accessed(r, scheme);
> case DAMOS_LRU_DEPRIO:
> return damon_pa_deactivate_pages(r, scheme);
> + case DAMOS_MIGRATE_HOT:
> + return damon_pa_migrate(r, scheme, MIG_MIGRATE_HOT);
> case DAMOS_MIGRATE_COLD:
> return damon_pa_migrate(r, scheme, MIG_MIGRATE_COLD);
> case DAMOS_STAT:
> @@ -476,6 +482,8 @@ static int damon_pa_scheme_score(struct damon_ctx *context,
> return damon_hot_score(context, r, scheme);
> case DAMOS_LRU_DEPRIO:
> return damon_cold_score(context, r, scheme);
> + case DAMOS_MIGRATE_HOT:
> + return damon_hot_score(context, r, scheme);
> case DAMOS_MIGRATE_COLD:
> return damon_cold_score(context, r, scheme);
> default:
> diff --git a/mm/damon/sysfs-schemes.c b/mm/damon/sysfs-schemes.c
> index 18b7d054c748..1d2f62aa79ca 100644
> --- a/mm/damon/sysfs-schemes.c
> +++ b/mm/damon/sysfs-schemes.c
> @@ -1406,6 +1406,7 @@ static const char * const damon_sysfs_damos_action_strs[] = {
> "nohugepage",
> "lru_prio",
> "lru_deprio",
> + "migrate_hot",
> "migrate_cold",
> "stat",
> };
> @@ -1660,7 +1661,8 @@ static ssize_t target_nid_store(struct kobject *kobj,
> struct damon_sysfs_scheme, kobj);
> int err = 0;
>
> - if (scheme->action != DAMOS_MIGRATE_COLD)
> + if (scheme->action != DAMOS_MIGRATE_HOT &&
> + scheme->action != DAMOS_MIGRATE_COLD)
> return -EINVAL;
>
> /* TODO: error handling for target_nid range. */
> --
> 2.34.1
>
>
Thanks,
SJ
^ permalink raw reply [flat|nested] 28+ messages in thread
* [RFC PATCH v3 7/7] mm/damon: Add "damon_migrate_{hot,cold}" vmstat
2024-04-05 6:08 [RFC PATCH v3 0/7] DAMON based tiered memory management for CXL memory Honggyu Kim
` (5 preceding siblings ...)
2024-04-05 6:08 ` [RFC PATCH v3 6/7] mm/damon/paddr: introduce DAMOS_MIGRATE_HOT action for promotion Honggyu Kim
@ 2024-04-05 6:08 ` Honggyu Kim
2024-04-05 19:27 ` SeongJae Park
2024-04-05 16:56 ` [RFC PATCH v3 0/7] DAMON based tiered memory management for CXL memory Gregory Price
2024-04-05 19:28 ` [RFC PATCH v3 0/7] DAMON based tiered memory management for CXL memory SeongJae Park
8 siblings, 1 reply; 28+ messages in thread
From: Honggyu Kim @ 2024-04-05 6:08 UTC (permalink / raw)
To: sj, damon, linux-mm
Cc: akpm, apopple, baolin.wang, dave.jiang, honggyu.kim, hyeongtak.ji,
kernel_team, linmiaohe, linux-kernel, linux-trace-kernel,
mathieu.desnoyers, mhiramat, rakie.kim, rostedt, surenb, yangx.jy,
ying.huang, ziy, 42.hyeyoo, art.jeongseob
This patch adds "damon_migrate_{hot,cold}" under node specific vmstat
counters at the following location.
/sys/devices/system/node/node*/vmstat
The counted values are accumulcated to the global vmstat so it also
introduces the same counter at /proc/vmstat as well.
Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
---
include/linux/mmzone.h | 4 ++++
mm/damon/paddr.c | 17 ++++++++++++++++-
mm/vmstat.c | 4 ++++
3 files changed, 24 insertions(+), 1 deletion(-)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index a497f189d988..0005372c5503 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -214,6 +214,10 @@ enum node_stat_item {
PGDEMOTE_KSWAPD,
PGDEMOTE_DIRECT,
PGDEMOTE_KHUGEPAGED,
+#ifdef CONFIG_DAMON_PADDR
+ DAMON_MIGRATE_HOT,
+ DAMON_MIGRATE_COLD,
+#endif
NR_VM_NODE_STAT_ITEMS
};
diff --git a/mm/damon/paddr.c b/mm/damon/paddr.c
index fd9d35b5cc83..d559c242d151 100644
--- a/mm/damon/paddr.c
+++ b/mm/damon/paddr.c
@@ -235,10 +235,23 @@ enum migration_mode {
static unsigned int migrate_folio_list(struct list_head *migrate_folios,
struct pglist_data *pgdat,
+ enum migration_mode mm,
int target_nid)
{
unsigned int nr_succeeded;
nodemask_t allowed_mask = NODE_MASK_NONE;
+ enum node_stat_item node_stat;
+
+ switch (mm) {
+ case MIG_MIGRATE_HOT:
+ node_stat = DAMON_MIGRATE_HOT;
+ break;
+ case MIG_MIGRATE_COLD:
+ node_stat = DAMON_MIGRATE_COLD;
+ break;
+ default:
+ return 0;
+ }
struct migration_target_control mtc = {
/*
@@ -263,6 +276,8 @@ static unsigned int migrate_folio_list(struct list_head *migrate_folios,
(unsigned long)&mtc, MIGRATE_ASYNC, MR_DAMON,
&nr_succeeded);
+ mod_node_page_state(pgdat, node_stat, nr_succeeded);
+
return nr_succeeded;
}
@@ -302,7 +317,7 @@ static unsigned int damon_pa_migrate_folio_list(struct list_head *folio_list,
/* 'folio_list' is always empty here */
/* Migrate folios selected for migration */
- nr_migrated += migrate_folio_list(&migrate_folios, pgdat, target_nid);
+ nr_migrated += migrate_folio_list(&migrate_folios, pgdat, mm, target_nid);
/* Folios that could not be migrated are still in @migrate_folios */
if (!list_empty(&migrate_folios)) {
/* Folios which weren't migrated go back on @folio_list */
diff --git a/mm/vmstat.c b/mm/vmstat.c
index db79935e4a54..be9ba89fede1 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1252,6 +1252,10 @@ const char * const vmstat_text[] = {
"pgdemote_kswapd",
"pgdemote_direct",
"pgdemote_khugepaged",
+#ifdef CONFIG_DAMON_PADDR
+ "damon_migrate_hot",
+ "damon_migrate_cold",
+#endif
/* enum writeback_stat_item counters */
"nr_dirty_threshold",
--
2.34.1
^ permalink raw reply related [flat|nested] 28+ messages in thread* Re: [RFC PATCH v3 7/7] mm/damon: Add "damon_migrate_{hot,cold}" vmstat
2024-04-05 6:08 ` [RFC PATCH v3 7/7] mm/damon: Add "damon_migrate_{hot,cold}" vmstat Honggyu Kim
@ 2024-04-05 19:27 ` SeongJae Park
0 siblings, 0 replies; 28+ messages in thread
From: SeongJae Park @ 2024-04-05 19:27 UTC (permalink / raw)
To: Honggyu Kim
Cc: SeongJae Park, damon, linux-mm, akpm, apopple, baolin.wang,
dave.jiang, hyeongtak.ji, kernel_team, linmiaohe, linux-kernel,
linux-trace-kernel, mathieu.desnoyers, mhiramat, rakie.kim,
rostedt, surenb, yangx.jy, ying.huang, ziy, 42.hyeyoo,
art.jeongseob
On Fri, 5 Apr 2024 15:08:56 +0900 Honggyu Kim <honggyu.kim@sk.com> wrote:
> This patch adds "damon_migrate_{hot,cold}" under node specific vmstat
> counters at the following location.
>
> /sys/devices/system/node/node*/vmstat
>
> The counted values are accumulcated to the global vmstat so it also
> introduces the same counter at /proc/vmstat as well.
DAMON provides its own DAMOS stats via DAMON sysfs interface. Do we really
need this change?
Thanks,
SJ
[...]
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC PATCH v3 0/7] DAMON based tiered memory management for CXL memory
2024-04-05 6:08 [RFC PATCH v3 0/7] DAMON based tiered memory management for CXL memory Honggyu Kim
` (6 preceding siblings ...)
2024-04-05 6:08 ` [RFC PATCH v3 7/7] mm/damon: Add "damon_migrate_{hot,cold}" vmstat Honggyu Kim
@ 2024-04-05 16:56 ` Gregory Price
2024-04-08 13:41 ` [RFC PATCH v3 0/7] DAMON based tiered memory management for CXL Honggyu Kim
2024-04-05 19:28 ` [RFC PATCH v3 0/7] DAMON based tiered memory management for CXL memory SeongJae Park
8 siblings, 1 reply; 28+ messages in thread
From: Gregory Price @ 2024-04-05 16:56 UTC (permalink / raw)
To: Honggyu Kim
Cc: sj, damon, linux-mm, akpm, apopple, baolin.wang, dave.jiang,
hyeongtak.ji, kernel_team, linmiaohe, linux-kernel,
linux-trace-kernel, mathieu.desnoyers, mhiramat, rakie.kim,
rostedt, surenb, yangx.jy, ying.huang, ziy, 42.hyeyoo,
art.jeongseob
On Fri, Apr 05, 2024 at 03:08:49PM +0900, Honggyu Kim wrote:
> There was an RFC IDEA "DAMOS-based Tiered-Memory Management" previously
> posted at [1].
>
> 1. YCSB zipfian distribution read only workload
> memory pressure with cold memory on node0 with 512GB of local DRAM.
> =============+================================================+=========
> | cold memory occupied by mmap and memset |
> | 0G 440G 450G 460G 470G 480G 490G 500G |
> =============+================================================+=========
> Execution time normalized to DRAM-only values | GEOMEAN
> -------------+------------------------------------------------+---------
> DRAM-only | 1.00 - - - - - - - | 1.00
> CXL-only | 1.22 - - - - - - - | 1.22
> default | - 1.12 1.13 1.14 1.16 1.19 1.21 1.21 | 1.17
> DAMON tiered | - 1.04 1.03 1.04 1.06 1.05 1.05 1.05 | 1.05
> =============+================================================+=========
> CXL usage of redis-server in GB | AVERAGE
> -------------+------------------------------------------------+---------
> DRAM-only | 0.0 - - - - - - - | 0.0
> CXL-only | 52.6 - - - - - - - | 52.6
> default | - 20.4 27.0 33.1 39.5 45.6 50.5 50.3 | 38.1
> DAMON tiered | - 0.1 0.3 0.8 0.6 0.7 1.3 0.9 | 0.7
> =============+================================================+=========
>
> Each test result is based on the exeuction environment as follows.
>
> DRAM-only : redis-server uses only local DRAM memory.
> CXL-only : redis-server uses only CXL memory.
> default : default memory policy(MPOL_DEFAULT).
> numa balancing disabled.
> DAMON tiered: DAMON enabled with DAMOS_MIGRATE_COLD for DRAM nodes and
> DAMOS_MIGRATE_HOT for CXL nodes.
>
> The above result shows the "default" execution time goes up as the size
> of cold memory is increased from 440G to 500G because the more cold
> memory used, the more CXL memory is used for the target redis workload
> and this makes the execution time increase.
>
> However, "DAMON tiered" result shows less slowdown because the
> DAMOS_MIGRATE_COLD action at DRAM node proactively demotes pre-allocated
> cold memory to CXL node and this free space at DRAM increases more
> chance to allocate hot or warm pages of redis-server to fast DRAM node.
> Moreover, DAMOS_MIGRATE_HOT action at CXL node also promotes hot pages
> of redis-server to DRAM node actively.
>
> As a result, it makes more memory of redis-server stay in DRAM node
> compared to "default" memory policy and this makes the performance
> improvement.
>
> The following result of latest distribution workload shows similar data.
>
> 2. YCSB latest distribution read only workload
> memory pressure with cold memory on node0 with 512GB of local DRAM.
> =============+================================================+=========
> | cold memory occupied by mmap and memset |
> | 0G 440G 450G 460G 470G 480G 490G 500G |
> =============+================================================+=========
> Execution time normalized to DRAM-only values | GEOMEAN
> -------------+------------------------------------------------+---------
> DRAM-only | 1.00 - - - - - - - | 1.00
> CXL-only | 1.18 - - - - - - - | 1.18
> default | - 1.18 1.19 1.18 1.18 1.17 1.19 1.18 | 1.18
> DAMON tiered | - 1.04 1.04 1.04 1.05 1.04 1.05 1.05 | 1.04
> =============+================================================+=========
> CXL usage of redis-server in GB | AVERAGE
> -------------+------------------------------------------------+---------
> DRAM-only | 0.0 - - - - - - - | 0.0
> CXL-only | 52.6 - - - - - - - | 52.6
> default | - 20.5 27.1 33.2 39.5 45.5 50.4 50.5 | 38.1
> DAMON tiered | - 0.2 0.4 0.7 1.6 1.2 1.1 3.4 | 1.2
> =============+================================================+=========
>
> In summary of both results, our evaluation shows that "DAMON tiered"
> memory management reduces the performance slowdown compared to the
> "default" memory policy from 17~18% to 4~5% when the system runs with
> high memory pressure on its fast tier DRAM nodes.
>
> Having these DAMOS_MIGRATE_HOT and DAMOS_MIGRATE_COLD actions can make
> tiered memory systems run more efficiently under high memory pressures.
>
Hi,
It's hard to determine from your results whether the performance
mitigation is being caused primarily by MIGRATE_COLD freeing up space
for new allocations, or from some combination of HOT/COLD actions
occurring during execution but after the database has already been
warmed up.
Do you have test results which enable only DAMOS_MIGRATE_COLD actions
but not DAMOS_MIGRATE_HOT actions? (and vice versa)
The question I have is exactly how often is MIGRATE_HOT actually being
utilized, and how much data is being moved. Testing MIGRATE_COLD only
would at least give a rough approximation of that.
Additionally, do you have any data on workloads that exceed the capacity
of the DRAM tier? Here you say you have 512GB of local DRAM, but only
test a workload that caps out at 500G. Have you run a test of, say,
550GB to see the effect of DAMON HOT/COLD migration actions when DRAM
capacity is exceeded?
Can you also provide the DRAM-only results for each test? Presumably,
as workload size increases from 440G to 500G, the system probably starts
using some amount of swap/zswap/whatever. It would be good to know how
this system compares to swap small amounts of overflow.
~Gregory
^ permalink raw reply [flat|nested] 28+ messages in thread* Re: [RFC PATCH v3 0/7] DAMON based tiered memory management for CXL
2024-04-05 16:56 ` [RFC PATCH v3 0/7] DAMON based tiered memory management for CXL memory Gregory Price
@ 2024-04-08 13:41 ` Honggyu Kim
2024-04-09 9:59 ` Honggyu Kim
2024-04-10 0:00 ` Gregory Price
0 siblings, 2 replies; 28+ messages in thread
From: Honggyu Kim @ 2024-04-08 13:41 UTC (permalink / raw)
To: Gregory Price
Cc: sj, damon, linux-mm, akpm, apopple, baolin.wang, dave.jiang,
hyeongtak.ji, kernel_team, linmiaohe, linux-kernel,
linux-trace-kernel, mathieu.desnoyers, mhiramat, rakie.kim,
rostedt, surenb, yangx.jy, ying.huang, ziy, 42.hyeyoo,
art.jeongseob
Hi Gregory,
On Fri, 5 Apr 2024 12:56:14 -0400 Gregory Price <gregory.price@memverge.com> wrote:
> On Fri, Apr 05, 2024 at 03:08:49PM +0900, Honggyu Kim wrote:
> > There was an RFC IDEA "DAMOS-based Tiered-Memory Management" previously
> > posted at [1].
> >
> > 1. YCSB zipfian distribution read only workload
> > memory pressure with cold memory on node0 with 512GB of local DRAM.
> > =============+================================================+=========
> > | cold memory occupied by mmap and memset |
> > | 0G 440G 450G 460G 470G 480G 490G 500G |
> > =============+================================================+=========
> > Execution time normalized to DRAM-only values | GEOMEAN
> > -------------+------------------------------------------------+---------
> > DRAM-only | 1.00 - - - - - - - | 1.00
> > CXL-only | 1.22 - - - - - - - | 1.22
> > default | - 1.12 1.13 1.14 1.16 1.19 1.21 1.21 | 1.17
> > DAMON tiered | - 1.04 1.03 1.04 1.06 1.05 1.05 1.05 | 1.05
> > =============+================================================+=========
> > CXL usage of redis-server in GB | AVERAGE
> > -------------+------------------------------------------------+---------
> > DRAM-only | 0.0 - - - - - - - | 0.0
> > CXL-only | 52.6 - - - - - - - | 52.6
> > default | - 20.4 27.0 33.1 39.5 45.6 50.5 50.3 | 38.1
> > DAMON tiered | - 0.1 0.3 0.8 0.6 0.7 1.3 0.9 | 0.7
> > =============+================================================+=========
> >
> > Each test result is based on the exeuction environment as follows.
> >
> > DRAM-only : redis-server uses only local DRAM memory.
> > CXL-only : redis-server uses only CXL memory.
> > default : default memory policy(MPOL_DEFAULT).
> > numa balancing disabled.
> > DAMON tiered: DAMON enabled with DAMOS_MIGRATE_COLD for DRAM nodes and
> > DAMOS_MIGRATE_HOT for CXL nodes.
> >
> > The above result shows the "default" execution time goes up as the size
> > of cold memory is increased from 440G to 500G because the more cold
> > memory used, the more CXL memory is used for the target redis workload
> > and this makes the execution time increase.
> >
> > However, "DAMON tiered" result shows less slowdown because the
> > DAMOS_MIGRATE_COLD action at DRAM node proactively demotes pre-allocated
> > cold memory to CXL node and this free space at DRAM increases more
> > chance to allocate hot or warm pages of redis-server to fast DRAM node.
> > Moreover, DAMOS_MIGRATE_HOT action at CXL node also promotes hot pages
> > of redis-server to DRAM node actively.
> >
> > As a result, it makes more memory of redis-server stay in DRAM node
> > compared to "default" memory policy and this makes the performance
> > improvement.
> >
> > The following result of latest distribution workload shows similar data.
> >
> > 2. YCSB latest distribution read only workload
> > memory pressure with cold memory on node0 with 512GB of local DRAM.
> > =============+================================================+=========
> > | cold memory occupied by mmap and memset |
> > | 0G 440G 450G 460G 470G 480G 490G 500G |
> > =============+================================================+=========
> > Execution time normalized to DRAM-only values | GEOMEAN
> > -------------+------------------------------------------------+---------
> > DRAM-only | 1.00 - - - - - - - | 1.00
> > CXL-only | 1.18 - - - - - - - | 1.18
> > default | - 1.18 1.19 1.18 1.18 1.17 1.19 1.18 | 1.18
> > DAMON tiered | - 1.04 1.04 1.04 1.05 1.04 1.05 1.05 | 1.04
> > =============+================================================+=========
> > CXL usage of redis-server in GB | AVERAGE
> > -------------+------------------------------------------------+---------
> > DRAM-only | 0.0 - - - - - - - | 0.0
> > CXL-only | 52.6 - - - - - - - | 52.6
> > default | - 20.5 27.1 33.2 39.5 45.5 50.4 50.5 | 38.1
> > DAMON tiered | - 0.2 0.4 0.7 1.6 1.2 1.1 3.4 | 1.2
> > =============+================================================+=========
> >
> > In summary of both results, our evaluation shows that "DAMON tiered"
> > memory management reduces the performance slowdown compared to the
> > "default" memory policy from 17~18% to 4~5% when the system runs with
> > high memory pressure on its fast tier DRAM nodes.
> >
> > Having these DAMOS_MIGRATE_HOT and DAMOS_MIGRATE_COLD actions can make
> > tiered memory systems run more efficiently under high memory pressures.
> >
>
> Hi,
>
> It's hard to determine from your results whether the performance
> mitigation is being caused primarily by MIGRATE_COLD freeing up space
> for new allocations, or from some combination of HOT/COLD actions
> occurring during execution but after the database has already been
> warmed up.
Thanks for the question. I didn't include all the details for the
evaluation result, but this is a chance to share more in details.
I would say the mitigation comes from both. DAMOS_MIGRATE_COLD demotes
some cold data to CXL so redis can allocate more data on the fast DRAM
during launching time as the mmap+memset and redis launching takes
several minutes. But it also promotes some redis data while running.
> Do you have test results which enable only DAMOS_MIGRATE_COLD actions
> but not DAMOS_MIGRATE_HOT actions? (and vice versa)
>
> The question I have is exactly how often is MIGRATE_HOT actually being
> utilized, and how much data is being moved. Testing MIGRATE_COLD only
> would at least give a rough approximation of that.
To explain this, I better share more test results. In the section of
"Evaluation Workload", the test sequence can be summarized as follows.
*. "Turn on DAMON."
1. Allocate cold memory(mmap+memset) at DRAM node, then make the
process sleep.
2. Launch redis-server and load prebaked snapshot image, dump.rdb.
(85GB consumed: 52GB for anon and 33GB for file cache)
3. Run YCSB to make zipfian distribution of memory accesses to
redis-server, then measure execution time.
4. Repeat 4 over 50 times to measure the average execution time for
each run.
5. Increase the cold memory size then repeat goes to 2.
I didn't want to make the evaluation too long in the cover letter, but
I have also evaluated another senario, which lazyly enabled DAMON just
before YCSB run at step 4. I will call this test as "DAMON lazy". This
is missing part from the cover letter.
1. Allocate cold memory(mmap+memset) at DRAM node, then make the
process sleep.
2. Launch redis-server and load prebaked snapshot image, dump.rdb.
(85GB consumed: 52GB for anon and 33GB for file cache)
*. "Turn on DAMON."
4. Run YCSB to make zipfian distribution of memory accesses to
redis-server, then measure execution time.
5. Repeat 4 over 50 times to measure the average execution time for
each run.
6. Increase the cold memory size then repeat goes to 2.
In the "DAMON lazy" senario, DAMON started monitoring late so the
initial redis-server placement is same as "default", but started to
demote cold data and promote redis data just before YCSB run.
The full test result is as follows.
1. YCSB zipfian distribution read only workload
memory pressure with cold memory on node0 with 512GB of local DRAM.
=============+================================================+=========
| cold memory occupied by mmap and memset |
| 0G 440G 450G 460G 470G 480G 490G 500G |
=============+================================================+=========
Execution time normalized to DRAM-only values | GEOMEAN
-------------+------------------------------------------------+---------
DRAM-only | 1.00 - - - - - - - | 1.00
CXL-only | 1.22 - - - - - - - | 1.22
default | - 1.12 1.13 1.14 1.16 1.19 1.21 1.21 | 1.17
DAMON tiered | - 1.04 1.03 1.04 1.06 1.05 1.05 1.05 | 1.05
DAMON lazy | - 1.04 1.05 1.05 1.06 1.06 1.07 1.07 | 1.06
=============+================================================+=========
CXL usage of redis-server in GB | AVERAGE
-------------+------------------------------------------------+---------
DRAM-only | 0.0 - - - - - - - | 0.0
CXL-only | 52.6 - - - - - - - | 52.6
default | - 20.4 27.0 33.1 39.5 45.6 50.5 50.3 | 38.1
DAMON tiered | - 0.1 0.3 0.8 0.6 0.7 1.3 0.9 | 0.7
DAMON lazy | - 2.9 3.1 3.7 4.7 6.6 8.2 9.7 | 5.6
=============+================================================+=========
Migration size in GB by DAMOS_MIGRATE_COLD(demotion) and |
DAMOS_MIGRATE_HOT(promotion) | AVERAGE
-------------+------------------------------------------------+---------
DAMON tiered | |
- demotion | - 522 510 523 520 513 558 558 | 529
- promotion | - 0.1 1.3 6.2 8.1 7.2 22 17 | 8.8
DAMON lazy | |
- demotion | - 288 277 322 343 315 312 320 | 311
- promotion | - 33 44 41 55 73 89 101 | 5.6
=============+================================================+=========
I have included "DAMON lazy" result and also the migration size by new
DAMOS migrate actions. Please note that demotion size is way higher
than promotion because promotion target is only for redis data, but
demotion target includes huge cold memory allocated by mmap + memset.
(there could be some ping-pong issue though.)
As you mentioned, "DAMON tiered" case gets more benefit because new
redis allocations go to DRAM more than "default", but it also gets
benefit from promotion when it is under higher memory pressure as shown
in 490G and 500G cases. It promotes 22GB and 17GB of redis data to DRAM
from CXL.
In the case of "DAMON lazy", it shows more promotion size as expected
and it gets increases as memory pressure goes higher from left to right.
I will share "latest" workload result as well and it shows similar
tendency.
2. YCSB latest distribution read only workload
memory pressure with cold memory on node0 with 512GB of local DRAM.
=============+================================================+=========
| cold memory occupied by mmap and memset |
| 0G 440G 450G 460G 470G 480G 490G 500G |
=============+================================================+=========
Execution time normalized to DRAM-only values | GEOMEAN
-------------+------------------------------------------------+---------
DRAM-only | 1.00 - - - - - - - | 1.00
CXL-only | 1.18 - - - - - - - | 1.18
default | - 1.18 1.19 1.18 1.18 1.17 1.19 1.18 | 1.18
DAMON tiered | - 1.04 1.04 1.04 1.05 1.04 1.05 1.05 | 1.04
DAMON lazy | - 1.05 1.05 1.06 1.06 1.07 1.06 1.07 | 1.06
=============+================================================+=========
CXL usage of redis-server in GB | AVERAGE
-------------+------------------------------------------------+---------
DRAM-only | 0.0 - - - - - - - | 0.0
CXL-only | 52.6 - - - - - - - | 52.6
default | - 20.5 27.1 33.2 39.5 45.5 50.4 50.5 | 38.1
DAMON tiered | - 0.2 0.4 0.7 1.6 1.2 1.1 3.4 | 1.2
DAMON lazy | - 5.3 4.1 3.9 6.4 8.8 10.1 11.3 | 7.1
=============+================================================+=========
Migration size in GB by DAMOS_MIGRATE_COLD(demotion) and |
DAMOS_MIGRATE_HOT(promotion) | AVERAGE
-------------+------------------------------------------------+---------
DAMON tiered | |
- demotion | - 493 478 487 516 510 540 512 | 505
- promotion | - 0.1 0.2 8.2 5.6 4.0 5.9 29 | 7.5
DAMON lazy | |
- demotion | - 315 318 293 290 308 322 286 | 305
- promotion | - 36 45 38 56 74 91 99 | 63
=============+================================================+=========
> Additionally, do you have any data on workloads that exceed the capacity
> of the DRAM tier? Here you say you have 512GB of local DRAM, but only
> test a workload that caps out at 500G. Have you run a test of, say,
> 550GB to see the effect of DAMON HOT/COLD migration actions when DRAM
> capacity is exceeded?
I didn't want to remove DRAM from my server so kept using 512GB of DRAM,
but I couldn't make a single workload that consumes more than the DRAM
size.
I wanted to use more realistic workload rather than micro benchmarks.
And the core concept of this test is to cover realisitic senarios with
the system wide view. I think if the system has 512GB of local DRAM,
then it wouldn't be possible to make the entire 512GB of DRAM hot and
it'd have some amount of cold memory, which can be the target of
demotion. Then we can find some workload that is actively used and
promote it as much as possible. That's why I made the promotion policy
aggressively.
> Can you also provide the DRAM-only results for each test? Presumably,
> as workload size increases from 440G to 500G, the system probably starts
> using some amount of swap/zswap/whatever. It would be good to know how
> this system compares to swap small amounts of overflow.
It looks like my explanation doesn't correctly inform you. The size
from 440GB to 500GB is for pre allocated cold data to give memory
pressure on the system so that redis-server cannot be fully allocated at
fast DRAM, then partially allocated at CXL memory as well.
And my evaluation environment doesn't have swap space to focus on
migration rather than swap.
>
> ~Gregory
I hope my explanation is helpful for you to understand. Please let me
know if you have more questions.
Thanks,
Honggyu
^ permalink raw reply [flat|nested] 28+ messages in thread* Re: [RFC PATCH v3 0/7] DAMON based tiered memory management for CXL
2024-04-08 13:41 ` [RFC PATCH v3 0/7] DAMON based tiered memory management for CXL Honggyu Kim
@ 2024-04-09 9:59 ` Honggyu Kim
2024-04-10 0:00 ` Gregory Price
1 sibling, 0 replies; 28+ messages in thread
From: Honggyu Kim @ 2024-04-09 9:59 UTC (permalink / raw)
To: Honggyu Kim
Cc: sj, damon, linux-mm, akpm, apopple, baolin.wang, dave.jiang,
hyeongtak.ji, kernel_team, linmiaohe, linux-kernel,
linux-trace-kernel, mathieu.desnoyers, mhiramat, rakie.kim,
rostedt, surenb, yangx.jy, ying.huang, ziy, 42.hyeyoo,
art.jeongseob, Gregory Price
On Mon, 8 Apr 2024 22:41:04 +0900 Honggyu Kim <honggyu.kim@sk.com> wrote:
[...]
> To explain this, I better share more test results. In the section of
> "Evaluation Workload", the test sequence can be summarized as follows.
>
> *. "Turn on DAMON."
> 1. Allocate cold memory(mmap+memset) at DRAM node, then make the
> process sleep.
> 2. Launch redis-server and load prebaked snapshot image, dump.rdb.
> (85GB consumed: 52GB for anon and 33GB for file cache)
> 3. Run YCSB to make zipfian distribution of memory accesses to
> redis-server, then measure execution time.
> 4. Repeat 4 over 50 times to measure the average execution time for
> each run.
Sorry, "Repeat 4 over 50 times" is incorrect. This should be "Repeat 3
over 50 times".
> 5. Increase the cold memory size then repeat goes to 2.
>
> I didn't want to make the evaluation too long in the cover letter, but
> I have also evaluated another senario, which lazyly enabled DAMON just
> before YCSB run at step 4. I will call this test as "DAMON lazy". This
> is missing part from the cover letter.
>
> 1. Allocate cold memory(mmap+memset) at DRAM node, then make the
> process sleep.
> 2. Launch redis-server and load prebaked snapshot image, dump.rdb.
> (85GB consumed: 52GB for anon and 33GB for file cache)
> *. "Turn on DAMON."
> 4. Run YCSB to make zipfian distribution of memory accesses to
> redis-server, then measure execution time.
> 5. Repeat 4 over 50 times to measure the average execution time for
> each run.
> 6. Increase the cold memory size then repeat goes to 2.
>
> In the "DAMON lazy" senario, DAMON started monitoring late so the
> initial redis-server placement is same as "default", but started to
> demote cold data and promote redis data just before YCSB run.
[...]
Thanks,
Honggyu
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC PATCH v3 0/7] DAMON based tiered memory management for CXL
2024-04-08 13:41 ` [RFC PATCH v3 0/7] DAMON based tiered memory management for CXL Honggyu Kim
2024-04-09 9:59 ` Honggyu Kim
@ 2024-04-10 0:00 ` Gregory Price
1 sibling, 0 replies; 28+ messages in thread
From: Gregory Price @ 2024-04-10 0:00 UTC (permalink / raw)
To: Honggyu Kim
Cc: sj, damon, linux-mm, akpm, apopple, baolin.wang, dave.jiang,
hyeongtak.ji, kernel_team, linmiaohe, linux-kernel,
linux-trace-kernel, mathieu.desnoyers, mhiramat, rakie.kim,
rostedt, surenb, yangx.jy, ying.huang, ziy, 42.hyeyoo,
art.jeongseob
On Mon, Apr 08, 2024 at 10:41:04PM +0900, Honggyu Kim wrote:
> Hi Gregory,
>
> On Fri, 5 Apr 2024 12:56:14 -0400 Gregory Price <gregory.price@memverge.com> wrote:
> > Do you have test results which enable only DAMOS_MIGRATE_COLD actions
> > but not DAMOS_MIGRATE_HOT actions? (and vice versa)
> >
> > The question I have is exactly how often is MIGRATE_HOT actually being
> > utilized, and how much data is being moved. Testing MIGRATE_COLD only
> > would at least give a rough approximation of that.
>
> To explain this, I better share more test results. In the section of
> "Evaluation Workload", the test sequence can be summarized as follows.
>
> *. "Turn on DAMON."
> 1. Allocate cold memory(mmap+memset) at DRAM node, then make the
> process sleep.
> 2. Launch redis-server and load prebaked snapshot image, dump.rdb.
> (85GB consumed: 52GB for anon and 33GB for file cache)
Aha! I see now, you are allocating memory to ensure the real workload
(redis-server) pressures the DRAM tier and causes "spillage" to the CXL
tier, and then measure the overhead in different scenarios.
I would still love to know what the result of a demote-only system would
produce, mosty because it would very clearly demonstrate the value of
the demote+promote system when the system is memory-pressured.
Given the additional results below, it shows a demote-only system would
likely trend toward CXL-only, and so this shows an affirmative support
for the promotion logic.
Just another datum that is useful and paints a more complete picture.
> I didn't want to make the evaluation too long in the cover letter, but
> I have also evaluated another senario, which lazyly enabled DAMON just
> before YCSB run at step 4. I will call this test as "DAMON lazy". This
> is missing part from the cover letter.
>
> 1. Allocate cold memory(mmap+memset) at DRAM node, then make the
> process sleep.
> 2. Launch redis-server and load prebaked snapshot image, dump.rdb.
> (85GB consumed: 52GB for anon and 33GB for file cache)
> *. "Turn on DAMON."
>
> In the "DAMON lazy" senario, DAMON started monitoring late so the
> initial redis-server placement is same as "default", but started to
> demote cold data and promote redis data just before YCSB run.
>
This is excellent and definitely demonstrates part of the picture I was
alluding to, thank you for the additional data!
>
> I have included "DAMON lazy" result and also the migration size by new
> DAMOS migrate actions. Please note that demotion size is way higher
> than promotion because promotion target is only for redis data, but
> demotion target includes huge cold memory allocated by mmap + memset.
> (there could be some ping-pong issue though.)
>
> As you mentioned, "DAMON tiered" case gets more benefit because new
> redis allocations go to DRAM more than "default", but it also gets
> benefit from promotion when it is under higher memory pressure as shown
> in 490G and 500G cases. It promotes 22GB and 17GB of redis data to DRAM
> from CXL.
I think a better way of saying this is that "DAMON tiered" more
effectively mitigates the effect of memory-pressure on faster tier
before spillage occurs, while "DAMON lazy" demonstrates the expected
performance of the system after memory pressure outruns the demotion
logic, so you wind up with hot data stuck in the slow tier.
There are some out there that would simply say "just demote more
aggressively", so this is useful information for the discussion.
+/- ~2% despite greater meomry migration is an excellent result
> > Can you also provide the DRAM-only results for each test? Presumably,
> > as workload size increases from 440G to 500G, the system probably starts
> > using some amount of swap/zswap/whatever. It would be good to know how
> > this system compares to swap small amounts of overflow.
>
> It looks like my explanation doesn't correctly inform you. The size
> from 440GB to 500GB is for pre allocated cold data to give memory
> pressure on the system so that redis-server cannot be fully allocated at
> fast DRAM, then partially allocated at CXL memory as well.
>
Yes, sorry for the misunderstanding. This makes it much clearer.
>
> I hope my explanation is helpful for you to understand. Please let me
> know if you have more questions.
>
Excellent work, exciting results! Thank you for the additional answers
:]
~Gregory
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC PATCH v3 0/7] DAMON based tiered memory management for CXL memory
2024-04-05 6:08 [RFC PATCH v3 0/7] DAMON based tiered memory management for CXL memory Honggyu Kim
` (7 preceding siblings ...)
2024-04-05 16:56 ` [RFC PATCH v3 0/7] DAMON based tiered memory management for CXL memory Gregory Price
@ 2024-04-05 19:28 ` SeongJae Park
2024-04-08 10:56 ` Honggyu Kim
8 siblings, 1 reply; 28+ messages in thread
From: SeongJae Park @ 2024-04-05 19:28 UTC (permalink / raw)
To: Honggyu Kim
Cc: SeongJae Park, damon, linux-mm, akpm, apopple, baolin.wang,
dave.jiang, hyeongtak.ji, kernel_team, linmiaohe, linux-kernel,
linux-trace-kernel, mathieu.desnoyers, mhiramat, rakie.kim,
rostedt, surenb, yangx.jy, ying.huang, ziy, 42.hyeyoo,
art.jeongseob
Hello Honggyu,
On Fri, 5 Apr 2024 15:08:49 +0900 Honggyu Kim <honggyu.kim@sk.com> wrote:
> There was an RFC IDEA "DAMOS-based Tiered-Memory Management" previously
> posted at [1].
>
> It says there is no implementation of the demote/promote DAMOS action
> are made. This RFC is about its implementation for physical address
> space.
>
>
> Changes from RFC v2:
> 1. Rename DAMOS_{PROMOTE,DEMOTE} actions to DAMOS_MIGRATE_{HOT,COLD}.
> 2. Create 'target_nid' to set the migration target node instead of
> depending on node distance based information.
> 3. Instead of having page level access check in this patch series,
> delegate the job to a new DAMOS filter type YOUNG[2].
> 4. Introduce vmstat counters "damon_migrate_{hot,cold}".
> 5. Rebase from v6.7 to v6.8.
Thank you for patiently keeping discussion and making this great version! I
left comments on each patch, but found no special concerns. Per-page access
recheck for MIGRATE_HOT and vmstat change are taking my eyes, though. I doubt
if those really needed. It would be nice if you could answer to the comments.
Once my comments on this version are addressed, I would have no reason to
object at dropping the RFC tag from this patchset.
Nonetheless, I show some warnings and errors from checkpatch.pl. I don't
really care about those for RFC patches, so no problem at all. But if you
agree to my opinion about RFC tag dropping, and therefore if you will send next
version without RFC tag, please make sure you also run checkpatch.pl before
posting.
Thanks,
SJ
[...]
^ permalink raw reply [flat|nested] 28+ messages in thread* Re: [RFC PATCH v3 0/7] DAMON based tiered memory management for CXL memory
2024-04-05 19:28 ` [RFC PATCH v3 0/7] DAMON based tiered memory management for CXL memory SeongJae Park
@ 2024-04-08 10:56 ` Honggyu Kim
0 siblings, 0 replies; 28+ messages in thread
From: Honggyu Kim @ 2024-04-08 10:56 UTC (permalink / raw)
To: SeongJae Park
Cc: damon, linux-mm, akpm, apopple, baolin.wang, dave.jiang,
hyeongtak.ji, kernel_team, linmiaohe, linux-kernel,
linux-trace-kernel, mathieu.desnoyers, mhiramat, rakie.kim,
rostedt, surenb, yangx.jy, ying.huang, ziy, 42.hyeyoo,
art.jeongseob
Hi SeongJae,
On Fri, 5 Apr 2024 12:28:00 -0700 SeongJae Park <sj@kernel.org> wrote:
> Hello Honggyu,
>
> On Fri, 5 Apr 2024 15:08:49 +0900 Honggyu Kim <honggyu.kim@sk.com> wrote:
>
> > There was an RFC IDEA "DAMOS-based Tiered-Memory Management" previously
> > posted at [1].
> >
> > It says there is no implementation of the demote/promote DAMOS action
> > are made. This RFC is about its implementation for physical address
> > space.
> >
> >
> > Changes from RFC v2:
> > 1. Rename DAMOS_{PROMOTE,DEMOTE} actions to DAMOS_MIGRATE_{HOT,COLD}.
> > 2. Create 'target_nid' to set the migration target node instead of
> > depending on node distance based information.
> > 3. Instead of having page level access check in this patch series,
> > delegate the job to a new DAMOS filter type YOUNG[2].
> > 4. Introduce vmstat counters "damon_migrate_{hot,cold}".
> > 5. Rebase from v6.7 to v6.8.
>
> Thank you for patiently keeping discussion and making this great version! I
> left comments on each patch, but found no special concerns. Per-page access
> recheck for MIGRATE_HOT and vmstat change are taking my eyes, though. I doubt
> if those really needed. It would be nice if you could answer to the comments.
I will answer them where you made the comments.
> Once my comments on this version are addressed, I would have no reason to
> object at dropping the RFC tag from this patchset.
Thanks. I will drop the RFC after addressing your comments.
> Nonetheless, I show some warnings and errors from checkpatch.pl. I don't
> really care about those for RFC patches, so no problem at all. But if you
> agree to my opinion about RFC tag dropping, and therefore if you will send next
> version without RFC tag, please make sure you also run checkpatch.pl before
> posting.
Sure. I will run checkpatch.pl from the next revision.
Thanks,
Honggyu
>
> Thanks,
> SJ
>
> [...]
^ permalink raw reply [flat|nested] 28+ messages in thread