public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] Demotion cleanup and fixes
@ 2026-03-11 11:02 Alexandre Ghiti
  2026-03-11 11:02 ` [PATCH 1/4] mm: Move demotion related functions in memory-tiers.c Alexandre Ghiti
                   ` (3 more replies)
  0 siblings, 4 replies; 22+ messages in thread
From: Alexandre Ghiti @ 2026-03-11 11:02 UTC (permalink / raw)
  To: akpm
  Cc: alexghiti, kernel-team, akinobu.mita, david, lorenzo.stoakes,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, hannes, zhengqi.arch,
	shakeel.butt, axelrasmussen, yuanchu, weixugc, gourry, apopple,
	byungchul, joshua.hahnjy, matthew.brost, rakie.kim, ying.huang,
	ziy, linux-mm, linux-kernel, Alexandre Ghiti

Small series that follows up the discussion at [1]. Note that the
initial issue reported there is not fixed in this series.

[1] https://lore.kernel.org/linux-mm/20260113081453.8293-1-akinobu.mita@gmail.com/

Alexandre Ghiti (4):
  mm: Move demotion related functions in memory-tiers.c
  mm: Rename node_get_allowed_targets() to make it more explicit
  mm: Fix demotion gfp by clearing GFP_RECLAIM after setting
    GFP_TRANSHUGE
  mm: Fix demotion gfp by preserving initial gfp reclaim policy

 include/linux/memory-tiers.h | 24 +++++++++--
 mm/memory-tiers.c            | 77 ++++++++++++++++++++++++++++++++-
 mm/migrate.c                 |  8 ++--
 mm/vmscan.c                  | 82 +-----------------------------------
 4 files changed, 104 insertions(+), 87 deletions(-)

-- 
2.53.0


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/4] mm: Move demotion related functions in memory-tiers.c
  2026-03-11 11:02 [PATCH 0/4] Demotion cleanup and fixes Alexandre Ghiti
@ 2026-03-11 11:02 ` Alexandre Ghiti
  2026-03-11 14:55   ` Joshua Hahn
                     ` (2 more replies)
  2026-03-11 11:02 ` [PATCH 2/4] mm: Rename node_get_allowed_targets() to make it more explicit Alexandre Ghiti
                   ` (2 subsequent siblings)
  3 siblings, 3 replies; 22+ messages in thread
From: Alexandre Ghiti @ 2026-03-11 11:02 UTC (permalink / raw)
  To: akpm
  Cc: alexghiti, kernel-team, akinobu.mita, david, lorenzo.stoakes,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, hannes, zhengqi.arch,
	shakeel.butt, axelrasmussen, yuanchu, weixugc, gourry, apopple,
	byungchul, joshua.hahnjy, matthew.brost, rakie.kim, ying.huang,
	ziy, linux-mm, linux-kernel, Alexandre Ghiti

Let's have all the demotion functions in this file, no functional
change intended.

Suggested-by: Gregory Price <gourry@gourry.net>
Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
---
 include/linux/memory-tiers.h | 18 ++++++++
 mm/memory-tiers.c            | 75 +++++++++++++++++++++++++++++++++
 mm/vmscan.c                  | 80 +-----------------------------------
 3 files changed, 94 insertions(+), 79 deletions(-)

diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h
index 96987d9d95a8..0bf0d002939e 100644
--- a/include/linux/memory-tiers.h
+++ b/include/linux/memory-tiers.h
@@ -56,6 +56,9 @@ void mt_put_memory_types(struct list_head *memory_types);
 int next_demotion_node(int node, const nodemask_t *allowed_mask);
 void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets);
 bool node_is_toptier(int node);
+unsigned int mt_demote_folios(struct list_head *demote_folios,
+			      struct pglist_data *pgdat,
+			      struct mem_cgroup *memcg);
 #else
 static inline int next_demotion_node(int node, const nodemask_t *allowed_mask)
 {
@@ -71,6 +74,14 @@ static inline bool node_is_toptier(int node)
 {
 	return true;
 }
+
+static inline unsigned int mt_demote_folios(struct list_head *demote_folios,
+					    struct pglist_data *pgdat,
+					    struct mem_cgroup *memcg)
+{
+	return 0;
+}
+
 #endif
 
 #else
@@ -116,6 +127,13 @@ static inline bool node_is_toptier(int node)
 	return true;
 }
 
+static inline unsigned int mt_demote_folios(struct list_head *demote_folios,
+					    struct pglist_data *pgdat,
+					    struct mem_cgroup *memcg)
+{
+	return 0;
+}
+
 static inline int register_mt_adistance_algorithm(struct notifier_block *nb)
 {
 	return 0;
diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
index 986f809376eb..afdf21738a54 100644
--- a/mm/memory-tiers.c
+++ b/mm/memory-tiers.c
@@ -7,6 +7,7 @@
 #include <linux/memory-tiers.h>
 #include <linux/notifier.h>
 #include <linux/sched/sysctl.h>
+#include <linux/migrate.h>
 
 #include "internal.h"
 
@@ -373,6 +374,80 @@ int next_demotion_node(int node, const nodemask_t *allowed_mask)
 	return find_next_best_node(node, &mask);
 }
 
+static struct folio *alloc_demote_folio(struct folio *src,
+					unsigned long private)
+{
+	struct folio *dst;
+	nodemask_t *allowed_mask;
+	struct migration_target_control *mtc;
+
+	mtc = (struct migration_target_control *)private;
+
+	allowed_mask = mtc->nmask;
+	/*
+	 * make sure we allocate from the target node first also trying to
+	 * demote or reclaim pages from the target node via kswapd if we are
+	 * low on free memory on target node. If we don't do this and if
+	 * we have free memory on the slower(lower) memtier, we would start
+	 * allocating pages from slower(lower) memory tiers without even forcing
+	 * a demotion of cold pages from the target memtier. This can result
+	 * in the kernel placing hot pages in slower(lower) memory tiers.
+	 */
+	mtc->nmask = NULL;
+	mtc->gfp_mask |= __GFP_THISNODE;
+	dst = alloc_migration_target(src, (unsigned long)mtc);
+	if (dst)
+		return dst;
+
+	mtc->gfp_mask &= ~__GFP_THISNODE;
+	mtc->nmask = allowed_mask;
+
+	return alloc_migration_target(src, (unsigned long)mtc);
+}
+
+unsigned int mt_demote_folios(struct list_head *demote_folios,
+			      struct pglist_data *pgdat,
+			      struct mem_cgroup *memcg)
+{
+	int target_nid;
+	unsigned int nr_succeeded;
+	nodemask_t allowed_mask;
+
+	struct migration_target_control mtc = {
+		/*
+		 * Allocate from 'node', or fail quickly and quietly.
+		 * When this happens, 'page' will likely just be discarded
+		 * instead of migrated.
+		 */
+		.gfp_mask = (GFP_HIGHUSER_MOVABLE & ~__GFP_RECLAIM) |
+			__GFP_NOMEMALLOC | GFP_NOWAIT,
+		.nmask = &allowed_mask,
+		.reason = MR_DEMOTION,
+	};
+
+	if (list_empty(demote_folios))
+		return 0;
+
+	node_get_allowed_targets(pgdat, &allowed_mask);
+	mem_cgroup_node_filter_allowed(memcg, &allowed_mask);
+	if (nodes_empty(allowed_mask))
+		return 0;
+
+	target_nid = next_demotion_node(pgdat->node_id, &allowed_mask);
+	if (target_nid == NUMA_NO_NODE)
+		/* No lower-tier nodes or nodes were hot-unplugged. */
+		return 0;
+
+	mtc.nid = target_nid;
+
+	/* Demotion ignores all cpuset and mempolicy settings */
+	migrate_pages(demote_folios, alloc_demote_folio, NULL,
+			(unsigned long)&mtc, MIGRATE_ASYNC, MR_DEMOTION,
+			&nr_succeeded);
+
+	return nr_succeeded;
+}
+
 static void disable_all_demotion_targets(void)
 {
 	struct memory_tier *memtier;
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 0fc9373e8251..5e0138b94480 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -983,84 +983,6 @@ static void folio_check_dirty_writeback(struct folio *folio,
 		mapping->a_ops->is_dirty_writeback(folio, dirty, writeback);
 }
 
-static struct folio *alloc_demote_folio(struct folio *src,
-		unsigned long private)
-{
-	struct folio *dst;
-	nodemask_t *allowed_mask;
-	struct migration_target_control *mtc;
-
-	mtc = (struct migration_target_control *)private;
-
-	allowed_mask = mtc->nmask;
-	/*
-	 * make sure we allocate from the target node first also trying to
-	 * demote or reclaim pages from the target node via kswapd if we are
-	 * low on free memory on target node. If we don't do this and if
-	 * we have free memory on the slower(lower) memtier, we would start
-	 * allocating pages from slower(lower) memory tiers without even forcing
-	 * a demotion of cold pages from the target memtier. This can result
-	 * in the kernel placing hot pages in slower(lower) memory tiers.
-	 */
-	mtc->nmask = NULL;
-	mtc->gfp_mask |= __GFP_THISNODE;
-	dst = alloc_migration_target(src, (unsigned long)mtc);
-	if (dst)
-		return dst;
-
-	mtc->gfp_mask &= ~__GFP_THISNODE;
-	mtc->nmask = allowed_mask;
-
-	return alloc_migration_target(src, (unsigned long)mtc);
-}
-
-/*
- * Take folios on @demote_folios and attempt to demote them to another node.
- * Folios which are not demoted are left on @demote_folios.
- */
-static unsigned int demote_folio_list(struct list_head *demote_folios,
-				      struct pglist_data *pgdat,
-				      struct mem_cgroup *memcg)
-{
-	int target_nid;
-	unsigned int nr_succeeded;
-	nodemask_t allowed_mask;
-
-	struct migration_target_control mtc = {
-		/*
-		 * Allocate from 'node', or fail quickly and quietly.
-		 * When this happens, 'page' will likely just be discarded
-		 * instead of migrated.
-		 */
-		.gfp_mask = (GFP_HIGHUSER_MOVABLE & ~__GFP_RECLAIM) |
-			__GFP_NOMEMALLOC | GFP_NOWAIT,
-		.nmask = &allowed_mask,
-		.reason = MR_DEMOTION,
-	};
-
-	if (list_empty(demote_folios))
-		return 0;
-
-	node_get_allowed_targets(pgdat, &allowed_mask);
-	mem_cgroup_node_filter_allowed(memcg, &allowed_mask);
-	if (nodes_empty(allowed_mask))
-		return 0;
-
-	target_nid = next_demotion_node(pgdat->node_id, &allowed_mask);
-	if (target_nid == NUMA_NO_NODE)
-		/* No lower-tier nodes or nodes were hot-unplugged. */
-		return 0;
-
-	mtc.nid = target_nid;
-
-	/* Demotion ignores all cpuset and mempolicy settings */
-	migrate_pages(demote_folios, alloc_demote_folio, NULL,
-		      (unsigned long)&mtc, MIGRATE_ASYNC, MR_DEMOTION,
-		      &nr_succeeded);
-
-	return nr_succeeded;
-}
-
 static bool may_enter_fs(struct folio *folio, gfp_t gfp_mask)
 {
 	if (gfp_mask & __GFP_FS)
@@ -1573,7 +1495,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
 	/* 'folio_list' is always empty here */
 
 	/* Migrate folios selected for demotion */
-	nr_demoted = demote_folio_list(&demote_folios, pgdat, memcg);
+	nr_demoted = mt_demote_folios(&demote_folios, pgdat, memcg);
 	nr_reclaimed += nr_demoted;
 	stat->nr_demoted += nr_demoted;
 	/* Folios that could not be demoted are still in @demote_folios */
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/4] mm: Rename node_get_allowed_targets() to make it more explicit
  2026-03-11 11:02 [PATCH 0/4] Demotion cleanup and fixes Alexandre Ghiti
  2026-03-11 11:02 ` [PATCH 1/4] mm: Move demotion related functions in memory-tiers.c Alexandre Ghiti
@ 2026-03-11 11:02 ` Alexandre Ghiti
  2026-03-11 15:02   ` Joshua Hahn
                     ` (2 more replies)
  2026-03-11 11:02 ` [PATCH 3/4] mm: Fix demotion gfp by clearing GFP_RECLAIM after setting GFP_TRANSHUGE Alexandre Ghiti
  2026-03-11 11:02 ` [PATCH 4/4] mm: Fix demotion gfp by preserving initial gfp reclaim policy Alexandre Ghiti
  3 siblings, 3 replies; 22+ messages in thread
From: Alexandre Ghiti @ 2026-03-11 11:02 UTC (permalink / raw)
  To: akpm
  Cc: alexghiti, kernel-team, akinobu.mita, david, lorenzo.stoakes,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, hannes, zhengqi.arch,
	shakeel.butt, axelrasmussen, yuanchu, weixugc, gourry, apopple,
	byungchul, joshua.hahnjy, matthew.brost, rakie.kim, ying.huang,
	ziy, linux-mm, linux-kernel, Alexandre Ghiti

This function actually returns the tier nodes that are targeted during a
demotion, so rename it to be more explicit.

No functional change intended.

Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
---
 include/linux/memory-tiers.h | 6 +++---
 mm/memory-tiers.c            | 4 ++--
 mm/vmscan.c                  | 2 +-
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h
index 0bf0d002939e..ec39dc3c39e6 100644
--- a/include/linux/memory-tiers.h
+++ b/include/linux/memory-tiers.h
@@ -54,7 +54,7 @@ struct memory_dev_type *mt_find_alloc_memory_type(int adist,
 void mt_put_memory_types(struct list_head *memory_types);
 #ifdef CONFIG_MIGRATION
 int next_demotion_node(int node, const nodemask_t *allowed_mask);
-void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets);
+void node_get_allowed_demotion_targets(pg_data_t *pgdat, nodemask_t *targets);
 bool node_is_toptier(int node);
 unsigned int mt_demote_folios(struct list_head *demote_folios,
 			      struct pglist_data *pgdat,
@@ -65,7 +65,7 @@ static inline int next_demotion_node(int node, const nodemask_t *allowed_mask)
 	return NUMA_NO_NODE;
 }
 
-static inline void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets)
+static inline void node_get_allowed_demotion_targets(pg_data_t *pgdat, nodemask_t *targets)
 {
 	*targets = NODE_MASK_NONE;
 }
@@ -117,7 +117,7 @@ static inline int next_demotion_node(int node, const nodemask_t *allowed_mask)
 	return NUMA_NO_NODE;
 }
 
-static inline void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets)
+static inline void node_get_allowed_demotion_targets(pg_data_t *pgdat, nodemask_t *targets)
 {
 	*targets = NODE_MASK_NONE;
 }
diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
index afdf21738a54..19ecc9b6bbda 100644
--- a/mm/memory-tiers.c
+++ b/mm/memory-tiers.c
@@ -300,7 +300,7 @@ bool node_is_toptier(int node)
 	return toptier;
 }
 
-void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets)
+void node_get_allowed_demotion_targets(pg_data_t *pgdat, nodemask_t *targets)
 {
 	struct memory_tier *memtier;
 
@@ -428,7 +428,7 @@ unsigned int mt_demote_folios(struct list_head *demote_folios,
 	if (list_empty(demote_folios))
 		return 0;
 
-	node_get_allowed_targets(pgdat, &allowed_mask);
+	node_get_allowed_demotion_targets(pgdat, &allowed_mask);
 	mem_cgroup_node_filter_allowed(memcg, &allowed_mask);
 	if (nodes_empty(allowed_mask))
 		return 0;
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 5e0138b94480..11a97ee8f583 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -351,7 +351,7 @@ static bool can_demote(int nid, struct scan_control *sc,
 	if (sc && sc->no_demotion)
 		return false;
 
-	node_get_allowed_targets(pgdat, &allowed_mask);
+	node_get_allowed_demotion_targets(pgdat, &allowed_mask);
 	if (nodes_empty(allowed_mask))
 		return false;
 
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 3/4] mm: Fix demotion gfp by clearing GFP_RECLAIM after setting GFP_TRANSHUGE
  2026-03-11 11:02 [PATCH 0/4] Demotion cleanup and fixes Alexandre Ghiti
  2026-03-11 11:02 ` [PATCH 1/4] mm: Move demotion related functions in memory-tiers.c Alexandre Ghiti
  2026-03-11 11:02 ` [PATCH 2/4] mm: Rename node_get_allowed_targets() to make it more explicit Alexandre Ghiti
@ 2026-03-11 11:02 ` Alexandre Ghiti
  2026-03-11 17:06   ` Andrew Morton
  2026-03-11 17:54   ` Johannes Weiner
  2026-03-11 11:02 ` [PATCH 4/4] mm: Fix demotion gfp by preserving initial gfp reclaim policy Alexandre Ghiti
  3 siblings, 2 replies; 22+ messages in thread
From: Alexandre Ghiti @ 2026-03-11 11:02 UTC (permalink / raw)
  To: akpm
  Cc: alexghiti, kernel-team, akinobu.mita, david, lorenzo.stoakes,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, hannes, zhengqi.arch,
	shakeel.butt, axelrasmussen, yuanchu, weixugc, gourry, apopple,
	byungchul, joshua.hahnjy, matthew.brost, rakie.kim, ying.huang,
	ziy, linux-mm, linux-kernel, Alexandre Ghiti, Bing Jiao, stable

GFP_TRANSHUGE sets __GFP_DIRECT_RECLAIM so we must clear GFP_RECLAIM
after, not before.

Reported-by: Bing Jiao <bingjiao@google.com>
Closes: https://lore.kernel.org/linux-mm/aXlKOxGGI9zne8sl@google.com/
Fixes: 9933a0c8a539 ("mm/migrate: clear __GFP_RECLAIM to make the migration callback consistent with regular THP allocations")
Cc: stable@vger.kernel.org
Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
---
 mm/migrate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 2c3d489ecf51..ee533a4d38db 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2190,12 +2190,12 @@ struct folio *alloc_migration_target(struct folio *src, unsigned long private)
 	}
 
 	if (folio_test_large(src)) {
+		gfp_mask |= GFP_TRANSHUGE;
 		/*
 		 * clear __GFP_RECLAIM to make the migration callback
 		 * consistent with regular THP allocations.
 		 */
 		gfp_mask &= ~__GFP_RECLAIM;
-		gfp_mask |= GFP_TRANSHUGE;
 		order = folio_order(src);
 	}
 	zidx = folio_zonenum(src);
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 4/4] mm: Fix demotion gfp by preserving initial gfp reclaim policy
  2026-03-11 11:02 [PATCH 0/4] Demotion cleanup and fixes Alexandre Ghiti
                   ` (2 preceding siblings ...)
  2026-03-11 11:02 ` [PATCH 3/4] mm: Fix demotion gfp by clearing GFP_RECLAIM after setting GFP_TRANSHUGE Alexandre Ghiti
@ 2026-03-11 11:02 ` Alexandre Ghiti
  3 siblings, 0 replies; 22+ messages in thread
From: Alexandre Ghiti @ 2026-03-11 11:02 UTC (permalink / raw)
  To: akpm
  Cc: alexghiti, kernel-team, akinobu.mita, david, lorenzo.stoakes,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, hannes, zhengqi.arch,
	shakeel.butt, axelrasmussen, yuanchu, weixugc, gourry, apopple,
	byungchul, joshua.hahnjy, matthew.brost, rakie.kim, ying.huang,
	ziy, linux-mm, linux-kernel, Alexandre Ghiti, stable

When the src folio is a hugetlb page, htlb_modify_alloc_mask() will
unconditionally enable reclaim. But we have to preserve initial gfp
flags which, in the case of demotion, prevent direct reclaim.

Reported-by: Gregory Price <gourry@gourry.net>
Closes: https://lore.kernel.org/linux-mm/aXkfBF5bdnTZ7t7e@gourry-fedora-PF4VCD3F/
Fixes: 19fc7bed252c ("mm/migrate: introduce a standard migration target allocation function")
Cc: stable@vger.kernel.org
Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
---
 mm/migrate.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index ee533a4d38db..d44a34d37007 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2169,13 +2169,13 @@ int migrate_pages(struct list_head *from, new_folio_t get_new_folio,
 struct folio *alloc_migration_target(struct folio *src, unsigned long private)
 {
 	struct migration_target_control *mtc;
-	gfp_t gfp_mask;
+	gfp_t gfp_mask, gfp_entry;
 	unsigned int order = 0;
 	int nid;
 	enum zone_type zidx;
 
 	mtc = (struct migration_target_control *)private;
-	gfp_mask = mtc->gfp_mask;
+	gfp_mask = gfp_entry = mtc->gfp_mask;
 	nid = mtc->nid;
 	if (nid == NUMA_NO_NODE)
 		nid = folio_nid(src);
@@ -2184,6 +2184,8 @@ struct folio *alloc_migration_target(struct folio *src, unsigned long private)
 		struct hstate *h = folio_hstate(src);
 
 		gfp_mask = htlb_modify_alloc_mask(h, gfp_mask);
+		gfp_mask = (gfp_mask & ~__GFP_RECLAIM) | (gfp_entry & __GFP_RECLAIM);
+
 		return alloc_hugetlb_folio_nodemask(h, nid,
 						mtc->nmask, gfp_mask,
 						htlb_allow_alloc_fallback(mtc->reason));
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/4] mm: Move demotion related functions in memory-tiers.c
  2026-03-11 11:02 ` [PATCH 1/4] mm: Move demotion related functions in memory-tiers.c Alexandre Ghiti
@ 2026-03-11 14:55   ` Joshua Hahn
  2026-03-13 13:33     ` Alexandre Ghiti
  2026-03-12  8:44   ` Donet Tom
  2026-03-12 12:56   ` David Hildenbrand (Arm)
  2 siblings, 1 reply; 22+ messages in thread
From: Joshua Hahn @ 2026-03-11 14:55 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: akpm, alexghiti, kernel-team, akinobu.mita, david,
	lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
	hannes, zhengqi.arch, shakeel.butt, axelrasmussen, yuanchu,
	weixugc, gourry, apopple, byungchul, joshua.hahnjy, matthew.brost,
	rakie.kim, ying.huang, ziy, linux-mm, linux-kernel

On Wed, 11 Mar 2026 12:02:40 +0100 Alexandre Ghiti <alex@ghiti.fr> wrote:

> Let's have all the demotion functions in this file, no functional
> change intended.

Hi Alexandre,

I hope you are doing well! Thank you for the patch.

Makes sense to move the migration functions together. Just one small
nit, I think the following comment is pretty helpful in understanding
that folios that aren't demoted still remain in @demote_folios. Should
we also move this comment to memory-tiers.c?

[...snip...]

> -/*
> - * Take folios on @demote_folios and attempt to demote them to another node.
> - * Folios which are not demoted are left on @demote_folios.
> - */
> -static unsigned int demote_folio_list(struct list_head *demote_folios,
> -				      struct pglist_data *pgdat,
> -				      struct mem_cgroup *memcg)
> -{
> -	int target_nid;
> -	unsigned int nr_succeeded;
> -	nodemask_t allowed_mask;
> -

[...snip...]

Anyways, the rest looks good to me. Have a great day!
Joshua

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/4] mm: Rename node_get_allowed_targets() to make it more explicit
  2026-03-11 11:02 ` [PATCH 2/4] mm: Rename node_get_allowed_targets() to make it more explicit Alexandre Ghiti
@ 2026-03-11 15:02   ` Joshua Hahn
  2026-03-12  5:28   ` Byungchul Park
  2026-03-12  8:46   ` Donet Tom
  2 siblings, 0 replies; 22+ messages in thread
From: Joshua Hahn @ 2026-03-11 15:02 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: akpm, alexghiti, kernel-team, akinobu.mita, david,
	lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
	hannes, zhengqi.arch, shakeel.butt, axelrasmussen, yuanchu,
	weixugc, gourry, apopple, byungchul, joshua.hahnjy, matthew.brost,
	rakie.kim, ying.huang, ziy, linux-mm, linux-kernel

On Wed, 11 Mar 2026 12:02:41 +0100 Alexandre Ghiti <alex@ghiti.fr> wrote:

> This function actually returns the tier nodes that are targeted during a
> demotion, so rename it to be more explicit.
> 
> No functional change intended.

Agreed, node_get_allowed_targets is pretty vague ; -)

I do think that node_get_allowed_demotion_targets could be considered
a bit too long, but I don't think it's called in too many places where
there would be more than just this function call in the line so LGTM!

Please feel free to add my review tag, have a great day Alexandre!

Reviewed-by: Joshua Hahn <joshua.hahnjy@gmail.com>

> Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
> ---
>  include/linux/memory-tiers.h | 6 +++---
>  mm/memory-tiers.c            | 4 ++--
>  mm/vmscan.c                  | 2 +-
>  3 files changed, 6 insertions(+), 6 deletions(-)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/4] mm: Fix demotion gfp by clearing GFP_RECLAIM after setting GFP_TRANSHUGE
  2026-03-11 11:02 ` [PATCH 3/4] mm: Fix demotion gfp by clearing GFP_RECLAIM after setting GFP_TRANSHUGE Alexandre Ghiti
@ 2026-03-11 17:06   ` Andrew Morton
  2026-03-12 12:59     ` David Hildenbrand (Arm)
  2026-03-13 13:47     ` Alexandre Ghiti
  2026-03-11 17:54   ` Johannes Weiner
  1 sibling, 2 replies; 22+ messages in thread
From: Andrew Morton @ 2026-03-11 17:06 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: alexghiti, kernel-team, akinobu.mita, david, lorenzo.stoakes,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, hannes, zhengqi.arch,
	shakeel.butt, axelrasmussen, yuanchu, weixugc, gourry, apopple,
	byungchul, joshua.hahnjy, matthew.brost, rakie.kim, ying.huang,
	ziy, linux-mm, linux-kernel, Bing Jiao, stable

On Wed, 11 Mar 2026 12:02:42 +0100 Alexandre Ghiti <alex@ghiti.fr> wrote:

> Fixes: 9933a0c8a539 ("mm/migrate: clear __GFP_RECLAIM to make the migration callback consistent with regular THP allocations")
> Cc: stable@vger.kernel.org

Please let's have the cc:stable fixes separated out from the cleanups,
and prepared against current -linus mainline.

Also, when proposing backportable fixes please ensure that the
changelogs carefully describe the userspace-visible runtime effects of
the bug.

Thanks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/4] mm: Fix demotion gfp by clearing GFP_RECLAIM after setting GFP_TRANSHUGE
  2026-03-11 11:02 ` [PATCH 3/4] mm: Fix demotion gfp by clearing GFP_RECLAIM after setting GFP_TRANSHUGE Alexandre Ghiti
  2026-03-11 17:06   ` Andrew Morton
@ 2026-03-11 17:54   ` Johannes Weiner
  2026-03-12 16:01     ` Gregory Price
  2026-03-13 13:49     ` Alexandre Ghiti
  1 sibling, 2 replies; 22+ messages in thread
From: Johannes Weiner @ 2026-03-11 17:54 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: akpm, alexghiti, kernel-team, akinobu.mita, david,
	lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
	zhengqi.arch, shakeel.butt, axelrasmussen, yuanchu, weixugc,
	gourry, apopple, byungchul, joshua.hahnjy, matthew.brost,
	rakie.kim, ying.huang, ziy, linux-mm, linux-kernel, Bing Jiao,
	stable

On Wed, Mar 11, 2026 at 12:02:42PM +0100, Alexandre Ghiti wrote:
> GFP_TRANSHUGE sets __GFP_DIRECT_RECLAIM so we must clear GFP_RECLAIM
> after, not before.
> 
> Reported-by: Bing Jiao <bingjiao@google.com>
> Closes: https://lore.kernel.org/linux-mm/aXlKOxGGI9zne8sl@google.com/
> Fixes: 9933a0c8a539 ("mm/migrate: clear __GFP_RECLAIM to make the migration callback consistent with regular THP allocations")
> Cc: stable@vger.kernel.org
> Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
> ---
>  mm/migrate.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 2c3d489ecf51..ee533a4d38db 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -2190,12 +2190,12 @@ struct folio *alloc_migration_target(struct folio *src, unsigned long private)
>  	}
>  
>  	if (folio_test_large(src)) {
> +		gfp_mask |= GFP_TRANSHUGE;
>  		/*
>  		 * clear __GFP_RECLAIM to make the migration callback
>  		 * consistent with regular THP allocations.
>  		 */
>  		gfp_mask &= ~__GFP_RECLAIM;
> -		gfp_mask |= GFP_TRANSHUGE;

I don't think this is right.

The Fixes: did it this way to disable kswapd for THP allocations,
while still allowing the customary direct reclaim. Maybe a better
comment would have been: /* GFP_TRANSHUGE has its own reclaim policy */

After your fix, direct reclaim isn't allowed either, which makes the
request unnecessarily wimpy.

The Closes: refers to reclaim that should be avoided during demotion.
But if this path is taken during demotion it will already not recurse
into direct reclaim due to PF_MEMALLOC.

So I don't see a bug in the existing code. But maybe the comment could
be clearer.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/4] mm: Rename node_get_allowed_targets() to make it more explicit
  2026-03-11 11:02 ` [PATCH 2/4] mm: Rename node_get_allowed_targets() to make it more explicit Alexandre Ghiti
  2026-03-11 15:02   ` Joshua Hahn
@ 2026-03-12  5:28   ` Byungchul Park
  2026-03-12 12:58     ` David Hildenbrand (Arm)
  2026-03-12  8:46   ` Donet Tom
  2 siblings, 1 reply; 22+ messages in thread
From: Byungchul Park @ 2026-03-12  5:28 UTC (permalink / raw)
  To: Alexandre Ghiti
  Cc: akpm, alexghiti, kernel-team, akinobu.mita, david,
	lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
	hannes, zhengqi.arch, shakeel.butt, axelrasmussen, yuanchu,
	weixugc, gourry, apopple, joshua.hahnjy, matthew.brost, rakie.kim,
	ying.huang, ziy, linux-mm, linux-kernel, kernel_team

On Wed, Mar 11, 2026 at 12:02:41PM +0100, Alexandre Ghiti wrote:
> This function actually returns the tier nodes that are targeted during a
> demotion, so rename it to be more explicit.
> 
> No functional change intended.
> 
> Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
> ---
>  include/linux/memory-tiers.h | 6 +++---
>  mm/memory-tiers.c            | 4 ++--
>  mm/vmscan.c                  | 2 +-
>  3 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h
> index 0bf0d002939e..ec39dc3c39e6 100644
> --- a/include/linux/memory-tiers.h
> +++ b/include/linux/memory-tiers.h
> @@ -54,7 +54,7 @@ struct memory_dev_type *mt_find_alloc_memory_type(int adist,
>  void mt_put_memory_types(struct list_head *memory_types);
>  #ifdef CONFIG_MIGRATION
>  int next_demotion_node(int node, const nodemask_t *allowed_mask);
> -void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets);
> +void node_get_allowed_demotion_targets(pg_data_t *pgdat, nodemask_t *targets);

Look better than before to me.

What about just node_get_demotion_targets()?

	Byungchul

>  bool node_is_toptier(int node);
>  unsigned int mt_demote_folios(struct list_head *demote_folios,
>                               struct pglist_data *pgdat,
> @@ -65,7 +65,7 @@ static inline int next_demotion_node(int node, const nodemask_t *allowed_mask)
>         return NUMA_NO_NODE;
>  }
> 
> -static inline void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets)
> +static inline void node_get_allowed_demotion_targets(pg_data_t *pgdat, nodemask_t *targets)
>  {
>         *targets = NODE_MASK_NONE;
>  }
> @@ -117,7 +117,7 @@ static inline int next_demotion_node(int node, const nodemask_t *allowed_mask)
>         return NUMA_NO_NODE;
>  }
> 
> -static inline void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets)
> +static inline void node_get_allowed_demotion_targets(pg_data_t *pgdat, nodemask_t *targets)
>  {
>         *targets = NODE_MASK_NONE;
>  }
> diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
> index afdf21738a54..19ecc9b6bbda 100644
> --- a/mm/memory-tiers.c
> +++ b/mm/memory-tiers.c
> @@ -300,7 +300,7 @@ bool node_is_toptier(int node)
>         return toptier;
>  }
> 
> -void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets)
> +void node_get_allowed_demotion_targets(pg_data_t *pgdat, nodemask_t *targets)
>  {
>         struct memory_tier *memtier;
> 
> @@ -428,7 +428,7 @@ unsigned int mt_demote_folios(struct list_head *demote_folios,
>         if (list_empty(demote_folios))
>                 return 0;
> 
> -       node_get_allowed_targets(pgdat, &allowed_mask);
> +       node_get_allowed_demotion_targets(pgdat, &allowed_mask);
>         mem_cgroup_node_filter_allowed(memcg, &allowed_mask);
>         if (nodes_empty(allowed_mask))
>                 return 0;
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 5e0138b94480..11a97ee8f583 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -351,7 +351,7 @@ static bool can_demote(int nid, struct scan_control *sc,
>         if (sc && sc->no_demotion)
>                 return false;
> 
> -       node_get_allowed_targets(pgdat, &allowed_mask);
> +       node_get_allowed_demotion_targets(pgdat, &allowed_mask);
>         if (nodes_empty(allowed_mask))
>                 return false;
> 
> --
> 2.53.0

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/4] mm: Move demotion related functions in memory-tiers.c
  2026-03-11 11:02 ` [PATCH 1/4] mm: Move demotion related functions in memory-tiers.c Alexandre Ghiti
  2026-03-11 14:55   ` Joshua Hahn
@ 2026-03-12  8:44   ` Donet Tom
  2026-03-13 13:27     ` Alexandre Ghiti
  2026-03-12 12:56   ` David Hildenbrand (Arm)
  2 siblings, 1 reply; 22+ messages in thread
From: Donet Tom @ 2026-03-12  8:44 UTC (permalink / raw)
  To: Alexandre Ghiti, akpm
  Cc: alexghiti, kernel-team, akinobu.mita, david, lorenzo.stoakes,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, hannes, zhengqi.arch,
	shakeel.butt, axelrasmussen, yuanchu, weixugc, gourry, apopple,
	byungchul, joshua.hahnjy, matthew.brost, rakie.kim, ying.huang,
	ziy, linux-mm, linux-kernel


Hi Alexander

On 3/11/26 4:32 PM, Alexandre Ghiti wrote:
> Let's have all the demotion functions in this file, no functional
> change intended.
>
> Suggested-by: Gregory Price <gourry@gourry.net>
> Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
> ---
>   include/linux/memory-tiers.h | 18 ++++++++
>   mm/memory-tiers.c            | 75 +++++++++++++++++++++++++++++++++
>   mm/vmscan.c                  | 80 +-----------------------------------
>   3 files changed, 94 insertions(+), 79 deletions(-)
>
> diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h
> index 96987d9d95a8..0bf0d002939e 100644
> --- a/include/linux/memory-tiers.h
> +++ b/include/linux/memory-tiers.h
> @@ -56,6 +56,9 @@ void mt_put_memory_types(struct list_head *memory_types);
>   int next_demotion_node(int node, const nodemask_t *allowed_mask);
>   void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets);
>   bool node_is_toptier(int node);
> +unsigned int mt_demote_folios(struct list_head *demote_folios,
> +			      struct pglist_data *pgdat,
> +			      struct mem_cgroup *memcg);
>   #else
>   static inline int next_demotion_node(int node, const nodemask_t *allowed_mask)
>   {
> @@ -71,6 +74,14 @@ static inline bool node_is_toptier(int node)
>   {
>   	return true;
>   }
> +
> +static inline unsigned int mt_demote_folios(struct list_head *demote_folios,
> +					    struct pglist_data *pgdat,
> +					    struct mem_cgroup *memcg)
> +{
> +	return 0;
> +}
> +
>   #endif
>   
>   #else
> @@ -116,6 +127,13 @@ static inline bool node_is_toptier(int node)
>   	return true;
>   }
>   
> +static inline unsigned int mt_demote_folios(struct list_head *demote_folios,
> +					    struct pglist_data *pgdat,
> +					    struct mem_cgroup *memcg)
> +{
> +	return 0;
> +}
> +
>   static inline int register_mt_adistance_algorithm(struct notifier_block *nb)
>   {
>   	return 0;
> diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
> index 986f809376eb..afdf21738a54 100644
> --- a/mm/memory-tiers.c
> +++ b/mm/memory-tiers.c
> @@ -7,6 +7,7 @@
>   #include <linux/memory-tiers.h>
>   #include <linux/notifier.h>
>   #include <linux/sched/sysctl.h>
> +#include <linux/migrate.h>
>   
>   #include "internal.h"
>   
> @@ -373,6 +374,80 @@ int next_demotion_node(int node, const nodemask_t *allowed_mask)
>   	return find_next_best_node(node, &mask);
>   }
>   
> +static struct folio *alloc_demote_folio(struct folio *src,
> +					unsigned long private)
> +{
> +	struct folio *dst;
> +	nodemask_t *allowed_mask;
> +	struct migration_target_control *mtc;
> +
> +	mtc = (struct migration_target_control *)private;
> +
> +	allowed_mask = mtc->nmask;
> +	/*
> +	 * make sure we allocate from the target node first also trying to
> +	 * demote or reclaim pages from the target node via kswapd if we are
> +	 * low on free memory on target node. If we don't do this and if
> +	 * we have free memory on the slower(lower) memtier, we would start
> +	 * allocating pages from slower(lower) memory tiers without even forcing
> +	 * a demotion of cold pages from the target memtier. This can result
> +	 * in the kernel placing hot pages in slower(lower) memory tiers.
> +	 */
> +	mtc->nmask = NULL;
> +	mtc->gfp_mask |= __GFP_THISNODE;
> +	dst = alloc_migration_target(src, (unsigned long)mtc);
> +	if (dst)
> +		return dst;
> +
> +	mtc->gfp_mask &= ~__GFP_THISNODE;
> +	mtc->nmask = allowed_mask;
> +
> +	return alloc_migration_target(src, (unsigned long)mtc);
> +}
> +
> +unsigned int mt_demote_folios(struct list_head *demote_folios,


Demotion will happen only when different memory tiers are present, 
right? Since demote_folios() already implies that the folios are being 
demoted to a lower tier, is the mt_ prefix needed in the function name? 
I’m fine with keeping it as is, but I just wanted to clarify.

Otherwise it LGTM

Reviewed by: Donet Tom <donettom@linux.ibm.com>

> +			      struct pglist_data *pgdat,
> +			      struct mem_cgroup *memcg)
> +{
> +	int target_nid;
> +	unsigned int nr_succeeded;
> +	nodemask_t allowed_mask;
> +
> +	struct migration_target_control mtc = {
> +		/*
> +		 * Allocate from 'node', or fail quickly and quietly.
> +		 * When this happens, 'page' will likely just be discarded
> +		 * instead of migrated.
> +		 */
> +		.gfp_mask = (GFP_HIGHUSER_MOVABLE & ~__GFP_RECLAIM) |
> +			__GFP_NOMEMALLOC | GFP_NOWAIT,
> +		.nmask = &allowed_mask,
> +		.reason = MR_DEMOTION,
> +	};
> +
> +	if (list_empty(demote_folios))
> +		return 0;
> +
> +	node_get_allowed_targets(pgdat, &allowed_mask);
> +	mem_cgroup_node_filter_allowed(memcg, &allowed_mask);
> +	if (nodes_empty(allowed_mask))
> +		return 0;
> +
> +	target_nid = next_demotion_node(pgdat->node_id, &allowed_mask);
> +	if (target_nid == NUMA_NO_NODE)
> +		/* No lower-tier nodes or nodes were hot-unplugged. */
> +		return 0;
> +
> +	mtc.nid = target_nid;
> +
> +	/* Demotion ignores all cpuset and mempolicy settings */
> +	migrate_pages(demote_folios, alloc_demote_folio, NULL,
> +			(unsigned long)&mtc, MIGRATE_ASYNC, MR_DEMOTION,
> +			&nr_succeeded);
> +
> +	return nr_succeeded;
> +}
> +
>   static void disable_all_demotion_targets(void)
>   {
>   	struct memory_tier *memtier;
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 0fc9373e8251..5e0138b94480 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -983,84 +983,6 @@ static void folio_check_dirty_writeback(struct folio *folio,
>   		mapping->a_ops->is_dirty_writeback(folio, dirty, writeback);
>   }
>   
> -static struct folio *alloc_demote_folio(struct folio *src,
> -		unsigned long private)
> -{
> -	struct folio *dst;
> -	nodemask_t *allowed_mask;
> -	struct migration_target_control *mtc;
> -
> -	mtc = (struct migration_target_control *)private;
> -
> -	allowed_mask = mtc->nmask;
> -	/*
> -	 * make sure we allocate from the target node first also trying to
> -	 * demote or reclaim pages from the target node via kswapd if we are
> -	 * low on free memory on target node. If we don't do this and if
> -	 * we have free memory on the slower(lower) memtier, we would start
> -	 * allocating pages from slower(lower) memory tiers without even forcing
> -	 * a demotion of cold pages from the target memtier. This can result
> -	 * in the kernel placing hot pages in slower(lower) memory tiers.
> -	 */
> -	mtc->nmask = NULL;
> -	mtc->gfp_mask |= __GFP_THISNODE;
> -	dst = alloc_migration_target(src, (unsigned long)mtc);
> -	if (dst)
> -		return dst;
> -
> -	mtc->gfp_mask &= ~__GFP_THISNODE;
> -	mtc->nmask = allowed_mask;
> -
> -	return alloc_migration_target(src, (unsigned long)mtc);
> -}
> -
> -/*
> - * Take folios on @demote_folios and attempt to demote them to another node.
> - * Folios which are not demoted are left on @demote_folios.
> - */
> -static unsigned int demote_folio_list(struct list_head *demote_folios,
> -				      struct pglist_data *pgdat,
> -				      struct mem_cgroup *memcg)
> -{
> -	int target_nid;
> -	unsigned int nr_succeeded;
> -	nodemask_t allowed_mask;
> -
> -	struct migration_target_control mtc = {
> -		/*
> -		 * Allocate from 'node', or fail quickly and quietly.
> -		 * When this happens, 'page' will likely just be discarded
> -		 * instead of migrated.
> -		 */
> -		.gfp_mask = (GFP_HIGHUSER_MOVABLE & ~__GFP_RECLAIM) |
> -			__GFP_NOMEMALLOC | GFP_NOWAIT,
> -		.nmask = &allowed_mask,
> -		.reason = MR_DEMOTION,
> -	};
> -
> -	if (list_empty(demote_folios))
> -		return 0;
> -
> -	node_get_allowed_targets(pgdat, &allowed_mask);
> -	mem_cgroup_node_filter_allowed(memcg, &allowed_mask);
> -	if (nodes_empty(allowed_mask))
> -		return 0;
> -
> -	target_nid = next_demotion_node(pgdat->node_id, &allowed_mask);
> -	if (target_nid == NUMA_NO_NODE)
> -		/* No lower-tier nodes or nodes were hot-unplugged. */
> -		return 0;
> -
> -	mtc.nid = target_nid;
> -
> -	/* Demotion ignores all cpuset and mempolicy settings */
> -	migrate_pages(demote_folios, alloc_demote_folio, NULL,
> -		      (unsigned long)&mtc, MIGRATE_ASYNC, MR_DEMOTION,
> -		      &nr_succeeded);
> -
> -	return nr_succeeded;
> -}
> -
>   static bool may_enter_fs(struct folio *folio, gfp_t gfp_mask)
>   {
>   	if (gfp_mask & __GFP_FS)
> @@ -1573,7 +1495,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
>   	/* 'folio_list' is always empty here */
>   
>   	/* Migrate folios selected for demotion */
> -	nr_demoted = demote_folio_list(&demote_folios, pgdat, memcg);
> +	nr_demoted = mt_demote_folios(&demote_folios, pgdat, memcg);
>   	nr_reclaimed += nr_demoted;
>   	stat->nr_demoted += nr_demoted;
>   	/* Folios that could not be demoted are still in @demote_folios */

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/4] mm: Rename node_get_allowed_targets() to make it more explicit
  2026-03-11 11:02 ` [PATCH 2/4] mm: Rename node_get_allowed_targets() to make it more explicit Alexandre Ghiti
  2026-03-11 15:02   ` Joshua Hahn
  2026-03-12  5:28   ` Byungchul Park
@ 2026-03-12  8:46   ` Donet Tom
  2 siblings, 0 replies; 22+ messages in thread
From: Donet Tom @ 2026-03-12  8:46 UTC (permalink / raw)
  To: Alexandre Ghiti, akpm
  Cc: alexghiti, kernel-team, akinobu.mita, david, lorenzo.stoakes,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, hannes, zhengqi.arch,
	shakeel.butt, axelrasmussen, yuanchu, weixugc, gourry, apopple,
	byungchul, joshua.hahnjy, matthew.brost, rakie.kim, ying.huang,
	ziy, linux-mm, linux-kernel


On 3/11/26 4:32 PM, Alexandre Ghiti wrote:
> This function actually returns the tier nodes that are targeted during a
> demotion, so rename it to be more explicit.
>
> No functional change intended.
>
> Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>


This looks LGTM

Reviewed by: Donet Tom <donettom@linux.ibm.com>

> ---
>   include/linux/memory-tiers.h | 6 +++---
>   mm/memory-tiers.c            | 4 ++--
>   mm/vmscan.c                  | 2 +-
>   3 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h
> index 0bf0d002939e..ec39dc3c39e6 100644
> --- a/include/linux/memory-tiers.h
> +++ b/include/linux/memory-tiers.h
> @@ -54,7 +54,7 @@ struct memory_dev_type *mt_find_alloc_memory_type(int adist,
>   void mt_put_memory_types(struct list_head *memory_types);
>   #ifdef CONFIG_MIGRATION
>   int next_demotion_node(int node, const nodemask_t *allowed_mask);
> -void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets);
> +void node_get_allowed_demotion_targets(pg_data_t *pgdat, nodemask_t *targets);
>   bool node_is_toptier(int node);
>   unsigned int mt_demote_folios(struct list_head *demote_folios,
>   			      struct pglist_data *pgdat,
> @@ -65,7 +65,7 @@ static inline int next_demotion_node(int node, const nodemask_t *allowed_mask)
>   	return NUMA_NO_NODE;
>   }
>   
> -static inline void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets)
> +static inline void node_get_allowed_demotion_targets(pg_data_t *pgdat, nodemask_t *targets)
>   {
>   	*targets = NODE_MASK_NONE;
>   }
> @@ -117,7 +117,7 @@ static inline int next_demotion_node(int node, const nodemask_t *allowed_mask)
>   	return NUMA_NO_NODE;
>   }
>   
> -static inline void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets)
> +static inline void node_get_allowed_demotion_targets(pg_data_t *pgdat, nodemask_t *targets)
>   {
>   	*targets = NODE_MASK_NONE;
>   }
> diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
> index afdf21738a54..19ecc9b6bbda 100644
> --- a/mm/memory-tiers.c
> +++ b/mm/memory-tiers.c
> @@ -300,7 +300,7 @@ bool node_is_toptier(int node)
>   	return toptier;
>   }
>   
> -void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets)
> +void node_get_allowed_demotion_targets(pg_data_t *pgdat, nodemask_t *targets)
>   {
>   	struct memory_tier *memtier;
>   
> @@ -428,7 +428,7 @@ unsigned int mt_demote_folios(struct list_head *demote_folios,
>   	if (list_empty(demote_folios))
>   		return 0;
>   
> -	node_get_allowed_targets(pgdat, &allowed_mask);
> +	node_get_allowed_demotion_targets(pgdat, &allowed_mask);
>   	mem_cgroup_node_filter_allowed(memcg, &allowed_mask);
>   	if (nodes_empty(allowed_mask))
>   		return 0;
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 5e0138b94480..11a97ee8f583 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -351,7 +351,7 @@ static bool can_demote(int nid, struct scan_control *sc,
>   	if (sc && sc->no_demotion)
>   		return false;
>   
> -	node_get_allowed_targets(pgdat, &allowed_mask);
> +	node_get_allowed_demotion_targets(pgdat, &allowed_mask);
>   	if (nodes_empty(allowed_mask))
>   		return false;
>   

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/4] mm: Move demotion related functions in memory-tiers.c
  2026-03-11 11:02 ` [PATCH 1/4] mm: Move demotion related functions in memory-tiers.c Alexandre Ghiti
  2026-03-11 14:55   ` Joshua Hahn
  2026-03-12  8:44   ` Donet Tom
@ 2026-03-12 12:56   ` David Hildenbrand (Arm)
  2026-03-13 13:45     ` Alexandre Ghiti
  2 siblings, 1 reply; 22+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-12 12:56 UTC (permalink / raw)
  To: Alexandre Ghiti, akpm
  Cc: alexghiti, kernel-team, akinobu.mita, lorenzo.stoakes,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, hannes, zhengqi.arch,
	shakeel.butt, axelrasmussen, yuanchu, weixugc, gourry, apopple,
	byungchul, joshua.hahnjy, matthew.brost, rakie.kim, ying.huang,
	ziy, linux-mm, linux-kernel

On 3/11/26 12:02, Alexandre Ghiti wrote:
> Let's have all the demotion functions in this file, no functional
> change intended.
> 
> Suggested-by: Gregory Price <gourry@gourry.net>
> Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
> ---
>  include/linux/memory-tiers.h | 18 ++++++++
>  mm/memory-tiers.c            | 75 +++++++++++++++++++++++++++++++++
>  mm/vmscan.c                  | 80 +-----------------------------------
>  3 files changed, 94 insertions(+), 79 deletions(-)
> 
> diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h
> index 96987d9d95a8..0bf0d002939e 100644
> --- a/include/linux/memory-tiers.h
> +++ b/include/linux/memory-tiers.h
> @@ -56,6 +56,9 @@ void mt_put_memory_types(struct list_head *memory_types);
>  int next_demotion_node(int node, const nodemask_t *allowed_mask);
>  void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets);
>  bool node_is_toptier(int node);
> +unsigned int mt_demote_folios(struct list_head *demote_folios,
> +			      struct pglist_data *pgdat,
> +			      struct mem_cgroup *memcg);
>  #else
>  static inline int next_demotion_node(int node, const nodemask_t *allowed_mask)
>  {
> @@ -71,6 +74,14 @@ static inline bool node_is_toptier(int node)
>  {
>  	return true;
>  }
> +
> +static inline unsigned int mt_demote_folios(struct list_head *demote_folios,
> +					    struct pglist_data *pgdat,
> +					    struct mem_cgroup *memcg)

use two-tab indentation on second parameter line please. So this fits
into a single line. Same for the other functions.

Just like alloc_demote_folio() that you are moving already did.

[...]

> -static struct folio *alloc_demote_folio(struct folio *src,
> -		unsigned long private)
> -{
> -	struct folio *dst;
> -	nodemask_t *allowed_mask;
> -	struct migration_target_control *mtc;
> -
> -	mtc = (struct migration_target_control *)private;
> -
> -	allowed_mask = mtc->nmask;
> -	/*
> -	 * make sure we allocate from the target node first also trying to
> -	 * demote or reclaim pages from the target node via kswapd if we are
> -	 * low on free memory on target node. If we don't do this and if
> -	 * we have free memory on the slower(lower) memtier, we would start
> -	 * allocating pages from slower(lower) memory tiers without even forcing
> -	 * a demotion of cold pages from the target memtier. This can result
> -	 * in the kernel placing hot pages in slower(lower) memory tiers.
> -	 */
> -	mtc->nmask = NULL;
> -	mtc->gfp_mask |= __GFP_THISNODE;
> -	dst = alloc_migration_target(src, (unsigned long)mtc);
> -	if (dst)
> -		return dst;
> -
> -	mtc->gfp_mask &= ~__GFP_THISNODE;
> -	mtc->nmask = allowed_mask;
> -

I think this function changed in the meantime in mm/mm-unstable. Against
which branch is this patch?

-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/4] mm: Rename node_get_allowed_targets() to make it more explicit
  2026-03-12  5:28   ` Byungchul Park
@ 2026-03-12 12:58     ` David Hildenbrand (Arm)
  2026-03-13 13:46       ` Alexandre Ghiti
  0 siblings, 1 reply; 22+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-12 12:58 UTC (permalink / raw)
  To: Byungchul Park, Alexandre Ghiti
  Cc: akpm, alexghiti, kernel-team, akinobu.mita, lorenzo.stoakes,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, hannes, zhengqi.arch,
	shakeel.butt, axelrasmussen, yuanchu, weixugc, gourry, apopple,
	joshua.hahnjy, matthew.brost, rakie.kim, ying.huang, ziy,
	linux-mm, linux-kernel, kernel_team

On 3/12/26 06:28, Byungchul Park wrote:
> On Wed, Mar 11, 2026 at 12:02:41PM +0100, Alexandre Ghiti wrote:
>> This function actually returns the tier nodes that are targeted during a
>> demotion, so rename it to be more explicit.
>>
>> No functional change intended.
>>
>> Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
>> ---
>>  include/linux/memory-tiers.h | 6 +++---
>>  mm/memory-tiers.c            | 4 ++--
>>  mm/vmscan.c                  | 2 +-
>>  3 files changed, 6 insertions(+), 6 deletions(-)
>>
>> diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h
>> index 0bf0d002939e..ec39dc3c39e6 100644
>> --- a/include/linux/memory-tiers.h
>> +++ b/include/linux/memory-tiers.h
>> @@ -54,7 +54,7 @@ struct memory_dev_type *mt_find_alloc_memory_type(int adist,
>>  void mt_put_memory_types(struct list_head *memory_types);
>>  #ifdef CONFIG_MIGRATION
>>  int next_demotion_node(int node, const nodemask_t *allowed_mask);
>> -void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets);
>> +void node_get_allowed_demotion_targets(pg_data_t *pgdat, nodemask_t *targets);
> 
> Look better than before to me.
> 
> What about just node_get_demotion_targets()?

+1

Maybe throw in the mt_ prefix and call it

mt_get_node_demotion_targets()

-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/4] mm: Fix demotion gfp by clearing GFP_RECLAIM after setting GFP_TRANSHUGE
  2026-03-11 17:06   ` Andrew Morton
@ 2026-03-12 12:59     ` David Hildenbrand (Arm)
  2026-03-13 13:47     ` Alexandre Ghiti
  1 sibling, 0 replies; 22+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-12 12:59 UTC (permalink / raw)
  To: Andrew Morton, Alexandre Ghiti
  Cc: alexghiti, kernel-team, akinobu.mita, lorenzo.stoakes,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, hannes, zhengqi.arch,
	shakeel.butt, axelrasmussen, yuanchu, weixugc, gourry, apopple,
	byungchul, joshua.hahnjy, matthew.brost, rakie.kim, ying.huang,
	ziy, linux-mm, linux-kernel, Bing Jiao, stable

On 3/11/26 18:06, Andrew Morton wrote:
> On Wed, 11 Mar 2026 12:02:42 +0100 Alexandre Ghiti <alex@ghiti.fr> wrote:
> 
>> Fixes: 9933a0c8a539 ("mm/migrate: clear __GFP_RECLAIM to make the migration callback consistent with regular THP allocations")
>> Cc: stable@vger.kernel.org
> 
> Please let's have the cc:stable fixes separated out from the cleanups,
> and prepared against current -linus mainline.
> 
> Also, when proposing backportable fixes please ensure that the
> changelogs carefully describe the userspace-visible runtime effects of
> the bug.

Also, please move fixes to the very beginning of the patch set, or
better, send them independently.

-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/4] mm: Fix demotion gfp by clearing GFP_RECLAIM after setting GFP_TRANSHUGE
  2026-03-11 17:54   ` Johannes Weiner
@ 2026-03-12 16:01     ` Gregory Price
  2026-03-13 13:49     ` Alexandre Ghiti
  1 sibling, 0 replies; 22+ messages in thread
From: Gregory Price @ 2026-03-12 16:01 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Alexandre Ghiti, akpm, alexghiti, kernel-team, akinobu.mita,
	david, lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb,
	mhocko, zhengqi.arch, shakeel.butt, axelrasmussen, yuanchu,
	weixugc, apopple, byungchul, joshua.hahnjy, matthew.brost,
	rakie.kim, ying.huang, ziy, linux-mm, linux-kernel, Bing Jiao,
	stable

On Wed, Mar 11, 2026 at 01:54:50PM -0400, Johannes Weiner wrote:
> On Wed, Mar 11, 2026 at 12:02:42PM +0100, Alexandre Ghiti wrote:
> > GFP_TRANSHUGE sets __GFP_DIRECT_RECLAIM so we must clear GFP_RECLAIM
> > after, not before.
> > 
> > Reported-by: Bing Jiao <bingjiao@google.com>
> > Closes: https://lore.kernel.org/linux-mm/aXlKOxGGI9zne8sl@google.com/
> > Fixes: 9933a0c8a539 ("mm/migrate: clear __GFP_RECLAIM to make the migration callback consistent with regular THP allocations")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
> > ---
> >  mm/migrate.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/mm/migrate.c b/mm/migrate.c
> > index 2c3d489ecf51..ee533a4d38db 100644
> > --- a/mm/migrate.c
> > +++ b/mm/migrate.c
> > @@ -2190,12 +2190,12 @@ struct folio *alloc_migration_target(struct folio *src, unsigned long private)
> >  	}
> >  
> >  	if (folio_test_large(src)) {
> > +		gfp_mask |= GFP_TRANSHUGE;
> >  		/*
> >  		 * clear __GFP_RECLAIM to make the migration callback
> >  		 * consistent with regular THP allocations.
> >  		 */
> >  		gfp_mask &= ~__GFP_RECLAIM;
> > -		gfp_mask |= GFP_TRANSHUGE;
> 
> I don't think this is right.
> 
> The Fixes: did it this way to disable kswapd for THP allocations,
> while still allowing the customary direct reclaim. Maybe a better
> comment would have been: /* GFP_TRANSHUGE has its own reclaim policy */
> 

The bigger issue how many times we see this particular flag getting
masked and apparently added back in at multiple layers. We saw two
or three paths (some unreachable) that can twiddle RECLAIM flags in
the stack for demotion (which is in reclaim already, so do the flags
matter?).

It makes it difficult to reason about what the GFP flags actually
are at any given point.

But yeah I wasn't sure to make of this code, it could be as you
suggested just a bad comment.

~Gregory

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/4] mm: Move demotion related functions in memory-tiers.c
  2026-03-12  8:44   ` Donet Tom
@ 2026-03-13 13:27     ` Alexandre Ghiti
  0 siblings, 0 replies; 22+ messages in thread
From: Alexandre Ghiti @ 2026-03-13 13:27 UTC (permalink / raw)
  To: Donet Tom, akpm
  Cc: alexghiti, kernel-team, akinobu.mita, david, lorenzo.stoakes,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, hannes, zhengqi.arch,
	shakeel.butt, axelrasmussen, yuanchu, weixugc, gourry, apopple,
	byungchul, joshua.hahnjy, matthew.brost, rakie.kim, ying.huang,
	ziy, linux-mm, linux-kernel

Hi Tom,

On 3/12/26 09:44, Donet Tom wrote:
>
> Hi Alexander
>
> On 3/11/26 4:32 PM, Alexandre Ghiti wrote:
>> Let's have all the demotion functions in this file, no functional
>> change intended.
>>
>> Suggested-by: Gregory Price <gourry@gourry.net>
>> Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
>> ---
>>   include/linux/memory-tiers.h | 18 ++++++++
>>   mm/memory-tiers.c            | 75 +++++++++++++++++++++++++++++++++
>>   mm/vmscan.c                  | 80 +-----------------------------------
>>   3 files changed, 94 insertions(+), 79 deletions(-)
>>
>> diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h
>> index 96987d9d95a8..0bf0d002939e 100644
>> --- a/include/linux/memory-tiers.h
>> +++ b/include/linux/memory-tiers.h
>> @@ -56,6 +56,9 @@ void mt_put_memory_types(struct list_head 
>> *memory_types);
>>   int next_demotion_node(int node, const nodemask_t *allowed_mask);
>>   void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets);
>>   bool node_is_toptier(int node);
>> +unsigned int mt_demote_folios(struct list_head *demote_folios,
>> +                  struct pglist_data *pgdat,
>> +                  struct mem_cgroup *memcg);
>>   #else
>>   static inline int next_demotion_node(int node, const nodemask_t 
>> *allowed_mask)
>>   {
>> @@ -71,6 +74,14 @@ static inline bool node_is_toptier(int node)
>>   {
>>       return true;
>>   }
>> +
>> +static inline unsigned int mt_demote_folios(struct list_head 
>> *demote_folios,
>> +                        struct pglist_data *pgdat,
>> +                        struct mem_cgroup *memcg)
>> +{
>> +    return 0;
>> +}
>> +
>>   #endif
>>     #else
>> @@ -116,6 +127,13 @@ static inline bool node_is_toptier(int node)
>>       return true;
>>   }
>>   +static inline unsigned int mt_demote_folios(struct list_head 
>> *demote_folios,
>> +                        struct pglist_data *pgdat,
>> +                        struct mem_cgroup *memcg)
>> +{
>> +    return 0;
>> +}
>> +
>>   static inline int register_mt_adistance_algorithm(struct 
>> notifier_block *nb)
>>   {
>>       return 0;
>> diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
>> index 986f809376eb..afdf21738a54 100644
>> --- a/mm/memory-tiers.c
>> +++ b/mm/memory-tiers.c
>> @@ -7,6 +7,7 @@
>>   #include <linux/memory-tiers.h>
>>   #include <linux/notifier.h>
>>   #include <linux/sched/sysctl.h>
>> +#include <linux/migrate.h>
>>     #include "internal.h"
>>   @@ -373,6 +374,80 @@ int next_demotion_node(int node, const 
>> nodemask_t *allowed_mask)
>>       return find_next_best_node(node, &mask);
>>   }
>>   +static struct folio *alloc_demote_folio(struct folio *src,
>> +                    unsigned long private)
>> +{
>> +    struct folio *dst;
>> +    nodemask_t *allowed_mask;
>> +    struct migration_target_control *mtc;
>> +
>> +    mtc = (struct migration_target_control *)private;
>> +
>> +    allowed_mask = mtc->nmask;
>> +    /*
>> +     * make sure we allocate from the target node first also trying to
>> +     * demote or reclaim pages from the target node via kswapd if we 
>> are
>> +     * low on free memory on target node. If we don't do this and if
>> +     * we have free memory on the slower(lower) memtier, we would start
>> +     * allocating pages from slower(lower) memory tiers without even 
>> forcing
>> +     * a demotion of cold pages from the target memtier. This can 
>> result
>> +     * in the kernel placing hot pages in slower(lower) memory tiers.
>> +     */
>> +    mtc->nmask = NULL;
>> +    mtc->gfp_mask |= __GFP_THISNODE;
>> +    dst = alloc_migration_target(src, (unsigned long)mtc);
>> +    if (dst)
>> +        return dst;
>> +
>> +    mtc->gfp_mask &= ~__GFP_THISNODE;
>> +    mtc->nmask = allowed_mask;
>> +
>> +    return alloc_migration_target(src, (unsigned long)mtc);
>> +}
>> +
>> +unsigned int mt_demote_folios(struct list_head *demote_folios,
>
>
> Demotion will happen only when different memory tiers are present, 
> right? Since demote_folios() already implies that the folios are being 
> demoted to a lower tier, is the mt_ prefix needed in the function 
> name? I’m fine with keeping it as is, but I just wanted to clarify.


You're right, demote implies some memory tiers. But I like the mt_ 
prefix, some functions in memory-tiers.c already have this prefix so it 
adds consistency: so since you don't mind, I'll keep it :)


>
> Otherwise it LGTM
>
> Reviewed by: Donet Tom <donettom@linux.ibm.com>


Thanks for your time!

Alex


>
>> +                  struct pglist_data *pgdat,
>> +                  struct mem_cgroup *memcg)
>> +{
>> +    int target_nid;
>> +    unsigned int nr_succeeded;
>> +    nodemask_t allowed_mask;
>> +
>> +    struct migration_target_control mtc = {
>> +        /*
>> +         * Allocate from 'node', or fail quickly and quietly.
>> +         * When this happens, 'page' will likely just be discarded
>> +         * instead of migrated.
>> +         */
>> +        .gfp_mask = (GFP_HIGHUSER_MOVABLE & ~__GFP_RECLAIM) |
>> +            __GFP_NOMEMALLOC | GFP_NOWAIT,
>> +        .nmask = &allowed_mask,
>> +        .reason = MR_DEMOTION,
>> +    };
>> +
>> +    if (list_empty(demote_folios))
>> +        return 0;
>> +
>> +    node_get_allowed_targets(pgdat, &allowed_mask);
>> +    mem_cgroup_node_filter_allowed(memcg, &allowed_mask);
>> +    if (nodes_empty(allowed_mask))
>> +        return 0;
>> +
>> +    target_nid = next_demotion_node(pgdat->node_id, &allowed_mask);
>> +    if (target_nid == NUMA_NO_NODE)
>> +        /* No lower-tier nodes or nodes were hot-unplugged. */
>> +        return 0;
>> +
>> +    mtc.nid = target_nid;
>> +
>> +    /* Demotion ignores all cpuset and mempolicy settings */
>> +    migrate_pages(demote_folios, alloc_demote_folio, NULL,
>> +            (unsigned long)&mtc, MIGRATE_ASYNC, MR_DEMOTION,
>> +            &nr_succeeded);
>> +
>> +    return nr_succeeded;
>> +}
>> +
>>   static void disable_all_demotion_targets(void)
>>   {
>>       struct memory_tier *memtier;
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index 0fc9373e8251..5e0138b94480 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -983,84 +983,6 @@ static void folio_check_dirty_writeback(struct 
>> folio *folio,
>>           mapping->a_ops->is_dirty_writeback(folio, dirty, writeback);
>>   }
>>   -static struct folio *alloc_demote_folio(struct folio *src,
>> -        unsigned long private)
>> -{
>> -    struct folio *dst;
>> -    nodemask_t *allowed_mask;
>> -    struct migration_target_control *mtc;
>> -
>> -    mtc = (struct migration_target_control *)private;
>> -
>> -    allowed_mask = mtc->nmask;
>> -    /*
>> -     * make sure we allocate from the target node first also trying to
>> -     * demote or reclaim pages from the target node via kswapd if we 
>> are
>> -     * low on free memory on target node. If we don't do this and if
>> -     * we have free memory on the slower(lower) memtier, we would start
>> -     * allocating pages from slower(lower) memory tiers without even 
>> forcing
>> -     * a demotion of cold pages from the target memtier. This can 
>> result
>> -     * in the kernel placing hot pages in slower(lower) memory tiers.
>> -     */
>> -    mtc->nmask = NULL;
>> -    mtc->gfp_mask |= __GFP_THISNODE;
>> -    dst = alloc_migration_target(src, (unsigned long)mtc);
>> -    if (dst)
>> -        return dst;
>> -
>> -    mtc->gfp_mask &= ~__GFP_THISNODE;
>> -    mtc->nmask = allowed_mask;
>> -
>> -    return alloc_migration_target(src, (unsigned long)mtc);
>> -}
>> -
>> -/*
>> - * Take folios on @demote_folios and attempt to demote them to 
>> another node.
>> - * Folios which are not demoted are left on @demote_folios.
>> - */
>> -static unsigned int demote_folio_list(struct list_head *demote_folios,
>> -                      struct pglist_data *pgdat,
>> -                      struct mem_cgroup *memcg)
>> -{
>> -    int target_nid;
>> -    unsigned int nr_succeeded;
>> -    nodemask_t allowed_mask;
>> -
>> -    struct migration_target_control mtc = {
>> -        /*
>> -         * Allocate from 'node', or fail quickly and quietly.
>> -         * When this happens, 'page' will likely just be discarded
>> -         * instead of migrated.
>> -         */
>> -        .gfp_mask = (GFP_HIGHUSER_MOVABLE & ~__GFP_RECLAIM) |
>> -            __GFP_NOMEMALLOC | GFP_NOWAIT,
>> -        .nmask = &allowed_mask,
>> -        .reason = MR_DEMOTION,
>> -    };
>> -
>> -    if (list_empty(demote_folios))
>> -        return 0;
>> -
>> -    node_get_allowed_targets(pgdat, &allowed_mask);
>> -    mem_cgroup_node_filter_allowed(memcg, &allowed_mask);
>> -    if (nodes_empty(allowed_mask))
>> -        return 0;
>> -
>> -    target_nid = next_demotion_node(pgdat->node_id, &allowed_mask);
>> -    if (target_nid == NUMA_NO_NODE)
>> -        /* No lower-tier nodes or nodes were hot-unplugged. */
>> -        return 0;
>> -
>> -    mtc.nid = target_nid;
>> -
>> -    /* Demotion ignores all cpuset and mempolicy settings */
>> -    migrate_pages(demote_folios, alloc_demote_folio, NULL,
>> -              (unsigned long)&mtc, MIGRATE_ASYNC, MR_DEMOTION,
>> -              &nr_succeeded);
>> -
>> -    return nr_succeeded;
>> -}
>> -
>>   static bool may_enter_fs(struct folio *folio, gfp_t gfp_mask)
>>   {
>>       if (gfp_mask & __GFP_FS)
>> @@ -1573,7 +1495,7 @@ static unsigned int shrink_folio_list(struct 
>> list_head *folio_list,
>>       /* 'folio_list' is always empty here */
>>         /* Migrate folios selected for demotion */
>> -    nr_demoted = demote_folio_list(&demote_folios, pgdat, memcg);
>> +    nr_demoted = mt_demote_folios(&demote_folios, pgdat, memcg);
>>       nr_reclaimed += nr_demoted;
>>       stat->nr_demoted += nr_demoted;
>>       /* Folios that could not be demoted are still in @demote_folios */

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/4] mm: Move demotion related functions in memory-tiers.c
  2026-03-11 14:55   ` Joshua Hahn
@ 2026-03-13 13:33     ` Alexandre Ghiti
  0 siblings, 0 replies; 22+ messages in thread
From: Alexandre Ghiti @ 2026-03-13 13:33 UTC (permalink / raw)
  To: Joshua Hahn
  Cc: akpm, alexghiti, kernel-team, akinobu.mita, david,
	lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
	hannes, zhengqi.arch, shakeel.butt, axelrasmussen, yuanchu,
	weixugc, gourry, apopple, byungchul, matthew.brost, rakie.kim,
	ying.huang, ziy, linux-mm, linux-kernel

Hi Joshua,

On 3/11/26 15:55, Joshua Hahn wrote:
> On Wed, 11 Mar 2026 12:02:40 +0100 Alexandre Ghiti <alex@ghiti.fr> wrote:
>
>> Let's have all the demotion functions in this file, no functional
>> change intended.
> Hi Alexandre,
>
> I hope you are doing well! Thank you for the patch.
>
> Makes sense to move the migration functions together. Just one small
> nit, I think the following comment is pretty helpful in understanding
> that folios that aren't demoted still remain in @demote_folios. Should
> we also move this comment to memory-tiers.c?


You're totally right, my bad! I'll add the comment in the next version.


>
> [...snip...]
>
>> -/*
>> - * Take folios on @demote_folios and attempt to demote them to another node.
>> - * Folios which are not demoted are left on @demote_folios.
>> - */
>> -static unsigned int demote_folio_list(struct list_head *demote_folios,
>> -				      struct pglist_data *pgdat,
>> -				      struct mem_cgroup *memcg)
>> -{
>> -	int target_nid;
>> -	unsigned int nr_succeeded;
>> -	nodemask_t allowed_mask;
>> -
> [...snip...]
>
> Anyways, the rest looks good to me. Have a great day!
> Joshua


Thanks!

Alex


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/4] mm: Move demotion related functions in memory-tiers.c
  2026-03-12 12:56   ` David Hildenbrand (Arm)
@ 2026-03-13 13:45     ` Alexandre Ghiti
  0 siblings, 0 replies; 22+ messages in thread
From: Alexandre Ghiti @ 2026-03-13 13:45 UTC (permalink / raw)
  To: David Hildenbrand (Arm), akpm
  Cc: alexghiti, kernel-team, akinobu.mita, lorenzo.stoakes,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, hannes, zhengqi.arch,
	shakeel.butt, axelrasmussen, yuanchu, weixugc, gourry, apopple,
	byungchul, joshua.hahnjy, matthew.brost, rakie.kim, ying.huang,
	ziy, linux-mm, linux-kernel

Hi David,

On 3/12/26 13:56, David Hildenbrand (Arm) wrote:
> On 3/11/26 12:02, Alexandre Ghiti wrote:
>> Let's have all the demotion functions in this file, no functional
>> change intended.
>>
>> Suggested-by: Gregory Price <gourry@gourry.net>
>> Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
>> ---
>>   include/linux/memory-tiers.h | 18 ++++++++
>>   mm/memory-tiers.c            | 75 +++++++++++++++++++++++++++++++++
>>   mm/vmscan.c                  | 80 +-----------------------------------
>>   3 files changed, 94 insertions(+), 79 deletions(-)
>>
>> diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h
>> index 96987d9d95a8..0bf0d002939e 100644
>> --- a/include/linux/memory-tiers.h
>> +++ b/include/linux/memory-tiers.h
>> @@ -56,6 +56,9 @@ void mt_put_memory_types(struct list_head *memory_types);
>>   int next_demotion_node(int node, const nodemask_t *allowed_mask);
>>   void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets);
>>   bool node_is_toptier(int node);
>> +unsigned int mt_demote_folios(struct list_head *demote_folios,
>> +			      struct pglist_data *pgdat,
>> +			      struct mem_cgroup *memcg);
>>   #else
>>   static inline int next_demotion_node(int node, const nodemask_t *allowed_mask)
>>   {
>> @@ -71,6 +74,14 @@ static inline bool node_is_toptier(int node)
>>   {
>>   	return true;
>>   }
>> +
>> +static inline unsigned int mt_demote_folios(struct list_head *demote_folios,
>> +					    struct pglist_data *pgdat,
>> +					    struct mem_cgroup *memcg)
> use two-tab indentation on second parameter line please. So this fits
> into a single line. Same for the other functions.
>
> Just like alloc_demote_folio() that you are moving already did.


Will do.


>
> [...]
>
>> -static struct folio *alloc_demote_folio(struct folio *src,
>> -		unsigned long private)
>> -{
>> -	struct folio *dst;
>> -	nodemask_t *allowed_mask;
>> -	struct migration_target_control *mtc;
>> -
>> -	mtc = (struct migration_target_control *)private;
>> -
>> -	allowed_mask = mtc->nmask;
>> -	/*
>> -	 * make sure we allocate from the target node first also trying to
>> -	 * demote or reclaim pages from the target node via kswapd if we are
>> -	 * low on free memory on target node. If we don't do this and if
>> -	 * we have free memory on the slower(lower) memtier, we would start
>> -	 * allocating pages from slower(lower) memory tiers without even forcing
>> -	 * a demotion of cold pages from the target memtier. This can result
>> -	 * in the kernel placing hot pages in slower(lower) memory tiers.
>> -	 */
>> -	mtc->nmask = NULL;
>> -	mtc->gfp_mask |= __GFP_THISNODE;
>> -	dst = alloc_migration_target(src, (unsigned long)mtc);
>> -	if (dst)
>> -		return dst;
>> -
>> -	mtc->gfp_mask &= ~__GFP_THISNODE;
>> -	mtc->nmask = allowed_mask;
>> -
> I think this function changed in the meantime in mm/mm-unstable. Against
> which branch is this patch?


Against Linus v7.0-rc3. I have just checked and you're right, I missed 
this modification, I'll rebase against mm-unstable.

Thanks,

Alex

>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/4] mm: Rename node_get_allowed_targets() to make it more explicit
  2026-03-12 12:58     ` David Hildenbrand (Arm)
@ 2026-03-13 13:46       ` Alexandre Ghiti
  0 siblings, 0 replies; 22+ messages in thread
From: Alexandre Ghiti @ 2026-03-13 13:46 UTC (permalink / raw)
  To: David Hildenbrand (Arm), Byungchul Park
  Cc: akpm, alexghiti, kernel-team, akinobu.mita, lorenzo.stoakes,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, hannes, zhengqi.arch,
	shakeel.butt, axelrasmussen, yuanchu, weixugc, gourry, apopple,
	joshua.hahnjy, matthew.brost, rakie.kim, ying.huang, ziy,
	linux-mm, linux-kernel, kernel_team

Hi David, Byungchul,

On 3/12/26 13:58, David Hildenbrand (Arm) wrote:
> On 3/12/26 06:28, Byungchul Park wrote:
>> On Wed, Mar 11, 2026 at 12:02:41PM +0100, Alexandre Ghiti wrote:
>>> This function actually returns the tier nodes that are targeted during a
>>> demotion, so rename it to be more explicit.
>>>
>>> No functional change intended.
>>>
>>> Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
>>> ---
>>>   include/linux/memory-tiers.h | 6 +++---
>>>   mm/memory-tiers.c            | 4 ++--
>>>   mm/vmscan.c                  | 2 +-
>>>   3 files changed, 6 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h
>>> index 0bf0d002939e..ec39dc3c39e6 100644
>>> --- a/include/linux/memory-tiers.h
>>> +++ b/include/linux/memory-tiers.h
>>> @@ -54,7 +54,7 @@ struct memory_dev_type *mt_find_alloc_memory_type(int adist,
>>>   void mt_put_memory_types(struct list_head *memory_types);
>>>   #ifdef CONFIG_MIGRATION
>>>   int next_demotion_node(int node, const nodemask_t *allowed_mask);
>>> -void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets);
>>> +void node_get_allowed_demotion_targets(pg_data_t *pgdat, nodemask_t *targets);
>> Look better than before to me.
>>
>> What about just node_get_demotion_targets()?
> +1
>
> Maybe throw in the mt_ prefix and call it
>
> mt_get_node_demotion_targets()
>

I'll do, thanks

Alex


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/4] mm: Fix demotion gfp by clearing GFP_RECLAIM after setting GFP_TRANSHUGE
  2026-03-11 17:06   ` Andrew Morton
  2026-03-12 12:59     ` David Hildenbrand (Arm)
@ 2026-03-13 13:47     ` Alexandre Ghiti
  1 sibling, 0 replies; 22+ messages in thread
From: Alexandre Ghiti @ 2026-03-13 13:47 UTC (permalink / raw)
  To: Andrew Morton
  Cc: alexghiti, kernel-team, akinobu.mita, david, lorenzo.stoakes,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, hannes, zhengqi.arch,
	shakeel.butt, axelrasmussen, yuanchu, weixugc, gourry, apopple,
	byungchul, joshua.hahnjy, matthew.brost, rakie.kim, ying.huang,
	ziy, linux-mm, linux-kernel, Bing Jiao, stable

Hi Andrew,

On 3/11/26 18:06, Andrew Morton wrote:
> On Wed, 11 Mar 2026 12:02:42 +0100 Alexandre Ghiti <alex@ghiti.fr> wrote:
>
>> Fixes: 9933a0c8a539 ("mm/migrate: clear __GFP_RECLAIM to make the migration callback consistent with regular THP allocations")
>> Cc: stable@vger.kernel.org
> Please let's have the cc:stable fixes separated out from the cleanups,
> and prepared against current -linus mainline.


I'll split the series in the next version.


>
> Also, when proposing backportable fixes please ensure that the
> changelogs carefully describe the userspace-visible runtime effects of
> the bug.


I was unaware of that requirement, I'll do.

Thanks,

Alex


>
> Thanks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/4] mm: Fix demotion gfp by clearing GFP_RECLAIM after setting GFP_TRANSHUGE
  2026-03-11 17:54   ` Johannes Weiner
  2026-03-12 16:01     ` Gregory Price
@ 2026-03-13 13:49     ` Alexandre Ghiti
  1 sibling, 0 replies; 22+ messages in thread
From: Alexandre Ghiti @ 2026-03-13 13:49 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: akpm, alexghiti, kernel-team, akinobu.mita, david,
	lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
	zhengqi.arch, shakeel.butt, axelrasmussen, yuanchu, weixugc,
	gourry, apopple, byungchul, joshua.hahnjy, matthew.brost,
	rakie.kim, ying.huang, ziy, linux-mm, linux-kernel, Bing Jiao,
	stable

Hi Johannes,

On 3/11/26 18:54, Johannes Weiner wrote:
> On Wed, Mar 11, 2026 at 12:02:42PM +0100, Alexandre Ghiti wrote:
>> GFP_TRANSHUGE sets __GFP_DIRECT_RECLAIM so we must clear GFP_RECLAIM
>> after, not before.
>>
>> Reported-by: Bing Jiao <bingjiao@google.com>
>> Closes: https://lore.kernel.org/linux-mm/aXlKOxGGI9zne8sl@google.com/
>> Fixes: 9933a0c8a539 ("mm/migrate: clear __GFP_RECLAIM to make the migration callback consistent with regular THP allocations")
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
>> ---
>>   mm/migrate.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index 2c3d489ecf51..ee533a4d38db 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -2190,12 +2190,12 @@ struct folio *alloc_migration_target(struct folio *src, unsigned long private)
>>   	}
>>   
>>   	if (folio_test_large(src)) {
>> +		gfp_mask |= GFP_TRANSHUGE;
>>   		/*
>>   		 * clear __GFP_RECLAIM to make the migration callback
>>   		 * consistent with regular THP allocations.
>>   		 */
>>   		gfp_mask &= ~__GFP_RECLAIM;
>> -		gfp_mask |= GFP_TRANSHUGE;
> I don't think this is right.
>
> The Fixes: did it this way to disable kswapd for THP allocations,
> while still allowing the customary direct reclaim. Maybe a better
> comment would have been: /* GFP_TRANSHUGE has its own reclaim policy */
>
> After your fix, direct reclaim isn't allowed either, which makes the
> request unnecessarily wimpy.
>
> The Closes: refers to reclaim that should be avoided during demotion.
> But if this path is taken during demotion it will already not recurse
> into direct reclaim due to PF_MEMALLOC.
>
> So I don't see a bug in the existing code. But maybe the comment could
> be clearer.


Makes sense, I had not understood the comment indeed. I'll drop this fix 
in the next version then.

Thanks,

Alex


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2026-03-13 13:56 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-11 11:02 [PATCH 0/4] Demotion cleanup and fixes Alexandre Ghiti
2026-03-11 11:02 ` [PATCH 1/4] mm: Move demotion related functions in memory-tiers.c Alexandre Ghiti
2026-03-11 14:55   ` Joshua Hahn
2026-03-13 13:33     ` Alexandre Ghiti
2026-03-12  8:44   ` Donet Tom
2026-03-13 13:27     ` Alexandre Ghiti
2026-03-12 12:56   ` David Hildenbrand (Arm)
2026-03-13 13:45     ` Alexandre Ghiti
2026-03-11 11:02 ` [PATCH 2/4] mm: Rename node_get_allowed_targets() to make it more explicit Alexandre Ghiti
2026-03-11 15:02   ` Joshua Hahn
2026-03-12  5:28   ` Byungchul Park
2026-03-12 12:58     ` David Hildenbrand (Arm)
2026-03-13 13:46       ` Alexandre Ghiti
2026-03-12  8:46   ` Donet Tom
2026-03-11 11:02 ` [PATCH 3/4] mm: Fix demotion gfp by clearing GFP_RECLAIM after setting GFP_TRANSHUGE Alexandre Ghiti
2026-03-11 17:06   ` Andrew Morton
2026-03-12 12:59     ` David Hildenbrand (Arm)
2026-03-13 13:47     ` Alexandre Ghiti
2026-03-11 17:54   ` Johannes Weiner
2026-03-12 16:01     ` Gregory Price
2026-03-13 13:49     ` Alexandre Ghiti
2026-03-11 11:02 ` [PATCH 4/4] mm: Fix demotion gfp by preserving initial gfp reclaim policy Alexandre Ghiti

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox