cgroups.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] reparent the THP split queue
@ 2025-09-19  3:46 Qi Zheng
  2025-09-19  3:46 ` [PATCH 1/4] mm: thp: replace folio_memcg() with folio_memcg_charged() Qi Zheng
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Qi Zheng @ 2025-09-19  3:46 UTC (permalink / raw)
  To: hannes, hughd, mhocko, roman.gushchin, shakeel.butt, muchun.song,
	david, lorenzo.stoakes, ziy, baolin.wang, Liam.Howlett, npache,
	ryan.roberts, dev.jain, baohua, lance.yang, akpm
  Cc: linux-mm, linux-kernel, cgroups, Qi Zheng

Hi all,

In the future, we will reparent LRU folios during memcg offline to eliminate
dying memory cgroups, which requires reparenting the THP split queue to its
parent memcg.

Similar to list_lru, the split queue is relatively independent and does not need
to be reparented along with objcg and LRU folios (holding objcg lock and lru
lock). Therefore, we can apply the same mechanism as list_lru to reparent the
split queue first when memcg is offine.

The first three patches in this series are separated from the series
"Eliminate Dying Memory Cgroup" [1], mainly to do some cleanup and preparatory
work. The changes to them are as follows:
 - fix bad unlock balance in [PATCH RFC 06/28]
 - fix the missing cleanup of partially_mapped state and counter in
   [PATCH RFC 07/28]
 - collect Acked-bys

The last patch reparents the THP split queue to its parent memcg during memcg
offline.

This series is based on the next-20250917.

Comments and suggestions are welcome!

Thanks,
Qi

[1]. https://lore.kernel.org/all/20250415024532.26632-1-songmuchun@bytedance.com/

Muchun Song (3):
  mm: thp: replace folio_memcg() with folio_memcg_charged()
  mm: thp: introduce folio_split_queue_lock and its variants
  mm: thp: use folio_batch to handle THP splitting in
    deferred_split_scan()

Qi Zheng (1):
  mm: thp: reparent the split queue during memcg offline

 include/linux/huge_mm.h    |   1 +
 include/linux/memcontrol.h |  10 ++
 include/linux/mmzone.h     |   1 +
 mm/huge_memory.c           | 218 ++++++++++++++++++++++++-------------
 mm/memcontrol.c            |   1 +
 mm/mm_init.c               |   1 +
 6 files changed, 157 insertions(+), 75 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/4] mm: thp: replace folio_memcg() with folio_memcg_charged()
  2025-09-19  3:46 [PATCH 0/4] reparent the THP split queue Qi Zheng
@ 2025-09-19  3:46 ` Qi Zheng
  2025-09-19 21:30   ` Shakeel Butt
  2025-09-22  8:17   ` David Hildenbrand
  2025-09-19  3:46 ` [PATCH 2/4] mm: thp: introduce folio_split_queue_lock and its variants Qi Zheng
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 17+ messages in thread
From: Qi Zheng @ 2025-09-19  3:46 UTC (permalink / raw)
  To: hannes, hughd, mhocko, roman.gushchin, shakeel.butt, muchun.song,
	david, lorenzo.stoakes, ziy, baolin.wang, Liam.Howlett, npache,
	ryan.roberts, dev.jain, baohua, lance.yang, akpm
  Cc: linux-mm, linux-kernel, cgroups, Muchun Song, Qi Zheng

From: Muchun Song <songmuchun@bytedance.com>

folio_memcg_charged() is intended for use when the user is unconcerned
about the returned memcg pointer. It is more efficient than folio_memcg().
Therefore, replace folio_memcg() with folio_memcg_charged().

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
---
 mm/huge_memory.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 5acca24bbabbe..582628ddf3f33 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -4014,7 +4014,7 @@ bool __folio_unqueue_deferred_split(struct folio *folio)
 	bool unqueued = false;
 
 	WARN_ON_ONCE(folio_ref_count(folio));
-	WARN_ON_ONCE(!mem_cgroup_disabled() && !folio_memcg(folio));
+	WARN_ON_ONCE(!mem_cgroup_disabled() && !folio_memcg_charged(folio));
 
 	ds_queue = get_deferred_split_queue(folio);
 	spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/4] mm: thp: introduce folio_split_queue_lock and its variants
  2025-09-19  3:46 [PATCH 0/4] reparent the THP split queue Qi Zheng
  2025-09-19  3:46 ` [PATCH 1/4] mm: thp: replace folio_memcg() with folio_memcg_charged() Qi Zheng
@ 2025-09-19  3:46 ` Qi Zheng
  2025-09-19 15:39   ` Zi Yan
                     ` (3 more replies)
  2025-09-19  3:46 ` [PATCH 3/4] mm: thp: use folio_batch to handle THP splitting in deferred_split_scan() Qi Zheng
                   ` (2 subsequent siblings)
  4 siblings, 4 replies; 17+ messages in thread
From: Qi Zheng @ 2025-09-19  3:46 UTC (permalink / raw)
  To: hannes, hughd, mhocko, roman.gushchin, shakeel.butt, muchun.song,
	david, lorenzo.stoakes, ziy, baolin.wang, Liam.Howlett, npache,
	ryan.roberts, dev.jain, baohua, lance.yang, akpm
  Cc: linux-mm, linux-kernel, cgroups, Muchun Song, Qi Zheng

From: Muchun Song <songmuchun@bytedance.com>

In future memcg removal, the binding between a folio and a memcg may
change, making the split lock within the memcg unstable when held.

A new approach is required to reparent the split queue to its parent. This
patch starts introducing a unified way to acquire the split lock for
future work.

It's a code-only refactoring with no functional changes.

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
---
 include/linux/memcontrol.h | 10 +++++
 mm/huge_memory.c           | 89 ++++++++++++++++++++++++++------------
 2 files changed, 71 insertions(+), 28 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 16fe0306e50ea..99876af13c315 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -1662,6 +1662,11 @@ int alloc_shrinker_info(struct mem_cgroup *memcg);
 void free_shrinker_info(struct mem_cgroup *memcg);
 void set_shrinker_bit(struct mem_cgroup *memcg, int nid, int shrinker_id);
 void reparent_shrinker_deferred(struct mem_cgroup *memcg);
+
+static inline int shrinker_id(struct shrinker *shrinker)
+{
+	return shrinker->id;
+}
 #else
 #define mem_cgroup_sockets_enabled 0
 
@@ -1693,6 +1698,11 @@ static inline void set_shrinker_bit(struct mem_cgroup *memcg,
 				    int nid, int shrinker_id)
 {
 }
+
+static inline int shrinker_id(struct shrinker *shrinker)
+{
+	return -1;
+}
 #endif
 
 #ifdef CONFIG_MEMCG
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 582628ddf3f33..d34516a22f5bb 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1078,26 +1078,62 @@ pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma)
 
 #ifdef CONFIG_MEMCG
 static inline
-struct deferred_split *get_deferred_split_queue(struct folio *folio)
+struct mem_cgroup *folio_split_queue_memcg(struct folio *folio,
+					   struct deferred_split *queue)
 {
-	struct mem_cgroup *memcg = folio_memcg(folio);
-	struct pglist_data *pgdat = NODE_DATA(folio_nid(folio));
-
-	if (memcg)
-		return &memcg->deferred_split_queue;
-	else
-		return &pgdat->deferred_split_queue;
+	if (mem_cgroup_disabled())
+		return NULL;
+	if (&NODE_DATA(folio_nid(folio))->deferred_split_queue == queue)
+		return NULL;
+	return container_of(queue, struct mem_cgroup, deferred_split_queue);
 }
 #else
 static inline
-struct deferred_split *get_deferred_split_queue(struct folio *folio)
+struct mem_cgroup *folio_split_queue_memcg(struct folio *folio,
+					   struct deferred_split *queue)
 {
-	struct pglist_data *pgdat = NODE_DATA(folio_nid(folio));
-
-	return &pgdat->deferred_split_queue;
+	return NULL;
 }
 #endif
 
+static struct deferred_split *folio_split_queue_lock(struct folio *folio)
+{
+	struct mem_cgroup *memcg;
+	struct deferred_split *queue;
+
+	memcg = folio_memcg(folio);
+	queue = memcg ? &memcg->deferred_split_queue :
+			&NODE_DATA(folio_nid(folio))->deferred_split_queue;
+	spin_lock(&queue->split_queue_lock);
+
+	return queue;
+}
+
+static struct deferred_split *
+folio_split_queue_lock_irqsave(struct folio *folio, unsigned long *flags)
+{
+	struct mem_cgroup *memcg;
+	struct deferred_split *queue;
+
+	memcg = folio_memcg(folio);
+	queue = memcg ? &memcg->deferred_split_queue :
+			&NODE_DATA(folio_nid(folio))->deferred_split_queue;
+	spin_lock_irqsave(&queue->split_queue_lock, *flags);
+
+	return queue;
+}
+
+static inline void split_queue_unlock(struct deferred_split *queue)
+{
+	spin_unlock(&queue->split_queue_lock);
+}
+
+static inline void split_queue_unlock_irqrestore(struct deferred_split *queue,
+						 unsigned long flags)
+{
+	spin_unlock_irqrestore(&queue->split_queue_lock, flags);
+}
+
 static inline bool is_transparent_hugepage(const struct folio *folio)
 {
 	if (!folio_test_large(folio))
@@ -3579,7 +3615,7 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
 		struct page *split_at, struct page *lock_at,
 		struct list_head *list, bool uniform_split)
 {
-	struct deferred_split *ds_queue = get_deferred_split_queue(folio);
+	struct deferred_split *ds_queue;
 	XA_STATE(xas, &folio->mapping->i_pages, folio->index);
 	struct folio *end_folio = folio_next(folio);
 	bool is_anon = folio_test_anon(folio);
@@ -3718,7 +3754,7 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
 	}
 
 	/* Prevent deferred_split_scan() touching ->_refcount */
-	spin_lock(&ds_queue->split_queue_lock);
+	ds_queue = folio_split_queue_lock(folio);
 	if (folio_ref_freeze(folio, 1 + extra_pins)) {
 		struct swap_cluster_info *ci = NULL;
 		struct lruvec *lruvec;
@@ -3740,7 +3776,7 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
 			 */
 			list_del_init(&folio->_deferred_list);
 		}
-		spin_unlock(&ds_queue->split_queue_lock);
+		split_queue_unlock(ds_queue);
 		if (mapping) {
 			int nr = folio_nr_pages(folio);
 
@@ -3835,7 +3871,7 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
 		if (ci)
 			swap_cluster_unlock(ci);
 	} else {
-		spin_unlock(&ds_queue->split_queue_lock);
+		split_queue_unlock(ds_queue);
 		ret = -EAGAIN;
 	}
 fail:
@@ -4016,8 +4052,7 @@ bool __folio_unqueue_deferred_split(struct folio *folio)
 	WARN_ON_ONCE(folio_ref_count(folio));
 	WARN_ON_ONCE(!mem_cgroup_disabled() && !folio_memcg_charged(folio));
 
-	ds_queue = get_deferred_split_queue(folio);
-	spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
+	ds_queue = folio_split_queue_lock_irqsave(folio, &flags);
 	if (!list_empty(&folio->_deferred_list)) {
 		ds_queue->split_queue_len--;
 		if (folio_test_partially_mapped(folio)) {
@@ -4028,7 +4063,7 @@ bool __folio_unqueue_deferred_split(struct folio *folio)
 		list_del_init(&folio->_deferred_list);
 		unqueued = true;
 	}
-	spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
+	split_queue_unlock_irqrestore(ds_queue, flags);
 
 	return unqueued;	/* useful for debug warnings */
 }
@@ -4036,10 +4071,7 @@ bool __folio_unqueue_deferred_split(struct folio *folio)
 /* partially_mapped=false won't clear PG_partially_mapped folio flag */
 void deferred_split_folio(struct folio *folio, bool partially_mapped)
 {
-	struct deferred_split *ds_queue = get_deferred_split_queue(folio);
-#ifdef CONFIG_MEMCG
-	struct mem_cgroup *memcg = folio_memcg(folio);
-#endif
+	struct deferred_split *ds_queue;
 	unsigned long flags;
 
 	/*
@@ -4062,7 +4094,7 @@ void deferred_split_folio(struct folio *folio, bool partially_mapped)
 	if (folio_test_swapcache(folio))
 		return;
 
-	spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
+	ds_queue = folio_split_queue_lock_irqsave(folio, &flags);
 	if (partially_mapped) {
 		if (!folio_test_partially_mapped(folio)) {
 			folio_set_partially_mapped(folio);
@@ -4077,15 +4109,16 @@ void deferred_split_folio(struct folio *folio, bool partially_mapped)
 		VM_WARN_ON_FOLIO(folio_test_partially_mapped(folio), folio);
 	}
 	if (list_empty(&folio->_deferred_list)) {
+		struct mem_cgroup *memcg;
+
+		memcg = folio_split_queue_memcg(folio, ds_queue);
 		list_add_tail(&folio->_deferred_list, &ds_queue->split_queue);
 		ds_queue->split_queue_len++;
-#ifdef CONFIG_MEMCG
 		if (memcg)
 			set_shrinker_bit(memcg, folio_nid(folio),
-					 deferred_split_shrinker->id);
-#endif
+					 shrinker_id(deferred_split_shrinker));
 	}
-	spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
+	split_queue_unlock_irqrestore(ds_queue, flags);
 }
 
 static unsigned long deferred_split_count(struct shrinker *shrink,
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/4] mm: thp: use folio_batch to handle THP splitting in deferred_split_scan()
  2025-09-19  3:46 [PATCH 0/4] reparent the THP split queue Qi Zheng
  2025-09-19  3:46 ` [PATCH 1/4] mm: thp: replace folio_memcg() with folio_memcg_charged() Qi Zheng
  2025-09-19  3:46 ` [PATCH 2/4] mm: thp: introduce folio_split_queue_lock and its variants Qi Zheng
@ 2025-09-19  3:46 ` Qi Zheng
  2025-09-22  8:43   ` David Hildenbrand
  2025-09-19  3:46 ` [PATCH 4/4] mm: thp: reparent the split queue during memcg offline Qi Zheng
  2025-09-19 21:33 ` [PATCH 0/4] reparent the THP split queue Shakeel Butt
  4 siblings, 1 reply; 17+ messages in thread
From: Qi Zheng @ 2025-09-19  3:46 UTC (permalink / raw)
  To: hannes, hughd, mhocko, roman.gushchin, shakeel.butt, muchun.song,
	david, lorenzo.stoakes, ziy, baolin.wang, Liam.Howlett, npache,
	ryan.roberts, dev.jain, baohua, lance.yang, akpm
  Cc: linux-mm, linux-kernel, cgroups, Muchun Song, Qi Zheng

From: Muchun Song <songmuchun@bytedance.com>

The maintenance of the folio->_deferred_list is intricate because it's
reused in a local list.

Here are some peculiarities:

   1) When a folio is removed from its split queue and added to a local
      on-stack list in deferred_split_scan(), the ->split_queue_len isn't
      updated, leading to an inconsistency between it and the actual
      number of folios in the split queue.

   2) When the folio is split via split_folio() later, it's removed from
      the local list while holding the split queue lock. At this time,
      this lock protects the local list, not the split queue.

   3) To handle the race condition with a third-party freeing or migrating
      the preceding folio, we must ensure there's always one safe (with
      raised refcount) folio before by delaying its folio_put(). More
      details can be found in commit e66f3185fa04 ("mm/thp: fix deferred
      split queue not partially_mapped"). It's rather tricky.

We can use the folio_batch infrastructure to handle this clearly. In this
case, ->split_queue_len will be consistent with the real number of folios
in the split queue. If list_empty(&folio->_deferred_list) returns false,
it's clear the folio must be in its split queue (not in a local list
anymore).

In the future, we will reparent LRU folios during memcg offline to
eliminate dying memory cgroups, which requires reparenting the split queue
to its parent first. So this patch prepares for using
folio_split_queue_lock_irqsave() as the memcg may change then.

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
---
 mm/huge_memory.c | 88 +++++++++++++++++++++++-------------------------
 1 file changed, 42 insertions(+), 46 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index d34516a22f5bb..ab16da21c94e0 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3760,21 +3760,22 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
 		struct lruvec *lruvec;
 		int expected_refs;
 
-		if (folio_order(folio) > 1 &&
-		    !list_empty(&folio->_deferred_list)) {
-			ds_queue->split_queue_len--;
+		if (folio_order(folio) > 1) {
+			if (!list_empty(&folio->_deferred_list)) {
+				ds_queue->split_queue_len--;
+				/*
+				 * Reinitialize page_deferred_list after removing the
+				 * page from the split_queue, otherwise a subsequent
+				 * split will see list corruption when checking the
+				 * page_deferred_list.
+				 */
+				list_del_init(&folio->_deferred_list);
+			}
 			if (folio_test_partially_mapped(folio)) {
 				folio_clear_partially_mapped(folio);
 				mod_mthp_stat(folio_order(folio),
 					      MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1);
 			}
-			/*
-			 * Reinitialize page_deferred_list after removing the
-			 * page from the split_queue, otherwise a subsequent
-			 * split will see list corruption when checking the
-			 * page_deferred_list.
-			 */
-			list_del_init(&folio->_deferred_list);
 		}
 		split_queue_unlock(ds_queue);
 		if (mapping) {
@@ -4173,40 +4174,48 @@ static unsigned long deferred_split_scan(struct shrinker *shrink,
 	struct pglist_data *pgdata = NODE_DATA(sc->nid);
 	struct deferred_split *ds_queue = &pgdata->deferred_split_queue;
 	unsigned long flags;
-	LIST_HEAD(list);
-	struct folio *folio, *next, *prev = NULL;
-	int split = 0, removed = 0;
+	struct folio *folio, *next;
+	int split = 0, i;
+	struct folio_batch fbatch;
+	bool done;
 
 #ifdef CONFIG_MEMCG
 	if (sc->memcg)
 		ds_queue = &sc->memcg->deferred_split_queue;
 #endif
 
+	folio_batch_init(&fbatch);
+retry:
+	done = true;
 	spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
 	/* Take pin on all head pages to avoid freeing them under us */
 	list_for_each_entry_safe(folio, next, &ds_queue->split_queue,
 							_deferred_list) {
 		if (folio_try_get(folio)) {
-			list_move(&folio->_deferred_list, &list);
-		} else {
+			folio_batch_add(&fbatch, folio);
+		} else if (folio_test_partially_mapped(folio)) {
 			/* We lost race with folio_put() */
-			if (folio_test_partially_mapped(folio)) {
-				folio_clear_partially_mapped(folio);
-				mod_mthp_stat(folio_order(folio),
-					      MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1);
-			}
-			list_del_init(&folio->_deferred_list);
-			ds_queue->split_queue_len--;
+			folio_clear_partially_mapped(folio);
+			mod_mthp_stat(folio_order(folio),
+				      MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1);
 		}
+		list_del_init(&folio->_deferred_list);
+		ds_queue->split_queue_len--;
 		if (!--sc->nr_to_scan)
 			break;
+		if (folio_batch_space(&fbatch) == 0) {
+			done = false;
+			break;
+		}
 	}
 	spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
 
-	list_for_each_entry_safe(folio, next, &list, _deferred_list) {
+	for (i = 0; i < folio_batch_count(&fbatch); i++) {
 		bool did_split = false;
 		bool underused = false;
+		struct deferred_split *fqueue;
 
+		folio = fbatch.folios[i];
 		if (!folio_test_partially_mapped(folio)) {
 			/*
 			 * See try_to_map_unused_to_zeropage(): we cannot
@@ -4229,38 +4238,25 @@ static unsigned long deferred_split_scan(struct shrinker *shrink,
 		}
 		folio_unlock(folio);
 next:
+		if (did_split || !folio_test_partially_mapped(folio))
+			continue;
 		/*
-		 * split_folio() removes folio from list on success.
 		 * Only add back to the queue if folio is partially mapped.
 		 * If thp_underused returns false, or if split_folio fails
 		 * in the case it was underused, then consider it used and
 		 * don't add it back to split_queue.
 		 */
-		if (did_split) {
-			; /* folio already removed from list */
-		} else if (!folio_test_partially_mapped(folio)) {
-			list_del_init(&folio->_deferred_list);
-			removed++;
-		} else {
-			/*
-			 * That unlocked list_del_init() above would be unsafe,
-			 * unless its folio is separated from any earlier folios
-			 * left on the list (which may be concurrently unqueued)
-			 * by one safe folio with refcount still raised.
-			 */
-			swap(folio, prev);
+		fqueue = folio_split_queue_lock_irqsave(folio, &flags);
+		if (list_empty(&folio->_deferred_list)) {
+			list_add_tail(&folio->_deferred_list, &fqueue->split_queue);
+			fqueue->split_queue_len++;
 		}
-		if (folio)
-			folio_put(folio);
+		split_queue_unlock_irqrestore(fqueue, flags);
 	}
+	folios_put(&fbatch);
 
-	spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
-	list_splice_tail(&list, &ds_queue->split_queue);
-	ds_queue->split_queue_len -= removed;
-	spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
-
-	if (prev)
-		folio_put(prev);
+	if (!done)
+		goto retry;
 
 	/*
 	 * Stop shrinker if we didn't split any page, but the queue is empty.
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/4] mm: thp: reparent the split queue during memcg offline
  2025-09-19  3:46 [PATCH 0/4] reparent the THP split queue Qi Zheng
                   ` (2 preceding siblings ...)
  2025-09-19  3:46 ` [PATCH 3/4] mm: thp: use folio_batch to handle THP splitting in deferred_split_scan() Qi Zheng
@ 2025-09-19  3:46 ` Qi Zheng
  2025-09-20  7:43   ` kernel test robot
  2025-09-19 21:33 ` [PATCH 0/4] reparent the THP split queue Shakeel Butt
  4 siblings, 1 reply; 17+ messages in thread
From: Qi Zheng @ 2025-09-19  3:46 UTC (permalink / raw)
  To: hannes, hughd, mhocko, roman.gushchin, shakeel.butt, muchun.song,
	david, lorenzo.stoakes, ziy, baolin.wang, Liam.Howlett, npache,
	ryan.roberts, dev.jain, baohua, lance.yang, akpm
  Cc: linux-mm, linux-kernel, cgroups, Qi Zheng

In the future, we will reparent LRU folios during memcg offline to
eliminate dying memory cgroups, which requires reparenting the split queue
to its parent.

Similar to list_lru, the split queue is relatively independent and does
not need to be reparented along with objcg and LRU folios (holding
objcg lock and lru lock). So let's apply the same mechanism as list_lru
to reparent the split queue separately when memcg is offine.

Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
---
 include/linux/huge_mm.h |  1 +
 include/linux/mmzone.h  |  1 +
 mm/huge_memory.c        | 39 +++++++++++++++++++++++++++++++++++++++
 mm/memcontrol.c         |  1 +
 mm/mm_init.c            |  1 +
 5 files changed, 43 insertions(+)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index f327d62fc9852..3215a35a20411 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -417,6 +417,7 @@ static inline int split_huge_page(struct page *page)
 	return split_huge_page_to_list_to_order(page, NULL, ret);
 }
 void deferred_split_folio(struct folio *folio, bool partially_mapped);
+void reparent_deferred_split_queue(struct mem_cgroup *memcg);
 
 void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
 		unsigned long address, bool freeze);
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 7fb7331c57250..f3eb81fee056a 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1346,6 +1346,7 @@ struct deferred_split {
 	spinlock_t split_queue_lock;
 	struct list_head split_queue;
 	unsigned long split_queue_len;
+	bool is_dying;
 };
 #endif
 
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index ab16da21c94e0..72e78d22ec4b2 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1102,9 +1102,15 @@ static struct deferred_split *folio_split_queue_lock(struct folio *folio)
 	struct deferred_split *queue;
 
 	memcg = folio_memcg(folio);
+retry:
 	queue = memcg ? &memcg->deferred_split_queue :
 			&NODE_DATA(folio_nid(folio))->deferred_split_queue;
 	spin_lock(&queue->split_queue_lock);
+	if (unlikely(queue->is_dying == true)) {
+		spin_unlock(&queue->split_queue_lock);
+		memcg = parent_mem_cgroup(memcg);
+		goto retry;
+	}
 
 	return queue;
 }
@@ -1116,9 +1122,15 @@ folio_split_queue_lock_irqsave(struct folio *folio, unsigned long *flags)
 	struct deferred_split *queue;
 
 	memcg = folio_memcg(folio);
+retry:
 	queue = memcg ? &memcg->deferred_split_queue :
 			&NODE_DATA(folio_nid(folio))->deferred_split_queue;
 	spin_lock_irqsave(&queue->split_queue_lock, *flags);
+	if (unlikely(queue->is_dying == true)) {
+		spin_unlock_irqrestore(&queue->split_queue_lock, *flags);
+		memcg = parent_mem_cgroup(memcg);
+		goto retry;
+	}
 
 	return queue;
 }
@@ -4267,6 +4279,33 @@ static unsigned long deferred_split_scan(struct shrinker *shrink,
 	return split;
 }
 
+void reparent_deferred_split_queue(struct mem_cgroup *memcg)
+{
+	struct mem_cgroup *parent = parent_mem_cgroup(memcg);
+	struct deferred_split *ds_queue = &memcg->deferred_split_queue;
+	struct deferred_split *parent_ds_queue = &parent->deferred_split_queue;
+	int nid;
+
+	spin_lock_irq(&ds_queue->split_queue_lock);
+	spin_lock_nested(&parent_ds_queue->split_queue_lock, SINGLE_DEPTH_NESTING);
+
+	if (!ds_queue->split_queue_len)
+		goto unlock;
+
+	list_splice_tail_init(&ds_queue->split_queue, &parent_ds_queue->split_queue);
+	parent_ds_queue->split_queue_len += ds_queue->split_queue_len;
+	ds_queue->split_queue_len = 0;
+	/* Mark the ds_queue dead */
+	ds_queue->is_dying = true;
+
+	for_each_node(nid)
+		set_shrinker_bit(parent, nid, shrinker_id(deferred_split_shrinker));
+
+unlock:
+	spin_unlock(&parent_ds_queue->split_queue_lock);
+	spin_unlock_irq(&ds_queue->split_queue_lock);
+}
+
 #ifdef CONFIG_DEBUG_FS
 static void split_huge_pages_all(void)
 {
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index e090f29eb03bd..d03da72e7585d 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3887,6 +3887,7 @@ static void mem_cgroup_css_offline(struct cgroup_subsys_state *css)
 	zswap_memcg_offline_cleanup(memcg);
 
 	memcg_offline_kmem(memcg);
+	reparent_deferred_split_queue(memcg);
 	reparent_shrinker_deferred(memcg);
 	wb_memcg_offline(memcg);
 	lru_gen_offline_memcg(memcg);
diff --git a/mm/mm_init.c b/mm/mm_init.c
index 3db2dea7db4c5..cbda5c2ee3241 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -1387,6 +1387,7 @@ static void pgdat_init_split_queue(struct pglist_data *pgdat)
 	spin_lock_init(&ds_queue->split_queue_lock);
 	INIT_LIST_HEAD(&ds_queue->split_queue);
 	ds_queue->split_queue_len = 0;
+	ds_queue->is_dying = false;
 }
 #else
 static void pgdat_init_split_queue(struct pglist_data *pgdat) {}
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/4] mm: thp: introduce folio_split_queue_lock and its variants
  2025-09-19  3:46 ` [PATCH 2/4] mm: thp: introduce folio_split_queue_lock and its variants Qi Zheng
@ 2025-09-19 15:39   ` Zi Yan
  2025-09-22  7:56     ` Qi Zheng
  2025-09-20  0:49   ` Shakeel Butt
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 17+ messages in thread
From: Zi Yan @ 2025-09-19 15:39 UTC (permalink / raw)
  To: Qi Zheng
  Cc: hannes, hughd, mhocko, roman.gushchin, shakeel.butt, muchun.song,
	david, lorenzo.stoakes, baolin.wang, Liam.Howlett, npache,
	ryan.roberts, dev.jain, baohua, lance.yang, akpm, linux-mm,
	linux-kernel, cgroups, Muchun Song

On 18 Sep 2025, at 23:46, Qi Zheng wrote:

> From: Muchun Song <songmuchun@bytedance.com>
>
> In future memcg removal, the binding between a folio and a memcg may
> change, making the split lock within the memcg unstable when held.
>
> A new approach is required to reparent the split queue to its parent. This
> patch starts introducing a unified way to acquire the split lock for
> future work.
>
> It's a code-only refactoring with no functional changes.
>
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
> ---
>  include/linux/memcontrol.h | 10 +++++
>  mm/huge_memory.c           | 89 ++++++++++++++++++++++++++------------
>  2 files changed, 71 insertions(+), 28 deletions(-)
>
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 16fe0306e50ea..99876af13c315 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -1662,6 +1662,11 @@ int alloc_shrinker_info(struct mem_cgroup *memcg);
>  void free_shrinker_info(struct mem_cgroup *memcg);
>  void set_shrinker_bit(struct mem_cgroup *memcg, int nid, int shrinker_id);
>  void reparent_shrinker_deferred(struct mem_cgroup *memcg);
> +
> +static inline int shrinker_id(struct shrinker *shrinker)
> +{
> +	return shrinker->id;
> +}
>  #else
>  #define mem_cgroup_sockets_enabled 0
>
> @@ -1693,6 +1698,11 @@ static inline void set_shrinker_bit(struct mem_cgroup *memcg,
>  				    int nid, int shrinker_id)
>  {
>  }
> +
> +static inline int shrinker_id(struct shrinker *shrinker)
> +{
> +	return -1;
> +}
>  #endif
>
>  #ifdef CONFIG_MEMCG
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 582628ddf3f33..d34516a22f5bb 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1078,26 +1078,62 @@ pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma)
>
>  #ifdef CONFIG_MEMCG
>  static inline
> -struct deferred_split *get_deferred_split_queue(struct folio *folio)
> +struct mem_cgroup *folio_split_queue_memcg(struct folio *folio,
> +					   struct deferred_split *queue)
>  {
> -	struct mem_cgroup *memcg = folio_memcg(folio);
> -	struct pglist_data *pgdat = NODE_DATA(folio_nid(folio));
> -
> -	if (memcg)
> -		return &memcg->deferred_split_queue;
> -	else
> -		return &pgdat->deferred_split_queue;
> +	if (mem_cgroup_disabled())
> +		return NULL;
> +	if (&NODE_DATA(folio_nid(folio))->deferred_split_queue == queue)
> +		return NULL;
> +	return container_of(queue, struct mem_cgroup, deferred_split_queue);
>  }
>  #else
>  static inline
> -struct deferred_split *get_deferred_split_queue(struct folio *folio)
> +struct mem_cgroup *folio_split_queue_memcg(struct folio *folio,
> +					   struct deferred_split *queue)
>  {
> -	struct pglist_data *pgdat = NODE_DATA(folio_nid(folio));
> -
> -	return &pgdat->deferred_split_queue;
> +	return NULL;
>  }
>  #endif
>
> +static struct deferred_split *folio_split_queue_lock(struct folio *folio)
> +{
> +	struct mem_cgroup *memcg;
> +	struct deferred_split *queue;
> +
> +	memcg = folio_memcg(folio);
> +	queue = memcg ? &memcg->deferred_split_queue :
> +			&NODE_DATA(folio_nid(folio))->deferred_split_queue;
> +	spin_lock(&queue->split_queue_lock);
> +
> +	return queue;
> +}
> +
> +static struct deferred_split *
> +folio_split_queue_lock_irqsave(struct folio *folio, unsigned long *flags)
> +{
> +	struct mem_cgroup *memcg;
> +	struct deferred_split *queue;
> +
> +	memcg = folio_memcg(folio);
> +	queue = memcg ? &memcg->deferred_split_queue :
> +			&NODE_DATA(folio_nid(folio))->deferred_split_queue;
> +	spin_lock_irqsave(&queue->split_queue_lock, *flags);
> +
> +	return queue;
> +}

A helper function to get queue from a folio would get rid of duplicated
code in the two functions above. Hmm, that is the deleted
get_deferred_split_queue(). So probably retain it.

Otherwise, LGTM. Reviewed-by: Zi Yan <ziy@nvidia.com>

Best Regards,
Yan, Zi

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/4] mm: thp: replace folio_memcg() with folio_memcg_charged()
  2025-09-19  3:46 ` [PATCH 1/4] mm: thp: replace folio_memcg() with folio_memcg_charged() Qi Zheng
@ 2025-09-19 21:30   ` Shakeel Butt
  2025-09-22  8:17   ` David Hildenbrand
  1 sibling, 0 replies; 17+ messages in thread
From: Shakeel Butt @ 2025-09-19 21:30 UTC (permalink / raw)
  To: Qi Zheng
  Cc: hannes, hughd, mhocko, roman.gushchin, muchun.song, david,
	lorenzo.stoakes, ziy, baolin.wang, Liam.Howlett, npache,
	ryan.roberts, dev.jain, baohua, lance.yang, akpm, linux-mm,
	linux-kernel, cgroups, Muchun Song

On Fri, Sep 19, 2025 at 11:46:32AM +0800, Qi Zheng wrote:
> From: Muchun Song <songmuchun@bytedance.com>
> 
> folio_memcg_charged() is intended for use when the user is unconcerned
> about the returned memcg pointer. It is more efficient than folio_memcg().
> Therefore, replace folio_memcg() with folio_memcg_charged().
> 
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>

Acked-by: Shakeel Butt <shakeel.butt@linux.dev>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/4] reparent the THP split queue
  2025-09-19  3:46 [PATCH 0/4] reparent the THP split queue Qi Zheng
                   ` (3 preceding siblings ...)
  2025-09-19  3:46 ` [PATCH 4/4] mm: thp: reparent the split queue during memcg offline Qi Zheng
@ 2025-09-19 21:33 ` Shakeel Butt
  2025-09-22  7:51   ` Qi Zheng
  4 siblings, 1 reply; 17+ messages in thread
From: Shakeel Butt @ 2025-09-19 21:33 UTC (permalink / raw)
  To: Qi Zheng
  Cc: hannes, hughd, mhocko, roman.gushchin, muchun.song, david,
	lorenzo.stoakes, ziy, baolin.wang, Liam.Howlett, npache,
	ryan.roberts, dev.jain, baohua, lance.yang, akpm, linux-mm,
	linux-kernel, cgroups

Hi Qi,

On Fri, Sep 19, 2025 at 11:46:31AM +0800, Qi Zheng wrote:
> Hi all,
> 
> In the future, we will reparent LRU folios during memcg offline to eliminate
> dying memory cgroups,

Will you be driving this reparent LRU effort or will Muchun be driving
it? I think it is really important work and I would really like to get
this upstreamed sooner than later.

thanks,
Shakeel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/4] mm: thp: introduce folio_split_queue_lock and its variants
  2025-09-19  3:46 ` [PATCH 2/4] mm: thp: introduce folio_split_queue_lock and its variants Qi Zheng
  2025-09-19 15:39   ` Zi Yan
@ 2025-09-20  0:49   ` Shakeel Butt
  2025-09-20  8:27   ` kernel test robot
  2025-09-22  8:20   ` David Hildenbrand
  3 siblings, 0 replies; 17+ messages in thread
From: Shakeel Butt @ 2025-09-20  0:49 UTC (permalink / raw)
  To: Qi Zheng
  Cc: hannes, hughd, mhocko, roman.gushchin, muchun.song, david,
	lorenzo.stoakes, ziy, baolin.wang, Liam.Howlett, npache,
	ryan.roberts, dev.jain, baohua, lance.yang, akpm, linux-mm,
	linux-kernel, cgroups, Muchun Song

On Fri, Sep 19, 2025 at 11:46:33AM +0800, Qi Zheng wrote:
> From: Muchun Song <songmuchun@bytedance.com>
> 
> In future memcg removal, the binding between a folio and a memcg may
> change, making the split lock within the memcg unstable when held.
> 
> A new approach is required to reparent the split queue to its parent. This
> patch starts introducing a unified way to acquire the split lock for
> future work.
> 
> It's a code-only refactoring with no functional changes.
> 
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>

Acked-by: Shakeel Butt <shakeel.butt@linux.dev>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/4] mm: thp: reparent the split queue during memcg offline
  2025-09-19  3:46 ` [PATCH 4/4] mm: thp: reparent the split queue during memcg offline Qi Zheng
@ 2025-09-20  7:43   ` kernel test robot
  0 siblings, 0 replies; 17+ messages in thread
From: kernel test robot @ 2025-09-20  7:43 UTC (permalink / raw)
  To: Qi Zheng, hannes, hughd, mhocko, roman.gushchin, shakeel.butt,
	muchun.song, david, lorenzo.stoakes, ziy, baolin.wang,
	Liam.Howlett, npache, ryan.roberts, dev.jain, baohua, lance.yang,
	akpm
  Cc: llvm, oe-kbuild-all, linux-mm, linux-kernel, cgroups, Qi Zheng

Hi Qi,

kernel test robot noticed the following build errors:

[auto build test ERROR on next-20250918]
[also build test ERROR on v6.17-rc6]
[cannot apply to akpm-mm/mm-everything rppt-memblock/for-next rppt-memblock/fixes linus/master v6.17-rc6 v6.17-rc5 v6.17-rc4]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Qi-Zheng/mm-thp-replace-folio_memcg-with-folio_memcg_charged/20250919-115219
base:   next-20250918
patch link:    https://lore.kernel.org/r/bbe3bf8bfce081fdf0815481b2a0c83b89b095b8.1758253018.git.zhengqi.arch%40bytedance.com
patch subject: [PATCH 4/4] mm: thp: reparent the split queue during memcg offline
config: x86_64-randconfig-004-20250920 (https://download.01.org/0day-ci/archive/20250920/202509201556.QAvYX4v3-lkp@intel.com/config)
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250920/202509201556.QAvYX4v3-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202509201556.QAvYX4v3-lkp@intel.com/

All errors (new ones prefixed by >>):

>> mm/memcontrol.c:3890:2: error: call to undeclared function 'reparent_deferred_split_queue'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
    3890 |         reparent_deferred_split_queue(memcg);
         |         ^
   1 error generated.


vim +/reparent_deferred_split_queue +3890 mm/memcontrol.c

  3877	
  3878	static void mem_cgroup_css_offline(struct cgroup_subsys_state *css)
  3879	{
  3880		struct mem_cgroup *memcg = mem_cgroup_from_css(css);
  3881	
  3882		memcg1_css_offline(memcg);
  3883	
  3884		page_counter_set_min(&memcg->memory, 0);
  3885		page_counter_set_low(&memcg->memory, 0);
  3886	
  3887		zswap_memcg_offline_cleanup(memcg);
  3888	
  3889		memcg_offline_kmem(memcg);
> 3890		reparent_deferred_split_queue(memcg);
  3891		reparent_shrinker_deferred(memcg);
  3892		wb_memcg_offline(memcg);
  3893		lru_gen_offline_memcg(memcg);
  3894	
  3895		drain_all_stock(memcg);
  3896	
  3897		mem_cgroup_id_put(memcg);
  3898	}
  3899	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/4] mm: thp: introduce folio_split_queue_lock and its variants
  2025-09-19  3:46 ` [PATCH 2/4] mm: thp: introduce folio_split_queue_lock and its variants Qi Zheng
  2025-09-19 15:39   ` Zi Yan
  2025-09-20  0:49   ` Shakeel Butt
@ 2025-09-20  8:27   ` kernel test robot
  2025-09-22  8:20   ` David Hildenbrand
  3 siblings, 0 replies; 17+ messages in thread
From: kernel test robot @ 2025-09-20  8:27 UTC (permalink / raw)
  To: Qi Zheng, hannes, hughd, mhocko, roman.gushchin, shakeel.butt,
	muchun.song, david, lorenzo.stoakes, ziy, baolin.wang,
	Liam.Howlett, npache, ryan.roberts, dev.jain, baohua, lance.yang,
	akpm
  Cc: llvm, oe-kbuild-all, linux-mm, linux-kernel, cgroups, Muchun Song,
	Qi Zheng

Hi Qi,

kernel test robot noticed the following build errors:

[auto build test ERROR on next-20250918]
[also build test ERROR on v6.17-rc6]
[cannot apply to akpm-mm/mm-everything rppt-memblock/for-next rppt-memblock/fixes linus/master v6.17-rc6 v6.17-rc5 v6.17-rc4]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Qi-Zheng/mm-thp-replace-folio_memcg-with-folio_memcg_charged/20250919-115219
base:   next-20250918
patch link:    https://lore.kernel.org/r/eb072e71cc39a0ea915347f39f2af29d2e82897f.1758253018.git.zhengqi.arch%40bytedance.com
patch subject: [PATCH 2/4] mm: thp: introduce folio_split_queue_lock and its variants
config: x86_64-buildonly-randconfig-001-20250920 (https://download.01.org/0day-ci/archive/20250920/202509201640.ikJun7XS-lkp@intel.com/config)
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250920/202509201640.ikJun7XS-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202509201640.ikJun7XS-lkp@intel.com/

All errors (new ones prefixed by >>):

>> mm/huge_memory.c:1105:24: error: incomplete definition of type 'struct mem_cgroup'
    1105 |         queue = memcg ? &memcg->deferred_split_queue :
         |                          ~~~~~^
   include/linux/mm_types.h:35:8: note: forward declaration of 'struct mem_cgroup'
      35 | struct mem_cgroup;
         |        ^
   mm/huge_memory.c:1119:24: error: incomplete definition of type 'struct mem_cgroup'
    1119 |         queue = memcg ? &memcg->deferred_split_queue :
         |                          ~~~~~^
   include/linux/mm_types.h:35:8: note: forward declaration of 'struct mem_cgroup'
      35 | struct mem_cgroup;
         |        ^
   2 errors generated.


vim +1105 mm/huge_memory.c

  1098	
  1099	static struct deferred_split *folio_split_queue_lock(struct folio *folio)
  1100	{
  1101		struct mem_cgroup *memcg;
  1102		struct deferred_split *queue;
  1103	
  1104		memcg = folio_memcg(folio);
> 1105		queue = memcg ? &memcg->deferred_split_queue :
  1106				&NODE_DATA(folio_nid(folio))->deferred_split_queue;
  1107		spin_lock(&queue->split_queue_lock);
  1108	
  1109		return queue;
  1110	}
  1111	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/4] reparent the THP split queue
  2025-09-19 21:33 ` [PATCH 0/4] reparent the THP split queue Shakeel Butt
@ 2025-09-22  7:51   ` Qi Zheng
  0 siblings, 0 replies; 17+ messages in thread
From: Qi Zheng @ 2025-09-22  7:51 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: hannes, hughd, mhocko, roman.gushchin, muchun.song, david,
	lorenzo.stoakes, ziy, baolin.wang, Liam.Howlett, npache,
	ryan.roberts, dev.jain, baohua, lance.yang, akpm, linux-mm,
	linux-kernel, cgroups, Harry Yoo

Hi Shakeel,

On 9/20/25 5:33 AM, Shakeel Butt wrote:
> Hi Qi,
> 
> On Fri, Sep 19, 2025 at 11:46:31AM +0800, Qi Zheng wrote:
>> Hi all,
>>
>> In the future, we will reparent LRU folios during memcg offline to eliminate
>> dying memory cgroups,
> 
> Will you be driving this reparent LRU effort or will Muchun be driving
> it? I think it is really important work and I would really like to get
> this upstreamed sooner than later.

I will work with Muchun to drive it. And we are also discussing some
solutions for adapting MGLRU with Harry Yoo (private email).

Oh, I forgot to cc Harry in this series.

+cc Harry Yoo.

Thanks,
Qi

> 
> thanks,
> Shakeel


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/4] mm: thp: introduce folio_split_queue_lock and its variants
  2025-09-19 15:39   ` Zi Yan
@ 2025-09-22  7:56     ` Qi Zheng
  0 siblings, 0 replies; 17+ messages in thread
From: Qi Zheng @ 2025-09-22  7:56 UTC (permalink / raw)
  To: Zi Yan
  Cc: hannes, hughd, mhocko, roman.gushchin, shakeel.butt, muchun.song,
	david, lorenzo.stoakes, baolin.wang, Liam.Howlett, npache,
	ryan.roberts, dev.jain, baohua, lance.yang, akpm, linux-mm,
	linux-kernel, cgroups, Muchun Song

Hi Zi,

On 9/19/25 11:39 PM, Zi Yan wrote:
> On 18 Sep 2025, at 23:46, Qi Zheng wrote:
> 
>> From: Muchun Song <songmuchun@bytedance.com>
>>
>> In future memcg removal, the binding between a folio and a memcg may
>> change, making the split lock within the memcg unstable when held.
>>
>> A new approach is required to reparent the split queue to its parent. This
>> patch starts introducing a unified way to acquire the split lock for
>> future work.
>>
>> It's a code-only refactoring with no functional changes.
>>
>> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
>> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
>> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
>> ---
>>   include/linux/memcontrol.h | 10 +++++
>>   mm/huge_memory.c           | 89 ++++++++++++++++++++++++++------------
>>   2 files changed, 71 insertions(+), 28 deletions(-)
>>
>> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
>> index 16fe0306e50ea..99876af13c315 100644
>> --- a/include/linux/memcontrol.h
>> +++ b/include/linux/memcontrol.h
>> @@ -1662,6 +1662,11 @@ int alloc_shrinker_info(struct mem_cgroup *memcg);
>>   void free_shrinker_info(struct mem_cgroup *memcg);
>>   void set_shrinker_bit(struct mem_cgroup *memcg, int nid, int shrinker_id);
>>   void reparent_shrinker_deferred(struct mem_cgroup *memcg);
>> +
>> +static inline int shrinker_id(struct shrinker *shrinker)
>> +{
>> +	return shrinker->id;
>> +}
>>   #else
>>   #define mem_cgroup_sockets_enabled 0
>>
>> @@ -1693,6 +1698,11 @@ static inline void set_shrinker_bit(struct mem_cgroup *memcg,
>>   				    int nid, int shrinker_id)
>>   {
>>   }
>> +
>> +static inline int shrinker_id(struct shrinker *shrinker)
>> +{
>> +	return -1;
>> +}
>>   #endif
>>
>>   #ifdef CONFIG_MEMCG
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 582628ddf3f33..d34516a22f5bb 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -1078,26 +1078,62 @@ pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma)
>>
>>   #ifdef CONFIG_MEMCG
>>   static inline
>> -struct deferred_split *get_deferred_split_queue(struct folio *folio)
>> +struct mem_cgroup *folio_split_queue_memcg(struct folio *folio,
>> +					   struct deferred_split *queue)
>>   {
>> -	struct mem_cgroup *memcg = folio_memcg(folio);
>> -	struct pglist_data *pgdat = NODE_DATA(folio_nid(folio));
>> -
>> -	if (memcg)
>> -		return &memcg->deferred_split_queue;
>> -	else
>> -		return &pgdat->deferred_split_queue;
>> +	if (mem_cgroup_disabled())
>> +		return NULL;
>> +	if (&NODE_DATA(folio_nid(folio))->deferred_split_queue == queue)
>> +		return NULL;
>> +	return container_of(queue, struct mem_cgroup, deferred_split_queue);
>>   }
>>   #else
>>   static inline
>> -struct deferred_split *get_deferred_split_queue(struct folio *folio)
>> +struct mem_cgroup *folio_split_queue_memcg(struct folio *folio,
>> +					   struct deferred_split *queue)
>>   {
>> -	struct pglist_data *pgdat = NODE_DATA(folio_nid(folio));
>> -
>> -	return &pgdat->deferred_split_queue;
>> +	return NULL;
>>   }
>>   #endif
>>
>> +static struct deferred_split *folio_split_queue_lock(struct folio *folio)
>> +{
>> +	struct mem_cgroup *memcg;
>> +	struct deferred_split *queue;
>> +
>> +	memcg = folio_memcg(folio);
>> +	queue = memcg ? &memcg->deferred_split_queue :
>> +			&NODE_DATA(folio_nid(folio))->deferred_split_queue;
>> +	spin_lock(&queue->split_queue_lock);
>> +
>> +	return queue;
>> +}
>> +
>> +static struct deferred_split *
>> +folio_split_queue_lock_irqsave(struct folio *folio, unsigned long *flags)
>> +{
>> +	struct mem_cgroup *memcg;
>> +	struct deferred_split *queue;
>> +
>> +	memcg = folio_memcg(folio);
>> +	queue = memcg ? &memcg->deferred_split_queue :
>> +			&NODE_DATA(folio_nid(folio))->deferred_split_queue;
>> +	spin_lock_irqsave(&queue->split_queue_lock, *flags);
>> +
>> +	return queue;
>> +}
> 
> A helper function to get queue from a folio would get rid of duplicated
> code in the two functions above. Hmm, that is the deleted
> get_deferred_split_queue(). So probably retain it.

After PATCH #4, we may retry after getting parent memcg in the above
functions. so we may not need to retrieve get_deferred_split_queue().

> 
> Otherwise, LGTM. Reviewed-by: Zi Yan <ziy@nvidia.com>

Thanks!

> 
> Best Regards,
> Yan, Zi


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/4] mm: thp: replace folio_memcg() with folio_memcg_charged()
  2025-09-19  3:46 ` [PATCH 1/4] mm: thp: replace folio_memcg() with folio_memcg_charged() Qi Zheng
  2025-09-19 21:30   ` Shakeel Butt
@ 2025-09-22  8:17   ` David Hildenbrand
  1 sibling, 0 replies; 17+ messages in thread
From: David Hildenbrand @ 2025-09-22  8:17 UTC (permalink / raw)
  To: Qi Zheng, hannes, hughd, mhocko, roman.gushchin, shakeel.butt,
	muchun.song, lorenzo.stoakes, ziy, baolin.wang, Liam.Howlett,
	npache, ryan.roberts, dev.jain, baohua, lance.yang, akpm
  Cc: linux-mm, linux-kernel, cgroups, Muchun Song

On 19.09.25 05:46, Qi Zheng wrote:
> From: Muchun Song <songmuchun@bytedance.com>
> 
> folio_memcg_charged() is intended for use when the user is unconcerned
> about the returned memcg pointer. It is more efficient than folio_memcg().
> Therefore, replace folio_memcg() with folio_memcg_charged().
> 
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
> ---

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Cheers

David / dhildenb


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/4] mm: thp: introduce folio_split_queue_lock and its variants
  2025-09-19  3:46 ` [PATCH 2/4] mm: thp: introduce folio_split_queue_lock and its variants Qi Zheng
                     ` (2 preceding siblings ...)
  2025-09-20  8:27   ` kernel test robot
@ 2025-09-22  8:20   ` David Hildenbrand
  3 siblings, 0 replies; 17+ messages in thread
From: David Hildenbrand @ 2025-09-22  8:20 UTC (permalink / raw)
  To: Qi Zheng, hannes, hughd, mhocko, roman.gushchin, shakeel.butt,
	muchun.song, lorenzo.stoakes, ziy, baolin.wang, Liam.Howlett,
	npache, ryan.roberts, dev.jain, baohua, lance.yang, akpm
  Cc: linux-mm, linux-kernel, cgroups, Muchun Song

On 19.09.25 05:46, Qi Zheng wrote:
> From: Muchun Song <songmuchun@bytedance.com>
> 
> In future memcg removal, the binding between a folio and a memcg may
> change, making the split lock within the memcg unstable when held.
> 
> A new approach is required to reparent the split queue to its parent. This
> patch starts introducing a unified way to acquire the split lock for
> future work.
> 
> It's a code-only refactoring with no functional changes.
> 
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
> ---

Looks sane, I assume the build issue is easily fixed

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Cheers

David / dhildenb


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/4] mm: thp: use folio_batch to handle THP splitting in deferred_split_scan()
  2025-09-19  3:46 ` [PATCH 3/4] mm: thp: use folio_batch to handle THP splitting in deferred_split_scan() Qi Zheng
@ 2025-09-22  8:43   ` David Hildenbrand
  2025-09-22 11:36     ` Qi Zheng
  0 siblings, 1 reply; 17+ messages in thread
From: David Hildenbrand @ 2025-09-22  8:43 UTC (permalink / raw)
  To: Qi Zheng, hannes, hughd, mhocko, roman.gushchin, shakeel.butt,
	muchun.song, lorenzo.stoakes, ziy, baolin.wang, Liam.Howlett,
	npache, ryan.roberts, dev.jain, baohua, lance.yang, akpm
  Cc: linux-mm, linux-kernel, cgroups, Muchun Song

On 19.09.25 05:46, Qi Zheng wrote:
> From: Muchun Song <songmuchun@bytedance.com>
> 
> The maintenance of the folio->_deferred_list is intricate because it's
> reused in a local list.
> 
> Here are some peculiarities:
> 
>     1) When a folio is removed from its split queue and added to a local
>        on-stack list in deferred_split_scan(), the ->split_queue_len isn't
>        updated, leading to an inconsistency between it and the actual
>        number of folios in the split queue.

deferred_split_count() will now return "0" even though there might be 
concurrent scanning going on. I assume that's okay because we are not 
returning SHRINK_EMPTY (which is a difference).

> 
>     2) When the folio is split via split_folio() later, it's removed from
>        the local list while holding the split queue lock. At this time,
>        this lock protects the local list, not the split queue.
> 
>     3) To handle the race condition with a third-party freeing or migrating
>        the preceding folio, we must ensure there's always one safe (with
>        raised refcount) folio before by delaying its folio_put(). More
>        details can be found in commit e66f3185fa04 ("mm/thp: fix deferred
>        split queue not partially_mapped"). It's rather tricky.
> 
> We can use the folio_batch infrastructure to handle this clearly. In this
> case, ->split_queue_len will be consistent with the real number of folios
> in the split queue. If list_empty(&folio->_deferred_list) returns false,
> it's clear the folio must be in its split queue (not in a local list
> anymore).
> 
> In the future, we will reparent LRU folios during memcg offline to
> eliminate dying memory cgroups, which requires reparenting the split queue
> to its parent first. So this patch prepares for using
> folio_split_queue_lock_irqsave() as the memcg may change then.
> 
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
> ---
>   mm/huge_memory.c | 88 +++++++++++++++++++++++-------------------------
>   1 file changed, 42 insertions(+), 46 deletions(-)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index d34516a22f5bb..ab16da21c94e0 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -3760,21 +3760,22 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>   		struct lruvec *lruvec;
>   		int expected_refs;
>   
> -		if (folio_order(folio) > 1 &&
> -		    !list_empty(&folio->_deferred_list)) {
> -			ds_queue->split_queue_len--;
> +		if (folio_order(folio) > 1) {
> +			if (!list_empty(&folio->_deferred_list)) {
> +				ds_queue->split_queue_len--;
> +				/*
> +				 * Reinitialize page_deferred_list after removing the
> +				 * page from the split_queue, otherwise a subsequent
> +				 * split will see list corruption when checking the
> +				 * page_deferred_list.
> +				 */
> +				list_del_init(&folio->_deferred_list);
> +			}
>   			if (folio_test_partially_mapped(folio)) {
>   				folio_clear_partially_mapped(folio);
>   				mod_mthp_stat(folio_order(folio),
>   					      MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1);
>   			}
> -			/*
> -			 * Reinitialize page_deferred_list after removing the
> -			 * page from the split_queue, otherwise a subsequent
> -			 * split will see list corruption when checking the
> -			 * page_deferred_list.
> -			 */
> -			list_del_init(&folio->_deferred_list);
>   		}

BTW I am not sure about holding the split_queue_lock before freezing the 
refcount (comment above the freeze):

freezing should properly sync against the folio_try_get(): one of them 
would fail.

So not sure if that is still required. But I recall something nasty 
regarding that :)


>   		split_queue_unlock(ds_queue);
>   		if (mapping) {
> @@ -4173,40 +4174,48 @@ static unsigned long deferred_split_scan(struct shrinker *shrink,
>   	struct pglist_data *pgdata = NODE_DATA(sc->nid);
>   	struct deferred_split *ds_queue = &pgdata->deferred_split_queue;
>   	unsigned long flags;
> -	LIST_HEAD(list);
> -	struct folio *folio, *next, *prev = NULL;
> -	int split = 0, removed = 0;
> +	struct folio *folio, *next;
> +	int split = 0, i;
> +	struct folio_batch fbatch;
> +	bool done;

Is "done" really required? Can't we just use sc->nr_to_scan tos ee if 
there is work remaining to be done so we retry?

>   
>   #ifdef CONFIG_MEMCG
>   	if (sc->memcg)
>   		ds_queue = &sc->memcg->deferred_split_queue;
>   #endif
>   
> +	folio_batch_init(&fbatch);
> +retry:
> +	done = true;
>   	spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
>   	/* Take pin on all head pages to avoid freeing them under us */
>   	list_for_each_entry_safe(folio, next, &ds_queue->split_queue,
>   							_deferred_list) {
>   		if (folio_try_get(folio)) {
> -			list_move(&folio->_deferred_list, &list);
> -		} else {
> +			folio_batch_add(&fbatch, folio);
> +		} else if (folio_test_partially_mapped(folio)) {
>   			/* We lost race with folio_put() */
> -			if (folio_test_partially_mapped(folio)) {
> -				folio_clear_partially_mapped(folio);
> -				mod_mthp_stat(folio_order(folio),
> -					      MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1);
> -			}
> -			list_del_init(&folio->_deferred_list);
> -			ds_queue->split_queue_len--;
> +			folio_clear_partially_mapped(folio);
> +			mod_mthp_stat(folio_order(folio),
> +				      MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1);
>   		}
> +		list_del_init(&folio->_deferred_list);
> +		ds_queue->split_queue_len--;
>   		if (!--sc->nr_to_scan)
>   			break;
> +		if (folio_batch_space(&fbatch) == 0) {

Nit: if (!folio_batch_space(&fbatch)) {


-- 
Cheers

David / dhildenb


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/4] mm: thp: use folio_batch to handle THP splitting in deferred_split_scan()
  2025-09-22  8:43   ` David Hildenbrand
@ 2025-09-22 11:36     ` Qi Zheng
  0 siblings, 0 replies; 17+ messages in thread
From: Qi Zheng @ 2025-09-22 11:36 UTC (permalink / raw)
  To: David Hildenbrand, hannes, hughd, mhocko, roman.gushchin,
	shakeel.butt, muchun.song, lorenzo.stoakes, ziy, baolin.wang,
	Liam.Howlett, npache, ryan.roberts, dev.jain, baohua, lance.yang,
	akpm
  Cc: linux-mm, linux-kernel, cgroups, Muchun Song

Hi David,

On 9/22/25 4:43 PM, David Hildenbrand wrote:
> On 19.09.25 05:46, Qi Zheng wrote:
>> From: Muchun Song <songmuchun@bytedance.com>
>>
>> The maintenance of the folio->_deferred_list is intricate because it's
>> reused in a local list.
>>
>> Here are some peculiarities:
>>
>>     1) When a folio is removed from its split queue and added to a local
>>        on-stack list in deferred_split_scan(), the ->split_queue_len 
>> isn't
>>        updated, leading to an inconsistency between it and the actual
>>        number of folios in the split queue.
> 
> deferred_split_count() will now return "0" even though there might be 
> concurrent scanning going on. I assume that's okay because we are not 
> returning SHRINK_EMPTY (which is a difference).
> 
>>
>>     2) When the folio is split via split_folio() later, it's removed from
>>        the local list while holding the split queue lock. At this time,
>>        this lock protects the local list, not the split queue.
>>
>>     3) To handle the race condition with a third-party freeing or 
>> migrating
>>        the preceding folio, we must ensure there's always one safe (with
>>        raised refcount) folio before by delaying its folio_put(). More
>>        details can be found in commit e66f3185fa04 ("mm/thp: fix deferred
>>        split queue not partially_mapped"). It's rather tricky.
>>
>> We can use the folio_batch infrastructure to handle this clearly. In this
>> case, ->split_queue_len will be consistent with the real number of folios
>> in the split queue. If list_empty(&folio->_deferred_list) returns false,
>> it's clear the folio must be in its split queue (not in a local list
>> anymore).
>>
>> In the future, we will reparent LRU folios during memcg offline to
>> eliminate dying memory cgroups, which requires reparenting the split 
>> queue
>> to its parent first. So this patch prepares for using
>> folio_split_queue_lock_irqsave() as the memcg may change then.
>>
>> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
>> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
>> ---
>>   mm/huge_memory.c | 88 +++++++++++++++++++++++-------------------------
>>   1 file changed, 42 insertions(+), 46 deletions(-)
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index d34516a22f5bb..ab16da21c94e0 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -3760,21 +3760,22 @@ static int __folio_split(struct folio *folio, 
>> unsigned int new_order,
>>           struct lruvec *lruvec;
>>           int expected_refs;
>> -        if (folio_order(folio) > 1 &&
>> -            !list_empty(&folio->_deferred_list)) {
>> -            ds_queue->split_queue_len--;
>> +        if (folio_order(folio) > 1) {
>> +            if (!list_empty(&folio->_deferred_list)) {
>> +                ds_queue->split_queue_len--;
>> +                /*
>> +                 * Reinitialize page_deferred_list after removing the
>> +                 * page from the split_queue, otherwise a subsequent
>> +                 * split will see list corruption when checking the
>> +                 * page_deferred_list.
>> +                 */
>> +                list_del_init(&folio->_deferred_list);
>> +            }
>>               if (folio_test_partially_mapped(folio)) {
>>                   folio_clear_partially_mapped(folio);
>>                   mod_mthp_stat(folio_order(folio),
>>                             MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1);
>>               }
>> -            /*
>> -             * Reinitialize page_deferred_list after removing the
>> -             * page from the split_queue, otherwise a subsequent
>> -             * split will see list corruption when checking the
>> -             * page_deferred_list.
>> -             */
>> -            list_del_init(&folio->_deferred_list);
>>           }
> 
> BTW I am not sure about holding the split_queue_lock before freezing the 
> refcount (comment above the freeze):
> 
> freezing should properly sync against the folio_try_get(): one of them 
> would fail.
> 
> So not sure if that is still required. But I recall something nasty 
> regarding that :)

I'm not sure either, need some investigation.

> 
> 
>>           split_queue_unlock(ds_queue);
>>           if (mapping) {
>> @@ -4173,40 +4174,48 @@ static unsigned long 
>> deferred_split_scan(struct shrinker *shrink,
>>       struct pglist_data *pgdata = NODE_DATA(sc->nid);
>>       struct deferred_split *ds_queue = &pgdata->deferred_split_queue;
>>       unsigned long flags;
>> -    LIST_HEAD(list);
>> -    struct folio *folio, *next, *prev = NULL;
>> -    int split = 0, removed = 0;
>> +    struct folio *folio, *next;
>> +    int split = 0, i;
>> +    struct folio_batch fbatch;
>> +    bool done;
> 
> Is "done" really required? Can't we just use sc->nr_to_scan tos ee if 
> there is work remaining to be done so we retry?

I think you are right, will do in the next version.

> 
>>   #ifdef CONFIG_MEMCG
>>       if (sc->memcg)
>>           ds_queue = &sc->memcg->deferred_split_queue;
>>   #endif
>> +    folio_batch_init(&fbatch);
>> +retry:
>> +    done = true;
>>       spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
>>       /* Take pin on all head pages to avoid freeing them under us */
>>       list_for_each_entry_safe(folio, next, &ds_queue->split_queue,
>>                               _deferred_list) {
>>           if (folio_try_get(folio)) {
>> -            list_move(&folio->_deferred_list, &list);
>> -        } else {
>> +            folio_batch_add(&fbatch, folio);
>> +        } else if (folio_test_partially_mapped(folio)) {
>>               /* We lost race with folio_put() */
>> -            if (folio_test_partially_mapped(folio)) {
>> -                folio_clear_partially_mapped(folio);
>> -                mod_mthp_stat(folio_order(folio),
>> -                          MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1);
>> -            }
>> -            list_del_init(&folio->_deferred_list);
>> -            ds_queue->split_queue_len--;
>> +            folio_clear_partially_mapped(folio);
>> +            mod_mthp_stat(folio_order(folio),
>> +                      MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1);
>>           }
>> +        list_del_init(&folio->_deferred_list);
>> +        ds_queue->split_queue_len--;
>>           if (!--sc->nr_to_scan)
>>               break;
>> +        if (folio_batch_space(&fbatch) == 0) {
> 
> Nit: if (!folio_batch_space(&fbatch)) {

OK, will do.

Thanks,
Qi

> 
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2025-09-22 11:36 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-19  3:46 [PATCH 0/4] reparent the THP split queue Qi Zheng
2025-09-19  3:46 ` [PATCH 1/4] mm: thp: replace folio_memcg() with folio_memcg_charged() Qi Zheng
2025-09-19 21:30   ` Shakeel Butt
2025-09-22  8:17   ` David Hildenbrand
2025-09-19  3:46 ` [PATCH 2/4] mm: thp: introduce folio_split_queue_lock and its variants Qi Zheng
2025-09-19 15:39   ` Zi Yan
2025-09-22  7:56     ` Qi Zheng
2025-09-20  0:49   ` Shakeel Butt
2025-09-20  8:27   ` kernel test robot
2025-09-22  8:20   ` David Hildenbrand
2025-09-19  3:46 ` [PATCH 3/4] mm: thp: use folio_batch to handle THP splitting in deferred_split_scan() Qi Zheng
2025-09-22  8:43   ` David Hildenbrand
2025-09-22 11:36     ` Qi Zheng
2025-09-19  3:46 ` [PATCH 4/4] mm: thp: reparent the split queue during memcg offline Qi Zheng
2025-09-20  7:43   ` kernel test robot
2025-09-19 21:33 ` [PATCH 0/4] reparent the THP split queue Shakeel Butt
2025-09-22  7:51   ` Qi Zheng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).