public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed
* [PATCH v3 0/7] mm: switch THP shrinker to list_lru
@ 2026-03-18 19:53 Johannes Weiner
  2026-03-18 19:53 ` [PATCH v3 1/7] mm: list_lru: lock_list_lru_of_memcg() cannot return NULL if !skip_empty Johannes Weiner
                   ` (7 more replies)
  0 siblings, 8 replies; 29+ messages in thread
From: Johannes Weiner @ 2026-03-18 19:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Shakeel Butt, Yosry Ahmed, Zi Yan,
	Liam R. Howlett, Usama Arif, Kiryl Shutsemau, Dave Chinner,
	Roman Gushchin, linux-mm, linux-kernel

This is version 3 of switching the THP shrinker to list_lru.

Changes in v3:
- dedicated lockdep_key for irqsafe deferred_split_lru.lock (syzbot)
- conditional list_lru ops in __folio_freeze_and_split_unmapped() (syzbot)
- annotate runs of inscrutable false, NULL, false function arguments (David)
- rename to folio_memcg_list_lru_alloc() (David)

Changes in v2:
- explicit rcu_read_lock() in __folio_freeze_and_split_unmapped() (Usama)
- split out list_lru prep bits (Dave)

The open-coded deferred split queue has issues. It's not NUMA-aware
(when cgroup is enabled), and it's more complicated in the callsites
interacting with it. Switching to list_lru fixes the NUMA problem and
streamlines things. It also simplifies planned shrinker work.

Patches 1-4 are cleanups and small refactors in list_lru code. They're
basically independent, but make the THP shrinker conversion easier.

Patch 5 extends the list_lru API to allow the caller to control the
locking scope. The THP shrinker has private state it needs to keep
synchronized with the LRU state.

Patch 6 extends the list_lru API with a convenience helper to do
list_lru head allocation (memcg_list_lru_alloc) when coming from a
folio. Anon THPs are instantiated in several places, and with the
folio reparenting patches pending, folio_memcg() access is now a more
delicate dance. This avoids having to replicate that dance everywhere.

Patch 7 finally switches the deferred_split_queue to list_lru.

Based on mm-unstable.

 include/linux/huge_mm.h    |   6 +-
 include/linux/list_lru.h   |  46 ++++++
 include/linux/memcontrol.h |   4 -
 include/linux/mmzone.h     |  12 --
 mm/huge_memory.c           | 342 ++++++++++++++-----------------------------
 mm/internal.h              |   2 +-
 mm/khugepaged.c            |   7 +
 mm/list_lru.c              | 196 ++++++++++++++++---------
 mm/memcontrol.c            |  12 +-
 mm/memory.c                |  52 ++++---
 mm/mm_init.c               |  15 --
 11 files changed, 323 insertions(+), 371 deletions(-)



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v3 1/7] mm: list_lru: lock_list_lru_of_memcg() cannot return NULL if !skip_empty
  2026-03-18 19:53 [PATCH v3 0/7] mm: switch THP shrinker to list_lru Johannes Weiner
@ 2026-03-18 19:53 ` Johannes Weiner
  2026-03-18 20:12   ` Shakeel Butt
  2026-03-24 11:30   ` Lorenzo Stoakes (Oracle)
  2026-03-18 19:53 ` [PATCH v3 2/7] mm: list_lru: deduplicate unlock_list_lru() Johannes Weiner
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 29+ messages in thread
From: Johannes Weiner @ 2026-03-18 19:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Shakeel Butt, Yosry Ahmed, Zi Yan,
	Liam R. Howlett, Usama Arif, Kiryl Shutsemau, Dave Chinner,
	Roman Gushchin, linux-mm, linux-kernel

skip_empty is only for the shrinker to abort and skip a list that's
empty or whose cgroup is being deleted.

For list additions and deletions, the cgroup hierarchy is walked
upwards until a valid list_lru head is found, or it will fall back to
the node list. Acquiring the lock won't fail. Remove the NULL checks
in those callers.

Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/list_lru.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/mm/list_lru.c b/mm/list_lru.c
index 26463ae29c64..d96fd50fc9af 100644
--- a/mm/list_lru.c
+++ b/mm/list_lru.c
@@ -165,8 +165,6 @@ bool list_lru_add(struct list_lru *lru, struct list_head *item, int nid,
 	struct list_lru_one *l;
 
 	l = lock_list_lru_of_memcg(lru, nid, memcg, false, false);
-	if (!l)
-		return false;
 	if (list_empty(item)) {
 		list_add_tail(item, &l->list);
 		/* Set shrinker bit if the first element was added */
@@ -203,9 +201,8 @@ bool list_lru_del(struct list_lru *lru, struct list_head *item, int nid,
 {
 	struct list_lru_node *nlru = &lru->node[nid];
 	struct list_lru_one *l;
+
 	l = lock_list_lru_of_memcg(lru, nid, memcg, false, false);
-	if (!l)
-		return false;
 	if (!list_empty(item)) {
 		list_del_init(item);
 		l->nr_items--;
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 2/7] mm: list_lru: deduplicate unlock_list_lru()
  2026-03-18 19:53 [PATCH v3 0/7] mm: switch THP shrinker to list_lru Johannes Weiner
  2026-03-18 19:53 ` [PATCH v3 1/7] mm: list_lru: lock_list_lru_of_memcg() cannot return NULL if !skip_empty Johannes Weiner
@ 2026-03-18 19:53 ` Johannes Weiner
  2026-03-24 11:32   ` Lorenzo Stoakes (Oracle)
  2026-03-18 19:53 ` [PATCH v3 3/7] mm: list_lru: move list dead check to lock_list_lru_of_memcg() Johannes Weiner
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 29+ messages in thread
From: Johannes Weiner @ 2026-03-18 19:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Shakeel Butt, Yosry Ahmed, Zi Yan,
	Liam R. Howlett, Usama Arif, Kiryl Shutsemau, Dave Chinner,
	Roman Gushchin, linux-mm, linux-kernel

The MEMCG and !MEMCG variants are the same. lock_list_lru() has the
same pattern when bailing. Consolidate into a common implementation.

Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/list_lru.c | 29 +++++++++--------------------
 1 file changed, 9 insertions(+), 20 deletions(-)

diff --git a/mm/list_lru.c b/mm/list_lru.c
index d96fd50fc9af..e873bc26a7ef 100644
--- a/mm/list_lru.c
+++ b/mm/list_lru.c
@@ -15,6 +15,14 @@
 #include "slab.h"
 #include "internal.h"
 
+static inline void unlock_list_lru(struct list_lru_one *l, bool irq_off)
+{
+	if (irq_off)
+		spin_unlock_irq(&l->lock);
+	else
+		spin_unlock(&l->lock);
+}
+
 #ifdef CONFIG_MEMCG
 static LIST_HEAD(memcg_list_lrus);
 static DEFINE_MUTEX(list_lrus_mutex);
@@ -67,10 +75,7 @@ static inline bool lock_list_lru(struct list_lru_one *l, bool irq)
 	else
 		spin_lock(&l->lock);
 	if (unlikely(READ_ONCE(l->nr_items) == LONG_MIN)) {
-		if (irq)
-			spin_unlock_irq(&l->lock);
-		else
-			spin_unlock(&l->lock);
+		unlock_list_lru(l, irq);
 		return false;
 	}
 	return true;
@@ -101,14 +106,6 @@ lock_list_lru_of_memcg(struct list_lru *lru, int nid, struct mem_cgroup *memcg,
 	memcg = parent_mem_cgroup(memcg);
 	goto again;
 }
-
-static inline void unlock_list_lru(struct list_lru_one *l, bool irq_off)
-{
-	if (irq_off)
-		spin_unlock_irq(&l->lock);
-	else
-		spin_unlock(&l->lock);
-}
 #else
 static void list_lru_register(struct list_lru *lru)
 {
@@ -147,14 +144,6 @@ lock_list_lru_of_memcg(struct list_lru *lru, int nid, struct mem_cgroup *memcg,
 
 	return l;
 }
-
-static inline void unlock_list_lru(struct list_lru_one *l, bool irq_off)
-{
-	if (irq_off)
-		spin_unlock_irq(&l->lock);
-	else
-		spin_unlock(&l->lock);
-}
 #endif /* CONFIG_MEMCG */
 
 /* The caller must ensure the memcg lifetime. */
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 3/7] mm: list_lru: move list dead check to lock_list_lru_of_memcg()
  2026-03-18 19:53 [PATCH v3 0/7] mm: switch THP shrinker to list_lru Johannes Weiner
  2026-03-18 19:53 ` [PATCH v3 1/7] mm: list_lru: lock_list_lru_of_memcg() cannot return NULL if !skip_empty Johannes Weiner
  2026-03-18 19:53 ` [PATCH v3 2/7] mm: list_lru: deduplicate unlock_list_lru() Johannes Weiner
@ 2026-03-18 19:53 ` Johannes Weiner
  2026-03-18 20:20   ` Shakeel Butt
  2026-03-24 11:34   ` Lorenzo Stoakes (Oracle)
  2026-03-18 19:53 ` [PATCH v3 4/7] mm: list_lru: deduplicate lock_list_lru() Johannes Weiner
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 29+ messages in thread
From: Johannes Weiner @ 2026-03-18 19:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Shakeel Butt, Yosry Ahmed, Zi Yan,
	Liam R. Howlett, Usama Arif, Kiryl Shutsemau, Dave Chinner,
	Roman Gushchin, linux-mm, linux-kernel

Only the MEMCG variant of lock_list_lru() needs to check if there is a
race with cgroup deletion and list reparenting. Move the check to the
caller, so that the next patch can unify the lock_list_lru() variants.

Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/list_lru.c | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/mm/list_lru.c b/mm/list_lru.c
index e873bc26a7ef..1a39ff490643 100644
--- a/mm/list_lru.c
+++ b/mm/list_lru.c
@@ -68,17 +68,12 @@ list_lru_from_memcg_idx(struct list_lru *lru, int nid, int idx)
 	return &lru->node[nid].lru;
 }
 
-static inline bool lock_list_lru(struct list_lru_one *l, bool irq)
+static inline void lock_list_lru(struct list_lru_one *l, bool irq)
 {
 	if (irq)
 		spin_lock_irq(&l->lock);
 	else
 		spin_lock(&l->lock);
-	if (unlikely(READ_ONCE(l->nr_items) == LONG_MIN)) {
-		unlock_list_lru(l, irq);
-		return false;
-	}
-	return true;
 }
 
 static inline struct list_lru_one *
@@ -90,9 +85,13 @@ lock_list_lru_of_memcg(struct list_lru *lru, int nid, struct mem_cgroup *memcg,
 	rcu_read_lock();
 again:
 	l = list_lru_from_memcg_idx(lru, nid, memcg_kmem_id(memcg));
-	if (likely(l) && lock_list_lru(l, irq)) {
-		rcu_read_unlock();
-		return l;
+	if (likely(l)) {
+		lock_list_lru(l, irq);
+		if (likely(READ_ONCE(l->nr_items) != LONG_MIN)) {
+			rcu_read_unlock();
+			return l;
+		}
+		unlock_list_lru(l, irq);
 	}
 	/*
 	 * Caller may simply bail out if raced with reparenting or
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 4/7] mm: list_lru: deduplicate lock_list_lru()
  2026-03-18 19:53 [PATCH v3 0/7] mm: switch THP shrinker to list_lru Johannes Weiner
                   ` (2 preceding siblings ...)
  2026-03-18 19:53 ` [PATCH v3 3/7] mm: list_lru: move list dead check to lock_list_lru_of_memcg() Johannes Weiner
@ 2026-03-18 19:53 ` Johannes Weiner
  2026-03-18 20:22   ` Shakeel Butt
  2026-03-24 11:36   ` Lorenzo Stoakes (Oracle)
  2026-03-18 19:53 ` [PATCH v3 5/7] mm: list_lru: introduce caller locking for additions and deletions Johannes Weiner
                   ` (3 subsequent siblings)
  7 siblings, 2 replies; 29+ messages in thread
From: Johannes Weiner @ 2026-03-18 19:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Shakeel Butt, Yosry Ahmed, Zi Yan,
	Liam R. Howlett, Usama Arif, Kiryl Shutsemau, Dave Chinner,
	Roman Gushchin, linux-mm, linux-kernel

The MEMCG and !MEMCG paths have the same pattern. Share the code.

Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/list_lru.c | 21 +++++++++------------
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/mm/list_lru.c b/mm/list_lru.c
index 1a39ff490643..4d74c2e9c2a5 100644
--- a/mm/list_lru.c
+++ b/mm/list_lru.c
@@ -15,6 +15,14 @@
 #include "slab.h"
 #include "internal.h"
 
+static inline void lock_list_lru(struct list_lru_one *l, bool irq)
+{
+	if (irq)
+		spin_lock_irq(&l->lock);
+	else
+		spin_lock(&l->lock);
+}
+
 static inline void unlock_list_lru(struct list_lru_one *l, bool irq_off)
 {
 	if (irq_off)
@@ -68,14 +76,6 @@ list_lru_from_memcg_idx(struct list_lru *lru, int nid, int idx)
 	return &lru->node[nid].lru;
 }
 
-static inline void lock_list_lru(struct list_lru_one *l, bool irq)
-{
-	if (irq)
-		spin_lock_irq(&l->lock);
-	else
-		spin_lock(&l->lock);
-}
-
 static inline struct list_lru_one *
 lock_list_lru_of_memcg(struct list_lru *lru, int nid, struct mem_cgroup *memcg,
 		       bool irq, bool skip_empty)
@@ -136,10 +136,7 @@ lock_list_lru_of_memcg(struct list_lru *lru, int nid, struct mem_cgroup *memcg,
 {
 	struct list_lru_one *l = &lru->node[nid].lru;
 
-	if (irq)
-		spin_lock_irq(&l->lock);
-	else
-		spin_lock(&l->lock);
+	lock_list_lru(l, irq);
 
 	return l;
 }
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 5/7] mm: list_lru: introduce caller locking for additions and deletions
  2026-03-18 19:53 [PATCH v3 0/7] mm: switch THP shrinker to list_lru Johannes Weiner
                   ` (3 preceding siblings ...)
  2026-03-18 19:53 ` [PATCH v3 4/7] mm: list_lru: deduplicate lock_list_lru() Johannes Weiner
@ 2026-03-18 19:53 ` Johannes Weiner
  2026-03-18 20:51   ` Shakeel Butt
  2026-03-24 11:55   ` Lorenzo Stoakes (Oracle)
  2026-03-18 19:53 ` [PATCH v3 6/7] mm: list_lru: introduce folio_memcg_list_lru_alloc() Johannes Weiner
                   ` (2 subsequent siblings)
  7 siblings, 2 replies; 29+ messages in thread
From: Johannes Weiner @ 2026-03-18 19:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Shakeel Butt, Yosry Ahmed, Zi Yan,
	Liam R. Howlett, Usama Arif, Kiryl Shutsemau, Dave Chinner,
	Roman Gushchin, linux-mm, linux-kernel

Locking is currently internal to the list_lru API. However, a caller
might want to keep auxiliary state synchronized with the LRU state.

For example, the THP shrinker uses the lock of its custom LRU to keep
PG_partially_mapped and vmstats consistent.

To allow the THP shrinker to switch to list_lru, provide normal and
irqsafe locking primitives as well as caller-locked variants of the
addition and deletion functions.

Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/list_lru.h |  34 +++++++++++++
 mm/list_lru.c            | 107 +++++++++++++++++++++++++++------------
 2 files changed, 110 insertions(+), 31 deletions(-)

diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h
index fe739d35a864..4afc02deb44d 100644
--- a/include/linux/list_lru.h
+++ b/include/linux/list_lru.h
@@ -83,6 +83,40 @@ int memcg_list_lru_alloc(struct mem_cgroup *memcg, struct list_lru *lru,
 			 gfp_t gfp);
 void memcg_reparent_list_lrus(struct mem_cgroup *memcg, struct mem_cgroup *parent);
 
+/**
+ * list_lru_lock: lock the sublist for the given node and memcg
+ * @lru: the lru pointer
+ * @nid: the node id of the sublist to lock.
+ * @memcg: the cgroup of the sublist to lock.
+ *
+ * Returns the locked list_lru_one sublist. The caller must call
+ * list_lru_unlock() when done.
+ *
+ * You must ensure that the memcg is not freed during this call (e.g., with
+ * rcu or by taking a css refcnt).
+ *
+ * Return: the locked list_lru_one, or NULL on failure
+ */
+struct list_lru_one *list_lru_lock(struct list_lru *lru, int nid,
+		struct mem_cgroup *memcg);
+
+/**
+ * list_lru_unlock: unlock a sublist locked by list_lru_lock()
+ * @l: the list_lru_one to unlock
+ */
+void list_lru_unlock(struct list_lru_one *l);
+
+struct list_lru_one *list_lru_lock_irqsave(struct list_lru *lru, int nid,
+		struct mem_cgroup *memcg, unsigned long *irq_flags);
+void list_lru_unlock_irqrestore(struct list_lru_one *l,
+		unsigned long *irq_flags);
+
+/* Caller-locked variants, see list_lru_add() etc for documentation */
+bool __list_lru_add(struct list_lru *lru, struct list_lru_one *l,
+		struct list_head *item, int nid, struct mem_cgroup *memcg);
+bool __list_lru_del(struct list_lru *lru, struct list_lru_one *l,
+		struct list_head *item, int nid);
+
 /**
  * list_lru_add: add an element to the lru list's tail
  * @lru: the lru pointer
diff --git a/mm/list_lru.c b/mm/list_lru.c
index 4d74c2e9c2a5..b817c0f48f73 100644
--- a/mm/list_lru.c
+++ b/mm/list_lru.c
@@ -15,17 +15,23 @@
 #include "slab.h"
 #include "internal.h"
 
-static inline void lock_list_lru(struct list_lru_one *l, bool irq)
+static inline void lock_list_lru(struct list_lru_one *l, bool irq,
+				 unsigned long *irq_flags)
 {
-	if (irq)
+	if (irq_flags)
+		spin_lock_irqsave(&l->lock, *irq_flags);
+	else if (irq)
 		spin_lock_irq(&l->lock);
 	else
 		spin_lock(&l->lock);
 }
 
-static inline void unlock_list_lru(struct list_lru_one *l, bool irq_off)
+static inline void unlock_list_lru(struct list_lru_one *l, bool irq_off,
+				   unsigned long *irq_flags)
 {
-	if (irq_off)
+	if (irq_flags)
+		spin_unlock_irqrestore(&l->lock, *irq_flags);
+	else if (irq_off)
 		spin_unlock_irq(&l->lock);
 	else
 		spin_unlock(&l->lock);
@@ -78,7 +84,7 @@ list_lru_from_memcg_idx(struct list_lru *lru, int nid, int idx)
 
 static inline struct list_lru_one *
 lock_list_lru_of_memcg(struct list_lru *lru, int nid, struct mem_cgroup *memcg,
-		       bool irq, bool skip_empty)
+		       bool irq, unsigned long *irq_flags, bool skip_empty)
 {
 	struct list_lru_one *l;
 
@@ -86,12 +92,12 @@ lock_list_lru_of_memcg(struct list_lru *lru, int nid, struct mem_cgroup *memcg,
 again:
 	l = list_lru_from_memcg_idx(lru, nid, memcg_kmem_id(memcg));
 	if (likely(l)) {
-		lock_list_lru(l, irq);
+		lock_list_lru(l, irq, irq_flags);
 		if (likely(READ_ONCE(l->nr_items) != LONG_MIN)) {
 			rcu_read_unlock();
 			return l;
 		}
-		unlock_list_lru(l, irq);
+		unlock_list_lru(l, irq, irq_flags);
 	}
 	/*
 	 * Caller may simply bail out if raced with reparenting or
@@ -132,37 +138,81 @@ list_lru_from_memcg_idx(struct list_lru *lru, int nid, int idx)
 
 static inline struct list_lru_one *
 lock_list_lru_of_memcg(struct list_lru *lru, int nid, struct mem_cgroup *memcg,
-		       bool irq, bool skip_empty)
+		       bool irq, unsigned long *irq_flags, bool skip_empty)
 {
 	struct list_lru_one *l = &lru->node[nid].lru;
 
-	lock_list_lru(l, irq);
+	lock_list_lru(l, irq, irq_flags);
 
 	return l;
 }
 #endif /* CONFIG_MEMCG */
 
-/* The caller must ensure the memcg lifetime. */
-bool list_lru_add(struct list_lru *lru, struct list_head *item, int nid,
-		  struct mem_cgroup *memcg)
+struct list_lru_one *list_lru_lock(struct list_lru *lru, int nid,
+				   struct mem_cgroup *memcg)
 {
-	struct list_lru_node *nlru = &lru->node[nid];
-	struct list_lru_one *l;
+	return lock_list_lru_of_memcg(lru, nid, memcg, /*irq=*/false,
+				      /*irq_flags=*/NULL, /*skip_empty=*/false);
+}
+
+void list_lru_unlock(struct list_lru_one *l)
+{
+	unlock_list_lru(l, /*irq_off=*/false, /*irq_flags=*/NULL);
+}
+
+struct list_lru_one *list_lru_lock_irqsave(struct list_lru *lru, int nid,
+					   struct mem_cgroup *memcg,
+					   unsigned long *flags)
+{
+	return lock_list_lru_of_memcg(lru, nid, memcg, /*irq=*/true,
+				      /*irq_flags=*/flags, /*skip_empty=*/false);
+}
+
+void list_lru_unlock_irqrestore(struct list_lru_one *l, unsigned long *flags)
+{
+	unlock_list_lru(l, /*irq_off=*/true, /*irq_flags=*/flags);
+}
 
-	l = lock_list_lru_of_memcg(lru, nid, memcg, false, false);
+bool __list_lru_add(struct list_lru *lru, struct list_lru_one *l,
+		    struct list_head *item, int nid,
+		    struct mem_cgroup *memcg)
+{
 	if (list_empty(item)) {
 		list_add_tail(item, &l->list);
 		/* Set shrinker bit if the first element was added */
 		if (!l->nr_items++)
 			set_shrinker_bit(memcg, nid, lru_shrinker_id(lru));
-		unlock_list_lru(l, false);
-		atomic_long_inc(&nlru->nr_items);
+		atomic_long_inc(&lru->node[nid].nr_items);
+		return true;
+	}
+	return false;
+}
+
+bool __list_lru_del(struct list_lru *lru, struct list_lru_one *l,
+		    struct list_head *item, int nid)
+{
+	if (!list_empty(item)) {
+		list_del_init(item);
+		l->nr_items--;
+		atomic_long_dec(&lru->node[nid].nr_items);
 		return true;
 	}
-	unlock_list_lru(l, false);
 	return false;
 }
 
+/* The caller must ensure the memcg lifetime. */
+bool list_lru_add(struct list_lru *lru, struct list_head *item, int nid,
+		  struct mem_cgroup *memcg)
+{
+	struct list_lru_one *l;
+	bool ret;
+
+	l = list_lru_lock(lru, nid, memcg);
+	ret = __list_lru_add(lru, l, item, nid, memcg);
+	list_lru_unlock(l);
+	return ret;
+}
+
 bool list_lru_add_obj(struct list_lru *lru, struct list_head *item)
 {
 	bool ret;
@@ -184,19 +234,13 @@ EXPORT_SYMBOL_GPL(list_lru_add_obj);
 bool list_lru_del(struct list_lru *lru, struct list_head *item, int nid,
 		  struct mem_cgroup *memcg)
 {
-	struct list_lru_node *nlru = &lru->node[nid];
 	struct list_lru_one *l;
+	bool ret;
 
-	l = lock_list_lru_of_memcg(lru, nid, memcg, false, false);
-	if (!list_empty(item)) {
-		list_del_init(item);
-		l->nr_items--;
-		unlock_list_lru(l, false);
-		atomic_long_dec(&nlru->nr_items);
-		return true;
-	}
-	unlock_list_lru(l, false);
-	return false;
+	l = list_lru_lock(lru, nid, memcg);
+	ret = __list_lru_del(lru, l, item, nid);
+	list_lru_unlock(l);
+	return ret;
 }
 
 bool list_lru_del_obj(struct list_lru *lru, struct list_head *item)
@@ -269,7 +313,8 @@ __list_lru_walk_one(struct list_lru *lru, int nid, struct mem_cgroup *memcg,
 	unsigned long isolated = 0;
 
 restart:
-	l = lock_list_lru_of_memcg(lru, nid, memcg, irq_off, true);
+	l = lock_list_lru_of_memcg(lru, nid, memcg, /*irq=*/irq_off,
+				   /*irq_flags=*/NULL, /*skip_empty=*/true);
 	if (!l)
 		return isolated;
 	list_for_each_safe(item, n, &l->list) {
@@ -310,7 +355,7 @@ __list_lru_walk_one(struct list_lru *lru, int nid, struct mem_cgroup *memcg,
 			BUG();
 		}
 	}
-	unlock_list_lru(l, irq_off);
+	unlock_list_lru(l, irq_off, NULL);
 out:
 	return isolated;
 }
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 6/7] mm: list_lru: introduce folio_memcg_list_lru_alloc()
  2026-03-18 19:53 [PATCH v3 0/7] mm: switch THP shrinker to list_lru Johannes Weiner
                   ` (4 preceding siblings ...)
  2026-03-18 19:53 ` [PATCH v3 5/7] mm: list_lru: introduce caller locking for additions and deletions Johannes Weiner
@ 2026-03-18 19:53 ` Johannes Weiner
  2026-03-18 20:52   ` Shakeel Butt
                     ` (2 more replies)
  2026-03-18 19:53 ` [PATCH v3 7/7] mm: switch deferred split shrinker to list_lru Johannes Weiner
  2026-03-18 21:00 ` [PATCH v3 0/7] mm: switch THP " Lorenzo Stoakes (Oracle)
  7 siblings, 3 replies; 29+ messages in thread
From: Johannes Weiner @ 2026-03-18 19:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Shakeel Butt, Yosry Ahmed, Zi Yan,
	Liam R. Howlett, Usama Arif, Kiryl Shutsemau, Dave Chinner,
	Roman Gushchin, linux-mm, linux-kernel

memcg_list_lru_alloc() is called every time an object that may end up
on the list_lru is created. It needs to quickly check if the list_lru
heads for the memcg already exist, and allocate them when they don't.

Doing this with folio objects is tricky: folio_memcg() is not stable
and requires either RCU protection or pinning the cgroup. But it's
desirable to make the existence check lightweight under RCU, and only
pin the memcg when we need to allocate list_lru heads and may block.

In preparation for switching the THP shrinker to list_lru, add a
helper function for allocating list_lru heads coming from a folio.

Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/list_lru.h | 12 ++++++++++++
 mm/list_lru.c            | 39 ++++++++++++++++++++++++++++++++++-----
 2 files changed, 46 insertions(+), 5 deletions(-)

diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h
index 4afc02deb44d..4bd29b61c59a 100644
--- a/include/linux/list_lru.h
+++ b/include/linux/list_lru.h
@@ -81,6 +81,18 @@ static inline int list_lru_init_memcg_key(struct list_lru *lru, struct shrinker
 
 int memcg_list_lru_alloc(struct mem_cgroup *memcg, struct list_lru *lru,
 			 gfp_t gfp);
+
+#ifdef CONFIG_MEMCG
+int folio_memcg_list_lru_alloc(struct folio *folio, struct list_lru *lru,
+			       gfp_t gfp);
+#else
+static inline int folio_memcg_list_lru_alloc(struct folio *folio,
+					     struct list_lru *lru, gfp_t gfp)
+{
+	return 0;
+}
+#endif
+
 void memcg_reparent_list_lrus(struct mem_cgroup *memcg, struct mem_cgroup *parent);
 
 /**
diff --git a/mm/list_lru.c b/mm/list_lru.c
index b817c0f48f73..1ccdd45b1d14 100644
--- a/mm/list_lru.c
+++ b/mm/list_lru.c
@@ -537,17 +537,14 @@ static inline bool memcg_list_lru_allocated(struct mem_cgroup *memcg,
 	return idx < 0 || xa_load(&lru->xa, idx);
 }
 
-int memcg_list_lru_alloc(struct mem_cgroup *memcg, struct list_lru *lru,
-			 gfp_t gfp)
+static int __memcg_list_lru_alloc(struct mem_cgroup *memcg,
+				  struct list_lru *lru, gfp_t gfp)
 {
 	unsigned long flags;
 	struct list_lru_memcg *mlru = NULL;
 	struct mem_cgroup *pos, *parent;
 	XA_STATE(xas, &lru->xa, 0);
 
-	if (!list_lru_memcg_aware(lru) || memcg_list_lru_allocated(memcg, lru))
-		return 0;
-
 	gfp &= GFP_RECLAIM_MASK;
 	/*
 	 * Because the list_lru can be reparented to the parent cgroup's
@@ -588,6 +585,38 @@ int memcg_list_lru_alloc(struct mem_cgroup *memcg, struct list_lru *lru,
 
 	return xas_error(&xas);
 }
+
+int memcg_list_lru_alloc(struct mem_cgroup *memcg, struct list_lru *lru,
+			 gfp_t gfp)
+{
+	if (!list_lru_memcg_aware(lru) || memcg_list_lru_allocated(memcg, lru))
+		return 0;
+	return __memcg_list_lru_alloc(memcg, lru, gfp);
+}
+
+int folio_memcg_list_lru_alloc(struct folio *folio, struct list_lru *lru,
+			       gfp_t gfp)
+{
+	struct mem_cgroup *memcg;
+	int res;
+
+	if (!list_lru_memcg_aware(lru))
+		return 0;
+
+	/* Fast path when list_lru heads already exist */
+	rcu_read_lock();
+	memcg = folio_memcg(folio);
+	res = memcg_list_lru_allocated(memcg, lru);
+	rcu_read_unlock();
+	if (likely(res))
+		return 0;
+
+	/* Allocation may block, pin the memcg */
+	memcg = get_mem_cgroup_from_folio(folio);
+	res = __memcg_list_lru_alloc(memcg, lru, gfp);
+	mem_cgroup_put(memcg);
+	return res;
+}
 #else
 static inline void memcg_init_list_lru(struct list_lru *lru, bool memcg_aware)
 {
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 7/7] mm: switch deferred split shrinker to list_lru
  2026-03-18 19:53 [PATCH v3 0/7] mm: switch THP shrinker to list_lru Johannes Weiner
                   ` (5 preceding siblings ...)
  2026-03-18 19:53 ` [PATCH v3 6/7] mm: list_lru: introduce folio_memcg_list_lru_alloc() Johannes Weiner
@ 2026-03-18 19:53 ` Johannes Weiner
  2026-03-18 20:26   ` David Hildenbrand (Arm)
                     ` (2 more replies)
  2026-03-18 21:00 ` [PATCH v3 0/7] mm: switch THP " Lorenzo Stoakes (Oracle)
  7 siblings, 3 replies; 29+ messages in thread
From: Johannes Weiner @ 2026-03-18 19:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Shakeel Butt, Yosry Ahmed, Zi Yan,
	Liam R. Howlett, Usama Arif, Kiryl Shutsemau, Dave Chinner,
	Roman Gushchin, linux-mm, linux-kernel

The deferred split queue handles cgroups in a suboptimal fashion. The
queue is per-NUMA node or per-cgroup, not the intersection. That means
on a cgrouped system, a node-restricted allocation entering reclaim
can end up splitting large pages on other nodes:

	alloc/unmap
	  deferred_split_folio()
	    list_add_tail(memcg->split_queue)
	    set_shrinker_bit(memcg, node, deferred_shrinker_id)

	for_each_zone_zonelist_nodemask(restricted_nodes)
	  mem_cgroup_iter()
	    shrink_slab(node, memcg)
	      shrink_slab_memcg(node, memcg)
	        if test_shrinker_bit(memcg, node, deferred_shrinker_id)
	          deferred_split_scan()
	            walks memcg->split_queue

The shrinker bit adds an imperfect guard rail. As soon as the cgroup
has a single large page on the node of interest, all large pages owned
by that memcg, including those on other nodes, will be split.

list_lru properly sets up per-node, per-cgroup lists. As a bonus, it
streamlines a lot of the list operations and reclaim walks. It's used
widely by other major shrinkers already. Convert the deferred split
queue as well.

The list_lru per-memcg heads are instantiated on demand when the first
object of interest is allocated for a cgroup, by calling
folio_memcg_list_lru_alloc(). Add calls to where splittable pages are
created: anon faults, swapin faults, khugepaged collapse.

These calls create all possible node heads for the cgroup at once, so
the migration code (between nodes) doesn't need any special care.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/huge_mm.h    |   6 +-
 include/linux/memcontrol.h |   4 -
 include/linux/mmzone.h     |  12 --
 mm/huge_memory.c           | 342 ++++++++++++-------------------------
 mm/internal.h              |   2 +-
 mm/khugepaged.c            |   7 +
 mm/memcontrol.c            |  12 +-
 mm/memory.c                |  52 +++---
 mm/mm_init.c               |  15 --
 9 files changed, 151 insertions(+), 301 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index bd7f0e1d8094..8d801ed378db 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -414,10 +414,9 @@ static inline int split_huge_page(struct page *page)
 {
 	return split_huge_page_to_list_to_order(page, NULL, 0);
 }
+
+extern struct list_lru deferred_split_lru;
 void deferred_split_folio(struct folio *folio, bool partially_mapped);
-#ifdef CONFIG_MEMCG
-void reparent_deferred_split_queue(struct mem_cgroup *memcg);
-#endif
 
 void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
 		unsigned long address, bool freeze);
@@ -650,7 +649,6 @@ static inline int try_folio_split_to_order(struct folio *folio,
 }
 
 static inline void deferred_split_folio(struct folio *folio, bool partially_mapped) {}
-static inline void reparent_deferred_split_queue(struct mem_cgroup *memcg) {}
 #define split_huge_pmd(__vma, __pmd, __address)	\
 	do { } while (0)
 
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 086158969529..0782c72a1997 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -277,10 +277,6 @@ struct mem_cgroup {
 	struct memcg_cgwb_frn cgwb_frn[MEMCG_CGWB_FRN_CNT];
 #endif
 
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-	struct deferred_split deferred_split_queue;
-#endif
-
 #ifdef CONFIG_LRU_GEN_WALKS_MMU
 	/* per-memcg mm_struct list */
 	struct lru_gen_mm_list mm_list;
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 7bd0134c241c..232b7a71fd69 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1429,14 +1429,6 @@ struct zonelist {
  */
 extern struct page *mem_map;
 
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-struct deferred_split {
-	spinlock_t split_queue_lock;
-	struct list_head split_queue;
-	unsigned long split_queue_len;
-};
-#endif
-
 #ifdef CONFIG_MEMORY_FAILURE
 /*
  * Per NUMA node memory failure handling statistics.
@@ -1562,10 +1554,6 @@ typedef struct pglist_data {
 	unsigned long first_deferred_pfn;
 #endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
 
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-	struct deferred_split deferred_split_queue;
-#endif
-
 #ifdef CONFIG_NUMA_BALANCING
 	/* start time in ms of current promote rate limit period */
 	unsigned int nbp_rl_start;
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 3fc02913b63e..e90d08db219d 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -14,6 +14,7 @@
 #include <linux/mmu_notifier.h>
 #include <linux/rmap.h>
 #include <linux/swap.h>
+#include <linux/list_lru.h>
 #include <linux/shrinker.h>
 #include <linux/mm_inline.h>
 #include <linux/swapops.h>
@@ -67,6 +68,8 @@ unsigned long transparent_hugepage_flags __read_mostly =
 	(1<<TRANSPARENT_HUGEPAGE_DEFRAG_KHUGEPAGED_FLAG)|
 	(1<<TRANSPARENT_HUGEPAGE_USE_ZERO_PAGE_FLAG);
 
+static struct lock_class_key deferred_split_key;
+struct list_lru deferred_split_lru;
 static struct shrinker *deferred_split_shrinker;
 static unsigned long deferred_split_count(struct shrinker *shrink,
 					  struct shrink_control *sc);
@@ -919,6 +922,13 @@ static int __init thp_shrinker_init(void)
 	if (!deferred_split_shrinker)
 		return -ENOMEM;
 
+	if (list_lru_init_memcg_key(&deferred_split_lru,
+				    deferred_split_shrinker,
+				    &deferred_split_key)) {
+		shrinker_free(deferred_split_shrinker);
+		return -ENOMEM;
+	}
+
 	deferred_split_shrinker->count_objects = deferred_split_count;
 	deferred_split_shrinker->scan_objects = deferred_split_scan;
 	shrinker_register(deferred_split_shrinker);
@@ -939,6 +949,7 @@ static int __init thp_shrinker_init(void)
 
 	huge_zero_folio_shrinker = shrinker_alloc(0, "thp-zero");
 	if (!huge_zero_folio_shrinker) {
+		list_lru_destroy(&deferred_split_lru);
 		shrinker_free(deferred_split_shrinker);
 		return -ENOMEM;
 	}
@@ -953,6 +964,7 @@ static int __init thp_shrinker_init(void)
 static void __init thp_shrinker_exit(void)
 {
 	shrinker_free(huge_zero_folio_shrinker);
+	list_lru_destroy(&deferred_split_lru);
 	shrinker_free(deferred_split_shrinker);
 }
 
@@ -1133,119 +1145,6 @@ pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma)
 	return pmd;
 }
 
-static struct deferred_split *split_queue_node(int nid)
-{
-	struct pglist_data *pgdata = NODE_DATA(nid);
-
-	return &pgdata->deferred_split_queue;
-}
-
-#ifdef CONFIG_MEMCG
-static inline
-struct mem_cgroup *folio_split_queue_memcg(struct folio *folio,
-					   struct deferred_split *queue)
-{
-	if (mem_cgroup_disabled())
-		return NULL;
-	if (split_queue_node(folio_nid(folio)) == queue)
-		return NULL;
-	return container_of(queue, struct mem_cgroup, deferred_split_queue);
-}
-
-static struct deferred_split *memcg_split_queue(int nid, struct mem_cgroup *memcg)
-{
-	return memcg ? &memcg->deferred_split_queue : split_queue_node(nid);
-}
-#else
-static inline
-struct mem_cgroup *folio_split_queue_memcg(struct folio *folio,
-					   struct deferred_split *queue)
-{
-	return NULL;
-}
-
-static struct deferred_split *memcg_split_queue(int nid, struct mem_cgroup *memcg)
-{
-	return split_queue_node(nid);
-}
-#endif
-
-static struct deferred_split *split_queue_lock(int nid, struct mem_cgroup *memcg)
-{
-	struct deferred_split *queue;
-
-retry:
-	queue = memcg_split_queue(nid, memcg);
-	spin_lock(&queue->split_queue_lock);
-	/*
-	 * There is a period between setting memcg to dying and reparenting
-	 * deferred split queue, and during this period the THPs in the deferred
-	 * split queue will be hidden from the shrinker side.
-	 */
-	if (unlikely(memcg_is_dying(memcg))) {
-		spin_unlock(&queue->split_queue_lock);
-		memcg = parent_mem_cgroup(memcg);
-		goto retry;
-	}
-
-	return queue;
-}
-
-static struct deferred_split *
-split_queue_lock_irqsave(int nid, struct mem_cgroup *memcg, unsigned long *flags)
-{
-	struct deferred_split *queue;
-
-retry:
-	queue = memcg_split_queue(nid, memcg);
-	spin_lock_irqsave(&queue->split_queue_lock, *flags);
-	if (unlikely(memcg_is_dying(memcg))) {
-		spin_unlock_irqrestore(&queue->split_queue_lock, *flags);
-		memcg = parent_mem_cgroup(memcg);
-		goto retry;
-	}
-
-	return queue;
-}
-
-static struct deferred_split *folio_split_queue_lock(struct folio *folio)
-{
-	struct deferred_split *queue;
-
-	rcu_read_lock();
-	queue = split_queue_lock(folio_nid(folio), folio_memcg(folio));
-	/*
-	 * The memcg destruction path is acquiring the split queue lock for
-	 * reparenting. Once you have it locked, it's safe to drop the rcu lock.
-	 */
-	rcu_read_unlock();
-
-	return queue;
-}
-
-static struct deferred_split *
-folio_split_queue_lock_irqsave(struct folio *folio, unsigned long *flags)
-{
-	struct deferred_split *queue;
-
-	rcu_read_lock();
-	queue = split_queue_lock_irqsave(folio_nid(folio), folio_memcg(folio), flags);
-	rcu_read_unlock();
-
-	return queue;
-}
-
-static inline void split_queue_unlock(struct deferred_split *queue)
-{
-	spin_unlock(&queue->split_queue_lock);
-}
-
-static inline void split_queue_unlock_irqrestore(struct deferred_split *queue,
-						 unsigned long flags)
-{
-	spin_unlock_irqrestore(&queue->split_queue_lock, flags);
-}
-
 static inline bool is_transparent_hugepage(const struct folio *folio)
 {
 	if (!folio_test_large(folio))
@@ -1346,6 +1245,14 @@ static struct folio *vma_alloc_anon_folio_pmd(struct vm_area_struct *vma,
 		count_mthp_stat(order, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE);
 		return NULL;
 	}
+
+	if (folio_memcg_list_lru_alloc(folio, &deferred_split_lru, gfp)) {
+		folio_put(folio);
+		count_vm_event(THP_FAULT_FALLBACK);
+		count_mthp_stat(order, MTHP_STAT_ANON_FAULT_FALLBACK);
+		return NULL;
+	}
+
 	folio_throttle_swaprate(folio, gfp);
 
        /*
@@ -3854,34 +3761,34 @@ static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int n
 	struct folio *end_folio = folio_next(folio);
 	struct folio *new_folio, *next;
 	int old_order = folio_order(folio);
+	struct list_lru_one *l;
+	bool dequeue_deferred;
 	int ret = 0;
-	struct deferred_split *ds_queue;
 
 	VM_WARN_ON_ONCE(!mapping && end);
 	/* Prevent deferred_split_scan() touching ->_refcount */
-	ds_queue = folio_split_queue_lock(folio);
+	dequeue_deferred = folio_test_anon(folio) && old_order > 1;
+	if (dequeue_deferred) {
+		rcu_read_lock();
+		l = list_lru_lock(&deferred_split_lru,
+				  folio_nid(folio), folio_memcg(folio));
+	}
 	if (folio_ref_freeze(folio, folio_cache_ref_count(folio) + 1)) {
 		struct swap_cluster_info *ci = NULL;
 		struct lruvec *lruvec;
 
-		if (old_order > 1) {
-			if (!list_empty(&folio->_deferred_list)) {
-				ds_queue->split_queue_len--;
-				/*
-				 * Reinitialize page_deferred_list after removing the
-				 * page from the split_queue, otherwise a subsequent
-				 * split will see list corruption when checking the
-				 * page_deferred_list.
-				 */
-				list_del_init(&folio->_deferred_list);
-			}
+		if (dequeue_deferred) {
+			__list_lru_del(&deferred_split_lru, l,
+				       &folio->_deferred_list, folio_nid(folio));
 			if (folio_test_partially_mapped(folio)) {
 				folio_clear_partially_mapped(folio);
 				mod_mthp_stat(old_order,
 					MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1);
 			}
+			list_lru_unlock(l);
+			rcu_read_unlock();
 		}
-		split_queue_unlock(ds_queue);
+
 		if (mapping) {
 			int nr = folio_nr_pages(folio);
 
@@ -3982,7 +3889,10 @@ static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int n
 		if (ci)
 			swap_cluster_unlock(ci);
 	} else {
-		split_queue_unlock(ds_queue);
+		if (dequeue_deferred) {
+			list_lru_unlock(l);
+			rcu_read_unlock();
+		}
 		return -EAGAIN;
 	}
 
@@ -4349,33 +4259,35 @@ int split_folio_to_list(struct folio *folio, struct list_head *list)
  * queueing THP splits, and that list is (racily observed to be) non-empty.
  *
  * It is unsafe to call folio_unqueue_deferred_split() until folio refcount is
- * zero: because even when split_queue_lock is held, a non-empty _deferred_list
- * might be in use on deferred_split_scan()'s unlocked on-stack list.
+ * zero: because even when the list_lru lock is held, a non-empty
+ * _deferred_list might be in use on deferred_split_scan()'s unlocked
+ * on-stack list.
  *
- * If memory cgroups are enabled, split_queue_lock is in the mem_cgroup: it is
- * therefore important to unqueue deferred split before changing folio memcg.
+ * The list_lru sublist is determined by folio's memcg: it is therefore
+ * important to unqueue deferred split before changing folio memcg.
  */
 bool __folio_unqueue_deferred_split(struct folio *folio)
 {
-	struct deferred_split *ds_queue;
+	struct list_lru_one *l;
+	int nid = folio_nid(folio);
 	unsigned long flags;
 	bool unqueued = false;
 
 	WARN_ON_ONCE(folio_ref_count(folio));
 	WARN_ON_ONCE(!mem_cgroup_disabled() && !folio_memcg_charged(folio));
 
-	ds_queue = folio_split_queue_lock_irqsave(folio, &flags);
-	if (!list_empty(&folio->_deferred_list)) {
-		ds_queue->split_queue_len--;
+	rcu_read_lock();
+	l = list_lru_lock_irqsave(&deferred_split_lru, nid, folio_memcg(folio), &flags);
+	if (__list_lru_del(&deferred_split_lru, l, &folio->_deferred_list, nid)) {
 		if (folio_test_partially_mapped(folio)) {
 			folio_clear_partially_mapped(folio);
 			mod_mthp_stat(folio_order(folio),
 				      MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1);
 		}
-		list_del_init(&folio->_deferred_list);
 		unqueued = true;
 	}
-	split_queue_unlock_irqrestore(ds_queue, flags);
+	list_lru_unlock_irqrestore(l, &flags);
+	rcu_read_unlock();
 
 	return unqueued;	/* useful for debug warnings */
 }
@@ -4383,7 +4295,9 @@ bool __folio_unqueue_deferred_split(struct folio *folio)
 /* partially_mapped=false won't clear PG_partially_mapped folio flag */
 void deferred_split_folio(struct folio *folio, bool partially_mapped)
 {
-	struct deferred_split *ds_queue;
+	struct list_lru_one *l;
+	int nid;
+	struct mem_cgroup *memcg;
 	unsigned long flags;
 
 	/*
@@ -4406,7 +4320,11 @@ void deferred_split_folio(struct folio *folio, bool partially_mapped)
 	if (folio_test_swapcache(folio))
 		return;
 
-	ds_queue = folio_split_queue_lock_irqsave(folio, &flags);
+	nid = folio_nid(folio);
+
+	rcu_read_lock();
+	memcg = folio_memcg(folio);
+	l = list_lru_lock_irqsave(&deferred_split_lru, nid, memcg, &flags);
 	if (partially_mapped) {
 		if (!folio_test_partially_mapped(folio)) {
 			folio_set_partially_mapped(folio);
@@ -4414,36 +4332,20 @@ void deferred_split_folio(struct folio *folio, bool partially_mapped)
 				count_vm_event(THP_DEFERRED_SPLIT_PAGE);
 			count_mthp_stat(folio_order(folio), MTHP_STAT_SPLIT_DEFERRED);
 			mod_mthp_stat(folio_order(folio), MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, 1);
-
 		}
 	} else {
 		/* partially mapped folios cannot become non-partially mapped */
 		VM_WARN_ON_FOLIO(folio_test_partially_mapped(folio), folio);
 	}
-	if (list_empty(&folio->_deferred_list)) {
-		struct mem_cgroup *memcg;
-
-		memcg = folio_split_queue_memcg(folio, ds_queue);
-		list_add_tail(&folio->_deferred_list, &ds_queue->split_queue);
-		ds_queue->split_queue_len++;
-		if (memcg)
-			set_shrinker_bit(memcg, folio_nid(folio),
-					 shrinker_id(deferred_split_shrinker));
-	}
-	split_queue_unlock_irqrestore(ds_queue, flags);
+	__list_lru_add(&deferred_split_lru, l, &folio->_deferred_list, nid, memcg);
+	list_lru_unlock_irqrestore(l, &flags);
+	rcu_read_unlock();
 }
 
 static unsigned long deferred_split_count(struct shrinker *shrink,
 		struct shrink_control *sc)
 {
-	struct pglist_data *pgdata = NODE_DATA(sc->nid);
-	struct deferred_split *ds_queue = &pgdata->deferred_split_queue;
-
-#ifdef CONFIG_MEMCG
-	if (sc->memcg)
-		ds_queue = &sc->memcg->deferred_split_queue;
-#endif
-	return READ_ONCE(ds_queue->split_queue_len);
+	return list_lru_shrink_count(&deferred_split_lru, sc);
 }
 
 static bool thp_underused(struct folio *folio)
@@ -4473,45 +4375,47 @@ static bool thp_underused(struct folio *folio)
 	return false;
 }
 
+static enum lru_status deferred_split_isolate(struct list_head *item,
+					      struct list_lru_one *lru,
+					      void *cb_arg)
+{
+	struct folio *folio = container_of(item, struct folio, _deferred_list);
+	struct list_head *freeable = cb_arg;
+
+	if (folio_try_get(folio)) {
+		list_lru_isolate_move(lru, item, freeable);
+		return LRU_REMOVED;
+	}
+
+	/* We lost race with folio_put() */
+	list_lru_isolate(lru, item);
+	if (folio_test_partially_mapped(folio)) {
+		folio_clear_partially_mapped(folio);
+		mod_mthp_stat(folio_order(folio),
+			      MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1);
+	}
+	return LRU_REMOVED;
+}
+
 static unsigned long deferred_split_scan(struct shrinker *shrink,
 		struct shrink_control *sc)
 {
-	struct deferred_split *ds_queue;
-	unsigned long flags;
+	LIST_HEAD(dispose);
 	struct folio *folio, *next;
-	int split = 0, i;
-	struct folio_batch fbatch;
+	int split = 0;
+	unsigned long isolated;
 
-	folio_batch_init(&fbatch);
+	isolated = list_lru_shrink_walk_irq(&deferred_split_lru, sc,
+					    deferred_split_isolate, &dispose);
 
-retry:
-	ds_queue = split_queue_lock_irqsave(sc->nid, sc->memcg, &flags);
-	/* Take pin on all head pages to avoid freeing them under us */
-	list_for_each_entry_safe(folio, next, &ds_queue->split_queue,
-							_deferred_list) {
-		if (folio_try_get(folio)) {
-			folio_batch_add(&fbatch, folio);
-		} else if (folio_test_partially_mapped(folio)) {
-			/* We lost race with folio_put() */
-			folio_clear_partially_mapped(folio);
-			mod_mthp_stat(folio_order(folio),
-				      MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1);
-		}
-		list_del_init(&folio->_deferred_list);
-		ds_queue->split_queue_len--;
-		if (!--sc->nr_to_scan)
-			break;
-		if (!folio_batch_space(&fbatch))
-			break;
-	}
-	split_queue_unlock_irqrestore(ds_queue, flags);
-
-	for (i = 0; i < folio_batch_count(&fbatch); i++) {
+	list_for_each_entry_safe(folio, next, &dispose, _deferred_list) {
 		bool did_split = false;
 		bool underused = false;
-		struct deferred_split *fqueue;
+		struct list_lru_one *l;
+		unsigned long flags;
+
+		list_del_init(&folio->_deferred_list);
 
-		folio = fbatch.folios[i];
 		if (!folio_test_partially_mapped(folio)) {
 			/*
 			 * See try_to_map_unused_to_zeropage(): we cannot
@@ -4534,64 +4438,32 @@ static unsigned long deferred_split_scan(struct shrinker *shrink,
 		}
 		folio_unlock(folio);
 next:
-		if (did_split || !folio_test_partially_mapped(folio))
-			continue;
 		/*
 		 * Only add back to the queue if folio is partially mapped.
 		 * If thp_underused returns false, or if split_folio fails
 		 * in the case it was underused, then consider it used and
 		 * don't add it back to split_queue.
 		 */
-		fqueue = folio_split_queue_lock_irqsave(folio, &flags);
-		if (list_empty(&folio->_deferred_list)) {
-			list_add_tail(&folio->_deferred_list, &fqueue->split_queue);
-			fqueue->split_queue_len++;
+		if (!did_split && folio_test_partially_mapped(folio)) {
+			rcu_read_lock();
+			l = list_lru_lock_irqsave(&deferred_split_lru,
+						  folio_nid(folio),
+						  folio_memcg(folio),
+						  &flags);
+			__list_lru_add(&deferred_split_lru, l,
+				       &folio->_deferred_list,
+				       folio_nid(folio), folio_memcg(folio));
+			list_lru_unlock_irqrestore(l, &flags);
+			rcu_read_unlock();
 		}
-		split_queue_unlock_irqrestore(fqueue, flags);
-	}
-	folios_put(&fbatch);
-
-	if (sc->nr_to_scan && !list_empty(&ds_queue->split_queue)) {
-		cond_resched();
-		goto retry;
+		folio_put(folio);
 	}
 
-	/*
-	 * Stop shrinker if we didn't split any page, but the queue is empty.
-	 * This can happen if pages were freed under us.
-	 */
-	if (!split && list_empty(&ds_queue->split_queue))
+	if (!split && !isolated)
 		return SHRINK_STOP;
 	return split;
 }
 
-#ifdef CONFIG_MEMCG
-void reparent_deferred_split_queue(struct mem_cgroup *memcg)
-{
-	struct mem_cgroup *parent = parent_mem_cgroup(memcg);
-	struct deferred_split *ds_queue = &memcg->deferred_split_queue;
-	struct deferred_split *parent_ds_queue = &parent->deferred_split_queue;
-	int nid;
-
-	spin_lock_irq(&ds_queue->split_queue_lock);
-	spin_lock_nested(&parent_ds_queue->split_queue_lock, SINGLE_DEPTH_NESTING);
-
-	if (!ds_queue->split_queue_len)
-		goto unlock;
-
-	list_splice_tail_init(&ds_queue->split_queue, &parent_ds_queue->split_queue);
-	parent_ds_queue->split_queue_len += ds_queue->split_queue_len;
-	ds_queue->split_queue_len = 0;
-
-	for_each_node(nid)
-		set_shrinker_bit(parent, nid, shrinker_id(deferred_split_shrinker));
-
-unlock:
-	spin_unlock(&parent_ds_queue->split_queue_lock);
-	spin_unlock_irq(&ds_queue->split_queue_lock);
-}
-#endif
-
 #ifdef CONFIG_DEBUG_FS
 static void split_huge_pages_all(void)
 {
diff --git a/mm/internal.h b/mm/internal.h
index f98f4746ac41..d8c737338df5 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -863,7 +863,7 @@ static inline bool folio_unqueue_deferred_split(struct folio *folio)
 	/*
 	 * At this point, there is no one trying to add the folio to
 	 * deferred_list. If folio is not in deferred_list, it's safe
-	 * to check without acquiring the split_queue_lock.
+	 * to check without acquiring the list_lru lock.
 	 */
 	if (data_race(list_empty(&folio->_deferred_list)))
 		return false;
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 4b0e59c7c0e6..b2ac28ddd480 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1081,6 +1081,7 @@ static enum scan_result alloc_charge_folio(struct folio **foliop, struct mm_stru
 	}
 
 	count_vm_event(THP_COLLAPSE_ALLOC);
+
 	if (unlikely(mem_cgroup_charge(folio, mm, gfp))) {
 		folio_put(folio);
 		*foliop = NULL;
@@ -1089,6 +1090,12 @@ static enum scan_result alloc_charge_folio(struct folio **foliop, struct mm_stru
 
 	count_memcg_folio_events(folio, THP_COLLAPSE_ALLOC, 1);
 
+	if (folio_memcg_list_lru_alloc(folio, &deferred_split_lru, gfp)) {
+		folio_put(folio);
+		*foliop = NULL;
+		return SCAN_CGROUP_CHARGE_FAIL;
+	}
+
 	*foliop = folio;
 	return SCAN_SUCCEED;
 }
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index a47fb68dd65f..f381cb6bdff1 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4015,11 +4015,6 @@ static struct mem_cgroup *mem_cgroup_alloc(struct mem_cgroup *parent)
 	for (i = 0; i < MEMCG_CGWB_FRN_CNT; i++)
 		memcg->cgwb_frn[i].done =
 			__WB_COMPLETION_INIT(&memcg_cgwb_frn_waitq);
-#endif
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-	spin_lock_init(&memcg->deferred_split_queue.split_queue_lock);
-	INIT_LIST_HEAD(&memcg->deferred_split_queue.split_queue);
-	memcg->deferred_split_queue.split_queue_len = 0;
 #endif
 	lru_gen_init_memcg(memcg);
 	return memcg;
@@ -4167,11 +4162,10 @@ static void mem_cgroup_css_offline(struct cgroup_subsys_state *css)
 	zswap_memcg_offline_cleanup(memcg);
 
 	memcg_offline_kmem(memcg);
-	reparent_deferred_split_queue(memcg);
 	/*
-	 * The reparenting of objcg must be after the reparenting of the
-	 * list_lru and deferred_split_queue above, which ensures that they will
-	 * not mistakenly get the parent list_lru and deferred_split_queue.
+	 * The reparenting of objcg must be after the reparenting of
+	 * the list_lru in memcg_offline_kmem(), which ensures that
+	 * they will not mistakenly get the parent list_lru.
 	 */
 	memcg_reparent_objcgs(memcg);
 	reparent_shrinker_deferred(memcg);
diff --git a/mm/memory.c b/mm/memory.c
index 219b9bf6cae0..e68ceb4aa624 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4651,13 +4651,19 @@ static struct folio *alloc_swap_folio(struct vm_fault *vmf)
 	while (orders) {
 		addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << order);
 		folio = vma_alloc_folio(gfp, order, vma, addr);
-		if (folio) {
-			if (!mem_cgroup_swapin_charge_folio(folio, vma->vm_mm,
-							    gfp, entry))
-				return folio;
+		if (!folio)
+			goto next;
+		if (mem_cgroup_swapin_charge_folio(folio, vma->vm_mm, gfp, entry)) {
 			count_mthp_stat(order, MTHP_STAT_SWPIN_FALLBACK_CHARGE);
 			folio_put(folio);
+			goto next;
 		}
+		if (folio_memcg_list_lru_alloc(folio, &deferred_split_lru, gfp)) {
+			folio_put(folio);
+			goto fallback;
+		}
+		return folio;
+next:
 		count_mthp_stat(order, MTHP_STAT_SWPIN_FALLBACK);
 		order = next_order(&orders, order);
 	}
@@ -5169,24 +5175,28 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf)
 	while (orders) {
 		addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << order);
 		folio = vma_alloc_folio(gfp, order, vma, addr);
-		if (folio) {
-			if (mem_cgroup_charge(folio, vma->vm_mm, gfp)) {
-				count_mthp_stat(order, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE);
-				folio_put(folio);
-				goto next;
-			}
-			folio_throttle_swaprate(folio, gfp);
-			/*
-			 * When a folio is not zeroed during allocation
-			 * (__GFP_ZERO not used) or user folios require special
-			 * handling, folio_zero_user() is used to make sure
-			 * that the page corresponding to the faulting address
-			 * will be hot in the cache after zeroing.
-			 */
-			if (user_alloc_needs_zeroing())
-				folio_zero_user(folio, vmf->address);
-			return folio;
+		if (!folio)
+			goto next;
+		if (mem_cgroup_charge(folio, vma->vm_mm, gfp)) {
+			count_mthp_stat(order, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE);
+			folio_put(folio);
+			goto next;
 		}
+		if (folio_memcg_list_lru_alloc(folio, &deferred_split_lru, gfp)) {
+			folio_put(folio);
+			goto fallback;
+		}
+		folio_throttle_swaprate(folio, gfp);
+		/*
+		 * When a folio is not zeroed during allocation
+		 * (__GFP_ZERO not used) or user folios require special
+		 * handling, folio_zero_user() is used to make sure
+		 * that the page corresponding to the faulting address
+		 * will be hot in the cache after zeroing.
+		 */
+		if (user_alloc_needs_zeroing())
+			folio_zero_user(folio, vmf->address);
+		return folio;
 next:
 		count_mthp_stat(order, MTHP_STAT_ANON_FAULT_FALLBACK);
 		order = next_order(&orders, order);
diff --git a/mm/mm_init.c b/mm/mm_init.c
index cec7bb758bdd..f293a62e652a 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -1388,19 +1388,6 @@ static void __init calculate_node_totalpages(struct pglist_data *pgdat,
 	pr_debug("On node %d totalpages: %lu\n", pgdat->node_id, realtotalpages);
 }
 
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-static void pgdat_init_split_queue(struct pglist_data *pgdat)
-{
-	struct deferred_split *ds_queue = &pgdat->deferred_split_queue;
-
-	spin_lock_init(&ds_queue->split_queue_lock);
-	INIT_LIST_HEAD(&ds_queue->split_queue);
-	ds_queue->split_queue_len = 0;
-}
-#else
-static void pgdat_init_split_queue(struct pglist_data *pgdat) {}
-#endif
-
 #ifdef CONFIG_COMPACTION
 static void pgdat_init_kcompactd(struct pglist_data *pgdat)
 {
@@ -1416,8 +1403,6 @@ static void __meminit pgdat_init_internals(struct pglist_data *pgdat)
 
 	pgdat_resize_init(pgdat);
 	pgdat_kswapd_lock_init(pgdat);
-
-	pgdat_init_split_queue(pgdat);
 	pgdat_init_kcompactd(pgdat);
 
 	init_waitqueue_head(&pgdat->kswapd_wait);
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 1/7] mm: list_lru: lock_list_lru_of_memcg() cannot return NULL if !skip_empty
  2026-03-18 19:53 ` [PATCH v3 1/7] mm: list_lru: lock_list_lru_of_memcg() cannot return NULL if !skip_empty Johannes Weiner
@ 2026-03-18 20:12   ` Shakeel Butt
  2026-03-24 11:30   ` Lorenzo Stoakes (Oracle)
  1 sibling, 0 replies; 29+ messages in thread
From: Shakeel Butt @ 2026-03-18 20:12 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, David Hildenbrand, Yosry Ahmed, Zi Yan,
	Liam R. Howlett, Usama Arif, Kiryl Shutsemau, Dave Chinner,
	Roman Gushchin, linux-mm, linux-kernel

On Wed, Mar 18, 2026 at 03:53:19PM -0400, Johannes Weiner wrote:
> skip_empty is only for the shrinker to abort and skip a list that's
> empty or whose cgroup is being deleted.
> 
> For list additions and deletions, the cgroup hierarchy is walked
> upwards until a valid list_lru head is found, or it will fall back to
> the node list. Acquiring the lock won't fail. Remove the NULL checks
> in those callers.
> 
> Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Acked-by: Shakeel Butt <shakeel.butt@linux.dev>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 3/7] mm: list_lru: move list dead check to lock_list_lru_of_memcg()
  2026-03-18 19:53 ` [PATCH v3 3/7] mm: list_lru: move list dead check to lock_list_lru_of_memcg() Johannes Weiner
@ 2026-03-18 20:20   ` Shakeel Butt
  2026-03-24 11:34   ` Lorenzo Stoakes (Oracle)
  1 sibling, 0 replies; 29+ messages in thread
From: Shakeel Butt @ 2026-03-18 20:20 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, David Hildenbrand, Yosry Ahmed, Zi Yan,
	Liam R. Howlett, Usama Arif, Kiryl Shutsemau, Dave Chinner,
	Roman Gushchin, linux-mm, linux-kernel

On Wed, Mar 18, 2026 at 03:53:21PM -0400, Johannes Weiner wrote:
> Only the MEMCG variant of lock_list_lru() needs to check if there is a
> race with cgroup deletion and list reparenting. Move the check to the
> caller, so that the next patch can unify the lock_list_lru() variants.
> 
> Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Acked-by: Shakeel Butt <shakeel.butt@linux.dev>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 4/7] mm: list_lru: deduplicate lock_list_lru()
  2026-03-18 19:53 ` [PATCH v3 4/7] mm: list_lru: deduplicate lock_list_lru() Johannes Weiner
@ 2026-03-18 20:22   ` Shakeel Butt
  2026-03-24 11:36   ` Lorenzo Stoakes (Oracle)
  1 sibling, 0 replies; 29+ messages in thread
From: Shakeel Butt @ 2026-03-18 20:22 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, David Hildenbrand, Yosry Ahmed, Zi Yan,
	Liam R. Howlett, Usama Arif, Kiryl Shutsemau, Dave Chinner,
	Roman Gushchin, linux-mm, linux-kernel

On Wed, Mar 18, 2026 at 03:53:22PM -0400, Johannes Weiner wrote:
> The MEMCG and !MEMCG paths have the same pattern. Share the code.
> 
> Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Acked-by: Shakeel Butt <shakeel.butt@linux.dev>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 7/7] mm: switch deferred split shrinker to list_lru
  2026-03-18 19:53 ` [PATCH v3 7/7] mm: switch deferred split shrinker to list_lru Johannes Weiner
@ 2026-03-18 20:26   ` David Hildenbrand (Arm)
  2026-03-18 23:18   ` Shakeel Butt
  2026-03-24 13:48   ` Lorenzo Stoakes (Oracle)
  2 siblings, 0 replies; 29+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-18 20:26 UTC (permalink / raw)
  To: Johannes Weiner, Andrew Morton
  Cc: Shakeel Butt, Yosry Ahmed, Zi Yan, Liam R. Howlett, Usama Arif,
	Kiryl Shutsemau, Dave Chinner, Roman Gushchin, linux-mm,
	linux-kernel

On 3/18/26 20:53, Johannes Weiner wrote:
> The deferred split queue handles cgroups in a suboptimal fashion. The
> queue is per-NUMA node or per-cgroup, not the intersection. That means
> on a cgrouped system, a node-restricted allocation entering reclaim
> can end up splitting large pages on other nodes:
> 
> 	alloc/unmap
> 	  deferred_split_folio()
> 	    list_add_tail(memcg->split_queue)
> 	    set_shrinker_bit(memcg, node, deferred_shrinker_id)
> 
> 	for_each_zone_zonelist_nodemask(restricted_nodes)
> 	  mem_cgroup_iter()
> 	    shrink_slab(node, memcg)
> 	      shrink_slab_memcg(node, memcg)
> 	        if test_shrinker_bit(memcg, node, deferred_shrinker_id)
> 	          deferred_split_scan()
> 	            walks memcg->split_queue
> 
> The shrinker bit adds an imperfect guard rail. As soon as the cgroup
> has a single large page on the node of interest, all large pages owned
> by that memcg, including those on other nodes, will be split.
> 
> list_lru properly sets up per-node, per-cgroup lists. As a bonus, it
> streamlines a lot of the list operations and reclaim walks. It's used
> widely by other major shrinkers already. Convert the deferred split
> queue as well.
> 
> The list_lru per-memcg heads are instantiated on demand when the first
> object of interest is allocated for a cgroup, by calling
> folio_memcg_list_lru_alloc(). Add calls to where splittable pages are
> created: anon faults, swapin faults, khugepaged collapse.
> 
> These calls create all possible node heads for the cgroup at once, so
> the migration code (between nodes) doesn't need any special care.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---

Was just hitting sent on a reply on v2 when I spotted this in my inbox.

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 5/7] mm: list_lru: introduce caller locking for additions and deletions
  2026-03-18 19:53 ` [PATCH v3 5/7] mm: list_lru: introduce caller locking for additions and deletions Johannes Weiner
@ 2026-03-18 20:51   ` Shakeel Butt
  2026-03-20 16:18     ` Johannes Weiner
  2026-03-24 11:55   ` Lorenzo Stoakes (Oracle)
  1 sibling, 1 reply; 29+ messages in thread
From: Shakeel Butt @ 2026-03-18 20:51 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, David Hildenbrand, Yosry Ahmed, Zi Yan,
	Liam R. Howlett, Usama Arif, Kiryl Shutsemau, Dave Chinner,
	Roman Gushchin, linux-mm, linux-kernel

On Wed, Mar 18, 2026 at 03:53:23PM -0400, Johannes Weiner wrote:
> Locking is currently internal to the list_lru API. However, a caller
> might want to keep auxiliary state synchronized with the LRU state.
> 
> For example, the THP shrinker uses the lock of its custom LRU to keep
> PG_partially_mapped and vmstats consistent.
> 
> To allow the THP shrinker to switch to list_lru, provide normal and
> irqsafe locking primitives as well as caller-locked variants of the
> addition and deletion functions.
> 
> Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

One nit below, other than that:

Acked-by: Shakeel Butt <shakeel.butt@linux.dev>

>  
> -static inline void lock_list_lru(struct list_lru_one *l, bool irq)
> +static inline void lock_list_lru(struct list_lru_one *l, bool irq,
> +				 unsigned long *irq_flags)
>  {
> -	if (irq)
> +	if (irq_flags)
> +		spin_lock_irqsave(&l->lock, *irq_flags);
> +	else if (irq)

If we move __list_lru_walk_one to use irq_flags then we can remove the irq
param. It is reclaim code path and I don't think additional cost of irqsave
would matter here.



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 6/7] mm: list_lru: introduce folio_memcg_list_lru_alloc()
  2026-03-18 19:53 ` [PATCH v3 6/7] mm: list_lru: introduce folio_memcg_list_lru_alloc() Johannes Weiner
@ 2026-03-18 20:52   ` Shakeel Butt
  2026-03-18 21:01   ` Shakeel Butt
  2026-03-24 12:01   ` Lorenzo Stoakes (Oracle)
  2 siblings, 0 replies; 29+ messages in thread
From: Shakeel Butt @ 2026-03-18 20:52 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, David Hildenbrand, Yosry Ahmed, Zi Yan,
	Liam R. Howlett, Usama Arif, Kiryl Shutsemau, Dave Chinner,
	Roman Gushchin, linux-mm, linux-kernel

On Wed, Mar 18, 2026 at 03:53:24PM -0400, Johannes Weiner wrote:
> memcg_list_lru_alloc() is called every time an object that may end up
> on the list_lru is created. It needs to quickly check if the list_lru
> heads for the memcg already exist, and allocate them when they don't.
> 
> Doing this with folio objects is tricky: folio_memcg() is not stable
> and requires either RCU protection or pinning the cgroup. But it's
> desirable to make the existence check lightweight under RCU, and only
> pin the memcg when we need to allocate list_lru heads and may block.
> 
> In preparation for switching the THP shrinker to list_lru, add a
> helper function for allocating list_lru heads coming from a folio.
> 
> Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Acked-by: Shakeel Butt <shakeel.butt@linux.dev>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 0/7] mm: switch THP shrinker to list_lru
  2026-03-18 19:53 [PATCH v3 0/7] mm: switch THP shrinker to list_lru Johannes Weiner
                   ` (6 preceding siblings ...)
  2026-03-18 19:53 ` [PATCH v3 7/7] mm: switch deferred split shrinker to list_lru Johannes Weiner
@ 2026-03-18 21:00 ` Lorenzo Stoakes (Oracle)
  2026-03-18 22:31   ` Johannes Weiner
  7 siblings, 1 reply; 29+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-18 21:00 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, David Hildenbrand, Shakeel Butt, Yosry Ahmed,
	Zi Yan, Liam R. Howlett, Usama Arif, Kiryl Shutsemau,
	Dave Chinner, Roman Gushchin, linux-mm, linux-kernel

Hi Joannes,

This is at v3 and you've not cc'd the majority of THP people:

MEMORY MANAGEMENT - THP (TRANSPARENT HUGE PAGE)
M:	Andrew Morton <akpm@linux-foundation.org>
M:	David Hildenbrand <david@kernel.org>
M:	Lorenzo Stoakes <ljs@kernel.org>            <--- (!)
R:	Zi Yan <ziy@nvidia.com>
R:	Baolin Wang <baolin.wang@linux.alibaba.com> <---
R:	Liam R. Howlett <Liam.Howlett@oracle.com>
R:	Nico Pache <npache@redhat.com>              <---
R:	Ryan Roberts <ryan.roberts@arm.com>         <---
R:	Dev Jain <dev.jain@arm.com>                 <---
R:	Barry Song <baohua@kernel.org>              <---
R:	Lance Yang <lance.yang@linux.dev>           <---

I'd assume it was an oversight but you're an mm sub-maintainer yourself so
I'm a little surprised...

Anyway could you please make sure to do that going forward?

I'd like to rely on lei's file-tracking stuff to catch cases like this, but
I just haven't got new mail tracking working properly in that environment
and it's really too slow.

It sucks to rely on people cc-ing but email development is what it is :)

Thanks, Lorenzo


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 6/7] mm: list_lru: introduce folio_memcg_list_lru_alloc()
  2026-03-18 19:53 ` [PATCH v3 6/7] mm: list_lru: introduce folio_memcg_list_lru_alloc() Johannes Weiner
  2026-03-18 20:52   ` Shakeel Butt
@ 2026-03-18 21:01   ` Shakeel Butt
  2026-03-24 12:01   ` Lorenzo Stoakes (Oracle)
  2 siblings, 0 replies; 29+ messages in thread
From: Shakeel Butt @ 2026-03-18 21:01 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, David Hildenbrand, Yosry Ahmed, Zi Yan,
	Liam R. Howlett, Usama Arif, Kiryl Shutsemau, Dave Chinner,
	Roman Gushchin, linux-mm, linux-kernel

On Wed, Mar 18, 2026 at 03:53:24PM -0400, Johannes Weiner wrote:
> memcg_list_lru_alloc() is called every time an object that may end up
> on the list_lru is created. It needs to quickly check if the list_lru
> heads for the memcg already exist, and allocate them when they don't.
> 
> Doing this with folio objects is tricky: folio_memcg() is not stable
> and requires either RCU protection or pinning the cgroup. But it's
> desirable to make the existence check lightweight under RCU, and only
> pin the memcg when we need to allocate list_lru heads and may block.
> 
> In preparation for switching the THP shrinker to list_lru, add a
> helper function for allocating list_lru heads coming from a folio.
> 
> Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Acked-by: Shakeel Butt <shakeel.butt@linux.dev>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 0/7] mm: switch THP shrinker to list_lru
  2026-03-18 21:00 ` [PATCH v3 0/7] mm: switch THP " Lorenzo Stoakes (Oracle)
@ 2026-03-18 22:31   ` Johannes Weiner
  2026-03-19  8:47     ` Lorenzo Stoakes (Oracle)
  0 siblings, 1 reply; 29+ messages in thread
From: Johannes Weiner @ 2026-03-18 22:31 UTC (permalink / raw)
  To: Lorenzo Stoakes (Oracle)
  Cc: Andrew Morton, David Hildenbrand, Shakeel Butt, Yosry Ahmed,
	Zi Yan, Liam R. Howlett, Usama Arif, Kiryl Shutsemau,
	Dave Chinner, Roman Gushchin, linux-mm, linux-kernel

On Wed, Mar 18, 2026 at 09:00:57PM +0000, Lorenzo Stoakes (Oracle) wrote:
> Hi Joannes,
> 
> This is at v3 and you've not cc'd the majority of THP people:
> 
> MEMORY MANAGEMENT - THP (TRANSPARENT HUGE PAGE)
> M:	Andrew Morton <akpm@linux-foundation.org>
> M:	David Hildenbrand <david@kernel.org>
> M:	Lorenzo Stoakes <ljs@kernel.org>            <--- (!)
> R:	Zi Yan <ziy@nvidia.com>
> R:	Baolin Wang <baolin.wang@linux.alibaba.com> <---
> R:	Liam R. Howlett <Liam.Howlett@oracle.com>
> R:	Nico Pache <npache@redhat.com>              <---
> R:	Ryan Roberts <ryan.roberts@arm.com>         <---
> R:	Dev Jain <dev.jain@arm.com>                 <---
> R:	Barry Song <baohua@kernel.org>              <---
> R:	Lance Yang <lance.yang@linux.dev>           <---
> 
> I'd assume it was an oversight but you're an mm sub-maintainer yourself so
> I'm a little surprised...
> 
> Anyway could you please make sure to do that going forward?
> 
> I'd like to rely on lei's file-tracking stuff to catch cases like this, but
> I just haven't got new mail tracking working properly in that environment
> and it's really too slow.
> 
> It sucks to rely on people cc-ing but email development is what it is :)
> 
> Thanks, Lorenzo

I apologize, there was no ill intent here at all. get_maintainers.pl
gave me 27 addresses for this series and I honestly thought that was
too aggrandizing and self-important for what these patches are lol.

So what you're looking at is a (poor) attempt at trimming down the CC
list based on who I had talked to about these patches at the THP
meeting and who I remember had worked on shrinkers & cgroups.

I can CC everybody going forward, not a problem. Personally I don't
mind getting CCd generously, but my understanding is that some people
do...


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 7/7] mm: switch deferred split shrinker to list_lru
  2026-03-18 19:53 ` [PATCH v3 7/7] mm: switch deferred split shrinker to list_lru Johannes Weiner
  2026-03-18 20:26   ` David Hildenbrand (Arm)
@ 2026-03-18 23:18   ` Shakeel Butt
  2026-03-24 13:48   ` Lorenzo Stoakes (Oracle)
  2 siblings, 0 replies; 29+ messages in thread
From: Shakeel Butt @ 2026-03-18 23:18 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, David Hildenbrand, Yosry Ahmed, Zi Yan,
	Liam R. Howlett, Usama Arif, Kiryl Shutsemau, Dave Chinner,
	Roman Gushchin, linux-mm, linux-kernel

On Wed, Mar 18, 2026 at 03:53:25PM -0400, Johannes Weiner wrote:
> The deferred split queue handles cgroups in a suboptimal fashion. The
> queue is per-NUMA node or per-cgroup, not the intersection. That means
> on a cgrouped system, a node-restricted allocation entering reclaim
> can end up splitting large pages on other nodes:
> 
> 	alloc/unmap
> 	  deferred_split_folio()
> 	    list_add_tail(memcg->split_queue)
> 	    set_shrinker_bit(memcg, node, deferred_shrinker_id)
> 
> 	for_each_zone_zonelist_nodemask(restricted_nodes)
> 	  mem_cgroup_iter()
> 	    shrink_slab(node, memcg)
> 	      shrink_slab_memcg(node, memcg)
> 	        if test_shrinker_bit(memcg, node, deferred_shrinker_id)
> 	          deferred_split_scan()
> 	            walks memcg->split_queue
> 
> The shrinker bit adds an imperfect guard rail. As soon as the cgroup
> has a single large page on the node of interest, all large pages owned
> by that memcg, including those on other nodes, will be split.
> 
> list_lru properly sets up per-node, per-cgroup lists. As a bonus, it
> streamlines a lot of the list operations and reclaim walks. It's used
> widely by other major shrinkers already. Convert the deferred split
> queue as well.
> 
> The list_lru per-memcg heads are instantiated on demand when the first
> object of interest is allocated for a cgroup, by calling
> folio_memcg_list_lru_alloc(). Add calls to where splittable pages are
> created: anon faults, swapin faults, khugepaged collapse.
> 
> These calls create all possible node heads for the cgroup at once, so
> the migration code (between nodes) doesn't need any special care.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Acked-by: Shakeel Butt <shakeel.butt@linux.dev>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 0/7] mm: switch THP shrinker to list_lru
  2026-03-18 22:31   ` Johannes Weiner
@ 2026-03-19  8:47     ` Lorenzo Stoakes (Oracle)
  2026-03-19  8:52       ` David Hildenbrand (Arm)
  0 siblings, 1 reply; 29+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-19  8:47 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, David Hildenbrand, Shakeel Butt, Yosry Ahmed,
	Zi Yan, Liam R. Howlett, Usama Arif, Kiryl Shutsemau,
	Dave Chinner, Roman Gushchin, linux-mm, linux-kernel

On Wed, Mar 18, 2026 at 06:31:14PM -0400, Johannes Weiner wrote:
> On Wed, Mar 18, 2026 at 09:00:57PM +0000, Lorenzo Stoakes (Oracle) wrote:
> > Hi Joannes,
> >
> > This is at v3 and you've not cc'd the majority of THP people:
> >
> > MEMORY MANAGEMENT - THP (TRANSPARENT HUGE PAGE)
> > M:	Andrew Morton <akpm@linux-foundation.org>
> > M:	David Hildenbrand <david@kernel.org>
> > M:	Lorenzo Stoakes <ljs@kernel.org>            <--- (!)
> > R:	Zi Yan <ziy@nvidia.com>
> > R:	Baolin Wang <baolin.wang@linux.alibaba.com> <---
> > R:	Liam R. Howlett <Liam.Howlett@oracle.com>
> > R:	Nico Pache <npache@redhat.com>              <---
> > R:	Ryan Roberts <ryan.roberts@arm.com>         <---
> > R:	Dev Jain <dev.jain@arm.com>                 <---
> > R:	Barry Song <baohua@kernel.org>              <---
> > R:	Lance Yang <lance.yang@linux.dev>           <---
> >
> > I'd assume it was an oversight but you're an mm sub-maintainer yourself so
> > I'm a little surprised...
> >
> > Anyway could you please make sure to do that going forward?
> >
> > I'd like to rely on lei's file-tracking stuff to catch cases like this, but
> > I just haven't got new mail tracking working properly in that environment
> > and it's really too slow.
> >
> > It sucks to rely on people cc-ing but email development is what it is :)
> >
> > Thanks, Lorenzo
>
> I apologize, there was no ill intent here at all. get_maintainers.pl
> gave me 27 addresses for this series and I honestly thought that was
> too aggrandizing and self-important for what these patches are lol.

Thanks, yeah email for development is a complete pain to be honest!

I wouldn't say that the cc list in any way would be perceived as
self-aggrandizing, I mean some of the trivial stuff I've sent to larger cc
lists :P

But I get your point, it's a kind of impossible balancing act :)

>
> So what you're looking at is a (poor) attempt at trimming down the CC
> list based on who I had talked to about these patches at the THP
> meeting and who I remember had worked on shrinkers & cgroups.
>
> I can CC everybody going forward, not a problem. Personally I don't
> mind getting CCd generously, but my understanding is that some people
> do...

Yeah I have often sent series where the list felt... egregiously huge, but
I _generally_ treat it as - people are used to getting tonnes of mail - so
some more shouldn't hurt.

I mean in [0] for example I really cringed at sending a 23 patch series
that is super super mm-centric and really just updating how we manage VMA
flags (trying to get away from system word size flags) - to a trillion
people.

TL;DR - anyway, I think get_maintainer.pl --nogit (and trim anyone who
isn't listed as a maintainer or reviewer) is the right balance to take -
maintainers with screwed up email set ups (*cough* naming no names
beginning with L and ending with orenzo) and reviewers who have signed up
for lots of mail I think are totally fine to always send to.

And of course +cc anybody else you feel appropriate.

The 'git' maintainers entries can absolutely be dropped though, unless
you're explicitly say radically changing something somebody contributed and
it makes sense to include them though all that's subjective!

Cheers, Lorenzo

[0]:https://lore.kernel.org/linux-mm/cover.1773846935.git.ljs@kernel.org/


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 0/7] mm: switch THP shrinker to list_lru
  2026-03-19  8:47     ` Lorenzo Stoakes (Oracle)
@ 2026-03-19  8:52       ` David Hildenbrand (Arm)
  2026-03-19 11:45         ` Lorenzo Stoakes (Oracle)
  0 siblings, 1 reply; 29+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-19  8:52 UTC (permalink / raw)
  To: Lorenzo Stoakes (Oracle), Johannes Weiner
  Cc: Andrew Morton, Shakeel Butt, Yosry Ahmed, Zi Yan, Liam R. Howlett,
	Usama Arif, Kiryl Shutsemau, Dave Chinner, Roman Gushchin,
	linux-mm, linux-kernel

> TL;DR - anyway, I think get_maintainer.pl --nogit (and trim anyone who
> isn't listed as a maintainer or reviewer) is the right balance to take -
> maintainers with screwed up email set ups (*cough* naming no names
> beginning with L and ending with orenzo) and reviewers who have signed up
> for lots of mail I think are totally fine to always send to.

On some simplistic changes (e.g., simple function rename) I sometimes
drop the Reviewers of subsystems because the simplistic changes are
mostly just as information to maintainers that there might be conflicts.

That helped in the past to reduce the CC lists on series that touch many
archs / subsystems.

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 0/7] mm: switch THP shrinker to list_lru
  2026-03-19  8:52       ` David Hildenbrand (Arm)
@ 2026-03-19 11:45         ` Lorenzo Stoakes (Oracle)
  0 siblings, 0 replies; 29+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-19 11:45 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Johannes Weiner, Andrew Morton, Shakeel Butt, Yosry Ahmed, Zi Yan,
	Liam R. Howlett, Usama Arif, Kiryl Shutsemau, Dave Chinner,
	Roman Gushchin, linux-mm, linux-kernel

On Thu, Mar 19, 2026 at 09:52:14AM +0100, David Hildenbrand (Arm) wrote:
> > TL;DR - anyway, I think get_maintainer.pl --nogit (and trim anyone who
> > isn't listed as a maintainer or reviewer) is the right balance to take -
> > maintainers with screwed up email set ups (*cough* naming no names
> > beginning with L and ending with orenzo) and reviewers who have signed up
> > for lots of mail I think are totally fine to always send to.
>
> On some simplistic changes (e.g., simple function rename) I sometimes
> drop the Reviewers of subsystems because the simplistic changes are
> mostly just as information to maintainers that there might be conflicts.

Yeah makes sense!

>
> That helped in the past to reduce the CC lists on series that touch many
> archs / subsystems.

It'd be nice if get_maintainer.pl could have some functionality to make all this
easier. But that sounds like self-volunteering a bit too much :P *cough*

>
> --
> Cheers,
>
> David

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 5/7] mm: list_lru: introduce caller locking for additions and deletions
  2026-03-18 20:51   ` Shakeel Butt
@ 2026-03-20 16:18     ` Johannes Weiner
  0 siblings, 0 replies; 29+ messages in thread
From: Johannes Weiner @ 2026-03-20 16:18 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Andrew Morton, David Hildenbrand, Yosry Ahmed, Zi Yan,
	Liam R. Howlett, Usama Arif, Kiryl Shutsemau, Dave Chinner,
	Roman Gushchin, linux-mm, linux-kernel

On Wed, Mar 18, 2026 at 01:51:04PM -0700, Shakeel Butt wrote:
> On Wed, Mar 18, 2026 at 03:53:23PM -0400, Johannes Weiner wrote:
> > Locking is currently internal to the list_lru API. However, a caller
> > might want to keep auxiliary state synchronized with the LRU state.
> > 
> > For example, the THP shrinker uses the lock of its custom LRU to keep
> > PG_partially_mapped and vmstats consistent.
> > 
> > To allow the THP shrinker to switch to list_lru, provide normal and
> > irqsafe locking primitives as well as caller-locked variants of the
> > addition and deletion functions.
> > 
> > Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>
> > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> 
> One nit below, other than that:
> 
> Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
> 
> >  
> > -static inline void lock_list_lru(struct list_lru_one *l, bool irq)
> > +static inline void lock_list_lru(struct list_lru_one *l, bool irq,
> > +				 unsigned long *irq_flags)
> >  {
> > -	if (irq)
> > +	if (irq_flags)
> > +		spin_lock_irqsave(&l->lock, *irq_flags);
> > +	else if (irq)
> 
> If we move __list_lru_walk_one to use irq_flags then we can remove the irq
> param. It is reclaim code path and I don't think additional cost of irqsave
> would matter here.

The workingset shrinker's isolation function uses unlock_irq() and
cond_resched(). That would be non-trivial to rewrite - pass flags
around; keep irqs disabled for the whole reclaim cycle; break it into
a two-stage process. This sounds like a higher maintenance burden than
the bool here.

I know there is some cost to this distinction, but I actually do find
it useful to know the difference. It's self-documenting context.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 1/7] mm: list_lru: lock_list_lru_of_memcg() cannot return NULL if !skip_empty
  2026-03-18 19:53 ` [PATCH v3 1/7] mm: list_lru: lock_list_lru_of_memcg() cannot return NULL if !skip_empty Johannes Weiner
  2026-03-18 20:12   ` Shakeel Butt
@ 2026-03-24 11:30   ` Lorenzo Stoakes (Oracle)
  1 sibling, 0 replies; 29+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-24 11:30 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, David Hildenbrand, Shakeel Butt, Yosry Ahmed,
	Zi Yan, Liam R. Howlett, Usama Arif, Kiryl Shutsemau,
	Dave Chinner, Roman Gushchin, linux-mm, linux-kernel

On Wed, Mar 18, 2026 at 03:53:19PM -0400, Johannes Weiner wrote:
> skip_empty is only for the shrinker to abort and skip a list that's
> empty or whose cgroup is being deleted.
>
> For list additions and deletions, the cgroup hierarchy is walked
> upwards until a valid list_lru head is found, or it will fall back to
> the node list. Acquiring the lock won't fail. Remove the NULL checks
> in those callers.
>
> Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

LGTM, as !skip_empty and so that just isn't possible, so:

Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>

> ---
>  mm/list_lru.c | 5 +----
>  1 file changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/mm/list_lru.c b/mm/list_lru.c
> index 26463ae29c64..d96fd50fc9af 100644
> --- a/mm/list_lru.c
> +++ b/mm/list_lru.c
> @@ -165,8 +165,6 @@ bool list_lru_add(struct list_lru *lru, struct list_head *item, int nid,
>  	struct list_lru_one *l;
>
>  	l = lock_list_lru_of_memcg(lru, nid, memcg, false, false);
> -	if (!l)
> -		return false;
>  	if (list_empty(item)) {
>  		list_add_tail(item, &l->list);
>  		/* Set shrinker bit if the first element was added */
> @@ -203,9 +201,8 @@ bool list_lru_del(struct list_lru *lru, struct list_head *item, int nid,
>  {
>  	struct list_lru_node *nlru = &lru->node[nid];
>  	struct list_lru_one *l;
> +
>  	l = lock_list_lru_of_memcg(lru, nid, memcg, false, false);
> -	if (!l)
> -		return false;
>  	if (!list_empty(item)) {
>  		list_del_init(item);
>  		l->nr_items--;
> --
> 2.53.0
>

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 2/7] mm: list_lru: deduplicate unlock_list_lru()
  2026-03-18 19:53 ` [PATCH v3 2/7] mm: list_lru: deduplicate unlock_list_lru() Johannes Weiner
@ 2026-03-24 11:32   ` Lorenzo Stoakes (Oracle)
  0 siblings, 0 replies; 29+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-24 11:32 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, David Hildenbrand, Shakeel Butt, Yosry Ahmed,
	Zi Yan, Liam R. Howlett, Usama Arif, Kiryl Shutsemau,
	Dave Chinner, Roman Gushchin, linux-mm, linux-kernel

On Wed, Mar 18, 2026 at 03:53:20PM -0400, Johannes Weiner wrote:
> The MEMCG and !MEMCG variants are the same. lock_list_lru() has the
> same pattern when bailing. Consolidate into a common implementation.
>
> Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>
> Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Nice, LGTM so:

Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>

> ---
>  mm/list_lru.c | 29 +++++++++--------------------
>  1 file changed, 9 insertions(+), 20 deletions(-)
>
> diff --git a/mm/list_lru.c b/mm/list_lru.c
> index d96fd50fc9af..e873bc26a7ef 100644
> --- a/mm/list_lru.c
> +++ b/mm/list_lru.c
> @@ -15,6 +15,14 @@
>  #include "slab.h"
>  #include "internal.h"
>
> +static inline void unlock_list_lru(struct list_lru_one *l, bool irq_off)
> +{
> +	if (irq_off)
> +		spin_unlock_irq(&l->lock);
> +	else
> +		spin_unlock(&l->lock);
> +}
> +
>  #ifdef CONFIG_MEMCG
>  static LIST_HEAD(memcg_list_lrus);
>  static DEFINE_MUTEX(list_lrus_mutex);
> @@ -67,10 +75,7 @@ static inline bool lock_list_lru(struct list_lru_one *l, bool irq)
>  	else
>  		spin_lock(&l->lock);
>  	if (unlikely(READ_ONCE(l->nr_items) == LONG_MIN)) {
> -		if (irq)
> -			spin_unlock_irq(&l->lock);
> -		else
> -			spin_unlock(&l->lock);
> +		unlock_list_lru(l, irq);
>  		return false;
>  	}
>  	return true;
> @@ -101,14 +106,6 @@ lock_list_lru_of_memcg(struct list_lru *lru, int nid, struct mem_cgroup *memcg,
>  	memcg = parent_mem_cgroup(memcg);
>  	goto again;
>  }
> -
> -static inline void unlock_list_lru(struct list_lru_one *l, bool irq_off)
> -{
> -	if (irq_off)
> -		spin_unlock_irq(&l->lock);
> -	else
> -		spin_unlock(&l->lock);
> -}
>  #else
>  static void list_lru_register(struct list_lru *lru)
>  {
> @@ -147,14 +144,6 @@ lock_list_lru_of_memcg(struct list_lru *lru, int nid, struct mem_cgroup *memcg,
>
>  	return l;
>  }
> -
> -static inline void unlock_list_lru(struct list_lru_one *l, bool irq_off)
> -{
> -	if (irq_off)
> -		spin_unlock_irq(&l->lock);
> -	else
> -		spin_unlock(&l->lock);
> -}
>  #endif /* CONFIG_MEMCG */
>
>  /* The caller must ensure the memcg lifetime. */
> --
> 2.53.0
>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 3/7] mm: list_lru: move list dead check to lock_list_lru_of_memcg()
  2026-03-18 19:53 ` [PATCH v3 3/7] mm: list_lru: move list dead check to lock_list_lru_of_memcg() Johannes Weiner
  2026-03-18 20:20   ` Shakeel Butt
@ 2026-03-24 11:34   ` Lorenzo Stoakes (Oracle)
  1 sibling, 0 replies; 29+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-24 11:34 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, David Hildenbrand, Shakeel Butt, Yosry Ahmed,
	Zi Yan, Liam R. Howlett, Usama Arif, Kiryl Shutsemau,
	Dave Chinner, Roman Gushchin, linux-mm, linux-kernel

On Wed, Mar 18, 2026 at 03:53:21PM -0400, Johannes Weiner wrote:
> Only the MEMCG variant of lock_list_lru() needs to check if there is a
> race with cgroup deletion and list reparenting. Move the check to the
> caller, so that the next patch can unify the lock_list_lru() variants.
>
> Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Logic looks correct so:

Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>

> ---
>  mm/list_lru.c | 17 ++++++++---------
>  1 file changed, 8 insertions(+), 9 deletions(-)
>
> diff --git a/mm/list_lru.c b/mm/list_lru.c
> index e873bc26a7ef..1a39ff490643 100644
> --- a/mm/list_lru.c
> +++ b/mm/list_lru.c
> @@ -68,17 +68,12 @@ list_lru_from_memcg_idx(struct list_lru *lru, int nid, int idx)
>  	return &lru->node[nid].lru;
>  }
>
> -static inline bool lock_list_lru(struct list_lru_one *l, bool irq)
> +static inline void lock_list_lru(struct list_lru_one *l, bool irq)
>  {
>  	if (irq)
>  		spin_lock_irq(&l->lock);
>  	else
>  		spin_lock(&l->lock);
> -	if (unlikely(READ_ONCE(l->nr_items) == LONG_MIN)) {
> -		unlock_list_lru(l, irq);
> -		return false;
> -	}
> -	return true;
>  }
>
>  static inline struct list_lru_one *
> @@ -90,9 +85,13 @@ lock_list_lru_of_memcg(struct list_lru *lru, int nid, struct mem_cgroup *memcg,
>  	rcu_read_lock();
>  again:
>  	l = list_lru_from_memcg_idx(lru, nid, memcg_kmem_id(memcg));
> -	if (likely(l) && lock_list_lru(l, irq)) {
> -		rcu_read_unlock();
> -		return l;
> +	if (likely(l)) {
> +		lock_list_lru(l, irq);
> +		if (likely(READ_ONCE(l->nr_items) != LONG_MIN)) {
> +			rcu_read_unlock();
> +			return l;
> +		}
> +		unlock_list_lru(l, irq);
>  	}
>  	/*
>  	 * Caller may simply bail out if raced with reparenting or
> --
> 2.53.0
>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 4/7] mm: list_lru: deduplicate lock_list_lru()
  2026-03-18 19:53 ` [PATCH v3 4/7] mm: list_lru: deduplicate lock_list_lru() Johannes Weiner
  2026-03-18 20:22   ` Shakeel Butt
@ 2026-03-24 11:36   ` Lorenzo Stoakes (Oracle)
  1 sibling, 0 replies; 29+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-24 11:36 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, David Hildenbrand, Shakeel Butt, Yosry Ahmed,
	Zi Yan, Liam R. Howlett, Usama Arif, Kiryl Shutsemau,
	Dave Chinner, Roman Gushchin, linux-mm, linux-kernel

On Wed, Mar 18, 2026 at 03:53:22PM -0400, Johannes Weiner wrote:
> The MEMCG and !MEMCG paths have the same pattern. Share the code.
>
> Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

I like the symmetry also here :) LGTM, so:

Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>

> ---
>  mm/list_lru.c | 21 +++++++++------------
>  1 file changed, 9 insertions(+), 12 deletions(-)
>
> diff --git a/mm/list_lru.c b/mm/list_lru.c
> index 1a39ff490643..4d74c2e9c2a5 100644
> --- a/mm/list_lru.c
> +++ b/mm/list_lru.c
> @@ -15,6 +15,14 @@
>  #include "slab.h"
>  #include "internal.h"
>
> +static inline void lock_list_lru(struct list_lru_one *l, bool irq)
> +{
> +	if (irq)
> +		spin_lock_irq(&l->lock);
> +	else
> +		spin_lock(&l->lock);
> +}
> +
>  static inline void unlock_list_lru(struct list_lru_one *l, bool irq_off)
>  {
>  	if (irq_off)
> @@ -68,14 +76,6 @@ list_lru_from_memcg_idx(struct list_lru *lru, int nid, int idx)
>  	return &lru->node[nid].lru;
>  }
>
> -static inline void lock_list_lru(struct list_lru_one *l, bool irq)
> -{
> -	if (irq)
> -		spin_lock_irq(&l->lock);
> -	else
> -		spin_lock(&l->lock);
> -}
> -
>  static inline struct list_lru_one *
>  lock_list_lru_of_memcg(struct list_lru *lru, int nid, struct mem_cgroup *memcg,
>  		       bool irq, bool skip_empty)
> @@ -136,10 +136,7 @@ lock_list_lru_of_memcg(struct list_lru *lru, int nid, struct mem_cgroup *memcg,
>  {
>  	struct list_lru_one *l = &lru->node[nid].lru;
>
> -	if (irq)
> -		spin_lock_irq(&l->lock);
> -	else
> -		spin_lock(&l->lock);
> +	lock_list_lru(l, irq);
>
>  	return l;
>  }
> --
> 2.53.0
>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 5/7] mm: list_lru: introduce caller locking for additions and deletions
  2026-03-18 19:53 ` [PATCH v3 5/7] mm: list_lru: introduce caller locking for additions and deletions Johannes Weiner
  2026-03-18 20:51   ` Shakeel Butt
@ 2026-03-24 11:55   ` Lorenzo Stoakes (Oracle)
  1 sibling, 0 replies; 29+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-24 11:55 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, David Hildenbrand, Shakeel Butt, Yosry Ahmed,
	Zi Yan, Liam R. Howlett, Usama Arif, Kiryl Shutsemau,
	Dave Chinner, Roman Gushchin, linux-mm, linux-kernel

On Wed, Mar 18, 2026 at 03:53:23PM -0400, Johannes Weiner wrote:
> Locking is currently internal to the list_lru API. However, a caller
> might want to keep auxiliary state synchronized with the LRU state.
>
> For example, the THP shrinker uses the lock of its custom LRU to keep
> PG_partially_mapped and vmstats consistent.
>
> To allow the THP shrinker to switch to list_lru, provide normal and
> irqsafe locking primitives as well as caller-locked variants of the
> addition and deletion functions.
>
> Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Had a good look through the logic, went to write comments more than once then
realised I didn't need to, so LGTM and:

Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>

> ---
>  include/linux/list_lru.h |  34 +++++++++++++
>  mm/list_lru.c            | 107 +++++++++++++++++++++++++++------------
>  2 files changed, 110 insertions(+), 31 deletions(-)
>
> diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h
> index fe739d35a864..4afc02deb44d 100644
> --- a/include/linux/list_lru.h
> +++ b/include/linux/list_lru.h
> @@ -83,6 +83,40 @@ int memcg_list_lru_alloc(struct mem_cgroup *memcg, struct list_lru *lru,
>  			 gfp_t gfp);
>  void memcg_reparent_list_lrus(struct mem_cgroup *memcg, struct mem_cgroup *parent);
>
> +/**
> + * list_lru_lock: lock the sublist for the given node and memcg
> + * @lru: the lru pointer
> + * @nid: the node id of the sublist to lock.
> + * @memcg: the cgroup of the sublist to lock.
> + *
> + * Returns the locked list_lru_one sublist. The caller must call
> + * list_lru_unlock() when done.
> + *
> + * You must ensure that the memcg is not freed during this call (e.g., with
> + * rcu or by taking a css refcnt).
> + *
> + * Return: the locked list_lru_one, or NULL on failure
> + */
> +struct list_lru_one *list_lru_lock(struct list_lru *lru, int nid,
> +		struct mem_cgroup *memcg);
> +
> +/**
> + * list_lru_unlock: unlock a sublist locked by list_lru_lock()
> + * @l: the list_lru_one to unlock
> + */
> +void list_lru_unlock(struct list_lru_one *l);
> +
> +struct list_lru_one *list_lru_lock_irqsave(struct list_lru *lru, int nid,
> +		struct mem_cgroup *memcg, unsigned long *irq_flags);
> +void list_lru_unlock_irqrestore(struct list_lru_one *l,
> +		unsigned long *irq_flags);
> +
> +/* Caller-locked variants, see list_lru_add() etc for documentation */
> +bool __list_lru_add(struct list_lru *lru, struct list_lru_one *l,
> +		struct list_head *item, int nid, struct mem_cgroup *memcg);
> +bool __list_lru_del(struct list_lru *lru, struct list_lru_one *l,
> +		struct list_head *item, int nid);
> +
>  /**
>   * list_lru_add: add an element to the lru list's tail
>   * @lru: the lru pointer
> diff --git a/mm/list_lru.c b/mm/list_lru.c
> index 4d74c2e9c2a5..b817c0f48f73 100644
> --- a/mm/list_lru.c
> +++ b/mm/list_lru.c
> @@ -15,17 +15,23 @@
>  #include "slab.h"
>  #include "internal.h"
>
> -static inline void lock_list_lru(struct list_lru_one *l, bool irq)
> +static inline void lock_list_lru(struct list_lru_one *l, bool irq,
> +				 unsigned long *irq_flags)
>  {
> -	if (irq)
> +	if (irq_flags)
> +		spin_lock_irqsave(&l->lock, *irq_flags);
> +	else if (irq)
>  		spin_lock_irq(&l->lock);
>  	else
>  		spin_lock(&l->lock);
>  }
>
> -static inline void unlock_list_lru(struct list_lru_one *l, bool irq_off)
> +static inline void unlock_list_lru(struct list_lru_one *l, bool irq_off,
> +				   unsigned long *irq_flags)
>  {
> -	if (irq_off)
> +	if (irq_flags)
> +		spin_unlock_irqrestore(&l->lock, *irq_flags);
> +	else if (irq_off)
>  		spin_unlock_irq(&l->lock);
>  	else
>  		spin_unlock(&l->lock);
> @@ -78,7 +84,7 @@ list_lru_from_memcg_idx(struct list_lru *lru, int nid, int idx)
>
>  static inline struct list_lru_one *
>  lock_list_lru_of_memcg(struct list_lru *lru, int nid, struct mem_cgroup *memcg,
> -		       bool irq, bool skip_empty)
> +		       bool irq, unsigned long *irq_flags, bool skip_empty)
>  {
>  	struct list_lru_one *l;
>
> @@ -86,12 +92,12 @@ lock_list_lru_of_memcg(struct list_lru *lru, int nid, struct mem_cgroup *memcg,
>  again:
>  	l = list_lru_from_memcg_idx(lru, nid, memcg_kmem_id(memcg));
>  	if (likely(l)) {
> -		lock_list_lru(l, irq);
> +		lock_list_lru(l, irq, irq_flags);
>  		if (likely(READ_ONCE(l->nr_items) != LONG_MIN)) {
>  			rcu_read_unlock();
>  			return l;
>  		}
> -		unlock_list_lru(l, irq);
> +		unlock_list_lru(l, irq, irq_flags);
>  	}
>  	/*
>  	 * Caller may simply bail out if raced with reparenting or
> @@ -132,37 +138,81 @@ list_lru_from_memcg_idx(struct list_lru *lru, int nid, int idx)
>
>  static inline struct list_lru_one *
>  lock_list_lru_of_memcg(struct list_lru *lru, int nid, struct mem_cgroup *memcg,
> -		       bool irq, bool skip_empty)
> +		       bool irq, unsigned long *irq_flags, bool skip_empty)
>  {
>  	struct list_lru_one *l = &lru->node[nid].lru;
>
> -	lock_list_lru(l, irq);
> +	lock_list_lru(l, irq, irq_flags);
>
>  	return l;
>  }
>  #endif /* CONFIG_MEMCG */
>
> -/* The caller must ensure the memcg lifetime. */
> -bool list_lru_add(struct list_lru *lru, struct list_head *item, int nid,
> -		  struct mem_cgroup *memcg)
> +struct list_lru_one *list_lru_lock(struct list_lru *lru, int nid,
> +				   struct mem_cgroup *memcg)
>  {
> -	struct list_lru_node *nlru = &lru->node[nid];
> -	struct list_lru_one *l;
> +	return lock_list_lru_of_memcg(lru, nid, memcg, /*irq=*/false,
> +				      /*irq_flags=*/NULL, /*skip_empty=*/false);
> +}
> +
> +void list_lru_unlock(struct list_lru_one *l)
> +{
> +	unlock_list_lru(l, /*irq_off=*/false, /*irq_flags=*/NULL);
> +}
> +
> +struct list_lru_one *list_lru_lock_irqsave(struct list_lru *lru, int nid,
> +					   struct mem_cgroup *memcg,
> +					   unsigned long *flags)
> +{
> +	return lock_list_lru_of_memcg(lru, nid, memcg, /*irq=*/true,
> +				      /*irq_flags=*/flags, /*skip_empty=*/false);
> +}
> +
> +void list_lru_unlock_irqrestore(struct list_lru_one *l, unsigned long *flags)
> +{
> +	unlock_list_lru(l, /*irq_off=*/true, /*irq_flags=*/flags);
> +}
>
> -	l = lock_list_lru_of_memcg(lru, nid, memcg, false, false);
> +bool __list_lru_add(struct list_lru *lru, struct list_lru_one *l,
> +		    struct list_head *item, int nid,
> +		    struct mem_cgroup *memcg)
> +{
>  	if (list_empty(item)) {
>  		list_add_tail(item, &l->list);
>  		/* Set shrinker bit if the first element was added */
>  		if (!l->nr_items++)
>  			set_shrinker_bit(memcg, nid, lru_shrinker_id(lru));
> -		unlock_list_lru(l, false);
> -		atomic_long_inc(&nlru->nr_items);
> +		atomic_long_inc(&lru->node[nid].nr_items);
> +		return true;
> +	}
> +	return false;
> +}
> +
> +bool __list_lru_del(struct list_lru *lru, struct list_lru_one *l,
> +		    struct list_head *item, int nid)
> +{
> +	if (!list_empty(item)) {
> +		list_del_init(item);
> +		l->nr_items--;
> +		atomic_long_dec(&lru->node[nid].nr_items);
>  		return true;
>  	}
> -	unlock_list_lru(l, false);
>  	return false;
>  }
>
> +/* The caller must ensure the memcg lifetime. */
> +bool list_lru_add(struct list_lru *lru, struct list_head *item, int nid,
> +		  struct mem_cgroup *memcg)
> +{
> +	struct list_lru_one *l;
> +	bool ret;
> +
> +	l = list_lru_lock(lru, nid, memcg);
> +	ret = __list_lru_add(lru, l, item, nid, memcg);
> +	list_lru_unlock(l);
> +	return ret;
> +}
> +
>  bool list_lru_add_obj(struct list_lru *lru, struct list_head *item)
>  {
>  	bool ret;
> @@ -184,19 +234,13 @@ EXPORT_SYMBOL_GPL(list_lru_add_obj);
>  bool list_lru_del(struct list_lru *lru, struct list_head *item, int nid,
>  		  struct mem_cgroup *memcg)
>  {
> -	struct list_lru_node *nlru = &lru->node[nid];
>  	struct list_lru_one *l;
> +	bool ret;
>
> -	l = lock_list_lru_of_memcg(lru, nid, memcg, false, false);
> -	if (!list_empty(item)) {
> -		list_del_init(item);
> -		l->nr_items--;
> -		unlock_list_lru(l, false);
> -		atomic_long_dec(&nlru->nr_items);
> -		return true;
> -	}
> -	unlock_list_lru(l, false);
> -	return false;
> +	l = list_lru_lock(lru, nid, memcg);
> +	ret = __list_lru_del(lru, l, item, nid);
> +	list_lru_unlock(l);
> +	return ret;
>  }
>
>  bool list_lru_del_obj(struct list_lru *lru, struct list_head *item)
> @@ -269,7 +313,8 @@ __list_lru_walk_one(struct list_lru *lru, int nid, struct mem_cgroup *memcg,
>  	unsigned long isolated = 0;
>
>  restart:
> -	l = lock_list_lru_of_memcg(lru, nid, memcg, irq_off, true);
> +	l = lock_list_lru_of_memcg(lru, nid, memcg, /*irq=*/irq_off,
> +				   /*irq_flags=*/NULL, /*skip_empty=*/true);
>  	if (!l)
>  		return isolated;
>  	list_for_each_safe(item, n, &l->list) {
> @@ -310,7 +355,7 @@ __list_lru_walk_one(struct list_lru *lru, int nid, struct mem_cgroup *memcg,
>  			BUG();
>  		}
>  	}
> -	unlock_list_lru(l, irq_off);
> +	unlock_list_lru(l, irq_off, NULL);
>  out:
>  	return isolated;
>  }
> --
> 2.53.0
>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 6/7] mm: list_lru: introduce folio_memcg_list_lru_alloc()
  2026-03-18 19:53 ` [PATCH v3 6/7] mm: list_lru: introduce folio_memcg_list_lru_alloc() Johannes Weiner
  2026-03-18 20:52   ` Shakeel Butt
  2026-03-18 21:01   ` Shakeel Butt
@ 2026-03-24 12:01   ` Lorenzo Stoakes (Oracle)
  2 siblings, 0 replies; 29+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-24 12:01 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, David Hildenbrand, Shakeel Butt, Yosry Ahmed,
	Zi Yan, Liam R. Howlett, Usama Arif, Kiryl Shutsemau,
	Dave Chinner, Roman Gushchin, linux-mm, linux-kernel

On Wed, Mar 18, 2026 at 03:53:24PM -0400, Johannes Weiner wrote:
> memcg_list_lru_alloc() is called every time an object that may end up
> on the list_lru is created. It needs to quickly check if the list_lru
> heads for the memcg already exist, and allocate them when they don't.
>
> Doing this with folio objects is tricky: folio_memcg() is not stable
> and requires either RCU protection or pinning the cgroup. But it's
> desirable to make the existence check lightweight under RCU, and only
> pin the memcg when we need to allocate list_lru heads and may block.
>
> In preparation for switching the THP shrinker to list_lru, add a
> helper function for allocating list_lru heads coming from a folio.
>
> Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Logic LGTM, but would be nice to have some kdoc. With that addressed, feel free
to add:

Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>

> ---
>  include/linux/list_lru.h | 12 ++++++++++++
>  mm/list_lru.c            | 39 ++++++++++++++++++++++++++++++++++-----
>  2 files changed, 46 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h
> index 4afc02deb44d..4bd29b61c59a 100644
> --- a/include/linux/list_lru.h
> +++ b/include/linux/list_lru.h
> @@ -81,6 +81,18 @@ static inline int list_lru_init_memcg_key(struct list_lru *lru, struct shrinker
>
>  int memcg_list_lru_alloc(struct mem_cgroup *memcg, struct list_lru *lru,
>  			 gfp_t gfp);
> +
> +#ifdef CONFIG_MEMCG
> +int folio_memcg_list_lru_alloc(struct folio *folio, struct list_lru *lru,
> +			       gfp_t gfp);

Could we have a kdoc comment for this? Thanks!

> +#else
> +static inline int folio_memcg_list_lru_alloc(struct folio *folio,
> +					     struct list_lru *lru, gfp_t gfp)
> +{
> +	return 0;
> +}
> +#endif
> +
>  void memcg_reparent_list_lrus(struct mem_cgroup *memcg, struct mem_cgroup *parent);
>
>  /**
> diff --git a/mm/list_lru.c b/mm/list_lru.c
> index b817c0f48f73..1ccdd45b1d14 100644
> --- a/mm/list_lru.c
> +++ b/mm/list_lru.c
> @@ -537,17 +537,14 @@ static inline bool memcg_list_lru_allocated(struct mem_cgroup *memcg,
>  	return idx < 0 || xa_load(&lru->xa, idx);
>  }
>
> -int memcg_list_lru_alloc(struct mem_cgroup *memcg, struct list_lru *lru,
> -			 gfp_t gfp)
> +static int __memcg_list_lru_alloc(struct mem_cgroup *memcg,
> +				  struct list_lru *lru, gfp_t gfp)
>  {
>  	unsigned long flags;
>  	struct list_lru_memcg *mlru = NULL;
>  	struct mem_cgroup *pos, *parent;
>  	XA_STATE(xas, &lru->xa, 0);
>
> -	if (!list_lru_memcg_aware(lru) || memcg_list_lru_allocated(memcg, lru))
> -		return 0;
> -
>  	gfp &= GFP_RECLAIM_MASK;
>  	/*
>  	 * Because the list_lru can be reparented to the parent cgroup's
> @@ -588,6 +585,38 @@ int memcg_list_lru_alloc(struct mem_cgroup *memcg, struct list_lru *lru,
>
>  	return xas_error(&xas);
>  }
> +
> +int memcg_list_lru_alloc(struct mem_cgroup *memcg, struct list_lru *lru,
> +			 gfp_t gfp)
> +{
> +	if (!list_lru_memcg_aware(lru) || memcg_list_lru_allocated(memcg, lru))
> +		return 0;
> +	return __memcg_list_lru_alloc(memcg, lru, gfp);
> +}
> +
> +int folio_memcg_list_lru_alloc(struct folio *folio, struct list_lru *lru,
> +			       gfp_t gfp)
> +{
> +	struct mem_cgroup *memcg;
> +	int res;
> +
> +	if (!list_lru_memcg_aware(lru))
> +		return 0;
> +
> +	/* Fast path when list_lru heads already exist */
> +	rcu_read_lock();

OK nice I see folio_memcg() explicitly states an RCU lock suffices....

> +	memcg = folio_memcg(folio);
> +	res = memcg_list_lru_allocated(memcg, lru);

...And an xa_load() should also be RCU safe :)

> +	rcu_read_unlock();
> +	if (likely(res))
> +		return 0;

So that's nice!

> +
> +	/* Allocation may block, pin the memcg */
> +	memcg = get_mem_cgroup_from_folio(folio);
> +	res = __memcg_list_lru_alloc(memcg, lru, gfp);
> +	mem_cgroup_put(memcg);
> +	return res;
> +}
>  #else
>  static inline void memcg_init_list_lru(struct list_lru *lru, bool memcg_aware)
>  {
> --
> 2.53.0
>

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 7/7] mm: switch deferred split shrinker to list_lru
  2026-03-18 19:53 ` [PATCH v3 7/7] mm: switch deferred split shrinker to list_lru Johannes Weiner
  2026-03-18 20:26   ` David Hildenbrand (Arm)
  2026-03-18 23:18   ` Shakeel Butt
@ 2026-03-24 13:48   ` Lorenzo Stoakes (Oracle)
  2 siblings, 0 replies; 29+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-24 13:48 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, David Hildenbrand, Shakeel Butt, Yosry Ahmed,
	Zi Yan, Liam R. Howlett, Usama Arif, Kiryl Shutsemau,
	Dave Chinner, Roman Gushchin, linux-mm, linux-kernel

On Wed, Mar 18, 2026 at 03:53:25PM -0400, Johannes Weiner wrote:
> The deferred split queue handles cgroups in a suboptimal fashion. The
> queue is per-NUMA node or per-cgroup, not the intersection. That means
> on a cgrouped system, a node-restricted allocation entering reclaim
> can end up splitting large pages on other nodes:
>
> 	alloc/unmap
> 	  deferred_split_folio()
> 	    list_add_tail(memcg->split_queue)
> 	    set_shrinker_bit(memcg, node, deferred_shrinker_id)

So here it's:

__do_huge_pmd_anonymous_page() / do_huge_zero_wp_pmd()
-> map_anon_folio_pmd_pf()
-> map_anon_folio_pmd_nopf()
-> deferred_split_folio()
-> set_shrinker_bit()

Yeah it makes sense to make the first bit succinct anyway :)

>
> 	for_each_zone_zonelist_nodemask(restricted_nodes)
> 	  mem_cgroup_iter()
> 	    shrink_slab(node, memcg)
> 	      shrink_slab_memcg(node, memcg)
> 	        if test_shrinker_bit(memcg, node, deferred_shrinker_id)

Hmm there's no such function is this kind of pseudocode, essentially?

Wouldn't it be clearer to reference:

shrink_slab_memcg() -> do_shrink_slab() -> shrinker->scan_objects ->
deferred_split_scan()?

Though I get that it adds verbosity :)

Sorry not (just) being pedantic here, also so I can understand myself :)

> 	          deferred_split_scan()
> 	            walks memcg->split_queue

Ok so overall there's a per-memcg memcg->split_queue, but we set a bit in
memcg->nodeinfo[nid]->shrinker_info->unit[blah]->map for it, and when we
enter shrink_slab_memcg(), we figure out the shrink_control from the
for_each_set_bit() across memcg->...->unit->map?

> The shrinker bit adds an imperfect guard rail. As soon as the cgroup
> has a single large page on the node of interest, all large pages owned
> by that memcg, including those on other nodes, will be split.

So at this point, regardless of node, we are invoking deferred_split_scan()
and it's the same memcg->split_queue we are walking, regardless of node?

Do let me know if my understanding is correct here!

Hmm that does sound sub-optimal.

>
> list_lru properly sets up per-node, per-cgroup lists. As a bonus, it
> streamlines a lot of the list operations and reclaim walks. It's used
> widely by other major shrinkers already. Convert the deferred split
> queue as well.

It's odd that it wasn't used before?

>
> The list_lru per-memcg heads are instantiated on demand when the first
> object of interest is allocated for a cgroup, by calling
> folio_memcg_list_lru_alloc(). Add calls to where splittable pages are
> created: anon faults, swapin faults, khugepaged collapse.

OK that makes sense.

>
> These calls create all possible node heads for the cgroup at once, so
> the migration code (between nodes) doesn't need any special care.

Nice!

>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Hmm you saved the big one for last :)

> ---
>  include/linux/huge_mm.h    |   6 +-
>  include/linux/memcontrol.h |   4 -
>  include/linux/mmzone.h     |  12 --
>  mm/huge_memory.c           | 342 ++++++++++++-------------------------
>  mm/internal.h              |   2 +-
>  mm/khugepaged.c            |   7 +
>  mm/memcontrol.c            |  12 +-
>  mm/memory.c                |  52 +++---
>  mm/mm_init.c               |  15 --
>  9 files changed, 151 insertions(+), 301 deletions(-)
>
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index bd7f0e1d8094..8d801ed378db 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -414,10 +414,9 @@ static inline int split_huge_page(struct page *page)
>  {
>  	return split_huge_page_to_list_to_order(page, NULL, 0);
>  }
> +
> +extern struct list_lru deferred_split_lru;

It might be nice for the sake of avoiding a global to instead expose this
as a getter?

Or actually better, since every caller outside of huge_memory.c that
references this uses folio_memcg_list_lru_alloc(), do something like:

int folio_memcg_alloc_deferred(struct folio *folio, gfp_t gfp);

in mm/huge_memory.c:

/**
 * blah blah blah put on error blah
 */
int folio_memcg_alloc_deferred(struct folio *folio, gfp_t gfp)
{
	int err;

	err = folio_memcg_list_lru_alloc(folio, &deferred_split_lru, gfP);
	if (err) {
		folio_put(folio);
		return err;
	}

	return 0;
}

And then the callers can just invoke this, and you can make
deferred_split_lru static in mm/huge_memory.c?


>  void deferred_split_folio(struct folio *folio, bool partially_mapped);
> -#ifdef CONFIG_MEMCG
> -void reparent_deferred_split_queue(struct mem_cgroup *memcg);
> -#endif
>
>  void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
>  		unsigned long address, bool freeze);
> @@ -650,7 +649,6 @@ static inline int try_folio_split_to_order(struct folio *folio,
>  }
>
>  static inline void deferred_split_folio(struct folio *folio, bool partially_mapped) {}
> -static inline void reparent_deferred_split_queue(struct mem_cgroup *memcg) {}
>  #define split_huge_pmd(__vma, __pmd, __address)	\
>  	do { } while (0)
>
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 086158969529..0782c72a1997 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -277,10 +277,6 @@ struct mem_cgroup {
>  	struct memcg_cgwb_frn cgwb_frn[MEMCG_CGWB_FRN_CNT];
>  #endif
>
> -#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> -	struct deferred_split deferred_split_queue;
> -#endif
> -
>  #ifdef CONFIG_LRU_GEN_WALKS_MMU
>  	/* per-memcg mm_struct list */
>  	struct lru_gen_mm_list mm_list;
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 7bd0134c241c..232b7a71fd69 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -1429,14 +1429,6 @@ struct zonelist {
>   */
>  extern struct page *mem_map;
>
> -#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> -struct deferred_split {
> -	spinlock_t split_queue_lock;
> -	struct list_head split_queue;
> -	unsigned long split_queue_len;
> -};
> -#endif
> -
>  #ifdef CONFIG_MEMORY_FAILURE
>  /*
>   * Per NUMA node memory failure handling statistics.
> @@ -1562,10 +1554,6 @@ typedef struct pglist_data {
>  	unsigned long first_deferred_pfn;
>  #endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
>
> -#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> -	struct deferred_split deferred_split_queue;
> -#endif
> -
>  #ifdef CONFIG_NUMA_BALANCING
>  	/* start time in ms of current promote rate limit period */
>  	unsigned int nbp_rl_start;
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 3fc02913b63e..e90d08db219d 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -14,6 +14,7 @@
>  #include <linux/mmu_notifier.h>
>  #include <linux/rmap.h>
>  #include <linux/swap.h>
> +#include <linux/list_lru.h>
>  #include <linux/shrinker.h>
>  #include <linux/mm_inline.h>
>  #include <linux/swapops.h>
> @@ -67,6 +68,8 @@ unsigned long transparent_hugepage_flags __read_mostly =
>  	(1<<TRANSPARENT_HUGEPAGE_DEFRAG_KHUGEPAGED_FLAG)|
>  	(1<<TRANSPARENT_HUGEPAGE_USE_ZERO_PAGE_FLAG);
>
> +static struct lock_class_key deferred_split_key;
> +struct list_lru deferred_split_lru;
>  static struct shrinker *deferred_split_shrinker;
>  static unsigned long deferred_split_count(struct shrinker *shrink,
>  					  struct shrink_control *sc);
> @@ -919,6 +922,13 @@ static int __init thp_shrinker_init(void)
>  	if (!deferred_split_shrinker)
>  		return -ENOMEM;
>
> +	if (list_lru_init_memcg_key(&deferred_split_lru,
> +				    deferred_split_shrinker,
> +				    &deferred_split_key)) {
> +		shrinker_free(deferred_split_shrinker);
> +		return -ENOMEM;
> +	}
> +

It's kind of out of scope for the series, but I hate that the huge zero
folio stuff has an early exit for persistent but if not, falls through to
trying to set that up - it'd be nice to split out the huge zero folio stuff
into another function.

But probably one for a follow up!

>  	deferred_split_shrinker->count_objects = deferred_split_count;
>  	deferred_split_shrinker->scan_objects = deferred_split_scan;
>  	shrinker_register(deferred_split_shrinker);
> @@ -939,6 +949,7 @@ static int __init thp_shrinker_init(void)
>
>  	huge_zero_folio_shrinker = shrinker_alloc(0, "thp-zero");
>  	if (!huge_zero_folio_shrinker) {
> +		list_lru_destroy(&deferred_split_lru);
>  		shrinker_free(deferred_split_shrinker);

Presumably no probably-impossible-in-reality race on somebody entering the
shrinker and referencing the deferred_split_lru before the shrinker is freed?

>  		return -ENOMEM;
>  	}
> @@ -953,6 +964,7 @@ static int __init thp_shrinker_init(void)
>  static void __init thp_shrinker_exit(void)
>  {
>  	shrinker_free(huge_zero_folio_shrinker);
> +	list_lru_destroy(&deferred_split_lru);

Same question above as to race/ordering.

>  	shrinker_free(deferred_split_shrinker);
>  }



>
> @@ -1133,119 +1145,6 @@ pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma)
>  	return pmd;
>  }
>
> -static struct deferred_split *split_queue_node(int nid)
> -{
> -	struct pglist_data *pgdata = NODE_DATA(nid);
> -
> -	return &pgdata->deferred_split_queue;
> -}
> -
> -#ifdef CONFIG_MEMCG
> -static inline
> -struct mem_cgroup *folio_split_queue_memcg(struct folio *folio,
> -					   struct deferred_split *queue)
> -{
> -	if (mem_cgroup_disabled())
> -		return NULL;
> -	if (split_queue_node(folio_nid(folio)) == queue)
> -		return NULL;
> -	return container_of(queue, struct mem_cgroup, deferred_split_queue);
> -}
> -
> -static struct deferred_split *memcg_split_queue(int nid, struct mem_cgroup *memcg)
> -{
> -	return memcg ? &memcg->deferred_split_queue : split_queue_node(nid);
> -}
> -#else
> -static inline
> -struct mem_cgroup *folio_split_queue_memcg(struct folio *folio,
> -					   struct deferred_split *queue)
> -{
> -	return NULL;
> -}
> -
> -static struct deferred_split *memcg_split_queue(int nid, struct mem_cgroup *memcg)
> -{
> -	return split_queue_node(nid);
> -}
> -#endif
> -
> -static struct deferred_split *split_queue_lock(int nid, struct mem_cgroup *memcg)
> -{
> -	struct deferred_split *queue;
> -
> -retry:
> -	queue = memcg_split_queue(nid, memcg);
> -	spin_lock(&queue->split_queue_lock);
> -	/*
> -	 * There is a period between setting memcg to dying and reparenting
> -	 * deferred split queue, and during this period the THPs in the deferred
> -	 * split queue will be hidden from the shrinker side.
> -	 */
> -	if (unlikely(memcg_is_dying(memcg))) {
> -		spin_unlock(&queue->split_queue_lock);
> -		memcg = parent_mem_cgroup(memcg);
> -		goto retry;
> -	}
> -
> -	return queue;
> -}
> -
> -static struct deferred_split *
> -split_queue_lock_irqsave(int nid, struct mem_cgroup *memcg, unsigned long *flags)
> -{
> -	struct deferred_split *queue;
> -
> -retry:
> -	queue = memcg_split_queue(nid, memcg);
> -	spin_lock_irqsave(&queue->split_queue_lock, *flags);
> -	if (unlikely(memcg_is_dying(memcg))) {
> -		spin_unlock_irqrestore(&queue->split_queue_lock, *flags);
> -		memcg = parent_mem_cgroup(memcg);
> -		goto retry;
> -	}
> -
> -	return queue;
> -}
> -
> -static struct deferred_split *folio_split_queue_lock(struct folio *folio)
> -{
> -	struct deferred_split *queue;
> -
> -	rcu_read_lock();
> -	queue = split_queue_lock(folio_nid(folio), folio_memcg(folio));
> -	/*
> -	 * The memcg destruction path is acquiring the split queue lock for
> -	 * reparenting. Once you have it locked, it's safe to drop the rcu lock.
> -	 */
> -	rcu_read_unlock();
> -
> -	return queue;
> -}
> -
> -static struct deferred_split *
> -folio_split_queue_lock_irqsave(struct folio *folio, unsigned long *flags)
> -{
> -	struct deferred_split *queue;
> -
> -	rcu_read_lock();
> -	queue = split_queue_lock_irqsave(folio_nid(folio), folio_memcg(folio), flags);
> -	rcu_read_unlock();
> -
> -	return queue;
> -}
> -
> -static inline void split_queue_unlock(struct deferred_split *queue)
> -{
> -	spin_unlock(&queue->split_queue_lock);
> -}
> -
> -static inline void split_queue_unlock_irqrestore(struct deferred_split *queue,
> -						 unsigned long flags)
> -{
> -	spin_unlock_irqrestore(&queue->split_queue_lock, flags);
> -}
> -

Lots of red, love to see it :)

>  static inline bool is_transparent_hugepage(const struct folio *folio)
>  {
>  	if (!folio_test_large(folio))
> @@ -1346,6 +1245,14 @@ static struct folio *vma_alloc_anon_folio_pmd(struct vm_area_struct *vma,
>  		count_mthp_stat(order, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE);
>  		return NULL;
>  	}
> +
> +	if (folio_memcg_list_lru_alloc(folio, &deferred_split_lru, gfp)) {
> +		folio_put(folio);
> +		count_vm_event(THP_FAULT_FALLBACK);
> +		count_mthp_stat(order, MTHP_STAT_ANON_FAULT_FALLBACK);
> +		return NULL;
> +	}
> +

As mentioned above, would be good to separate out into helper in mm/huge_memory.c.

>  	folio_throttle_swaprate(folio, gfp);
>
>         /*
> @@ -3854,34 +3761,34 @@ static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int n
>  	struct folio *end_folio = folio_next(folio);
>  	struct folio *new_folio, *next;
>  	int old_order = folio_order(folio);
> +	struct list_lru_one *l;

Nit, and maybe this is a convention, but hate single letter variable names,
'lru' or something might be nicer?

> +	bool dequeue_deferred;
>  	int ret = 0;
> -	struct deferred_split *ds_queue;
>
>  	VM_WARN_ON_ONCE(!mapping && end);
>  	/* Prevent deferred_split_scan() touching ->_refcount */
> -	ds_queue = folio_split_queue_lock(folio);
> +	dequeue_deferred = folio_test_anon(folio) && old_order > 1;

Why anon? (This review is partly me learning about the shrinker, an area
I'm weak on :)

> +	if (dequeue_deferred) {
> +		rcu_read_lock();
> +		l = list_lru_lock(&deferred_split_lru,
> +				  folio_nid(folio), folio_memcg(folio));

Hm don't adore this sort of almost 'hidden' RCU lock here, but this
function is pretty disgusting and needs serious work in general.

And any function that took the RCU lock and list_lru lock/did the unlock
equivalent would be equally horrible so yeah, I guess needs deferring to a
refactor.

OTOH, this could be a good excuse for us to pay down some technical debt
and split out for instance the folio_ref_freeze() bits?

Could we do something like:

	bool frozen;

	...

	dequeue_deferred = folio_test_anon(folio) && old_order > 1;
	frozen = folio_ref_freeze(folio, folio_cache_ref_count(folio) + 1);

	if (dequeue_deferred && frozen) {
		struct list_lru_one *lru;

		rcu_read_lock();
		lru = list_lru_lock(&deferred_split_lru,
				    folio_nid(folio), folio_memcg(folio));
		__list_lru_del(&deferred_split_lru, lru,
			       &folio->_deferred_list, folio_nid(folio));
		if (folio_test_partially_mapped(folio)) {
			folio_clear_partially_mapped(folio);
			mod_mthp_stat(old_order,
				MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1);
		}
		list_lru_unlock(lru);
		rcu_read_unlock();
	}

	if (!frozen)
		return -EAGAIN;

	... rest of logic now one indent level lower ...

Or maybe factor that out into a helper function or something?

static void execute_deferred_dequeue(...) { ... }

With this implemented either way you'd be able to get rid of the else block
too.

obviously only valid if you are able to do the freezing earlier?


> +	}
>  	if (folio_ref_freeze(folio, folio_cache_ref_count(folio) + 1)) {
>  		struct swap_cluster_info *ci = NULL;
>  		struct lruvec *lruvec;
>
> -		if (old_order > 1) {

Before was this also applicable to non-anon folios?

> -			if (!list_empty(&folio->_deferred_list)) {
> -				ds_queue->split_queue_len--;
> -				/*
> -				 * Reinitialize page_deferred_list after removing the
> -				 * page from the split_queue, otherwise a subsequent
> -				 * split will see list corruption when checking the
> -				 * page_deferred_list.
> -				 */
> -				list_del_init(&folio->_deferred_list);
> -			}
> +		if (dequeue_deferred) {
> +			__list_lru_del(&deferred_split_lru, l,
> +				       &folio->_deferred_list, folio_nid(folio));
>  			if (folio_test_partially_mapped(folio)) {
>  				folio_clear_partially_mapped(folio);
>  				mod_mthp_stat(old_order,
>  					MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1);
>  			}
> +			list_lru_unlock(l);
> +			rcu_read_unlock();
>  		}
> -		split_queue_unlock(ds_queue);
> +
>  		if (mapping) {
>  			int nr = folio_nr_pages(folio);
>
> @@ -3982,7 +3889,10 @@ static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int n
>  		if (ci)
>  			swap_cluster_unlock(ci);
>  	} else {
> -		split_queue_unlock(ds_queue);
> +		if (dequeue_deferred) {
> +			list_lru_unlock(l);
> +			rcu_read_unlock();
> +		}
>  		return -EAGAIN;
>  	}
>
> @@ -4349,33 +4259,35 @@ int split_folio_to_list(struct folio *folio, struct list_head *list)
>   * queueing THP splits, and that list is (racily observed to be) non-empty.
>   *
>   * It is unsafe to call folio_unqueue_deferred_split() until folio refcount is
> - * zero: because even when split_queue_lock is held, a non-empty _deferred_list
> - * might be in use on deferred_split_scan()'s unlocked on-stack list.
> + * zero: because even when the list_lru lock is held, a non-empty
> + * _deferred_list might be in use on deferred_split_scan()'s unlocked
> + * on-stack list.
>   *
> - * If memory cgroups are enabled, split_queue_lock is in the mem_cgroup: it is
> - * therefore important to unqueue deferred split before changing folio memcg.
> + * The list_lru sublist is determined by folio's memcg: it is therefore
> + * important to unqueue deferred split before changing folio memcg.
>   */
>  bool __folio_unqueue_deferred_split(struct folio *folio)
>  {
> -	struct deferred_split *ds_queue;
> +	struct list_lru_one *l;

Again nitty thing about single letter var name, OTOH if that's somehow the
convention maybe fine. 'l' is especially problematic as it looks like a 1
in many fonts :)

> +	int nid = folio_nid(folio);
>  	unsigned long flags;
>  	bool unqueued = false;
>
>  	WARN_ON_ONCE(folio_ref_count(folio));
>  	WARN_ON_ONCE(!mem_cgroup_disabled() && !folio_memcg_charged(folio));
>
> -	ds_queue = folio_split_queue_lock_irqsave(folio, &flags);
> -	if (!list_empty(&folio->_deferred_list)) {
> -		ds_queue->split_queue_len--;
> +	rcu_read_lock();
> +	l = list_lru_lock_irqsave(&deferred_split_lru, nid, folio_memcg(folio), &flags);
> +	if (__list_lru_del(&deferred_split_lru, l, &folio->_deferred_list, nid)) {

Maybe worth factoring __list_lru_del() into something that explicitly
references &folio->_deferred_list rather than open codingin both places?

>  		if (folio_test_partially_mapped(folio)) {
>  			folio_clear_partially_mapped(folio);
>  			mod_mthp_stat(folio_order(folio),
>  				      MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1);
>  		}
> -		list_del_init(&folio->_deferred_list);
>  		unqueued = true;
>  	}
> -	split_queue_unlock_irqrestore(ds_queue, flags);
> +	list_lru_unlock_irqrestore(l, &flags);
> +	rcu_read_unlock();
>
>  	return unqueued;	/* useful for debug warnings */
>  }
> @@ -4383,7 +4295,9 @@ bool __folio_unqueue_deferred_split(struct folio *folio)
>  /* partially_mapped=false won't clear PG_partially_mapped folio flag */
>  void deferred_split_folio(struct folio *folio, bool partially_mapped)
>  {
> -	struct deferred_split *ds_queue;
> +	struct list_lru_one *l;
> +	int nid;
> +	struct mem_cgroup *memcg;
>  	unsigned long flags;
>
>  	/*
> @@ -4406,7 +4320,11 @@ void deferred_split_folio(struct folio *folio, bool partially_mapped)
>  	if (folio_test_swapcache(folio))
>  		return;
>
> -	ds_queue = folio_split_queue_lock_irqsave(folio, &flags);
> +	nid = folio_nid(folio);
> +
> +	rcu_read_lock();
> +	memcg = folio_memcg(folio);
> +	l = list_lru_lock_irqsave(&deferred_split_lru, nid, memcg, &flags);

Do we really need to hold the lock over all the below, but hmm we can't do
an irqsave/restore list_lru_add(), and maybe not worth adding one either,
OK.

Just seems odd to have <lock> <__unlocked_add_variant> <unlock>,
instinctively feels like it should be just <locked_add_variant>.

>  	if (partially_mapped) {
>  		if (!folio_test_partially_mapped(folio)) {
>  			folio_set_partially_mapped(folio);
> @@ -4414,36 +4332,20 @@ void deferred_split_folio(struct folio *folio, bool partially_mapped)
>  				count_vm_event(THP_DEFERRED_SPLIT_PAGE);
>  			count_mthp_stat(folio_order(folio), MTHP_STAT_SPLIT_DEFERRED);
>  			mod_mthp_stat(folio_order(folio), MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, 1);
> -

Always nice to remove these annoying random newlines :)

>  		}
>  	} else {
>  		/* partially mapped folios cannot become non-partially mapped */
>  		VM_WARN_ON_FOLIO(folio_test_partially_mapped(folio), folio);
>  	}
> -	if (list_empty(&folio->_deferred_list)) {
> -		struct mem_cgroup *memcg;
> -
> -		memcg = folio_split_queue_memcg(folio, ds_queue);
> -		list_add_tail(&folio->_deferred_list, &ds_queue->split_queue);
> -		ds_queue->split_queue_len++;
> -		if (memcg)
> -			set_shrinker_bit(memcg, folio_nid(folio),
> -					 shrinker_id(deferred_split_shrinker));
> -	}
> -	split_queue_unlock_irqrestore(ds_queue, flags);
> +	__list_lru_add(&deferred_split_lru, l, &folio->_deferred_list, nid, memcg);
> +	list_lru_unlock_irqrestore(l, &flags);
> +	rcu_read_unlock();
>  }
>
>  static unsigned long deferred_split_count(struct shrinker *shrink,
>  		struct shrink_control *sc)
>  {
> -	struct pglist_data *pgdata = NODE_DATA(sc->nid);
> -	struct deferred_split *ds_queue = &pgdata->deferred_split_queue;
> -
> -#ifdef CONFIG_MEMCG
> -	if (sc->memcg)
> -		ds_queue = &sc->memcg->deferred_split_queue;
> -#endif
> -	return READ_ONCE(ds_queue->split_queue_len);
> +	return list_lru_shrink_count(&deferred_split_lru, sc);

That's nice :)

>  }
>
>  static bool thp_underused(struct folio *folio)
> @@ -4473,45 +4375,47 @@ static bool thp_underused(struct folio *folio)
>  	return false;
>  }
>
> +static enum lru_status deferred_split_isolate(struct list_head *item,
> +					      struct list_lru_one *lru,
> +					      void *cb_arg)
> +{
> +	struct folio *folio = container_of(item, struct folio, _deferred_list);
> +	struct list_head *freeable = cb_arg;
> +
> +	if (folio_try_get(folio)) {
> +		list_lru_isolate_move(lru, item, freeable);
> +		return LRU_REMOVED;
> +	}
> +
> +	/* We lost race with folio_put() */

Hmm, in the original code, this comment is associated with the partially
mapped logic, BUT this seems actually correct, because folio_try_get()
because it does folio_ref_add_unless_zero() only fails if the folio lost
the race.

So I think you're more correct right?

> +	list_lru_isolate(lru, item);
> +	if (folio_test_partially_mapped(folio)) {
> +		folio_clear_partially_mapped(folio);
> +		mod_mthp_stat(folio_order(folio),
> +			      MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1);
> +	}
> +	return LRU_REMOVED;
> +}
> +
>  static unsigned long deferred_split_scan(struct shrinker *shrink,
>  		struct shrink_control *sc)
>  {
> -	struct deferred_split *ds_queue;
> -	unsigned long flags;
> +	LIST_HEAD(dispose);
>  	struct folio *folio, *next;
> -	int split = 0, i;
> -	struct folio_batch fbatch;
> +	int split = 0;
> +	unsigned long isolated;
>
> -	folio_batch_init(&fbatch);
> +	isolated = list_lru_shrink_walk_irq(&deferred_split_lru, sc,
> +					    deferred_split_isolate, &dispose);
>
> -retry:
> -	ds_queue = split_queue_lock_irqsave(sc->nid, sc->memcg, &flags);
> -	/* Take pin on all head pages to avoid freeing them under us */
> -	list_for_each_entry_safe(folio, next, &ds_queue->split_queue,
> -							_deferred_list) {
> -		if (folio_try_get(folio)) {
> -			folio_batch_add(&fbatch, folio);
> -		} else if (folio_test_partially_mapped(folio)) {
> -			/* We lost race with folio_put() */
> -			folio_clear_partially_mapped(folio);
> -			mod_mthp_stat(folio_order(folio),
> -				      MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1);
> -		}
> -		list_del_init(&folio->_deferred_list);
> -		ds_queue->split_queue_len--;
> -		if (!--sc->nr_to_scan)
> -			break;
> -		if (!folio_batch_space(&fbatch))
> -			break;
> -	}
> -	split_queue_unlock_irqrestore(ds_queue, flags);
> -
> -	for (i = 0; i < folio_batch_count(&fbatch); i++) {
> +	list_for_each_entry_safe(folio, next, &dispose, _deferred_list) {
>  		bool did_split = false;
>  		bool underused = false;
> -		struct deferred_split *fqueue;
> +		struct list_lru_one *l;
> +		unsigned long flags;
> +
> +		list_del_init(&folio->_deferred_list);
>
> -		folio = fbatch.folios[i];
>  		if (!folio_test_partially_mapped(folio)) {
>  			/*
>  			 * See try_to_map_unused_to_zeropage(): we cannot
> @@ -4534,64 +4438,32 @@ static unsigned long deferred_split_scan(struct shrinker *shrink,
>  		}
>  		folio_unlock(folio);
>  next:
> -		if (did_split || !folio_test_partially_mapped(folio))
> -			continue;
>  		/*
>  		 * Only add back to the queue if folio is partially mapped.
>  		 * If thp_underused returns false, or if split_folio fails
>  		 * in the case it was underused, then consider it used and
>  		 * don't add it back to split_queue.
>  		 */
> -		fqueue = folio_split_queue_lock_irqsave(folio, &flags);
> -		if (list_empty(&folio->_deferred_list)) {
> -			list_add_tail(&folio->_deferred_list, &fqueue->split_queue);
> -			fqueue->split_queue_len++;
> +		if (!did_split && folio_test_partially_mapped(folio)) {
> +			rcu_read_lock();
> +			l = list_lru_lock_irqsave(&deferred_split_lru,
> +						  folio_nid(folio),
> +						  folio_memcg(folio),
> +						  &flags);
> +			__list_lru_add(&deferred_split_lru, l,
> +				       &folio->_deferred_list,
> +				       folio_nid(folio), folio_memcg(folio));
> +			list_lru_unlock_irqrestore(l, &flags);

Hmm this does make me think it'd be nice to have a list_lru_add() variant
for irqsave/restore then, since it's a repeating pattern!

> +			rcu_read_unlock();
>  		}
> -		split_queue_unlock_irqrestore(fqueue, flags);
> -	}
> -	folios_put(&fbatch);
> -
> -	if (sc->nr_to_scan && !list_empty(&ds_queue->split_queue)) {
> -		cond_resched();
> -		goto retry;
> +		folio_put(folio);
>  	}
>
> -	/*
> -	 * Stop shrinker if we didn't split any page, but the queue is empty.
> -	 * This can happen if pages were freed under us.
> -	 */
> -	if (!split && list_empty(&ds_queue->split_queue))
> +	if (!split && !isolated)
>  		return SHRINK_STOP;
>  	return split;
>  }
>
> -#ifdef CONFIG_MEMCG
> -void reparent_deferred_split_queue(struct mem_cgroup *memcg)
> -{
> -	struct mem_cgroup *parent = parent_mem_cgroup(memcg);
> -	struct deferred_split *ds_queue = &memcg->deferred_split_queue;
> -	struct deferred_split *parent_ds_queue = &parent->deferred_split_queue;
> -	int nid;
> -
> -	spin_lock_irq(&ds_queue->split_queue_lock);
> -	spin_lock_nested(&parent_ds_queue->split_queue_lock, SINGLE_DEPTH_NESTING);
> -
> -	if (!ds_queue->split_queue_len)
> -		goto unlock;
> -
> -	list_splice_tail_init(&ds_queue->split_queue, &parent_ds_queue->split_queue);
> -	parent_ds_queue->split_queue_len += ds_queue->split_queue_len;
> -	ds_queue->split_queue_len = 0;
> -
> -	for_each_node(nid)
> -		set_shrinker_bit(parent, nid, shrinker_id(deferred_split_shrinker));
> -
> -unlock:
> -	spin_unlock(&parent_ds_queue->split_queue_lock);
> -	spin_unlock_irq(&ds_queue->split_queue_lock);
> -}
> -#endif
> -
>  #ifdef CONFIG_DEBUG_FS
>  static void split_huge_pages_all(void)
>  {
> diff --git a/mm/internal.h b/mm/internal.h
> index f98f4746ac41..d8c737338df5 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -863,7 +863,7 @@ static inline bool folio_unqueue_deferred_split(struct folio *folio)
>  	/*
>  	 * At this point, there is no one trying to add the folio to
>  	 * deferred_list. If folio is not in deferred_list, it's safe
> -	 * to check without acquiring the split_queue_lock.
> +	 * to check without acquiring the list_lru lock.
>  	 */
>  	if (data_race(list_empty(&folio->_deferred_list)))
>  		return false;
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 4b0e59c7c0e6..b2ac28ddd480 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -1081,6 +1081,7 @@ static enum scan_result alloc_charge_folio(struct folio **foliop, struct mm_stru
>  	}
>
>  	count_vm_event(THP_COLLAPSE_ALLOC);
> +
>  	if (unlikely(mem_cgroup_charge(folio, mm, gfp))) {
>  		folio_put(folio);
>  		*foliop = NULL;
> @@ -1089,6 +1090,12 @@ static enum scan_result alloc_charge_folio(struct folio **foliop, struct mm_stru
>
>  	count_memcg_folio_events(folio, THP_COLLAPSE_ALLOC, 1);

Do we want to put this stat counter underneath the below so it's not
incremented on fail?

>
> +	if (folio_memcg_list_lru_alloc(folio, &deferred_split_lru, gfp)) {
> +		folio_put(folio);
> +		*foliop = NULL;
> +		return SCAN_CGROUP_CHARGE_FAIL;

Do we not need to uncharge here?

> +	}
> +
>  	*foliop = folio;
>  	return SCAN_SUCCEED;
>  }
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index a47fb68dd65f..f381cb6bdff1 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -4015,11 +4015,6 @@ static struct mem_cgroup *mem_cgroup_alloc(struct mem_cgroup *parent)
>  	for (i = 0; i < MEMCG_CGWB_FRN_CNT; i++)
>  		memcg->cgwb_frn[i].done =
>  			__WB_COMPLETION_INIT(&memcg_cgwb_frn_waitq);
> -#endif
> -#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> -	spin_lock_init(&memcg->deferred_split_queue.split_queue_lock);
> -	INIT_LIST_HEAD(&memcg->deferred_split_queue.split_queue);
> -	memcg->deferred_split_queue.split_queue_len = 0;
>  #endif
>  	lru_gen_init_memcg(memcg);
>  	return memcg;
> @@ -4167,11 +4162,10 @@ static void mem_cgroup_css_offline(struct cgroup_subsys_state *css)
>  	zswap_memcg_offline_cleanup(memcg);
>
>  	memcg_offline_kmem(memcg);
> -	reparent_deferred_split_queue(memcg);
>  	/*
> -	 * The reparenting of objcg must be after the reparenting of the
> -	 * list_lru and deferred_split_queue above, which ensures that they will
> -	 * not mistakenly get the parent list_lru and deferred_split_queue.
> +	 * The reparenting of objcg must be after the reparenting of
> +	 * the list_lru in memcg_offline_kmem(), which ensures that
> +	 * they will not mistakenly get the parent list_lru.
>  	 */
>  	memcg_reparent_objcgs(memcg);
>  	reparent_shrinker_deferred(memcg);
> diff --git a/mm/memory.c b/mm/memory.c
> index 219b9bf6cae0..e68ceb4aa624 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4651,13 +4651,19 @@ static struct folio *alloc_swap_folio(struct vm_fault *vmf)
>  	while (orders) {
>  		addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << order);
>  		folio = vma_alloc_folio(gfp, order, vma, addr);
> -		if (folio) {
> -			if (!mem_cgroup_swapin_charge_folio(folio, vma->vm_mm,
> -							    gfp, entry))
> -				return folio;
> +		if (!folio)
> +			goto next;
> +		if (mem_cgroup_swapin_charge_folio(folio, vma->vm_mm, gfp, entry)) {
>  			count_mthp_stat(order, MTHP_STAT_SWPIN_FALLBACK_CHARGE);
>  			folio_put(folio);
> +			goto next;
>  		}
> +		if (folio_memcg_list_lru_alloc(folio, &deferred_split_lru, gfp)) {

Do we need to uncharge here?

> +			folio_put(folio);
> +			goto fallback;
> +		}
> +		return folio;
> +next:
>  		count_mthp_stat(order, MTHP_STAT_SWPIN_FALLBACK);
>  		order = next_order(&orders, order);
>  	}
> @@ -5169,24 +5175,28 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf)
>  	while (orders) {
>  		addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << order);
>  		folio = vma_alloc_folio(gfp, order, vma, addr);
> -		if (folio) {
> -			if (mem_cgroup_charge(folio, vma->vm_mm, gfp)) {
> -				count_mthp_stat(order, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE);
> -				folio_put(folio);
> -				goto next;
> -			}
> -			folio_throttle_swaprate(folio, gfp);
> -			/*
> -			 * When a folio is not zeroed during allocation
> -			 * (__GFP_ZERO not used) or user folios require special
> -			 * handling, folio_zero_user() is used to make sure
> -			 * that the page corresponding to the faulting address
> -			 * will be hot in the cache after zeroing.
> -			 */
> -			if (user_alloc_needs_zeroing())
> -				folio_zero_user(folio, vmf->address);
> -			return folio;
> +		if (!folio)
> +			goto next;

This applies to the above equivalent refactorings, but maybe worth
separating out this:

if (folio) { ... big branch ... } ->
if (!folio) goto next; ... what was big branch ...

Refactoring into a separate patch? Makes it easier to see pertinent logic
changes + helps review/bisectability/fixes/etc. etc.


> +		if (mem_cgroup_charge(folio, vma->vm_mm, gfp)) {
> +			count_mthp_stat(order, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE);
> +			folio_put(folio);
> +			goto next;
>  		}
> +		if (folio_memcg_list_lru_alloc(folio, &deferred_split_lru, gfp)) {

Again, do we need to uncharge here?

> +			folio_put(folio);
> +			goto fallback;
> +		}
> +		folio_throttle_swaprate(folio, gfp);
> +		/*
> +		 * When a folio is not zeroed during allocation
> +		 * (__GFP_ZERO not used) or user folios require special
> +		 * handling, folio_zero_user() is used to make sure
> +		 * that the page corresponding to the faulting address
> +		 * will be hot in the cache after zeroing.
> +		 */
> +		if (user_alloc_needs_zeroing())
> +			folio_zero_user(folio, vmf->address);
> +		return folio;
>  next:
>  		count_mthp_stat(order, MTHP_STAT_ANON_FAULT_FALLBACK);
>  		order = next_order(&orders, order);
> diff --git a/mm/mm_init.c b/mm/mm_init.c
> index cec7bb758bdd..f293a62e652a 100644
> --- a/mm/mm_init.c
> +++ b/mm/mm_init.c
> @@ -1388,19 +1388,6 @@ static void __init calculate_node_totalpages(struct pglist_data *pgdat,
>  	pr_debug("On node %d totalpages: %lu\n", pgdat->node_id, realtotalpages);
>  }
>
> -#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> -static void pgdat_init_split_queue(struct pglist_data *pgdat)
> -{
> -	struct deferred_split *ds_queue = &pgdat->deferred_split_queue;
> -
> -	spin_lock_init(&ds_queue->split_queue_lock);
> -	INIT_LIST_HEAD(&ds_queue->split_queue);
> -	ds_queue->split_queue_len = 0;
> -}
> -#else
> -static void pgdat_init_split_queue(struct pglist_data *pgdat) {}
> -#endif
> -
>  #ifdef CONFIG_COMPACTION
>  static void pgdat_init_kcompactd(struct pglist_data *pgdat)
>  {
> @@ -1416,8 +1403,6 @@ static void __meminit pgdat_init_internals(struct pglist_data *pgdat)
>
>  	pgdat_resize_init(pgdat);
>  	pgdat_kswapd_lock_init(pgdat);
> -
> -	pgdat_init_split_queue(pgdat);
>  	pgdat_init_kcompactd(pgdat);
>
>  	init_waitqueue_head(&pgdat->kswapd_wait);
> --
> 2.53.0
>

Overall a lovely amount of code deletion here :)

Thanks for doing this, Cheers Lorenzo


^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2026-03-24 13:48 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-18 19:53 [PATCH v3 0/7] mm: switch THP shrinker to list_lru Johannes Weiner
2026-03-18 19:53 ` [PATCH v3 1/7] mm: list_lru: lock_list_lru_of_memcg() cannot return NULL if !skip_empty Johannes Weiner
2026-03-18 20:12   ` Shakeel Butt
2026-03-24 11:30   ` Lorenzo Stoakes (Oracle)
2026-03-18 19:53 ` [PATCH v3 2/7] mm: list_lru: deduplicate unlock_list_lru() Johannes Weiner
2026-03-24 11:32   ` Lorenzo Stoakes (Oracle)
2026-03-18 19:53 ` [PATCH v3 3/7] mm: list_lru: move list dead check to lock_list_lru_of_memcg() Johannes Weiner
2026-03-18 20:20   ` Shakeel Butt
2026-03-24 11:34   ` Lorenzo Stoakes (Oracle)
2026-03-18 19:53 ` [PATCH v3 4/7] mm: list_lru: deduplicate lock_list_lru() Johannes Weiner
2026-03-18 20:22   ` Shakeel Butt
2026-03-24 11:36   ` Lorenzo Stoakes (Oracle)
2026-03-18 19:53 ` [PATCH v3 5/7] mm: list_lru: introduce caller locking for additions and deletions Johannes Weiner
2026-03-18 20:51   ` Shakeel Butt
2026-03-20 16:18     ` Johannes Weiner
2026-03-24 11:55   ` Lorenzo Stoakes (Oracle)
2026-03-18 19:53 ` [PATCH v3 6/7] mm: list_lru: introduce folio_memcg_list_lru_alloc() Johannes Weiner
2026-03-18 20:52   ` Shakeel Butt
2026-03-18 21:01   ` Shakeel Butt
2026-03-24 12:01   ` Lorenzo Stoakes (Oracle)
2026-03-18 19:53 ` [PATCH v3 7/7] mm: switch deferred split shrinker to list_lru Johannes Weiner
2026-03-18 20:26   ` David Hildenbrand (Arm)
2026-03-18 23:18   ` Shakeel Butt
2026-03-24 13:48   ` Lorenzo Stoakes (Oracle)
2026-03-18 21:00 ` [PATCH v3 0/7] mm: switch THP " Lorenzo Stoakes (Oracle)
2026-03-18 22:31   ` Johannes Weiner
2026-03-19  8:47     ` Lorenzo Stoakes (Oracle)
2026-03-19  8:52       ` David Hildenbrand (Arm)
2026-03-19 11:45         ` Lorenzo Stoakes (Oracle)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox