[PATCH v2 0/7] mm: switch THP shrinker to list

public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed

* [PATCH v2 0/7] mm: switch THP shrinker to list_lru
@ 2026-03-12 20:51 Johannes Weiner
  2026-03-12 20:51 ` [PATCH v2 1/7] mm: list_lru: lock_list_lru_of_memcg() cannot return NULL if !skip_empty Johannes Weiner
                   ` (7 more replies)
  0 siblings, 8 replies; 30+ messages in thread
From: Johannes Weiner @ 2026-03-12 20:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Shakeel Butt, Yosry Ahmed, Zi Yan,
	Liam R. Howlett, Usama Arif, Kiryl Shutsemau, Dave Chinner,
	Roman Gushchin, linux-mm, linux-kernel

This is version 2 of switching the THP shrinker to list_lru. I fixed
the lockdep splat Usama had called out - with explicit rcu_read_lock()
annotation based on list feedback (thanks!). I also split out the
list_lru prep bits, per Dave's request, to make review a bit easier.

I was initially looking at mm/shrinker.c and the scalability issues
resulting from shrinker_alloc -> for_each_nid -> for_each_memcg. In
the process of working on that, though, the open-coded THP shrinker
queue started getting in the way. Switching it to list_lru seemed like
a good bugfix and cleanup (9 files changed, 140 insertions(+), 300
deletions(-)) in its own right, so here goes.

Patches 1-4 are cleanups and small refactors in list_lru code. They're
basically independent, but make the THP shrinker conversion easier.

Patch 5 extends the list_lru API to allow the caller to control the
locking scope. The THP shrinker has private state it needs to keep
synchronized with the LRU state.

Patch 6 extends the list_lru API with a convenience helper to do
list_lru head allocation (memcg_list_lru_alloc) when coming from a
folio. Anon THPs are instantiated in several places, and with the
folio reparenting patches pending, folio_memcg() access is now a more
delicate dance. This avoids having to replicate that dance everywhere.

Patch 7 finally switches the deferred_split_queue to list_lru.

Based on mm-unstable.

 include/linux/huge_mm.h    |   6 +-
 include/linux/list_lru.h   |  46 ++++++
 include/linux/memcontrol.h |   4 -
 include/linux/mmzone.h     |  12 --
 mm/huge_memory.c           | 330 +++++++++++++------------------------------
 mm/internal.h              |   2 +-
 mm/khugepaged.c            |   7 +
 mm/list_lru.c              | 193 ++++++++++++++++---------
 mm/memcontrol.c            |  12 +-
 mm/memory.c                |  52 ++++---
 mm/mm_init.c               |  15 --
 11 files changed, 309 insertions(+), 370 deletions(-)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v2 1/7] mm: list_lru: lock_list_lru_of_memcg() cannot return NULL if !skip_empty
  2026-03-12 20:51 [PATCH v2 0/7] mm: switch THP shrinker to list_lru Johannes Weiner
@ 2026-03-12 20:51 ` Johannes Weiner
  2026-03-17  9:43   ` David Hildenbrand (Arm)
  2026-03-18 17:56   ` Shakeel Butt
  2026-03-12 20:51 ` [PATCH v2 2/7] mm: list_lru: deduplicate unlock_list_lru() Johannes Weiner
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 30+ messages in thread
From: Johannes Weiner @ 2026-03-12 20:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Shakeel Butt, Yosry Ahmed, Zi Yan,
	Liam R. Howlett, Usama Arif, Kiryl Shutsemau, Dave Chinner,
	Roman Gushchin, linux-mm, linux-kernel

skip_empty is only for the shrinker to abort and skip a list that's
empty or whose cgroup is being deleted.

For list additions and deletions, the cgroup hierarchy is walked
upwards until a valid list_lru head is found, or it will fall back to
the node list. Acquiring the lock won't fail. Remove the NULL checks
in those callers.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/list_lru.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/mm/list_lru.c b/mm/list_lru.c
index 26463ae29c64..d96fd50fc9af 100644
--- a/mm/list_lru.c
+++ b/mm/list_lru.c
@@ -165,8 +165,6 @@ bool list_lru_add(struct list_lru *lru, struct list_head *item, int nid,
 	struct list_lru_one *l;
 
 	l = lock_list_lru_of_memcg(lru, nid, memcg, false, false);
-	if (!l)
-		return false;
 	if (list_empty(item)) {
 		list_add_tail(item, &l->list);
 		/* Set shrinker bit if the first element was added */
@@ -203,9 +201,8 @@ bool list_lru_del(struct list_lru *lru, struct list_head *item, int nid,
 {
 	struct list_lru_node *nlru = &lru->node[nid];
 	struct list_lru_one *l;
+
 	l = lock_list_lru_of_memcg(lru, nid, memcg, false, false);
-	if (!l)
-		return false;
 	if (!list_empty(item)) {
 		list_del_init(item);
 		l->nr_items--;
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 2/7] mm: list_lru: deduplicate unlock_list_lru()
  2026-03-12 20:51 [PATCH v2 0/7] mm: switch THP shrinker to list_lru Johannes Weiner
  2026-03-12 20:51 ` [PATCH v2 1/7] mm: list_lru: lock_list_lru_of_memcg() cannot return NULL if !skip_empty Johannes Weiner
@ 2026-03-12 20:51 ` Johannes Weiner
  2026-03-17  9:44   ` David Hildenbrand (Arm)
  2026-03-18 17:57   ` Shakeel Butt
  2026-03-12 20:51 ` [PATCH v2 3/7] mm: list_lru: move list dead check to lock_list_lru_of_memcg() Johannes Weiner
                   ` (5 subsequent siblings)
  7 siblings, 2 replies; 30+ messages in thread
From: Johannes Weiner @ 2026-03-12 20:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Shakeel Butt, Yosry Ahmed, Zi Yan,
	Liam R. Howlett, Usama Arif, Kiryl Shutsemau, Dave Chinner,
	Roman Gushchin, linux-mm, linux-kernel

The MEMCG and !MEMCG variants are the same. lock_list_lru() has the
same pattern when bailing. Consolidate into a common implementation.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/list_lru.c | 29 +++++++++--------------------
 1 file changed, 9 insertions(+), 20 deletions(-)

diff --git a/mm/list_lru.c b/mm/list_lru.c
index d96fd50fc9af..e873bc26a7ef 100644
--- a/mm/list_lru.c
+++ b/mm/list_lru.c
@@ -15,6 +15,14 @@
 #include "slab.h"
 #include "internal.h"
 
+static inline void unlock_list_lru(struct list_lru_one *l, bool irq_off)
+{
+	if (irq_off)
+		spin_unlock_irq(&l->lock);
+	else
+		spin_unlock(&l->lock);
+}
+
 #ifdef CONFIG_MEMCG
 static LIST_HEAD(memcg_list_lrus);
 static DEFINE_MUTEX(list_lrus_mutex);
@@ -67,10 +75,7 @@ static inline bool lock_list_lru(struct list_lru_one *l, bool irq)
 	else
 		spin_lock(&l->lock);
 	if (unlikely(READ_ONCE(l->nr_items) == LONG_MIN)) {
-		if (irq)
-			spin_unlock_irq(&l->lock);
-		else
-			spin_unlock(&l->lock);
+		unlock_list_lru(l, irq);
 		return false;
 	}
 	return true;
@@ -101,14 +106,6 @@ lock_list_lru_of_memcg(struct list_lru *lru, int nid, struct mem_cgroup *memcg,
 	memcg = parent_mem_cgroup(memcg);
 	goto again;
 }
-
-static inline void unlock_list_lru(struct list_lru_one *l, bool irq_off)
-{
-	if (irq_off)
-		spin_unlock_irq(&l->lock);
-	else
-		spin_unlock(&l->lock);
-}
 #else
 static void list_lru_register(struct list_lru *lru)
 {
@@ -147,14 +144,6 @@ lock_list_lru_of_memcg(struct list_lru *lru, int nid, struct mem_cgroup *memcg,
 
 	return l;
 }
-
-static inline void unlock_list_lru(struct list_lru_one *l, bool irq_off)
-{
-	if (irq_off)
-		spin_unlock_irq(&l->lock);
-	else
-		spin_unlock(&l->lock);
-}
 #endif /* CONFIG_MEMCG */
 
 /* The caller must ensure the memcg lifetime. */
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 3/7] mm: list_lru: move list dead check to lock_list_lru_of_memcg()
  2026-03-12 20:51 [PATCH v2 0/7] mm: switch THP shrinker to list_lru Johannes Weiner
  2026-03-12 20:51 ` [PATCH v2 1/7] mm: list_lru: lock_list_lru_of_memcg() cannot return NULL if !skip_empty Johannes Weiner
  2026-03-12 20:51 ` [PATCH v2 2/7] mm: list_lru: deduplicate unlock_list_lru() Johannes Weiner
@ 2026-03-12 20:51 ` Johannes Weiner
  2026-03-17  9:47   ` David Hildenbrand (Arm)
  2026-03-12 20:51 ` [PATCH v2 4/7] mm: list_lru: deduplicate lock_list_lru() Johannes Weiner
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 30+ messages in thread
From: Johannes Weiner @ 2026-03-12 20:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Shakeel Butt, Yosry Ahmed, Zi Yan,
	Liam R. Howlett, Usama Arif, Kiryl Shutsemau, Dave Chinner,
	Roman Gushchin, linux-mm, linux-kernel

Only the MEMCG variant of lock_list_lru() needs to check if there is a
race with cgroup deletion and list reparenting. Move the check to the
caller, so that the next patch can unify the lock_list_lru() variants.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/list_lru.c | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/mm/list_lru.c b/mm/list_lru.c
index e873bc26a7ef..1a39ff490643 100644
--- a/mm/list_lru.c
+++ b/mm/list_lru.c
@@ -68,17 +68,12 @@ list_lru_from_memcg_idx(struct list_lru *lru, int nid, int idx)
 	return &lru->node[nid].lru;
 }
 
-static inline bool lock_list_lru(struct list_lru_one *l, bool irq)
+static inline void lock_list_lru(struct list_lru_one *l, bool irq)
 {
 	if (irq)
 		spin_lock_irq(&l->lock);
 	else
 		spin_lock(&l->lock);
-	if (unlikely(READ_ONCE(l->nr_items) == LONG_MIN)) {
-		unlock_list_lru(l, irq);
-		return false;
-	}
-	return true;
 }
 
 static inline struct list_lru_one *
@@ -90,9 +85,13 @@ lock_list_lru_of_memcg(struct list_lru *lru, int nid, struct mem_cgroup *memcg,
 	rcu_read_lock();
 again:
 	l = list_lru_from_memcg_idx(lru, nid, memcg_kmem_id(memcg));
-	if (likely(l) && lock_list_lru(l, irq)) {
-		rcu_read_unlock();
-		return l;
+	if (likely(l)) {
+		lock_list_lru(l, irq);
+		if (likely(READ_ONCE(l->nr_items) != LONG_MIN)) {
+			rcu_read_unlock();
+			return l;
+		}
+		unlock_list_lru(l, irq);
 	}
 	/*
 	 * Caller may simply bail out if raced with reparenting or
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 4/7] mm: list_lru: deduplicate lock_list_lru()
  2026-03-12 20:51 [PATCH v2 0/7] mm: switch THP shrinker to list_lru Johannes Weiner
                   ` (2 preceding siblings ...)
  2026-03-12 20:51 ` [PATCH v2 3/7] mm: list_lru: move list dead check to lock_list_lru_of_memcg() Johannes Weiner
@ 2026-03-12 20:51 ` Johannes Weiner
  2026-03-17  9:51   ` David Hildenbrand (Arm)
  2026-03-12 20:51 ` [PATCH v2 5/7] mm: list_lru: introduce caller locking for additions and deletions Johannes Weiner
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 30+ messages in thread
From: Johannes Weiner @ 2026-03-12 20:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Shakeel Butt, Yosry Ahmed, Zi Yan,
	Liam R. Howlett, Usama Arif, Kiryl Shutsemau, Dave Chinner,
	Roman Gushchin, linux-mm, linux-kernel

The MEMCG and !MEMCG paths have the same pattern. Share the code.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/list_lru.c | 21 +++++++++------------
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/mm/list_lru.c b/mm/list_lru.c
index 1a39ff490643..4d74c2e9c2a5 100644
--- a/mm/list_lru.c
+++ b/mm/list_lru.c
@@ -15,6 +15,14 @@
 #include "slab.h"
 #include "internal.h"
 
+static inline void lock_list_lru(struct list_lru_one *l, bool irq)
+{
+	if (irq)
+		spin_lock_irq(&l->lock);
+	else
+		spin_lock(&l->lock);
+}
+
 static inline void unlock_list_lru(struct list_lru_one *l, bool irq_off)
 {
 	if (irq_off)
@@ -68,14 +76,6 @@ list_lru_from_memcg_idx(struct list_lru *lru, int nid, int idx)
 	return &lru->node[nid].lru;
 }
 
-static inline void lock_list_lru(struct list_lru_one *l, bool irq)
-{
-	if (irq)
-		spin_lock_irq(&l->lock);
-	else
-		spin_lock(&l->lock);
-}
-
 static inline struct list_lru_one *
 lock_list_lru_of_memcg(struct list_lru *lru, int nid, struct mem_cgroup *memcg,
 		       bool irq, bool skip_empty)
@@ -136,10 +136,7 @@ lock_list_lru_of_memcg(struct list_lru *lru, int nid, struct mem_cgroup *memcg,
 {
 	struct list_lru_one *l = &lru->node[nid].lru;
 
-	if (irq)
-		spin_lock_irq(&l->lock);
-	else
-		spin_lock(&l->lock);
+	lock_list_lru(l, irq);
 
 	return l;
 }
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 5/7] mm: list_lru: introduce caller locking for additions and deletions
  2026-03-12 20:51 [PATCH v2 0/7] mm: switch THP shrinker to list_lru Johannes Weiner
                   ` (3 preceding siblings ...)
  2026-03-12 20:51 ` [PATCH v2 4/7] mm: list_lru: deduplicate lock_list_lru() Johannes Weiner
@ 2026-03-12 20:51 ` Johannes Weiner
  2026-03-17 10:00   ` David Hildenbrand (Arm)
  2026-03-12 20:51 ` [PATCH v2 6/7] mm: list_lru: introduce memcg_list_lru_alloc_folio() Johannes Weiner
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 30+ messages in thread
From: Johannes Weiner @ 2026-03-12 20:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Shakeel Butt, Yosry Ahmed, Zi Yan,
	Liam R. Howlett, Usama Arif, Kiryl Shutsemau, Dave Chinner,
	Roman Gushchin, linux-mm, linux-kernel

Locking is currently internal to the list_lru API. However, a caller
might want to keep auxiliary state synchronized with the LRU state.

For example, the THP shrinker uses the lock of its custom LRU to keep
PG_partially_mapped and vmstats consistent.

To allow the THP shrinker to switch to list_lru, provide normal and
irqsafe locking primitives as well as caller-locked variants of the
addition and deletion functions.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/list_lru.h |  34 +++++++++++++
 mm/list_lru.c            | 104 +++++++++++++++++++++++++++------------
 2 files changed, 107 insertions(+), 31 deletions(-)

diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h
index fe739d35a864..4afc02deb44d 100644
--- a/include/linux/list_lru.h
+++ b/include/linux/list_lru.h
@@ -83,6 +83,40 @@ int memcg_list_lru_alloc(struct mem_cgroup *memcg, struct list_lru *lru,
 			 gfp_t gfp);
 void memcg_reparent_list_lrus(struct mem_cgroup *memcg, struct mem_cgroup *parent);
 
+/**
+ * list_lru_lock: lock the sublist for the given node and memcg
+ * @lru: the lru pointer
+ * @nid: the node id of the sublist to lock.
+ * @memcg: the cgroup of the sublist to lock.
+ *
+ * Returns the locked list_lru_one sublist. The caller must call
+ * list_lru_unlock() when done.
+ *
+ * You must ensure that the memcg is not freed during this call (e.g., with
+ * rcu or by taking a css refcnt).
+ *
+ * Return: the locked list_lru_one, or NULL on failure
+ */
+struct list_lru_one *list_lru_lock(struct list_lru *lru, int nid,
+		struct mem_cgroup *memcg);
+
+/**
+ * list_lru_unlock: unlock a sublist locked by list_lru_lock()
+ * @l: the list_lru_one to unlock
+ */
+void list_lru_unlock(struct list_lru_one *l);
+
+struct list_lru_one *list_lru_lock_irqsave(struct list_lru *lru, int nid,
+		struct mem_cgroup *memcg, unsigned long *irq_flags);
+void list_lru_unlock_irqrestore(struct list_lru_one *l,
+		unsigned long *irq_flags);
+
+/* Caller-locked variants, see list_lru_add() etc for documentation */
+bool __list_lru_add(struct list_lru *lru, struct list_lru_one *l,
+		struct list_head *item, int nid, struct mem_cgroup *memcg);
+bool __list_lru_del(struct list_lru *lru, struct list_lru_one *l,
+		struct list_head *item, int nid);
+
 /**
  * list_lru_add: add an element to the lru list's tail
  * @lru: the lru pointer
diff --git a/mm/list_lru.c b/mm/list_lru.c
index 4d74c2e9c2a5..779cb26cec84 100644
--- a/mm/list_lru.c
+++ b/mm/list_lru.c
@@ -15,17 +15,23 @@
 #include "slab.h"
 #include "internal.h"
 
-static inline void lock_list_lru(struct list_lru_one *l, bool irq)
+static inline void lock_list_lru(struct list_lru_one *l, bool irq,
+				 unsigned long *irq_flags)
 {
-	if (irq)
+	if (irq_flags)
+		spin_lock_irqsave(&l->lock, *irq_flags);
+	else if (irq)
 		spin_lock_irq(&l->lock);
 	else
 		spin_lock(&l->lock);
 }
 
-static inline void unlock_list_lru(struct list_lru_one *l, bool irq_off)
+static inline void unlock_list_lru(struct list_lru_one *l, bool irq_off,
+				   unsigned long *irq_flags)
 {
-	if (irq_off)
+	if (irq_flags)
+		spin_unlock_irqrestore(&l->lock, *irq_flags);
+	else if (irq_off)
 		spin_unlock_irq(&l->lock);
 	else
 		spin_unlock(&l->lock);
@@ -78,7 +84,7 @@ list_lru_from_memcg_idx(struct list_lru *lru, int nid, int idx)
 
 static inline struct list_lru_one *
 lock_list_lru_of_memcg(struct list_lru *lru, int nid, struct mem_cgroup *memcg,
-		       bool irq, bool skip_empty)
+		       bool irq, unsigned long *irq_flags, bool skip_empty)
 {
 	struct list_lru_one *l;
 
@@ -86,12 +92,12 @@ lock_list_lru_of_memcg(struct list_lru *lru, int nid, struct mem_cgroup *memcg,
 again:
 	l = list_lru_from_memcg_idx(lru, nid, memcg_kmem_id(memcg));
 	if (likely(l)) {
-		lock_list_lru(l, irq);
+		lock_list_lru(l, irq, irq_flags);
 		if (likely(READ_ONCE(l->nr_items) != LONG_MIN)) {
 			rcu_read_unlock();
 			return l;
 		}
-		unlock_list_lru(l, irq);
+		unlock_list_lru(l, irq, irq_flags);
 	}
 	/*
 	 * Caller may simply bail out if raced with reparenting or
@@ -132,37 +138,79 @@ list_lru_from_memcg_idx(struct list_lru *lru, int nid, int idx)
 
 static inline struct list_lru_one *
 lock_list_lru_of_memcg(struct list_lru *lru, int nid, struct mem_cgroup *memcg,
-		       bool irq, bool skip_empty)
+		       bool irq, unsigned long *irq_flags, bool skip_empty)
 {
 	struct list_lru_one *l = &lru->node[nid].lru;
 
-	lock_list_lru(l, irq);
+	lock_list_lru(l, irq, irq_flags);
 
 	return l;
 }
 #endif /* CONFIG_MEMCG */
 
-/* The caller must ensure the memcg lifetime. */
-bool list_lru_add(struct list_lru *lru, struct list_head *item, int nid,
-		  struct mem_cgroup *memcg)
+struct list_lru_one *list_lru_lock(struct list_lru *lru, int nid,
+				   struct mem_cgroup *memcg)
 {
-	struct list_lru_node *nlru = &lru->node[nid];
-	struct list_lru_one *l;
+	return lock_list_lru_of_memcg(lru, nid, memcg, false, NULL, false);
+}
+
+void list_lru_unlock(struct list_lru_one *l)
+{
+	unlock_list_lru(l, false, NULL);
+}
+
+struct list_lru_one *list_lru_lock_irqsave(struct list_lru *lru, int nid,
+					   struct mem_cgroup *memcg,
+					   unsigned long *flags)
+{
+	return lock_list_lru_of_memcg(lru, nid, memcg, true, flags, false);
+}
+
+void list_lru_unlock_irqrestore(struct list_lru_one *l, unsigned long *flags)
+{
+	unlock_list_lru(l, true, flags);
+}
 
-	l = lock_list_lru_of_memcg(lru, nid, memcg, false, false);
+bool __list_lru_add(struct list_lru *lru, struct list_lru_one *l,
+		    struct list_head *item, int nid,
+		    struct mem_cgroup *memcg)
+{
 	if (list_empty(item)) {
 		list_add_tail(item, &l->list);
 		/* Set shrinker bit if the first element was added */
 		if (!l->nr_items++)
 			set_shrinker_bit(memcg, nid, lru_shrinker_id(lru));
-		unlock_list_lru(l, false);
-		atomic_long_inc(&nlru->nr_items);
+		atomic_long_inc(&lru->node[nid].nr_items);
+		return true;
+	}
+	return false;
+}
+
+bool __list_lru_del(struct list_lru *lru, struct list_lru_one *l,
+		    struct list_head *item, int nid)
+{
+	if (!list_empty(item)) {
+		list_del_init(item);
+		l->nr_items--;
+		atomic_long_dec(&lru->node[nid].nr_items);
 		return true;
 	}
-	unlock_list_lru(l, false);
 	return false;
 }
 
+/* The caller must ensure the memcg lifetime. */
+bool list_lru_add(struct list_lru *lru, struct list_head *item, int nid,
+		  struct mem_cgroup *memcg)
+{
+	struct list_lru_one *l;
+	bool ret;
+
+	l = list_lru_lock(lru, nid, memcg);
+	ret = __list_lru_add(lru, l, item, nid, memcg);
+	list_lru_unlock(l);
+	return ret;
+}
+
 bool list_lru_add_obj(struct list_lru *lru, struct list_head *item)
 {
 	bool ret;
@@ -184,19 +232,13 @@ EXPORT_SYMBOL_GPL(list_lru_add_obj);
 bool list_lru_del(struct list_lru *lru, struct list_head *item, int nid,
 		  struct mem_cgroup *memcg)
 {
-	struct list_lru_node *nlru = &lru->node[nid];
 	struct list_lru_one *l;
+	bool ret;
 
-	l = lock_list_lru_of_memcg(lru, nid, memcg, false, false);
-	if (!list_empty(item)) {
-		list_del_init(item);
-		l->nr_items--;
-		unlock_list_lru(l, false);
-		atomic_long_dec(&nlru->nr_items);
-		return true;
-	}
-	unlock_list_lru(l, false);
-	return false;
+	l = list_lru_lock(lru, nid, memcg);
+	ret = __list_lru_del(lru, l, item, nid);
+	list_lru_unlock(l);
+	return ret;
 }
 
 bool list_lru_del_obj(struct list_lru *lru, struct list_head *item)
@@ -269,7 +311,7 @@ __list_lru_walk_one(struct list_lru *lru, int nid, struct mem_cgroup *memcg,
 	unsigned long isolated = 0;
 
 restart:
-	l = lock_list_lru_of_memcg(lru, nid, memcg, irq_off, true);
+	l = lock_list_lru_of_memcg(lru, nid, memcg, irq_off, NULL, true);
 	if (!l)
 		return isolated;
 	list_for_each_safe(item, n, &l->list) {
@@ -310,7 +352,7 @@ __list_lru_walk_one(struct list_lru *lru, int nid, struct mem_cgroup *memcg,
 			BUG();
 		}
 	}
-	unlock_list_lru(l, irq_off);
+	unlock_list_lru(l, irq_off, NULL);
 out:
 	return isolated;
 }
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 6/7] mm: list_lru: introduce memcg_list_lru_alloc_folio()
  2026-03-12 20:51 [PATCH v2 0/7] mm: switch THP shrinker to list_lru Johannes Weiner
                   ` (4 preceding siblings ...)
  2026-03-12 20:51 ` [PATCH v2 5/7] mm: list_lru: introduce caller locking for additions and deletions Johannes Weiner
@ 2026-03-12 20:51 ` Johannes Weiner
  2026-03-17 10:09   ` David Hildenbrand (Arm)
  2026-03-12 20:51 ` [PATCH v2 7/7] mm: switch deferred split shrinker to list_lru Johannes Weiner
  2026-03-13 17:39 ` [syzbot ci] Re: mm: switch THP " syzbot ci
  7 siblings, 1 reply; 30+ messages in thread
From: Johannes Weiner @ 2026-03-12 20:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Shakeel Butt, Yosry Ahmed, Zi Yan,
	Liam R. Howlett, Usama Arif, Kiryl Shutsemau, Dave Chinner,
	Roman Gushchin, linux-mm, linux-kernel

memcg_list_lru_alloc() is called every time an object that may end up
on the list_lru is created. It needs to quickly check if the list_lru
heads for the memcg already exist, and allocate them when they don't.

Doing this with folio objects is tricky: folio_memcg() is not stable
and requires either RCU protection or pinning the cgroup. But it's
desirable to make the existence check lightweight under RCU, and only
pin the memcg when we need to allocate list_lru heads and may block.

In preparation for switching the THP shrinker to list_lru, add a
helper function for allocating list_lru heads coming from a folio.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/list_lru.h | 12 ++++++++++++
 mm/list_lru.c            | 39 ++++++++++++++++++++++++++++++++++-----
 2 files changed, 46 insertions(+), 5 deletions(-)

diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h
index 4afc02deb44d..df6bd3c64b06 100644
--- a/include/linux/list_lru.h
+++ b/include/linux/list_lru.h
@@ -81,6 +81,18 @@ static inline int list_lru_init_memcg_key(struct list_lru *lru, struct shrinker
 
 int memcg_list_lru_alloc(struct mem_cgroup *memcg, struct list_lru *lru,
 			 gfp_t gfp);
+
+#ifdef CONFIG_MEMCG
+int memcg_list_lru_alloc_folio(struct folio *folio, struct list_lru *lru,
+			       gfp_t gfp);
+#else
+static inline int memcg_list_lru_alloc_folio(struct folio *folio,
+					     struct list_lru *lru, gfp_t gfp)
+{
+	return 0;
+}
+#endif
+
 void memcg_reparent_list_lrus(struct mem_cgroup *memcg, struct mem_cgroup *parent);
 
 /**
diff --git a/mm/list_lru.c b/mm/list_lru.c
index 779cb26cec84..562b2b1f8c41 100644
--- a/mm/list_lru.c
+++ b/mm/list_lru.c
@@ -534,17 +534,14 @@ static inline bool memcg_list_lru_allocated(struct mem_cgroup *memcg,
 	return idx < 0 || xa_load(&lru->xa, idx);
 }
 
-int memcg_list_lru_alloc(struct mem_cgroup *memcg, struct list_lru *lru,
-			 gfp_t gfp)
+static int __memcg_list_lru_alloc(struct mem_cgroup *memcg,
+				  struct list_lru *lru, gfp_t gfp)
 {
 	unsigned long flags;
 	struct list_lru_memcg *mlru = NULL;
 	struct mem_cgroup *pos, *parent;
 	XA_STATE(xas, &lru->xa, 0);
 
-	if (!list_lru_memcg_aware(lru) || memcg_list_lru_allocated(memcg, lru))
-		return 0;
-
 	gfp &= GFP_RECLAIM_MASK;
 	/*
 	 * Because the list_lru can be reparented to the parent cgroup's
@@ -585,6 +582,38 @@ int memcg_list_lru_alloc(struct mem_cgroup *memcg, struct list_lru *lru,
 
 	return xas_error(&xas);
 }
+
+int memcg_list_lru_alloc(struct mem_cgroup *memcg, struct list_lru *lru,
+			 gfp_t gfp)
+{
+	if (!list_lru_memcg_aware(lru) || memcg_list_lru_allocated(memcg, lru))
+		return 0;
+	return __memcg_list_lru_alloc(memcg, lru, gfp);
+}
+
+int memcg_list_lru_alloc_folio(struct folio *folio, struct list_lru *lru,
+			       gfp_t gfp)
+{
+	struct mem_cgroup *memcg;
+	int res;
+
+	if (!list_lru_memcg_aware(lru))
+		return 0;
+
+	/* Fast path when list_lru heads already exist */
+	rcu_read_lock();
+	memcg = folio_memcg(folio);
+	res = memcg_list_lru_allocated(memcg, lru);
+	rcu_read_unlock();
+	if (likely(res))
+		return 0;
+
+	/* Allocation may block, pin the memcg */
+	memcg = get_mem_cgroup_from_folio(folio);
+	res = __memcg_list_lru_alloc(memcg, lru, gfp);
+	mem_cgroup_put(memcg);
+	return res;
+}
 #else
 static inline void memcg_init_list_lru(struct list_lru *lru, bool memcg_aware)
 {
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 7/7] mm: switch deferred split shrinker to list_lru
  2026-03-12 20:51 [PATCH v2 0/7] mm: switch THP shrinker to list_lru Johannes Weiner
                   ` (5 preceding siblings ...)
  2026-03-12 20:51 ` [PATCH v2 6/7] mm: list_lru: introduce memcg_list_lru_alloc_folio() Johannes Weiner
@ 2026-03-12 20:51 ` Johannes Weiner
  2026-03-18 20:25   ` David Hildenbrand (Arm)
  2026-03-13 17:39 ` [syzbot ci] Re: mm: switch THP " syzbot ci
  7 siblings, 1 reply; 30+ messages in thread
From: Johannes Weiner @ 2026-03-12 20:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Shakeel Butt, Yosry Ahmed, Zi Yan,
	Liam R. Howlett, Usama Arif, Kiryl Shutsemau, Dave Chinner,
	Roman Gushchin, linux-mm, linux-kernel

The deferred split queue handles cgroups in a suboptimal fashion. The
queue is per-NUMA node or per-cgroup, not the intersection. That means
on a cgrouped system, a node-restricted allocation entering reclaim
can end up splitting large pages on other nodes:

	alloc/unmap
	  deferred_split_folio()
	    list_add_tail(memcg->split_queue)
	    set_shrinker_bit(memcg, node, deferred_shrinker_id)

	for_each_zone_zonelist_nodemask(restricted_nodes)
	  mem_cgroup_iter()
	    shrink_slab(node, memcg)
	      shrink_slab_memcg(node, memcg)
	        if test_shrinker_bit(memcg, node, deferred_shrinker_id)
	          deferred_split_scan()
	            walks memcg->split_queue

The shrinker bit adds an imperfect guard rail. As soon as the cgroup
has a single large page on the node of interest, all large pages owned
by that memcg, including those on other nodes, will be split.

list_lru properly sets up per-node, per-cgroup lists. As a bonus, it
streamlines a lot of the list operations and reclaim walks. It's used
widely by other major shrinkers already. Convert the deferred split
queue as well.

The list_lru per-memcg heads are instantiated on demand when the first
object of interest is allocated for a cgroup, by calling
memcg_list_lru_alloc_folio(). Add calls to where splittable pages are
created: anon faults, swapin faults, khugepaged collapse.

These calls create all possible node heads for the cgroup at once, so
the migration code (between nodes) doesn't need any special care.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 include/linux/huge_mm.h    |   6 +-
 include/linux/memcontrol.h |   4 -
 include/linux/mmzone.h     |  12 --
 mm/huge_memory.c           | 330 +++++++++++--------------------------
 mm/internal.h              |   2 +-
 mm/khugepaged.c            |   7 +
 mm/memcontrol.c            |  12 +-
 mm/memory.c                |  52 +++---
 mm/mm_init.c               |  15 --
 9 files changed, 140 insertions(+), 300 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index a4d9f964dfde..2d0d0c797dd8 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -414,10 +414,9 @@ static inline int split_huge_page(struct page *page)
 {
 	return split_huge_page_to_list_to_order(page, NULL, 0);
 }
+
+extern struct list_lru deferred_split_lru;
 void deferred_split_folio(struct folio *folio, bool partially_mapped);
-#ifdef CONFIG_MEMCG
-void reparent_deferred_split_queue(struct mem_cgroup *memcg);
-#endif
 
 void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
 		unsigned long address, bool freeze);
@@ -650,7 +649,6 @@ static inline int try_folio_split_to_order(struct folio *folio,
 }
 
 static inline void deferred_split_folio(struct folio *folio, bool partially_mapped) {}
-static inline void reparent_deferred_split_queue(struct mem_cgroup *memcg) {}
 #define split_huge_pmd(__vma, __pmd, __address)	\
 	do { } while (0)
 
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 086158969529..0782c72a1997 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -277,10 +277,6 @@ struct mem_cgroup {
 	struct memcg_cgwb_frn cgwb_frn[MEMCG_CGWB_FRN_CNT];
 #endif
 
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-	struct deferred_split deferred_split_queue;
-#endif
-
 #ifdef CONFIG_LRU_GEN_WALKS_MMU
 	/* per-memcg mm_struct list */
 	struct lru_gen_mm_list mm_list;
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 7bd0134c241c..232b7a71fd69 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1429,14 +1429,6 @@ struct zonelist {
  */
 extern struct page *mem_map;
 
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-struct deferred_split {
-	spinlock_t split_queue_lock;
-	struct list_head split_queue;
-	unsigned long split_queue_len;
-};
-#endif
-
 #ifdef CONFIG_MEMORY_FAILURE
 /*
  * Per NUMA node memory failure handling statistics.
@@ -1562,10 +1554,6 @@ typedef struct pglist_data {
 	unsigned long first_deferred_pfn;
 #endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
 
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-	struct deferred_split deferred_split_queue;
-#endif
-
 #ifdef CONFIG_NUMA_BALANCING
 	/* start time in ms of current promote rate limit period */
 	unsigned int nbp_rl_start;
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 7d0a64033b18..ed9b98e2e166 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -14,6 +14,7 @@
 #include <linux/mmu_notifier.h>
 #include <linux/rmap.h>
 #include <linux/swap.h>
+#include <linux/list_lru.h>
 #include <linux/shrinker.h>
 #include <linux/mm_inline.h>
 #include <linux/swapops.h>
@@ -67,6 +68,7 @@ unsigned long transparent_hugepage_flags __read_mostly =
 	(1<<TRANSPARENT_HUGEPAGE_DEFRAG_KHUGEPAGED_FLAG)|
 	(1<<TRANSPARENT_HUGEPAGE_USE_ZERO_PAGE_FLAG);
 
+struct list_lru deferred_split_lru;
 static struct shrinker *deferred_split_shrinker;
 static unsigned long deferred_split_count(struct shrinker *shrink,
 					  struct shrink_control *sc);
@@ -866,6 +868,11 @@ static int __init thp_shrinker_init(void)
 	if (!deferred_split_shrinker)
 		return -ENOMEM;
 
+	if (list_lru_init_memcg(&deferred_split_lru, deferred_split_shrinker)) {
+		shrinker_free(deferred_split_shrinker);
+		return -ENOMEM;
+	}
+
 	deferred_split_shrinker->count_objects = deferred_split_count;
 	deferred_split_shrinker->scan_objects = deferred_split_scan;
 	shrinker_register(deferred_split_shrinker);
@@ -886,6 +893,7 @@ static int __init thp_shrinker_init(void)
 
 	huge_zero_folio_shrinker = shrinker_alloc(0, "thp-zero");
 	if (!huge_zero_folio_shrinker) {
+		list_lru_destroy(&deferred_split_lru);
 		shrinker_free(deferred_split_shrinker);
 		return -ENOMEM;
 	}
@@ -900,6 +908,7 @@ static int __init thp_shrinker_init(void)
 static void __init thp_shrinker_exit(void)
 {
 	shrinker_free(huge_zero_folio_shrinker);
+	list_lru_destroy(&deferred_split_lru);
 	shrinker_free(deferred_split_shrinker);
 }
 
@@ -1080,119 +1089,6 @@ pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma)
 	return pmd;
 }
 
-static struct deferred_split *split_queue_node(int nid)
-{
-	struct pglist_data *pgdata = NODE_DATA(nid);
-
-	return &pgdata->deferred_split_queue;
-}
-
-#ifdef CONFIG_MEMCG
-static inline
-struct mem_cgroup *folio_split_queue_memcg(struct folio *folio,
-					   struct deferred_split *queue)
-{
-	if (mem_cgroup_disabled())
-		return NULL;
-	if (split_queue_node(folio_nid(folio)) == queue)
-		return NULL;
-	return container_of(queue, struct mem_cgroup, deferred_split_queue);
-}
-
-static struct deferred_split *memcg_split_queue(int nid, struct mem_cgroup *memcg)
-{
-	return memcg ? &memcg->deferred_split_queue : split_queue_node(nid);
-}
-#else
-static inline
-struct mem_cgroup *folio_split_queue_memcg(struct folio *folio,
-					   struct deferred_split *queue)
-{
-	return NULL;
-}
-
-static struct deferred_split *memcg_split_queue(int nid, struct mem_cgroup *memcg)
-{
-	return split_queue_node(nid);
-}
-#endif
-
-static struct deferred_split *split_queue_lock(int nid, struct mem_cgroup *memcg)
-{
-	struct deferred_split *queue;
-
-retry:
-	queue = memcg_split_queue(nid, memcg);
-	spin_lock(&queue->split_queue_lock);
-	/*
-	 * There is a period between setting memcg to dying and reparenting
-	 * deferred split queue, and during this period the THPs in the deferred
-	 * split queue will be hidden from the shrinker side.
-	 */
-	if (unlikely(memcg_is_dying(memcg))) {
-		spin_unlock(&queue->split_queue_lock);
-		memcg = parent_mem_cgroup(memcg);
-		goto retry;
-	}
-
-	return queue;
-}
-
-static struct deferred_split *
-split_queue_lock_irqsave(int nid, struct mem_cgroup *memcg, unsigned long *flags)
-{
-	struct deferred_split *queue;
-
-retry:
-	queue = memcg_split_queue(nid, memcg);
-	spin_lock_irqsave(&queue->split_queue_lock, *flags);
-	if (unlikely(memcg_is_dying(memcg))) {
-		spin_unlock_irqrestore(&queue->split_queue_lock, *flags);
-		memcg = parent_mem_cgroup(memcg);
-		goto retry;
-	}
-
-	return queue;
-}
-
-static struct deferred_split *folio_split_queue_lock(struct folio *folio)
-{
-	struct deferred_split *queue;
-
-	rcu_read_lock();
-	queue = split_queue_lock(folio_nid(folio), folio_memcg(folio));
-	/*
-	 * The memcg destruction path is acquiring the split queue lock for
-	 * reparenting. Once you have it locked, it's safe to drop the rcu lock.
-	 */
-	rcu_read_unlock();
-
-	return queue;
-}
-
-static struct deferred_split *
-folio_split_queue_lock_irqsave(struct folio *folio, unsigned long *flags)
-{
-	struct deferred_split *queue;
-
-	rcu_read_lock();
-	queue = split_queue_lock_irqsave(folio_nid(folio), folio_memcg(folio), flags);
-	rcu_read_unlock();
-
-	return queue;
-}
-
-static inline void split_queue_unlock(struct deferred_split *queue)
-{
-	spin_unlock(&queue->split_queue_lock);
-}
-
-static inline void split_queue_unlock_irqrestore(struct deferred_split *queue,
-						 unsigned long flags)
-{
-	spin_unlock_irqrestore(&queue->split_queue_lock, flags);
-}
-
 static inline bool is_transparent_hugepage(const struct folio *folio)
 {
 	if (!folio_test_large(folio))
@@ -1293,6 +1189,14 @@ static struct folio *vma_alloc_anon_folio_pmd(struct vm_area_struct *vma,
 		count_mthp_stat(order, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE);
 		return NULL;
 	}
+
+	if (memcg_list_lru_alloc_folio(folio, &deferred_split_lru, gfp)) {
+		folio_put(folio);
+		count_vm_event(THP_FAULT_FALLBACK);
+		count_mthp_stat(order, MTHP_STAT_ANON_FAULT_FALLBACK);
+		return NULL;
+	}
+
 	folio_throttle_swaprate(folio, gfp);
 
        /*
@@ -3802,33 +3706,28 @@ static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int n
 	struct folio *new_folio, *next;
 	int old_order = folio_order(folio);
 	int ret = 0;
-	struct deferred_split *ds_queue;
+	struct list_lru_one *l;
 
 	VM_WARN_ON_ONCE(!mapping && end);
 	/* Prevent deferred_split_scan() touching ->_refcount */
-	ds_queue = folio_split_queue_lock(folio);
+	rcu_read_lock();
+	l = list_lru_lock(&deferred_split_lru, folio_nid(folio), folio_memcg(folio));
 	if (folio_ref_freeze(folio, folio_cache_ref_count(folio) + 1)) {
 		struct swap_cluster_info *ci = NULL;
 		struct lruvec *lruvec;
 
 		if (old_order > 1) {
-			if (!list_empty(&folio->_deferred_list)) {
-				ds_queue->split_queue_len--;
-				/*
-				 * Reinitialize page_deferred_list after removing the
-				 * page from the split_queue, otherwise a subsequent
-				 * split will see list corruption when checking the
-				 * page_deferred_list.
-				 */
-				list_del_init(&folio->_deferred_list);
-			}
+			__list_lru_del(&deferred_split_lru, l,
+				       &folio->_deferred_list, folio_nid(folio));
 			if (folio_test_partially_mapped(folio)) {
 				folio_clear_partially_mapped(folio);
 				mod_mthp_stat(old_order,
 					MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1);
 			}
 		}
-		split_queue_unlock(ds_queue);
+		list_lru_unlock(l);
+		rcu_read_unlock();
+
 		if (mapping) {
 			int nr = folio_nr_pages(folio);
 
@@ -3929,7 +3828,8 @@ static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int n
 		if (ci)
 			swap_cluster_unlock(ci);
 	} else {
-		split_queue_unlock(ds_queue);
+		list_lru_unlock(l);
+		rcu_read_unlock();
 		return -EAGAIN;
 	}
 
@@ -4296,33 +4196,35 @@ int split_folio_to_list(struct folio *folio, struct list_head *list)
  * queueing THP splits, and that list is (racily observed to be) non-empty.
  *
  * It is unsafe to call folio_unqueue_deferred_split() until folio refcount is
- * zero: because even when split_queue_lock is held, a non-empty _deferred_list
- * might be in use on deferred_split_scan()'s unlocked on-stack list.
+ * zero: because even when the list_lru lock is held, a non-empty
+ * _deferred_list might be in use on deferred_split_scan()'s unlocked
+ * on-stack list.
  *
- * If memory cgroups are enabled, split_queue_lock is in the mem_cgroup: it is
- * therefore important to unqueue deferred split before changing folio memcg.
+ * The list_lru sublist is determined by folio's memcg: it is therefore
+ * important to unqueue deferred split before changing folio memcg.
  */
 bool __folio_unqueue_deferred_split(struct folio *folio)
 {
-	struct deferred_split *ds_queue;
+	struct list_lru_one *l;
+	int nid = folio_nid(folio);
 	unsigned long flags;
 	bool unqueued = false;
 
 	WARN_ON_ONCE(folio_ref_count(folio));
 	WARN_ON_ONCE(!mem_cgroup_disabled() && !folio_memcg_charged(folio));
 
-	ds_queue = folio_split_queue_lock_irqsave(folio, &flags);
-	if (!list_empty(&folio->_deferred_list)) {
-		ds_queue->split_queue_len--;
+	rcu_read_lock();
+	l = list_lru_lock_irqsave(&deferred_split_lru, nid, folio_memcg(folio), &flags);
+	if (__list_lru_del(&deferred_split_lru, l, &folio->_deferred_list, nid)) {
 		if (folio_test_partially_mapped(folio)) {
 			folio_clear_partially_mapped(folio);
 			mod_mthp_stat(folio_order(folio),
 				      MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1);
 		}
-		list_del_init(&folio->_deferred_list);
 		unqueued = true;
 	}
-	split_queue_unlock_irqrestore(ds_queue, flags);
+	list_lru_unlock_irqrestore(l, &flags);
+	rcu_read_unlock();
 
 	return unqueued;	/* useful for debug warnings */
 }
@@ -4330,7 +4232,9 @@ bool __folio_unqueue_deferred_split(struct folio *folio)
 /* partially_mapped=false won't clear PG_partially_mapped folio flag */
 void deferred_split_folio(struct folio *folio, bool partially_mapped)
 {
-	struct deferred_split *ds_queue;
+	struct list_lru_one *l;
+	int nid;
+	struct mem_cgroup *memcg;
 	unsigned long flags;
 
 	/*
@@ -4353,7 +4257,11 @@ void deferred_split_folio(struct folio *folio, bool partially_mapped)
 	if (folio_test_swapcache(folio))
 		return;
 
-	ds_queue = folio_split_queue_lock_irqsave(folio, &flags);
+	nid = folio_nid(folio);
+
+	rcu_read_lock();
+	memcg = folio_memcg(folio);
+	l = list_lru_lock_irqsave(&deferred_split_lru, nid, memcg, &flags);
 	if (partially_mapped) {
 		if (!folio_test_partially_mapped(folio)) {
 			folio_set_partially_mapped(folio);
@@ -4361,36 +4269,20 @@ void deferred_split_folio(struct folio *folio, bool partially_mapped)
 				count_vm_event(THP_DEFERRED_SPLIT_PAGE);
 			count_mthp_stat(folio_order(folio), MTHP_STAT_SPLIT_DEFERRED);
 			mod_mthp_stat(folio_order(folio), MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, 1);
-
 		}
 	} else {
 		/* partially mapped folios cannot become non-partially mapped */
 		VM_WARN_ON_FOLIO(folio_test_partially_mapped(folio), folio);
 	}
-	if (list_empty(&folio->_deferred_list)) {
-		struct mem_cgroup *memcg;
-
-		memcg = folio_split_queue_memcg(folio, ds_queue);
-		list_add_tail(&folio->_deferred_list, &ds_queue->split_queue);
-		ds_queue->split_queue_len++;
-		if (memcg)
-			set_shrinker_bit(memcg, folio_nid(folio),
-					 shrinker_id(deferred_split_shrinker));
-	}
-	split_queue_unlock_irqrestore(ds_queue, flags);
+	__list_lru_add(&deferred_split_lru, l, &folio->_deferred_list, nid, memcg);
+	list_lru_unlock_irqrestore(l, &flags);
+	rcu_read_unlock();
 }
 
 static unsigned long deferred_split_count(struct shrinker *shrink,
 		struct shrink_control *sc)
 {
-	struct pglist_data *pgdata = NODE_DATA(sc->nid);
-	struct deferred_split *ds_queue = &pgdata->deferred_split_queue;
-
-#ifdef CONFIG_MEMCG
-	if (sc->memcg)
-		ds_queue = &sc->memcg->deferred_split_queue;
-#endif
-	return READ_ONCE(ds_queue->split_queue_len);
+	return list_lru_shrink_count(&deferred_split_lru, sc);
 }
 
 static bool thp_underused(struct folio *folio)
@@ -4420,45 +4312,47 @@ static bool thp_underused(struct folio *folio)
 	return false;
 }
 
+static enum lru_status deferred_split_isolate(struct list_head *item,
+					      struct list_lru_one *lru,
+					      void *cb_arg)
+{
+	struct folio *folio = container_of(item, struct folio, _deferred_list);
+	struct list_head *freeable = cb_arg;
+
+	if (folio_try_get(folio)) {
+		list_lru_isolate_move(lru, item, freeable);
+		return LRU_REMOVED;
+	}
+
+	/* We lost race with folio_put() */
+	list_lru_isolate(lru, item);
+	if (folio_test_partially_mapped(folio)) {
+		folio_clear_partially_mapped(folio);
+		mod_mthp_stat(folio_order(folio),
+			      MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1);
+	}
+	return LRU_REMOVED;
+}
+
 static unsigned long deferred_split_scan(struct shrinker *shrink,
 		struct shrink_control *sc)
 {
-	struct deferred_split *ds_queue;
-	unsigned long flags;
+	LIST_HEAD(dispose);
 	struct folio *folio, *next;
-	int split = 0, i;
-	struct folio_batch fbatch;
-
-	folio_batch_init(&fbatch);
+	int split = 0;
+	unsigned long isolated;
 
-retry:
-	ds_queue = split_queue_lock_irqsave(sc->nid, sc->memcg, &flags);
-	/* Take pin on all head pages to avoid freeing them under us */
-	list_for_each_entry_safe(folio, next, &ds_queue->split_queue,
-							_deferred_list) {
-		if (folio_try_get(folio)) {
-			folio_batch_add(&fbatch, folio);
-		} else if (folio_test_partially_mapped(folio)) {
-			/* We lost race with folio_put() */
-			folio_clear_partially_mapped(folio);
-			mod_mthp_stat(folio_order(folio),
-				      MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1);
-		}
-		list_del_init(&folio->_deferred_list);
-		ds_queue->split_queue_len--;
-		if (!--sc->nr_to_scan)
-			break;
-		if (!folio_batch_space(&fbatch))
-			break;
-	}
-	split_queue_unlock_irqrestore(ds_queue, flags);
+	isolated = list_lru_shrink_walk_irq(&deferred_split_lru, sc,
+					    deferred_split_isolate, &dispose);
 
-	for (i = 0; i < folio_batch_count(&fbatch); i++) {
+	list_for_each_entry_safe(folio, next, &dispose, _deferred_list) {
 		bool did_split = false;
 		bool underused = false;
-		struct deferred_split *fqueue;
+		struct list_lru_one *l;
+		unsigned long flags;
+
+		list_del_init(&folio->_deferred_list);
 
-		folio = fbatch.folios[i];
 		if (!folio_test_partially_mapped(folio)) {
 			/*
 			 * See try_to_map_unused_to_zeropage(): we cannot
@@ -4481,64 +4375,32 @@ static unsigned long deferred_split_scan(struct shrinker *shrink,
 		}
 		folio_unlock(folio);
 next:
-		if (did_split || !folio_test_partially_mapped(folio))
-			continue;
 		/*
 		 * Only add back to the queue if folio is partially mapped.
 		 * If thp_underused returns false, or if split_folio fails
 		 * in the case it was underused, then consider it used and
 		 * don't add it back to split_queue.
 		 */
-		fqueue = folio_split_queue_lock_irqsave(folio, &flags);
-		if (list_empty(&folio->_deferred_list)) {
-			list_add_tail(&folio->_deferred_list, &fqueue->split_queue);
-			fqueue->split_queue_len++;
+		if (!did_split && folio_test_partially_mapped(folio)) {
+			rcu_read_lock();
+			l = list_lru_lock_irqsave(&deferred_split_lru,
+						  folio_nid(folio),
+						  folio_memcg(folio),
+						  &flags);
+			__list_lru_add(&deferred_split_lru, l,
+				       &folio->_deferred_list,
+				       folio_nid(folio), folio_memcg(folio));
+			list_lru_unlock_irqrestore(l, &flags);
+			rcu_read_unlock();
 		}
-		split_queue_unlock_irqrestore(fqueue, flags);
-	}
-	folios_put(&fbatch);
-
-	if (sc->nr_to_scan && !list_empty(&ds_queue->split_queue)) {
-		cond_resched();
-		goto retry;
+		folio_put(folio);
 	}
 
-	/*
-	 * Stop shrinker if we didn't split any page, but the queue is empty.
-	 * This can happen if pages were freed under us.
-	 */
-	if (!split && list_empty(&ds_queue->split_queue))
+	if (!split && !isolated)
 		return SHRINK_STOP;
 	return split;
 }
 
-#ifdef CONFIG_MEMCG
-void reparent_deferred_split_queue(struct mem_cgroup *memcg)
-{
-	struct mem_cgroup *parent = parent_mem_cgroup(memcg);
-	struct deferred_split *ds_queue = &memcg->deferred_split_queue;
-	struct deferred_split *parent_ds_queue = &parent->deferred_split_queue;
-	int nid;
-
-	spin_lock_irq(&ds_queue->split_queue_lock);
-	spin_lock_nested(&parent_ds_queue->split_queue_lock, SINGLE_DEPTH_NESTING);
-
-	if (!ds_queue->split_queue_len)
-		goto unlock;
-
-	list_splice_tail_init(&ds_queue->split_queue, &parent_ds_queue->split_queue);
-	parent_ds_queue->split_queue_len += ds_queue->split_queue_len;
-	ds_queue->split_queue_len = 0;
-
-	for_each_node(nid)
-		set_shrinker_bit(parent, nid, shrinker_id(deferred_split_shrinker));
-
-unlock:
-	spin_unlock(&parent_ds_queue->split_queue_lock);
-	spin_unlock_irq(&ds_queue->split_queue_lock);
-}
-#endif
-
 #ifdef CONFIG_DEBUG_FS
 static void split_huge_pages_all(void)
 {
diff --git a/mm/internal.h b/mm/internal.h
index 95b583e7e4f7..71d2605f8040 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -857,7 +857,7 @@ static inline bool folio_unqueue_deferred_split(struct folio *folio)
 	/*
 	 * At this point, there is no one trying to add the folio to
 	 * deferred_list. If folio is not in deferred_list, it's safe
-	 * to check without acquiring the split_queue_lock.
+	 * to check without acquiring the list_lru lock.
 	 */
 	if (data_race(list_empty(&folio->_deferred_list)))
 		return false;
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index b7b4680d27ab..01fd3d5933c5 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1076,6 +1076,7 @@ static enum scan_result alloc_charge_folio(struct folio **foliop, struct mm_stru
 	}
 
 	count_vm_event(THP_COLLAPSE_ALLOC);
+
 	if (unlikely(mem_cgroup_charge(folio, mm, gfp))) {
 		folio_put(folio);
 		*foliop = NULL;
@@ -1084,6 +1085,12 @@ static enum scan_result alloc_charge_folio(struct folio **foliop, struct mm_stru
 
 	count_memcg_folio_events(folio, THP_COLLAPSE_ALLOC, 1);
 
+	if (memcg_list_lru_alloc_folio(folio, &deferred_split_lru, gfp)) {
+		folio_put(folio);
+		*foliop = NULL;
+		return SCAN_CGROUP_CHARGE_FAIL;
+	}
+
 	*foliop = folio;
 	return SCAN_SUCCEED;
 }
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index a47fb68dd65f..f381cb6bdff1 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4015,11 +4015,6 @@ static struct mem_cgroup *mem_cgroup_alloc(struct mem_cgroup *parent)
 	for (i = 0; i < MEMCG_CGWB_FRN_CNT; i++)
 		memcg->cgwb_frn[i].done =
 			__WB_COMPLETION_INIT(&memcg_cgwb_frn_waitq);
-#endif
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-	spin_lock_init(&memcg->deferred_split_queue.split_queue_lock);
-	INIT_LIST_HEAD(&memcg->deferred_split_queue.split_queue);
-	memcg->deferred_split_queue.split_queue_len = 0;
 #endif
 	lru_gen_init_memcg(memcg);
 	return memcg;
@@ -4167,11 +4162,10 @@ static void mem_cgroup_css_offline(struct cgroup_subsys_state *css)
 	zswap_memcg_offline_cleanup(memcg);
 
 	memcg_offline_kmem(memcg);
-	reparent_deferred_split_queue(memcg);
 	/*
-	 * The reparenting of objcg must be after the reparenting of the
-	 * list_lru and deferred_split_queue above, which ensures that they will
-	 * not mistakenly get the parent list_lru and deferred_split_queue.
+	 * The reparenting of objcg must be after the reparenting of
+	 * the list_lru in memcg_offline_kmem(), which ensures that
+	 * they will not mistakenly get the parent list_lru.
 	 */
 	memcg_reparent_objcgs(memcg);
 	reparent_shrinker_deferred(memcg);
diff --git a/mm/memory.c b/mm/memory.c
index 38062f8e1165..4dad1a7890aa 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4651,13 +4651,19 @@ static struct folio *alloc_swap_folio(struct vm_fault *vmf)
 	while (orders) {
 		addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << order);
 		folio = vma_alloc_folio(gfp, order, vma, addr);
-		if (folio) {
-			if (!mem_cgroup_swapin_charge_folio(folio, vma->vm_mm,
-							    gfp, entry))
-				return folio;
+		if (!folio)
+			goto next;
+		if (mem_cgroup_swapin_charge_folio(folio, vma->vm_mm, gfp, entry)) {
 			count_mthp_stat(order, MTHP_STAT_SWPIN_FALLBACK_CHARGE);
 			folio_put(folio);
+			goto next;
 		}
+		if (memcg_list_lru_alloc_folio(folio, &deferred_split_lru, gfp)) {
+			folio_put(folio);
+			goto fallback;
+		}
+		return folio;
+next:
 		count_mthp_stat(order, MTHP_STAT_SWPIN_FALLBACK);
 		order = next_order(&orders, order);
 	}
@@ -5168,24 +5174,28 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf)
 	while (orders) {
 		addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << order);
 		folio = vma_alloc_folio(gfp, order, vma, addr);
-		if (folio) {
-			if (mem_cgroup_charge(folio, vma->vm_mm, gfp)) {
-				count_mthp_stat(order, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE);
-				folio_put(folio);
-				goto next;
-			}
-			folio_throttle_swaprate(folio, gfp);
-			/*
-			 * When a folio is not zeroed during allocation
-			 * (__GFP_ZERO not used) or user folios require special
-			 * handling, folio_zero_user() is used to make sure
-			 * that the page corresponding to the faulting address
-			 * will be hot in the cache after zeroing.
-			 */
-			if (user_alloc_needs_zeroing())
-				folio_zero_user(folio, vmf->address);
-			return folio;
+		if (!folio)
+			goto next;
+		if (mem_cgroup_charge(folio, vma->vm_mm, gfp)) {
+			count_mthp_stat(order, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE);
+			folio_put(folio);
+			goto next;
 		}
+		if (memcg_list_lru_alloc_folio(folio, &deferred_split_lru, gfp)) {
+			folio_put(folio);
+			goto fallback;
+		}
+		folio_throttle_swaprate(folio, gfp);
+		/*
+		 * When a folio is not zeroed during allocation
+		 * (__GFP_ZERO not used) or user folios require special
+		 * handling, folio_zero_user() is used to make sure
+		 * that the page corresponding to the faulting address
+		 * will be hot in the cache after zeroing.
+		 */
+		if (user_alloc_needs_zeroing())
+			folio_zero_user(folio, vmf->address);
+		return folio;
 next:
 		count_mthp_stat(order, MTHP_STAT_ANON_FAULT_FALLBACK);
 		order = next_order(&orders, order);
diff --git a/mm/mm_init.c b/mm/mm_init.c
index cec7bb758bdd..f293a62e652a 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -1388,19 +1388,6 @@ static void __init calculate_node_totalpages(struct pglist_data *pgdat,
 	pr_debug("On node %d totalpages: %lu\n", pgdat->node_id, realtotalpages);
 }
 
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-static void pgdat_init_split_queue(struct pglist_data *pgdat)
-{
-	struct deferred_split *ds_queue = &pgdat->deferred_split_queue;
-
-	spin_lock_init(&ds_queue->split_queue_lock);
-	INIT_LIST_HEAD(&ds_queue->split_queue);
-	ds_queue->split_queue_len = 0;
-}
-#else
-static void pgdat_init_split_queue(struct pglist_data *pgdat) {}
-#endif
-
 #ifdef CONFIG_COMPACTION
 static void pgdat_init_kcompactd(struct pglist_data *pgdat)
 {
@@ -1416,8 +1403,6 @@ static void __meminit pgdat_init_internals(struct pglist_data *pgdat)
 
 	pgdat_resize_init(pgdat);
 	pgdat_kswapd_lock_init(pgdat);
-
-	pgdat_init_split_queue(pgdat);
 	pgdat_init_kcompactd(pgdat);
 
 	init_waitqueue_head(&pgdat->kswapd_wait);
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [syzbot ci] Re: mm: switch THP shrinker to list_lru
  2026-03-12 20:51 [PATCH v2 0/7] mm: switch THP shrinker to list_lru Johannes Weiner
                   ` (6 preceding siblings ...)
  2026-03-12 20:51 ` [PATCH v2 7/7] mm: switch deferred split shrinker to list_lru Johannes Weiner
@ 2026-03-13 17:39 ` syzbot ci
  2026-03-13 23:08   ` Johannes Weiner
  7 siblings, 1 reply; 30+ messages in thread
From: syzbot ci @ 2026-03-13 17:39 UTC (permalink / raw)
  To: akpm, david, david, hannes, kas, liam.howlett, linux-kernel,
	linux-mm, roman.gushchin, shakeel.butt, usama.arif, yosry.ahmed,
	ziy
  Cc: syzbot, syzkaller-bugs

syzbot ci has tested the following series

[v2] mm: switch THP shrinker to list_lru
https://lore.kernel.org/all/20260312205321.638053-1-hannes@cmpxchg.org
* [PATCH v2 1/7] mm: list_lru: lock_list_lru_of_memcg() cannot return NULL if !skip_empty
* [PATCH v2 2/7] mm: list_lru: deduplicate unlock_list_lru()
* [PATCH v2 3/7] mm: list_lru: move list dead check to lock_list_lru_of_memcg()
* [PATCH v2 4/7] mm: list_lru: deduplicate lock_list_lru()
* [PATCH v2 5/7] mm: list_lru: introduce caller locking for additions and deletions
* [PATCH v2 6/7] mm: list_lru: introduce memcg_list_lru_alloc_folio()
* [PATCH v2 7/7] mm: switch deferred split shrinker to list_lru

and found the following issues:
* WARNING in lock_list_lru_of_memcg
* possible deadlock in __folio_end_writeback

Full report is available here:
https://ci.syzbot.org/series/e7f4d9e2-b111-4e6e-80f8-e762d8337560

***

WARNING in lock_list_lru_of_memcg

tree:      mm-new
URL:       https://kernel.googlesource.com/pub/scm/linux/kernel/git/akpm/mm.git
base:      f543926f9d0c3f6dfb354adfe7fbaeedd1277c6b
arch:      amd64
compiler:  Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
config:    https://ci.syzbot.org/builds/7315345d-816f-4df6-a17e-355964ef03ca/config
C repro:   https://ci.syzbot.org/findings/28d9d87d-fee2-4068-a072-c8a3713d5f60/c_repro
syz repro: https://ci.syzbot.org/findings/28d9d87d-fee2-4068-a072-c8a3713d5f60/syz_repro

XFS (loop0): Ending clean mount
XFS (loop0): Quotacheck needed: Please wait.
XFS (loop0): Quotacheck: Done.
------------[ cut here ]------------
!css_is_dying(&memcg->css)
WARNING: mm/list_lru.c:110 at lock_list_lru_of_memcg+0x33d/0x470 mm/list_lru.c:110, CPU#0: syz.0.17/5950
Modules linked in:
CPU: 0 UID: 0 PID: 5950 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:lock_list_lru_of_memcg+0x33d/0x470 mm/list_lru.c:110
Code: 3c 28 00 74 08 4c 89 e7 e8 b0 02 1d 00 4d 8b 24 24 48 8b 54 24 20 4d 85 e4 0f 85 00 fe ff ff e9 75 fe ff ff e8 d4 df b3 ff 90 <0f> 0b 90 eb c1 89 d9 80 e1 07 80 c1 03 38 c1 0f 8c 06 fe ff ff 48
RSP: 0018:ffffc90004017110 EFLAGS: 00010093
RAX: ffffffff8211b3ac RBX: 0000000000000000 RCX: ffff888104f057c0
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000000 R08: ffff888104f057c0 R09: 0000000000000002
R10: 0000000000000406 R11: 0000000000000000 R12: ffff8881026d0d00
R13: dffffc0000000000 R14: ffffffff9a2de05c R15: 0000000000000002
FS:  0000555572bfe500(0000) GS:ffff88818de66000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000200000001000 CR3: 0000000112554000 CR4: 00000000000006f0
Call Trace:
 <TASK>
 __folio_freeze_and_split_unmapped+0x2ab/0x34b0 mm/huge_memory.c:3767
 __folio_split+0xae1/0x1570 mm/huge_memory.c:4033
 try_folio_split_to_order include/linux/huge_mm.h:411 [inline]
 try_folio_split_or_unmap+0x5b/0x1e0 mm/truncate.c:189
 truncate_inode_partial_folio+0x4ab/0x8e0 mm/truncate.c:255
 truncate_inode_pages_range+0x5f1/0xe30 mm/truncate.c:416
 iomap_write_failed fs/iomap/buffered-io.c:780 [inline]
 iomap_write_iter fs/iomap/buffered-io.c:1182 [inline]
 iomap_file_buffered_write+0x788/0xb30 fs/iomap/buffered-io.c:1220
 xfs_file_buffered_write+0x212/0x8c0 fs/xfs/xfs_file.c:1013
 new_sync_write fs/read_write.c:595 [inline]
 vfs_write+0x61d/0xb90 fs/read_write.c:688
 ksys_pwrite64 fs/read_write.c:795 [inline]
 __do_sys_pwrite64 fs/read_write.c:803 [inline]
 __se_sys_pwrite64 fs/read_write.c:800 [inline]
 __x64_sys_pwrite64+0x199/0x230 fs/read_write.c:800
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x14d/0xf80 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f81d019c799
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffee035cc08 EFLAGS: 00000246 ORIG_RAX: 0000000000000012
RAX: ffffffffffffffda RBX: 00007f81d0415fa0 RCX: 00007f81d019c799
RDX: 000000000000fdef RSI: 0000200000000140 RDI: 0000000000000005
RBP: 00007f81d0232c99 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000e7c R11: 0000000000000246 R12: 0000000000000000
R13: 00007f81d0415fac R14: 00007f81d0415fa0 R15: 00007f81d0415fa0
 </TASK>


***

possible deadlock in __folio_end_writeback

tree:      mm-new
URL:       https://kernel.googlesource.com/pub/scm/linux/kernel/git/akpm/mm.git
base:      f543926f9d0c3f6dfb354adfe7fbaeedd1277c6b
arch:      amd64
compiler:  Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
config:    https://ci.syzbot.org/builds/7315345d-816f-4df6-a17e-355964ef03ca/config
C repro:   https://ci.syzbot.org/findings/8c08a79f-a08c-41d5-95e6-2860caf8744c/c_repro
syz repro: https://ci.syzbot.org/findings/8c08a79f-a08c-41d5-95e6-2860caf8744c/syz_repro

=====================================================
WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
syzkaller #0 Not tainted
-----------------------------------------------------
syz.0.17/5949 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
ffff88810c90c240 (&l->lock){+.+.}-{3:3}, at: spin_lock include/linux/spinlock.h:341 [inline]
ffff88810c90c240 (&l->lock){+.+.}-{3:3}, at: lock_list_lru mm/list_lru.c:26 [inline]
ffff88810c90c240 (&l->lock){+.+.}-{3:3}, at: lock_list_lru_of_memcg+0x268/0x470 mm/list_lru.c:95

and this task is already holding:
ffff8881107ad160 (&xa->xa_lock#9){..-.}-{3:3}, at: spin_lock include/linux/spinlock.h:341 [inline]
ffff8881107ad160 (&xa->xa_lock#9){..-.}-{3:3}, at: __folio_split+0xa2e/0x1570 mm/huge_memory.c:4025
which would create a new lock dependency:
 (&xa->xa_lock#9){..-.}-{3:3} -> (&l->lock){+.+.}-{3:3}

but this new dependency connects a SOFTIRQ-irq-safe lock:
 (&xa->xa_lock#9){..-.}-{3:3}

... which became SOFTIRQ-irq-safe at:
  lock_acquire+0xf0/0x2e0 kernel/locking/lockdep.c:5868
  __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:132 [inline]
  _raw_spin_lock_irqsave+0x40/0x60 kernel/locking/spinlock.c:162
  __folio_end_writeback+0x157/0x770 mm/page-writeback.c:2946
  folio_end_writeback_no_dropbehind+0x151/0x290 mm/filemap.c:1667
  folio_end_writeback+0xea/0x220 mm/filemap.c:1693
  end_bio_bh_io_sync+0xbd/0x120 fs/buffer.c:2773
  blk_update_request+0x57e/0xe60 block/blk-mq.c:1016
  scsi_end_request+0x7c/0x820 drivers/scsi/scsi_lib.c:647
  scsi_io_completion+0x131/0x360 drivers/scsi/scsi_lib.c:1088
  blk_complete_reqs block/blk-mq.c:1253 [inline]
  blk_done_softirq+0x10a/0x160 block/blk-mq.c:1258
  handle_softirqs+0x22a/0x870 kernel/softirq.c:622
  __do_softirq kernel/softirq.c:656 [inline]
  invoke_softirq kernel/softirq.c:496 [inline]
  __irq_exit_rcu+0x5f/0x150 kernel/softirq.c:723
  irq_exit_rcu+0x9/0x30 kernel/softirq.c:739
  instr_sysvec_call_function_single arch/x86/kernel/smp.c:266 [inline]
  sysvec_call_function_single+0xa3/0xc0 arch/x86/kernel/smp.c:266
  asm_sysvec_call_function_single+0x1a/0x20 arch/x86/include/asm/idtentry.h:704
  __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:179 [inline]
  _raw_spin_unlock_irqrestore+0x47/0x80 kernel/locking/spinlock.c:194
  spin_unlock_irqrestore include/linux/spinlock.h:407 [inline]
  ata_scsi_queuecmd+0x47b/0x590 drivers/ata/libata-scsi.c:4523
  scsi_dispatch_cmd drivers/scsi/scsi_lib.c:1647 [inline]
  scsi_queue_rq+0x1835/0x3330 drivers/scsi/scsi_lib.c:1904
  blk_mq_dispatch_rq_list+0xa70/0x1910 block/blk-mq.c:2148
  __blk_mq_do_dispatch_sched block/blk-mq-sched.c:168 [inline]
  blk_mq_do_dispatch_sched block/blk-mq-sched.c:182 [inline]
  __blk_mq_sched_dispatch_requests+0xdcc/0x1600 block/blk-mq-sched.c:307
  blk_mq_sched_dispatch_requests+0xd7/0x190 block/blk-mq-sched.c:329
  blk_mq_run_work_fn+0x22e/0x300 block/blk-mq.c:2562
  process_one_work kernel/workqueue.c:3275 [inline]
  process_scheduled_works+0xb02/0x1830 kernel/workqueue.c:3358
  worker_thread+0xa50/0xfc0 kernel/workqueue.c:3439
  kthread+0x388/0x470 kernel/kthread.c:436
  ret_from_fork+0x51e/0xb90 arch/x86/kernel/process.c:158
  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245

to a SOFTIRQ-irq-unsafe lock:
 (&l->lock){+.+.}-{3:3}

... which became SOFTIRQ-irq-unsafe at:
...
  lock_acquire+0xf0/0x2e0 kernel/locking/lockdep.c:5868
  __raw_spin_lock include/linux/spinlock_api_smp.h:158 [inline]
  _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154
  spin_lock include/linux/spinlock.h:341 [inline]
  lock_list_lru mm/list_lru.c:26 [inline]
  lock_list_lru_of_memcg+0x268/0x470 mm/list_lru.c:95
  list_lru_lock mm/list_lru.c:154 [inline]
  list_lru_add+0x46/0x260 mm/list_lru.c:208
  list_lru_add_obj+0x191/0x270 mm/list_lru.c:221
  d_lru_add+0xd6/0x160 fs/dcache.c:497
  retain_dentry fs/dcache.c:779 [inline]
  fast_dput+0x303/0x430 fs/dcache.c:866
  dput+0xe8/0x1a0 fs/dcache.c:924
  path_put fs/namei.c:717 [inline]
  put_link+0x112/0x190 fs/namei.c:1196
  walk_component fs/namei.c:2284 [inline]
  link_path_walk+0x1299/0x18d0 fs/namei.c:2644
  path_openat+0x2c3/0x3860 fs/namei.c:4826
  do_file_open+0x23e/0x4a0 fs/namei.c:4859
  do_open_execat+0x12b/0x580 fs/exec.c:781
  open_exec+0x29/0x40 fs/exec.c:817
  load_elf_binary+0x1aaf/0x2980 fs/binfmt_elf.c:908
  search_binary_handler fs/exec.c:1664 [inline]
  exec_binprm fs/exec.c:1696 [inline]
  bprm_execve+0x93d/0x1460 fs/exec.c:1748
  kernel_execve+0x844/0x930 fs/exec.c:1892
  try_to_run_init_process+0x13/0x60 init/main.c:1512
  kernel_init+0xad/0x1d0 init/main.c:1640
  ret_from_fork+0x51e/0xb90 arch/x86/kernel/process.c:158
  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245

other info that might help us debug this:

 Possible interrupt unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&l->lock);
                               local_irq_disable();
                               lock(&xa->xa_lock#9);
                               lock(&l->lock);
  <Interrupt>
    lock(&xa->xa_lock#9);

 *** DEADLOCK ***

5 locks held by syz.0.17/5949:
 #0: ffff88816931a980 (&mm->mmap_lock){++++}-{4:4}, at: mmap_read_lock include/linux/mmap_lock.h:592 [inline]
 #0: ffff88816931a980 (&mm->mmap_lock){++++}-{4:4}, at: madvise_lock+0x152/0x2e0 mm/madvise.c:1779
 #1: ffff8881107ad338 (&mapping->i_mmap_rwsem){++++}-{4:4}, at: i_mmap_lock_read include/linux/fs.h:532 [inline]
 #1: ffff8881107ad338 (&mapping->i_mmap_rwsem){++++}-{4:4}, at: __folio_split+0x11d7/0x1570 mm/huge_memory.c:3993
 #2: ffff8881107ad160 (&xa->xa_lock#9){..-.}-{3:3}, at: spin_lock include/linux/spinlock.h:341 [inline]
 #2: ffff8881107ad160 (&xa->xa_lock#9){..-.}-{3:3}, at: __folio_split+0xa2e/0x1570 mm/huge_memory.c:4025
 #3: ffffffff8e75e460 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire include/linux/rcupdate.h:312 [inline]
 #3: ffffffff8e75e460 (rcu_read_lock){....}-{1:3}, at: rcu_read_lock include/linux/rcupdate.h:850 [inline]
 #3: ffffffff8e75e460 (rcu_read_lock){....}-{1:3}, at: __folio_freeze_and_split_unmapped+0x1d3/0x34b0 mm/huge_memory.c:3766
 #4: ffffffff8e75e460 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire include/linux/rcupdate.h:312 [inline]
 #4: ffffffff8e75e460 (rcu_read_lock){....}-{1:3}, at: rcu_read_lock include/linux/rcupdate.h:850 [inline]
 #4: ffffffff8e75e460 (rcu_read_lock){....}-{1:3}, at: lock_list_lru_of_memcg+0x34/0x470 mm/list_lru.c:91

the dependencies between SOFTIRQ-irq-safe lock and the holding lock:
-> (&xa->xa_lock#9){..-.}-{3:3} {
   IN-SOFTIRQ-W at:
                    lock_acquire+0xf0/0x2e0 kernel/locking/lockdep.c:5868
                    __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:132 [inline]
                    _raw_spin_lock_irqsave+0x40/0x60 kernel/locking/spinlock.c:162
                    __folio_end_writeback+0x157/0x770 mm/page-writeback.c:2946
                    folio_end_writeback_no_dropbehind+0x151/0x290 mm/filemap.c:1667
                    folio_end_writeback+0xea/0x220 mm/filemap.c:1693
                    end_bio_bh_io_sync+0xbd/0x120 fs/buffer.c:2773
                    blk_update_request+0x57e/0xe60 block/blk-mq.c:1016
                    scsi_end_request+0x7c/0x820 drivers/scsi/scsi_lib.c:647
                    scsi_io_completion+0x131/0x360 drivers/scsi/scsi_lib.c:1088
                    blk_complete_reqs block/blk-mq.c:1253 [inline]
                    blk_done_softirq+0x10a/0x160 block/blk-mq.c:1258
                    handle_softirqs+0x22a/0x870 kernel/softirq.c:622
                    __do_softirq kernel/softirq.c:656 [inline]
                    invoke_softirq kernel/softirq.c:496 [inline]
                    __irq_exit_rcu+0x5f/0x150 kernel/softirq.c:723
                    irq_exit_rcu+0x9/0x30 kernel/softirq.c:739
                    instr_sysvec_call_function_single arch/x86/kernel/smp.c:266 [inline]
                    sysvec_call_function_single+0xa3/0xc0 arch/x86/kernel/smp.c:266
                    asm_sysvec_call_function_single+0x1a/0x20 arch/x86/include/asm/idtentry.h:704
                    __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:179 [inline]
                    _raw_spin_unlock_irqrestore+0x47/0x80 kernel/locking/spinlock.c:194
                    spin_unlock_irqrestore include/linux/spinlock.h:407 [inline]
                    ata_scsi_queuecmd+0x47b/0x590 drivers/ata/libata-scsi.c:4523
                    scsi_dispatch_cmd drivers/scsi/scsi_lib.c:1647 [inline]
                    scsi_queue_rq+0x1835/0x3330 drivers/scsi/scsi_lib.c:1904
                    blk_mq_dispatch_rq_list+0xa70/0x1910 block/blk-mq.c:2148
                    __blk_mq_do_dispatch_sched block/blk-mq-sched.c:168 [inline]
                    blk_mq_do_dispatch_sched block/blk-mq-sched.c:182 [inline]
                    __blk_mq_sched_dispatch_requests+0xdcc/0x1600 block/blk-mq-sched.c:307
                    blk_mq_sched_dispatch_requests+0xd7/0x190 block/blk-mq-sched.c:329
                    blk_mq_run_work_fn+0x22e/0x300 block/blk-mq.c:2562
                    process_one_work kernel/workqueue.c:3275 [inline]
                    process_scheduled_works+0xb02/0x1830 kernel/workqueue.c:3358
                    worker_thread+0xa50/0xfc0 kernel/workqueue.c:3439
                    kthread+0x388/0x470 kernel/kthread.c:436
                    ret_from_fork+0x51e/0xb90 arch/x86/kernel/process.c:158
                    ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
   INITIAL USE at:
                   lock_acquire+0xf0/0x2e0 kernel/locking/lockdep.c:5868
                   __raw_spin_lock_irq include/linux/spinlock_api_smp.h:142 [inline]
                   _raw_spin_lock_irq+0x3d/0x50 kernel/locking/spinlock.c:170
                   spin_lock_irq include/linux/spinlock.h:371 [inline]
                   shmem_add_to_page_cache+0x7b2/0xd40 mm/shmem.c:904
                   shmem_alloc_and_add_folio+0x869/0xf80 mm/shmem.c:1998
                   shmem_get_folio_gfp+0x4d4/0x1420 mm/shmem.c:2549
                   shmem_read_folio_gfp+0x8a/0xe0 mm/shmem.c:5957
                   drm_gem_get_pages+0x263/0x9d0 drivers/gpu/drm/drm_gem.c:696
                   drm_gem_shmem_get_pages_locked+0x22b/0x480 drivers/gpu/drm/drm_gem_shmem_helper.c:222
                   drm_gem_shmem_pin_locked+0x251/0x4d0 drivers/gpu/drm/drm_gem_shmem_helper.c:283
                   drm_gem_shmem_vmap_locked+0x499/0x7d0 drivers/gpu/drm/drm_gem_shmem_helper.c:387
                   drm_gem_vmap_locked drivers/gpu/drm/drm_gem.c:1387 [inline]
                   drm_gem_vmap+0x10a/0x1d0 drivers/gpu/drm/drm_gem.c:1429
                   drm_client_buffer_vmap+0x6c/0xb0 drivers/gpu/drm/drm_client.c:355
                   drm_fbdev_shmem_driver_fbdev_probe+0x273/0x8a0 drivers/gpu/drm/drm_fbdev_shmem.c:159
                   drm_fb_helper_single_fb_probe drivers/gpu/drm/drm_fb_helper.c:1468 [inline]
                   __drm_fb_helper_initial_config_and_unlock+0x1421/0x1b90 drivers/gpu/drm/drm_fb_helper.c:1647
                   drm_fbdev_client_hotplug+0x16c/0x230 drivers/gpu/drm/clients/drm_fbdev_client.c:66
                   drm_client_register+0x172/0x210 drivers/gpu/drm/drm_client.c:143
                   drm_fbdev_client_setup+0x1a0/0x3f0 drivers/gpu/drm/clients/drm_fbdev_client.c:168
                   drm_client_setup+0x107/0x220 drivers/gpu/drm/clients/drm_client_setup.c:46
                   vkms_create+0x413/0x4d0 drivers/gpu/drm/vkms/vkms_drv.c:212
                   vkms_init+0x57/0x80 drivers/gpu/drm/vkms/vkms_drv.c:240
                   do_one_initcall+0x250/0x8d0 init/main.c:1384
                   do_initcall_level+0x104/0x190 init/main.c:1446
                   do_initcalls+0x59/0xa0 init/main.c:1462
                   kernel_init_freeable+0x2a6/0x3e0 init/main.c:1694
                   kernel_init+0x1d/0x1d0 init/main.c:1584
                   ret_from_fork+0x51e/0xb90 arch/x86/kernel/process.c:158
                   ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
 }
 ... key      at: [<ffffffff9a2e8be0>] xa_init_flags.__key+0x0/0x20

the dependencies between the lock to be acquired
 and SOFTIRQ-irq-unsafe lock:
-> (&l->lock){+.+.}-{3:3} {
   HARDIRQ-ON-W at:
                    lock_acquire+0xf0/0x2e0 kernel/locking/lockdep.c:5868
                    __raw_spin_lock include/linux/spinlock_api_smp.h:158 [inline]
                    _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154
                    spin_lock include/linux/spinlock.h:341 [inline]
                    lock_list_lru mm/list_lru.c:26 [inline]
                    lock_list_lru_of_memcg+0x268/0x470 mm/list_lru.c:95
                    list_lru_lock mm/list_lru.c:154 [inline]
                    list_lru_add+0x46/0x260 mm/list_lru.c:208
                    list_lru_add_obj+0x191/0x270 mm/list_lru.c:221
                    d_lru_add+0xd6/0x160 fs/dcache.c:497
                    retain_dentry fs/dcache.c:779 [inline]
                    fast_dput+0x303/0x430 fs/dcache.c:866
                    dput+0xe8/0x1a0 fs/dcache.c:924
                    path_put fs/namei.c:717 [inline]
                    put_link+0x112/0x190 fs/namei.c:1196
                    walk_component fs/namei.c:2284 [inline]
                    link_path_walk+0x1299/0x18d0 fs/namei.c:2644
                    path_openat+0x2c3/0x3860 fs/namei.c:4826
                    do_file_open+0x23e/0x4a0 fs/namei.c:4859
                    do_open_execat+0x12b/0x580 fs/exec.c:781
                    open_exec+0x29/0x40 fs/exec.c:817
                    load_elf_binary+0x1aaf/0x2980 fs/binfmt_elf.c:908
                    search_binary_handler fs/exec.c:1664 [inline]
                    exec_binprm fs/exec.c:1696 [inline]
                    bprm_execve+0x93d/0x1460 fs/exec.c:1748
                    kernel_execve+0x844/0x930 fs/exec.c:1892
                    try_to_run_init_process+0x13/0x60 init/main.c:1512
                    kernel_init+0xad/0x1d0 init/main.c:1640
                    ret_from_fork+0x51e/0xb90 arch/x86/kernel/process.c:158
                    ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
   SOFTIRQ-ON-W at:
                    lock_acquire+0xf0/0x2e0 kernel/locking/lockdep.c:5868
                    __raw_spin_lock include/linux/spinlock_api_smp.h:158 [inline]
                    _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154
                    spin_lock include/linux/spinlock.h:341 [inline]
                    lock_list_lru mm/list_lru.c:26 [inline]
                    lock_list_lru_of_memcg+0x268/0x470 mm/list_lru.c:95
                    list_lru_lock mm/list_lru.c:154 [inline]
                    list_lru_add+0x46/0x260 mm/list_lru.c:208
                    list_lru_add_obj+0x191/0x270 mm/list_lru.c:221
                    d_lru_add+0xd6/0x160 fs/dcache.c:497
                    retain_dentry fs/dcache.c:779 [inline]
                    fast_dput+0x303/0x430 fs/dcache.c:866
                    dput+0xe8/0x1a0 fs/dcache.c:924
                    path_put fs/namei.c:717 [inline]
                    put_link+0x112/0x190 fs/namei.c:1196
                    walk_component fs/namei.c:2284 [inline]
                    link_path_walk+0x1299/0x18d0 fs/namei.c:2644
                    path_openat+0x2c3/0x3860 fs/namei.c:4826
                    do_file_open+0x23e/0x4a0 fs/namei.c:4859
                    do_open_execat+0x12b/0x580 fs/exec.c:781
                    open_exec+0x29/0x40 fs/exec.c:817
                    load_elf_binary+0x1aaf/0x2980 fs/binfmt_elf.c:908
                    search_binary_handler fs/exec.c:1664 [inline]
                    exec_binprm fs/exec.c:1696 [inline]
                    bprm_execve+0x93d/0x1460 fs/exec.c:1748
                    kernel_execve+0x844/0x930 fs/exec.c:1892
                    try_to_run_init_process+0x13/0x60 init/main.c:1512
                    kernel_init+0xad/0x1d0 init/main.c:1640
                    ret_from_fork+0x51e/0xb90 arch/x86/kernel/process.c:158
                    ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
   INITIAL USE at:
                   lock_acquire+0xf0/0x2e0 kernel/locking/lockdep.c:5868
                   __raw_spin_lock include/linux/spinlock_api_smp.h:158 [inline]
                   _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154
                   spin_lock include/linux/spinlock.h:341 [inline]
                   lock_list_lru mm/list_lru.c:26 [inline]
                   lock_list_lru_of_memcg+0x268/0x470 mm/list_lru.c:95
                   list_lru_lock mm/list_lru.c:154 [inline]
                   list_lru_add+0x46/0x260 mm/list_lru.c:208
                   list_lru_add_obj+0x191/0x270 mm/list_lru.c:221
                   d_lru_add+0xd6/0x160 fs/dcache.c:497
                   retain_dentry fs/dcache.c:779 [inline]
                   fast_dput+0x303/0x430 fs/dcache.c:866
                   dput+0xe8/0x1a0 fs/dcache.c:924
                   path_put fs/namei.c:717 [inline]
                   put_link+0x112/0x190 fs/namei.c:1196
                   walk_component fs/namei.c:2284 [inline]
                   link_path_walk+0x1299/0x18d0 fs/namei.c:2644
                   path_openat+0x2c3/0x3860 fs/namei.c:4826
                   do_file_open+0x23e/0x4a0 fs/namei.c:4859
                   do_open_execat+0x12b/0x580 fs/exec.c:781
                   open_exec+0x29/0x40 fs/exec.c:817
                   load_elf_binary+0x1aaf/0x2980 fs/binfmt_elf.c:908
                   search_binary_handler fs/exec.c:1664 [inline]
                   exec_binprm fs/exec.c:1696 [inline]
                   bprm_execve+0x93d/0x1460 fs/exec.c:1748
                   kernel_execve+0x844/0x930 fs/exec.c:1892
                   try_to_run_init_process+0x13/0x60 init/main.c:1512
                   kernel_init+0xad/0x1d0 init/main.c:1640
                   ret_from_fork+0x51e/0xb90 arch/x86/kernel/process.c:158
                   ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
 }
 ... key      at: [<ffffffff9a2c9540>] init_one_lru.__key+0x0/0x20
 ... acquired at:
   __raw_spin_lock include/linux/spinlock_api_smp.h:158 [inline]
   _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154
   spin_lock include/linux/spinlock.h:341 [inline]
   lock_list_lru mm/list_lru.c:26 [inline]
   lock_list_lru_of_memcg+0x268/0x470 mm/list_lru.c:95
   __folio_freeze_and_split_unmapped+0x2ab/0x34b0 mm/huge_memory.c:3767
   __folio_split+0xae1/0x1570 mm/huge_memory.c:4033
   shmem_writeout+0x570/0x1700 mm/shmem.c:1630
   writeout mm/vmscan.c:631 [inline]
   pageout mm/vmscan.c:680 [inline]
   shrink_folio_list+0x3380/0x5240 mm/vmscan.c:1401
   reclaim_folio_list+0x100/0x460 mm/vmscan.c:2172
   reclaim_pages+0x45b/0x530 mm/vmscan.c:2209
   madvise_cold_or_pageout_pte_range+0x1f7e/0x2220 mm/madvise.c:442
   walk_pmd_range mm/pagewalk.c:129 [inline]
   walk_pud_range mm/pagewalk.c:223 [inline]
   walk_p4d_range mm/pagewalk.c:261 [inline]
   walk_pgd_range+0x1032/0x1d30 mm/pagewalk.c:302
   __walk_page_range+0x14c/0x710 mm/pagewalk.c:410
   walk_page_range_vma_unsafe+0x309/0x410 mm/pagewalk.c:714
   madvise_pageout_page_range mm/madvise.c:620 [inline]
   madvise_pageout mm/madvise.c:645 [inline]
   madvise_vma_behavior+0x2883/0x44d0 mm/madvise.c:1356
   madvise_walk_vmas+0x573/0xae0 mm/madvise.c:1711
   madvise_do_behavior+0x386/0x540 mm/madvise.c:1927
   do_madvise+0x1fa/0x2e0 mm/madvise.c:2020
   __do_sys_madvise mm/madvise.c:2029 [inline]
   __se_sys_madvise mm/madvise.c:2027 [inline]
   __x64_sys_madvise+0xa6/0xc0 mm/madvise.c:2027
   do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
   do_syscall_64+0x14d/0xf80 arch/x86/entry/syscall_64.c:94
   entry_SYSCALL_64_after_hwframe+0x77/0x7f


stack backtrace:
CPU: 0 UID: 0 PID: 5949 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
 print_bad_irq_dependency kernel/locking/lockdep.c:2616 [inline]
 check_irq_usage kernel/locking/lockdep.c:2857 [inline]
 check_prev_add kernel/locking/lockdep.c:3169 [inline]
 check_prevs_add kernel/locking/lockdep.c:3284 [inline]
 validate_chain kernel/locking/lockdep.c:3908 [inline]
 __lock_acquire+0x2a94/0x2cf0 kernel/locking/lockdep.c:5237
 lock_acquire+0xf0/0x2e0 kernel/locking/lockdep.c:5868
 __raw_spin_lock include/linux/spinlock_api_smp.h:158 [inline]
 _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154
 spin_lock include/linux/spinlock.h:341 [inline]
 lock_list_lru mm/list_lru.c:26 [inline]
 lock_list_lru_of_memcg+0x268/0x470 mm/list_lru.c:95
 __folio_freeze_and_split_unmapped+0x2ab/0x34b0 mm/huge_memory.c:3767
 __folio_split+0xae1/0x1570 mm/huge_memory.c:4033
 shmem_writeout+0x570/0x1700 mm/shmem.c:1630
 writeout mm/vmscan.c:631 [inline]
 pageout mm/vmscan.c:680 [inline]
 shrink_folio_list+0x3380/0x5240 mm/vmscan.c:1401
 reclaim_folio_list+0x100/0x460 mm/vmscan.c:2172
 reclaim_pages+0x45b/0x530 mm/vmscan.c:2209
 madvise_cold_or_pageout_pte_range+0x1f7e/0x2220 mm/madvise.c:442
 walk_pmd_range mm/pagewalk.c:129 [inline]
 walk_pud_range mm/pagewalk.c:223 [inline]
 walk_p4d_range mm/pagewalk.c:261 [inline]
 walk_pgd_range+0x1032/0x1d30 mm/pagewalk.c:302
 __walk_page_range+0x14c/0x710 mm/pagewalk.c:410
 walk_page_range_vma_unsafe+0x309/0x410 mm/pagewalk.c:714
 madvise_pageout_page_range mm/madvise.c:620 [inline]
 madvise_pageout mm/madvise.c:645 [inline]
 madvise_vma_behavior+0x2883/0x44d0 mm/madvise.c:1356
 madvise_walk_vmas+0x573/0xae0 mm/madvise.c:1711
 madvise_do_behavior+0x386/0x540 mm/madvise.c:1927
 do_madvise+0x1fa/0x2e0 mm/madvise.c:2020
 __do_sys_madvise mm/madvise.c:2029 [inline]
 __se_sys_madvise mm/madvise.c:2027 [inline]
 __x64_sys_madvise+0xa6/0xc0 mm/madvise.c:2027
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x14d/0xf80 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f8b7939c799
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffefe994ff8 EFLAGS: 00000246 ORIG_RAX: 000000000000001c
RAX: ffffffffffffffda RBX: 00007f8b79615fa0 RCX: 00007f8b7939c799
RDX: 0000000000000015 RSI: 0000000000c00000 RDI: 0000200000000000
RBP: 00007f8b79432c99 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f8b79615fac R14: 00007f8b79615fa0 R15: 00007f8b79615fa0
 </TASK>


***

If these findings have caused you to resend the series or submit a
separate fix, please add the following tag to your commit message:
  Tested-by: syzbot@syzkaller.appspotmail.com

---
This report is generated by a bot. It may contain errors.
syzbot ci engineers can be reached at syzkaller@googlegroups.com.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [syzbot ci] Re: mm: switch THP shrinker to list_lru
  2026-03-13 17:39 ` [syzbot ci] Re: mm: switch THP " syzbot ci
@ 2026-03-13 23:08   ` Johannes Weiner
  0 siblings, 0 replies; 30+ messages in thread
From: Johannes Weiner @ 2026-03-13 23:08 UTC (permalink / raw)
  To: syzbot ci
  Cc: akpm, david, david, kas, liam.howlett, linux-kernel, linux-mm,
	roman.gushchin, shakeel.butt, usama.arif, yosry.ahmed, ziy,
	syzbot, syzkaller-bugs

On Fri, Mar 13, 2026 at 10:39:38AM -0700, syzbot ci wrote:
> ------------[ cut here ]------------
> !css_is_dying(&memcg->css)
> WARNING: mm/list_lru.c:110 at lock_list_lru_of_memcg+0x33d/0x470 mm/list_lru.c:110, CPU#0: syz.0.17/5950
> Modules linked in:
> CPU: 0 UID: 0 PID: 5950 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full) 
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> RIP: 0010:lock_list_lru_of_memcg+0x33d/0x470 mm/list_lru.c:110
> Code: 3c 28 00 74 08 4c 89 e7 e8 b0 02 1d 00 4d 8b 24 24 48 8b 54 24 20 4d 85 e4 0f 85 00 fe ff ff e9 75 fe ff ff e8 d4 df b3 ff 90 <0f> 0b 90 eb c1 89 d9 80 e1 07 80 c1 03 38 c1 0f 8c 06 fe ff ff 48
> RSP: 0018:ffffc90004017110 EFLAGS: 00010093
> RAX: ffffffff8211b3ac RBX: 0000000000000000 RCX: ffff888104f057c0
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> RBP: 0000000000000000 R08: ffff888104f057c0 R09: 0000000000000002
> R10: 0000000000000406 R11: 0000000000000000 R12: ffff8881026d0d00
> R13: dffffc0000000000 R14: ffffffff9a2de05c R15: 0000000000000002
> FS:  0000555572bfe500(0000) GS:ffff88818de66000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000200000001000 CR3: 0000000112554000 CR4: 00000000000006f0
> Call Trace:
>  <TASK>
>  __folio_freeze_and_split_unmapped+0x2ab/0x34b0 mm/huge_memory.c:3767
>  __folio_split+0xae1/0x1570 mm/huge_memory.c:4033
>  try_folio_split_to_order include/linux/huge_mm.h:411 [inline]
>  try_folio_split_or_unmap+0x5b/0x1e0 mm/truncate.c:189
>  truncate_inode_partial_folio+0x4ab/0x8e0 mm/truncate.c:255

File pages aren't on the deferred_split_lru. We're calling
list_lru_lock() on a nid+memcg combination that doesn't have list_lru
heads allocated. This should either fail gracefully or needs page type
filtering in __folio_freeze_and_split_unmapped(). Needs more thought.

> possible deadlock in __folio_end_writeback
> 
> =====================================================
> WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
> syzkaller #0 Not tainted
> -----------------------------------------------------
> syz.0.17/5949 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
> ffff88810c90c240 (&l->lock){+.+.}-{3:3}, at: spin_lock include/linux/spinlock.h:341 [inline]
> ffff88810c90c240 (&l->lock){+.+.}-{3:3}, at: lock_list_lru mm/list_lru.c:26 [inline]
> ffff88810c90c240 (&l->lock){+.+.}-{3:3}, at: lock_list_lru_of_memcg+0x268/0x470 mm/list_lru.c:95
> 
> and this task is already holding:
> ffff8881107ad160 (&xa->xa_lock#9){..-.}-{3:3}, at: spin_lock include/linux/spinlock.h:341 [inline]
> ffff8881107ad160 (&xa->xa_lock#9){..-.}-{3:3}, at: __folio_split+0xa2e/0x1570 mm/huge_memory.c:4025
> which would create a new lock dependency:
>  (&xa->xa_lock#9){..-.}-{3:3} -> (&l->lock){+.+.}-{3:3}
> 
> but this new dependency connects a SOFTIRQ-irq-safe lock:
>  (&xa->xa_lock#9){..-.}-{3:3}
> 
> ... which became SOFTIRQ-irq-safe at:
>   lock_acquire+0xf0/0x2e0 kernel/locking/lockdep.c:5868
>   __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:132 [inline]
>   _raw_spin_lock_irqsave+0x40/0x60 kernel/locking/spinlock.c:162
>   __folio_end_writeback+0x157/0x770 mm/page-writeback.c:2946
>
> to a SOFTIRQ-irq-unsafe lock:
>  (&l->lock){+.+.}-{3:3}
> 
> ... which became SOFTIRQ-irq-unsafe at:
> ...
>   lock_acquire+0xf0/0x2e0 kernel/locking/lockdep.c:5868
>   __raw_spin_lock include/linux/spinlock_api_smp.h:158 [inline]
>   _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154
>   spin_lock include/linux/spinlock.h:341 [inline]
>   lock_list_lru mm/list_lru.c:26 [inline]
>   lock_list_lru_of_memcg+0x268/0x470 mm/list_lru.c:95
>   list_lru_lock mm/list_lru.c:154 [inline]
>   list_lru_add+0x46/0x260 mm/list_lru.c:208
>   list_lru_add_obj+0x191/0x270 mm/list_lru.c:221
>   d_lru_add+0xd6/0x160 fs/dcache.c:497

Different locks, deferred_split_lru needs its own lockdep key.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 1/7] mm: list_lru: lock_list_lru_of_memcg() cannot return NULL if !skip_empty
  2026-03-12 20:51 ` [PATCH v2 1/7] mm: list_lru: lock_list_lru_of_memcg() cannot return NULL if !skip_empty Johannes Weiner
@ 2026-03-17  9:43   ` David Hildenbrand (Arm)
  2026-03-18 17:56   ` Shakeel Butt
  1 sibling, 0 replies; 30+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-17  9:43 UTC (permalink / raw)
  To: Johannes Weiner, Andrew Morton
  Cc: Shakeel Butt, Yosry Ahmed, Zi Yan, Liam R. Howlett, Usama Arif,
	Kiryl Shutsemau, Dave Chinner, Roman Gushchin, linux-mm,
	linux-kernel

On 3/12/26 21:51, Johannes Weiner wrote:
> skip_empty is only for the shrinker to abort and skip a list that's
> empty or whose cgroup is being deleted.
> 
> For list additions and deletions, the cgroup hierarchy is walked
> upwards until a valid list_lru head is found, or it will fall back to
> the node list. Acquiring the lock won't fail. Remove the NULL checks
> in those callers.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
>  mm/list_lru.c | 5 +----
>  1 file changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/mm/list_lru.c b/mm/list_lru.c
> index 26463ae29c64..d96fd50fc9af 100644
> --- a/mm/list_lru.c
> +++ b/mm/list_lru.c
> @@ -165,8 +165,6 @@ bool list_lru_add(struct list_lru *lru, struct list_head *item, int nid,
>  	struct list_lru_one *l;
>  
>  	l = lock_list_lru_of_memcg(lru, nid, memcg, false, false);
> -	if (!l)
> -		return false;
>  	if (list_empty(item)) {
>  		list_add_tail(item, &l->list);
>  		/* Set shrinker bit if the first element was added */
> @@ -203,9 +201,8 @@ bool list_lru_del(struct list_lru *lru, struct list_head *item, int nid,
>  {
>  	struct list_lru_node *nlru = &lru->node[nid];
>  	struct list_lru_one *l;
> +
>  	l = lock_list_lru_of_memcg(lru, nid, memcg, false, false);
> -	if (!l)
> -		return false;
>  	if (!list_empty(item)) {
>  		list_del_init(item);
>  		l->nr_items--;

Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 2/7] mm: list_lru: deduplicate unlock_list_lru()
  2026-03-12 20:51 ` [PATCH v2 2/7] mm: list_lru: deduplicate unlock_list_lru() Johannes Weiner
@ 2026-03-17  9:44   ` David Hildenbrand (Arm)
  2026-03-18 17:57   ` Shakeel Butt
  1 sibling, 0 replies; 30+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-17  9:44 UTC (permalink / raw)
  To: Johannes Weiner, Andrew Morton
  Cc: Shakeel Butt, Yosry Ahmed, Zi Yan, Liam R. Howlett, Usama Arif,
	Kiryl Shutsemau, Dave Chinner, Roman Gushchin, linux-mm,
	linux-kernel

On 3/12/26 21:51, Johannes Weiner wrote:
> The MEMCG and !MEMCG variants are the same. lock_list_lru() has the
> same pattern when bailing. Consolidate into a common implementation.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---

Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 3/7] mm: list_lru: move list dead check to lock_list_lru_of_memcg()
  2026-03-12 20:51 ` [PATCH v2 3/7] mm: list_lru: move list dead check to lock_list_lru_of_memcg() Johannes Weiner
@ 2026-03-17  9:47   ` David Hildenbrand (Arm)
  0 siblings, 0 replies; 30+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-17  9:47 UTC (permalink / raw)
  To: Johannes Weiner, Andrew Morton
  Cc: Shakeel Butt, Yosry Ahmed, Zi Yan, Liam R. Howlett, Usama Arif,
	Kiryl Shutsemau, Dave Chinner, Roman Gushchin, linux-mm,
	linux-kernel

On 3/12/26 21:51, Johannes Weiner wrote:
> Only the MEMCG variant of lock_list_lru() needs to check if there is a
> race with cgroup deletion and list reparenting. Move the check to the
> caller, so that the next patch can unify the lock_list_lru() variants.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
>  mm/list_lru.c | 17 ++++++++---------
>  1 file changed, 8 insertions(+), 9 deletions(-)
> 
> diff --git a/mm/list_lru.c b/mm/list_lru.c
> index e873bc26a7ef..1a39ff490643 100644
> --- a/mm/list_lru.c
> +++ b/mm/list_lru.c
> @@ -68,17 +68,12 @@ list_lru_from_memcg_idx(struct list_lru *lru, int nid, int idx)
>  	return &lru->node[nid].lru;
>  }
>  
> -static inline bool lock_list_lru(struct list_lru_one *l, bool irq)
> +static inline void lock_list_lru(struct list_lru_one *l, bool irq)
>  {
>  	if (irq)
>  		spin_lock_irq(&l->lock);
>  	else
>  		spin_lock(&l->lock);
> -	if (unlikely(READ_ONCE(l->nr_items) == LONG_MIN)) {
> -		unlock_list_lru(l, irq);
> -		return false;
> -	}
> -	return true;
>  }
>  
>  static inline struct list_lru_one *
> @@ -90,9 +85,13 @@ lock_list_lru_of_memcg(struct list_lru *lru, int nid, struct mem_cgroup *memcg,
>  	rcu_read_lock();
>  again:
>  	l = list_lru_from_memcg_idx(lru, nid, memcg_kmem_id(memcg));
> -	if (likely(l) && lock_list_lru(l, irq)) {
> -		rcu_read_unlock();
> -		return l;
> +	if (likely(l)) {
> +		lock_list_lru(l, irq);

Was about to suggest adding a comment regarding reparenting, but the
comment further below already spells that out now.

Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 4/7] mm: list_lru: deduplicate lock_list_lru()
  2026-03-12 20:51 ` [PATCH v2 4/7] mm: list_lru: deduplicate lock_list_lru() Johannes Weiner
@ 2026-03-17  9:51   ` David Hildenbrand (Arm)
  0 siblings, 0 replies; 30+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-17  9:51 UTC (permalink / raw)
  To: Johannes Weiner, Andrew Morton
  Cc: Shakeel Butt, Yosry Ahmed, Zi Yan, Liam R. Howlett, Usama Arif,
	Kiryl Shutsemau, Dave Chinner, Roman Gushchin, linux-mm,
	linux-kernel

On 3/12/26 21:51, Johannes Weiner wrote:
> The MEMCG and !MEMCG paths have the same pattern. Share the code.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---

Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 5/7] mm: list_lru: introduce caller locking for additions and deletions
  2026-03-12 20:51 ` [PATCH v2 5/7] mm: list_lru: introduce caller locking for additions and deletions Johannes Weiner
@ 2026-03-17 10:00   ` David Hildenbrand (Arm)
  2026-03-17 14:03     ` Johannes Weiner
  0 siblings, 1 reply; 30+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-17 10:00 UTC (permalink / raw)
  To: Johannes Weiner, Andrew Morton
  Cc: Shakeel Butt, Yosry Ahmed, Zi Yan, Liam R. Howlett, Usama Arif,
	Kiryl Shutsemau, Dave Chinner, Roman Gushchin, linux-mm,
	linux-kernel

On 3/12/26 21:51, Johannes Weiner wrote:
> Locking is currently internal to the list_lru API. However, a caller
> might want to keep auxiliary state synchronized with the LRU state.
> 
> For example, the THP shrinker uses the lock of its custom LRU to keep
> PG_partially_mapped and vmstats consistent.
> 
> To allow the THP shrinker to switch to list_lru, provide normal and
> irqsafe locking primitives as well as caller-locked variants of the
> addition and deletion functions.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
>  include/linux/list_lru.h |  34 +++++++++++++
>  mm/list_lru.c            | 104 +++++++++++++++++++++++++++------------
>  2 files changed, 107 insertions(+), 31 deletions(-)
> 
> diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h
> index fe739d35a864..4afc02deb44d 100644
> --- a/include/linux/list_lru.h
> +++ b/include/linux/list_lru.h
> @@ -83,6 +83,40 @@ int memcg_list_lru_alloc(struct mem_cgroup *memcg, struct list_lru *lru,
>  			 gfp_t gfp);
>  void memcg_reparent_list_lrus(struct mem_cgroup *memcg, struct mem_cgroup *parent);
>  

[...]

>  static inline struct list_lru_one *
>  lock_list_lru_of_memcg(struct list_lru *lru, int nid, struct mem_cgroup *memcg,
> -		       bool irq, bool skip_empty)
> +		       bool irq, unsigned long *irq_flags, bool skip_empty)
>  {
>  	struct list_lru_one *l = &lru->node[nid].lru;
>  
> -	lock_list_lru(l, irq);
> +	lock_list_lru(l, irq, irq_flags);
>  
>  	return l;
>  }
>  #endif /* CONFIG_MEMCG */
>  
> -/* The caller must ensure the memcg lifetime. */
> -bool list_lru_add(struct list_lru *lru, struct list_head *item, int nid,
> -		  struct mem_cgroup *memcg)
> +struct list_lru_one *list_lru_lock(struct list_lru *lru, int nid,
> +				   struct mem_cgroup *memcg)
>  {
> -	struct list_lru_node *nlru = &lru->node[nid];
> -	struct list_lru_one *l;
> +	return lock_list_lru_of_memcg(lru, nid, memcg, false, NULL, false);

The two "bool" parameters really are ugly. Fortunately this is only an
internal function.

The callers are still a bit hard to read; we could add /*skip=empty=*/true).

like

return lock_list_lru_of_memcg(lru, nid, memcg, /* irq= */false, NULL,
			      /* skip_empty= */false);

Like we do in other code. But I guess we should do it consistently then
(or better add some proper flags).

Anyhow, something that could be cleaned up later.

> +}
> +
> +void list_lru_unlock(struct list_lru_one *l)
> +{
> +	unlock_list_lru(l, false, NULL);
> +}
> +
> +struct list_lru_one *list_lru_lock_irqsave(struct list_lru *lru, int nid,
> +					   struct mem_cgroup *memcg,
> +					   unsigned long *flags)
> +{
> +	return lock_list_lru_of_memcg(lru, nid, memcg, true, flags, false);

And here it gets really confusing. true false false ... am I reading
binary code?

I guess the second "false" should actually be "NULL" :)

> +}
> +
> +void list_lru_unlock_irqrestore(struct list_lru_one *l, unsigned long *flags)
> +{
> +	unlock_list_lru(l, true, flags);
> +}
>  
> -	l = lock_list_lru_of_memcg(lru, nid, memcg, false, false);
> +bool __list_lru_add(struct list_lru *lru, struct list_lru_one *l,
> +		    struct list_head *item, int nid,
> +		    struct mem_cgroup *memcg)
> +{
>  	if (list_empty(item)) {
>  		list_add_tail(item, &l->list);
>  		/* Set shrinker bit if the first element was added */
>  		if (!l->nr_items++)
>  			set_shrinker_bit(memcg, nid, lru_shrinker_id(lru));
> -		unlock_list_lru(l, false);
> -		atomic_long_inc(&nlru->nr_items);
> +		atomic_long_inc(&lru->node[nid].nr_items);
> +		return true;
> +	}
> +	return false;
> +}
> +
> +bool __list_lru_del(struct list_lru *lru, struct list_lru_one *l,
> +		    struct list_head *item, int nid)
> +{
> +	if (!list_empty(item)) {
> +		list_del_init(item);
> +		l->nr_items--;
> +		atomic_long_dec(&lru->node[nid].nr_items);
>  		return true;
>  	}
> -	unlock_list_lru(l, false);
>  	return false;
>  }
>  
> +/* The caller must ensure the memcg lifetime. */
> +bool list_lru_add(struct list_lru *lru, struct list_head *item, int nid,
> +		  struct mem_cgroup *memcg)
> +{
> +	struct list_lru_one *l;
> +	bool ret;
> +
> +	l = list_lru_lock(lru, nid, memcg);
> +	ret = __list_lru_add(lru, l, item, nid, memcg);
> +	list_lru_unlock(l);
> +	return ret;
> +}

Nice.


Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 6/7] mm: list_lru: introduce memcg_list_lru_alloc_folio()
  2026-03-12 20:51 ` [PATCH v2 6/7] mm: list_lru: introduce memcg_list_lru_alloc_folio() Johannes Weiner
@ 2026-03-17 10:09   ` David Hildenbrand (Arm)
  0 siblings, 0 replies; 30+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-17 10:09 UTC (permalink / raw)
  To: Johannes Weiner, Andrew Morton
  Cc: Shakeel Butt, Yosry Ahmed, Zi Yan, Liam R. Howlett, Usama Arif,
	Kiryl Shutsemau, Dave Chinner, Roman Gushchin, linux-mm,
	linux-kernel

On 3/12/26 21:51, Johannes Weiner wrote:
> memcg_list_lru_alloc() is called every time an object that may end up
> on the list_lru is created. It needs to quickly check if the list_lru
> heads for the memcg already exist, and allocate them when they don't.
> 
> Doing this with folio objects is tricky: folio_memcg() is not stable
> and requires either RCU protection or pinning the cgroup. But it's
> desirable to make the existence check lightweight under RCU, and only
> pin the memcg when we need to allocate list_lru heads and may block.
> 
> In preparation for switching the THP shrinker to list_lru, add a
> helper function for allocating list_lru heads coming from a folio.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
>  include/linux/list_lru.h | 12 ++++++++++++
>  mm/list_lru.c            | 39 ++++++++++++++++++++++++++++++++++-----
>  2 files changed, 46 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h
> index 4afc02deb44d..df6bd3c64b06 100644
> --- a/include/linux/list_lru.h
> +++ b/include/linux/list_lru.h
> @@ -81,6 +81,18 @@ static inline int list_lru_init_memcg_key(struct list_lru *lru, struct shrinker
>  
>  int memcg_list_lru_alloc(struct mem_cgroup *memcg, struct list_lru *lru,
>  			 gfp_t gfp);
> +
> +#ifdef CONFIG_MEMCG
> +int memcg_list_lru_alloc_folio(struct folio *folio, struct list_lru *lru,
> +			       gfp_t gfp);
> +#else
> +static inline int memcg_list_lru_alloc_folio(struct folio *folio,
> +					     struct list_lru *lru, gfp_t gfp)
> +{
> +	return 0;
> +}
> +#endif
> +
>  void memcg_reparent_list_lrus(struct mem_cgroup *memcg, struct mem_cgroup *parent);
>  
>  /**
> diff --git a/mm/list_lru.c b/mm/list_lru.c
> index 779cb26cec84..562b2b1f8c41 100644
> --- a/mm/list_lru.c
> +++ b/mm/list_lru.c
> @@ -534,17 +534,14 @@ static inline bool memcg_list_lru_allocated(struct mem_cgroup *memcg,
>  	return idx < 0 || xa_load(&lru->xa, idx);
>  }
>  
> -int memcg_list_lru_alloc(struct mem_cgroup *memcg, struct list_lru *lru,
> -			 gfp_t gfp)
> +static int __memcg_list_lru_alloc(struct mem_cgroup *memcg,
> +				  struct list_lru *lru, gfp_t gfp)
>  {
>  	unsigned long flags;
>  	struct list_lru_memcg *mlru = NULL;
>  	struct mem_cgroup *pos, *parent;
>  	XA_STATE(xas, &lru->xa, 0);
>  
> -	if (!list_lru_memcg_aware(lru) || memcg_list_lru_allocated(memcg, lru))
> -		return 0;
> -
>  	gfp &= GFP_RECLAIM_MASK;
>  	/*
>  	 * Because the list_lru can be reparented to the parent cgroup's
> @@ -585,6 +582,38 @@ int memcg_list_lru_alloc(struct mem_cgroup *memcg, struct list_lru *lru,
>  
>  	return xas_error(&xas);
>  }
> +
> +int memcg_list_lru_alloc(struct mem_cgroup *memcg, struct list_lru *lru,
> +			 gfp_t gfp)
> +{
> +	if (!list_lru_memcg_aware(lru) || memcg_list_lru_allocated(memcg, lru))
> +		return 0;
> +	return __memcg_list_lru_alloc(memcg, lru, gfp);
> +}
> +
> +int memcg_list_lru_alloc_folio(struct folio *folio, struct list_lru *lru,
> +			       gfp_t gfp)

The function reads as if we would be allocating a folio ...

folio_memcg_list_lru_alloc() ?

Or memcg_list_lru_alloc_for_folio() ?


LGTM, with my limited understanding of memcg lifetimes :)

Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 5/7] mm: list_lru: introduce caller locking for additions and deletions
  2026-03-17 10:00   ` David Hildenbrand (Arm)
@ 2026-03-17 14:03     ` Johannes Weiner
  2026-03-17 14:34       ` Johannes Weiner
  0 siblings, 1 reply; 30+ messages in thread
From: Johannes Weiner @ 2026-03-17 14:03 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Andrew Morton, Shakeel Butt, Yosry Ahmed, Zi Yan, Liam R. Howlett,
	Usama Arif, Kiryl Shutsemau, Dave Chinner, Roman Gushchin,
	linux-mm, linux-kernel

On Tue, Mar 17, 2026 at 11:00:59AM +0100, David Hildenbrand (Arm) wrote:
> On 3/12/26 21:51, Johannes Weiner wrote:
> > -/* The caller must ensure the memcg lifetime. */
> > -bool list_lru_add(struct list_lru *lru, struct list_head *item, int nid,
> > -		  struct mem_cgroup *memcg)
> > +struct list_lru_one *list_lru_lock(struct list_lru *lru, int nid,
> > +				   struct mem_cgroup *memcg)
> >  {
> > -	struct list_lru_node *nlru = &lru->node[nid];
> > -	struct list_lru_one *l;
> > +	return lock_list_lru_of_memcg(lru, nid, memcg, false, NULL, false);
> 
> The two "bool" parameters really are ugly. Fortunately this is only an
> internal function.

Yeah, I absolutely hate this too. I only didn't look further because
it's internal, but...

> The callers are still a bit hard to read; we could add /*skip=empty=*/true).
> 
> like
> 
> return lock_list_lru_of_memcg(lru, nid, memcg, /* irq= */false, NULL,
> 			      /* skip_empty= */false);
> 
> Like we do in other code. But I guess we should do it consistently then
> (or better add some proper flags).
> 
> Anyhow, something that could be cleaned up later.

This is a great idea.

I have to send a v3 for the fix in __folio_freeze_and_split_unmapped()
and the lockdep key, so I'll make this change along with it.

> > +void list_lru_unlock(struct list_lru_one *l)
> > +{
> > +	unlock_list_lru(l, false, NULL);
> > +}
> > +
> > +struct list_lru_one *list_lru_lock_irqsave(struct list_lru *lru, int nid,
> > +					   struct mem_cgroup *memcg,
> > +					   unsigned long *flags)
> > +{
> > +	return lock_list_lru_of_memcg(lru, nid, memcg, true, flags, false);
> 
> And here it gets really confusing. true false false ... am I reading
> binary code?
> 
> I guess the second "false" should actually be "NULL" :)

Good catch, I'll fix that.

> > +/* The caller must ensure the memcg lifetime. */
> > +bool list_lru_add(struct list_lru *lru, struct list_head *item, int nid,
> > +		  struct mem_cgroup *memcg)
> > +{
> > +	struct list_lru_one *l;
> > +	bool ret;
> > +
> > +	l = list_lru_lock(lru, nid, memcg);
> > +	ret = __list_lru_add(lru, l, item, nid, memcg);
> > +	list_lru_unlock(l);
> > +	return ret;
> > +}
> 
> Nice.
> 
> Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>

Thanks for your review!


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 5/7] mm: list_lru: introduce caller locking for additions and deletions
  2026-03-17 14:03     ` Johannes Weiner
@ 2026-03-17 14:34       ` Johannes Weiner
  2026-03-17 16:35         ` David Hildenbrand (Arm)
  0 siblings, 1 reply; 30+ messages in thread
From: Johannes Weiner @ 2026-03-17 14:34 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Andrew Morton, Shakeel Butt, Yosry Ahmed, Zi Yan, Liam R. Howlett,
	Usama Arif, Kiryl Shutsemau, Dave Chinner, Roman Gushchin,
	linux-mm, linux-kernel

On Tue, Mar 17, 2026 at 10:03:08AM -0400, Johannes Weiner wrote:
> On Tue, Mar 17, 2026 at 11:00:59AM +0100, David Hildenbrand (Arm) wrote:
> > On 3/12/26 21:51, Johannes Weiner wrote:
> > > +void list_lru_unlock(struct list_lru_one *l)
> > > +{
> > > +	unlock_list_lru(l, false, NULL);
> > > +}
> > > +
> > > +struct list_lru_one *list_lru_lock_irqsave(struct list_lru *lru, int nid,
> > > +					   struct mem_cgroup *memcg,
> > > +					   unsigned long *flags)
> > > +{
> > > +	return lock_list_lru_of_memcg(lru, nid, memcg, true, flags, false);
> > 
> > And here it gets really confusing. true false false ... am I reading
> > binary code?
> > 
> > I guess the second "false" should actually be "NULL" :)
> 
> Good catch, I'll fix that.

That's actually "flags" haha. But it supports your point.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 5/7] mm: list_lru: introduce caller locking for additions and deletions
  2026-03-17 14:34       ` Johannes Weiner
@ 2026-03-17 16:35         ` David Hildenbrand (Arm)
  0 siblings, 0 replies; 30+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-17 16:35 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Shakeel Butt, Yosry Ahmed, Zi Yan, Liam R. Howlett,
	Usama Arif, Kiryl Shutsemau, Dave Chinner, Roman Gushchin,
	linux-mm, linux-kernel

On 3/17/26 15:34, Johannes Weiner wrote:
> On Tue, Mar 17, 2026 at 10:03:08AM -0400, Johannes Weiner wrote:
>> On Tue, Mar 17, 2026 at 11:00:59AM +0100, David Hildenbrand (Arm) wrote:
>>>
>>> And here it gets really confusing. true false false ... am I reading
>>> binary code?
>>>
>>> I guess the second "false" should actually be "NULL" :)
>>
>> Good catch, I'll fix that.
> 
> That's actually "flags" haha. But it supports your point.

Ohhh, haha. Well, at least nothing to fix then :)

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 1/7] mm: list_lru: lock_list_lru_of_memcg() cannot return NULL if !skip_empty
  2026-03-12 20:51 ` [PATCH v2 1/7] mm: list_lru: lock_list_lru_of_memcg() cannot return NULL if !skip_empty Johannes Weiner
  2026-03-17  9:43   ` David Hildenbrand (Arm)
@ 2026-03-18 17:56   ` Shakeel Butt
  2026-03-18 19:25     ` Johannes Weiner
  1 sibling, 1 reply; 30+ messages in thread
From: Shakeel Butt @ 2026-03-18 17:56 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, David Hildenbrand, Yosry Ahmed, Zi Yan,
	Liam R. Howlett, Usama Arif, Kiryl Shutsemau, Dave Chinner,
	Roman Gushchin, linux-mm, linux-kernel

On Thu, Mar 12, 2026 at 04:51:49PM -0400, Johannes Weiner wrote:
> skip_empty is only for the shrinker to abort and skip a list that's
> empty or whose cgroup is being deleted.
> 
> For list additions and deletions, the cgroup hierarchy is walked
> upwards until a valid list_lru head is found, or it will fall back to
> the node list. Acquiring the lock won't fail. Remove the NULL checks
> in those callers.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---

What do you think about squashing the following into this patch?

From bd56ea4505f792e00079b1a8dd98cb6f7a5e7215 Mon Sep 17 00:00:00 2001
From: Shakeel Butt <shakeel.butt@linux.dev>
Date: Wed, 18 Mar 2026 10:43:53 -0700
Subject: [PATCH] list_lru: cleanup

Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
---
 mm/list_lru.c | 53 ++++++++++++++++++++++++++++-----------------------
 1 file changed, 29 insertions(+), 24 deletions(-)

diff --git a/mm/list_lru.c b/mm/list_lru.c
index 26463ae29c64..062394c598d4 100644
--- a/mm/list_lru.c
+++ b/mm/list_lru.c
@@ -77,27 +77,30 @@ static inline bool lock_list_lru(struct list_lru_one *l, bool irq)
 }
 
 static inline struct list_lru_one *
-lock_list_lru_of_memcg(struct list_lru *lru, int nid, struct mem_cgroup *memcg,
-		       bool irq, bool skip_empty)
+__lock_list_lru_of_memcg(struct list_lru *lru, int nid, struct mem_cgroup *memcg,
+		       bool irq)
 {
 	struct list_lru_one *l;
 
 	rcu_read_lock();
-again:
 	l = list_lru_from_memcg_idx(lru, nid, memcg_kmem_id(memcg));
-	if (likely(l) && lock_list_lru(l, irq)) {
-		rcu_read_unlock();
+	if (likely(l) && !lock_list_lru(l, irq))
+		l = NULL;
+	rcu_read_unlock();
+
+	return l;
+}
+
+static inline struct list_lru_one *
+lock_list_lru_of_memcg(struct list_lru *lru, int nid, struct mem_cgroup *memcg)
+{
+	struct list_lru_one *l;
+again:
+	l = __lock_list_lru_of_memcg(lru, nid, memcg, false);
+	if (likely(l))
 		return l;
-	}
-	/*
-	 * Caller may simply bail out if raced with reparenting or
-	 * may iterate through the list_lru and expect empty slots.
-	 */
-	if (skip_empty) {
-		rcu_read_unlock();
-		return NULL;
-	}
-	VM_WARN_ON(!css_is_dying(&memcg->css));
+
+	VM_WARN_ON_ONCE(!css_is_dying(&memcg->css));
 	memcg = parent_mem_cgroup(memcg);
 	goto again;
 }
@@ -135,8 +138,8 @@ list_lru_from_memcg_idx(struct list_lru *lru, int nid, int idx)
 }
 
 static inline struct list_lru_one *
-lock_list_lru_of_memcg(struct list_lru *lru, int nid, struct mem_cgroup *memcg,
-		       bool irq, bool skip_empty)
+__lock_list_lru_of_memcg(struct list_lru *lru, int nid, struct mem_cgroup *memcg,
+		       bool irq)
 {
 	struct list_lru_one *l = &lru->node[nid].lru;
 
@@ -148,6 +151,12 @@ lock_list_lru_of_memcg(struct list_lru *lru, int nid, struct mem_cgroup *memcg,
 	return l;
 }
 
+static inline struct list_lru_one *
+lock_list_lru_of_memcg(struct list_lru *lru, int nid, struct mem_cgroup *memcg)
+{
+	return __lock_list_lru_of_memcg(lru, nid, memcg, false);
+}
+
 static inline void unlock_list_lru(struct list_lru_one *l, bool irq_off)
 {
 	if (irq_off)
@@ -164,9 +173,7 @@ bool list_lru_add(struct list_lru *lru, struct list_head *item, int nid,
 	struct list_lru_node *nlru = &lru->node[nid];
 	struct list_lru_one *l;
 
-	l = lock_list_lru_of_memcg(lru, nid, memcg, false, false);
-	if (!l)
-		return false;
+	l = lock_list_lru_of_memcg(lru, nid, memcg);
 	if (list_empty(item)) {
 		list_add_tail(item, &l->list);
 		/* Set shrinker bit if the first element was added */
@@ -203,9 +210,7 @@ bool list_lru_del(struct list_lru *lru, struct list_head *item, int nid,
 {
 	struct list_lru_node *nlru = &lru->node[nid];
 	struct list_lru_one *l;
-	l = lock_list_lru_of_memcg(lru, nid, memcg, false, false);
-	if (!l)
-		return false;
+	l = lock_list_lru_of_memcg(lru, nid, memcg);
 	if (!list_empty(item)) {
 		list_del_init(item);
 		l->nr_items--;
@@ -287,7 +292,7 @@ __list_lru_walk_one(struct list_lru *lru, int nid, struct mem_cgroup *memcg,
 	unsigned long isolated = 0;
 
 restart:
-	l = lock_list_lru_of_memcg(lru, nid, memcg, irq_off, true);
+	l = __lock_list_lru_of_memcg(lru, nid, memcg, irq_off);
 	if (!l)
 		return isolated;
 	list_for_each_safe(item, n, &l->list) {
-- 
2.52.0



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 2/7] mm: list_lru: deduplicate unlock_list_lru()
  2026-03-12 20:51 ` [PATCH v2 2/7] mm: list_lru: deduplicate unlock_list_lru() Johannes Weiner
  2026-03-17  9:44   ` David Hildenbrand (Arm)
@ 2026-03-18 17:57   ` Shakeel Butt
  1 sibling, 0 replies; 30+ messages in thread
From: Shakeel Butt @ 2026-03-18 17:57 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, David Hildenbrand, Yosry Ahmed, Zi Yan,
	Liam R. Howlett, Usama Arif, Kiryl Shutsemau, Dave Chinner,
	Roman Gushchin, linux-mm, linux-kernel

On Thu, Mar 12, 2026 at 04:51:50PM -0400, Johannes Weiner wrote:
> The MEMCG and !MEMCG variants are the same. lock_list_lru() has the
> same pattern when bailing. Consolidate into a common implementation.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Acked-by: Shakeel Butt <shakeel.butt@linux.dev>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 1/7] mm: list_lru: lock_list_lru_of_memcg() cannot return NULL if !skip_empty
  2026-03-18 17:56   ` Shakeel Butt
@ 2026-03-18 19:25     ` Johannes Weiner
  2026-03-18 19:34       ` Shakeel Butt
  0 siblings, 1 reply; 30+ messages in thread
From: Johannes Weiner @ 2026-03-18 19:25 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Andrew Morton, David Hildenbrand, Yosry Ahmed, Zi Yan,
	Liam R. Howlett, Usama Arif, Kiryl Shutsemau, Dave Chinner,
	Roman Gushchin, linux-mm, linux-kernel

On Wed, Mar 18, 2026 at 10:56:55AM -0700, Shakeel Butt wrote:
> On Thu, Mar 12, 2026 at 04:51:49PM -0400, Johannes Weiner wrote:
> > skip_empty is only for the shrinker to abort and skip a list that's
> > empty or whose cgroup is being deleted.
> > 
> > For list additions and deletions, the cgroup hierarchy is walked
> > upwards until a valid list_lru head is found, or it will fall back to
> > the node list. Acquiring the lock won't fail. Remove the NULL checks
> > in those callers.
> > 
> > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> > ---
> 
> What do you think about squashing the following into this patch?
> 
> From bd56ea4505f792e00079b1a8dd98cb6f7a5e7215 Mon Sep 17 00:00:00 2001
> From: Shakeel Butt <shakeel.butt@linux.dev>
> Date: Wed, 18 Mar 2026 10:43:53 -0700
> Subject: [PATCH] list_lru: cleanup
> 
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>

Thanks for taking a look!

There is some overlap and conflict between your delta and what later
patches in the series do.

AFAICS, the main thing left over would be: to have
__lock_list_lru_of_memcg() for the reclaimer (which does not walk the
parents during a cgroup deletion race) and lock_list_lru_of_memcg()
which does. Thereby eliminating the @skip_empty bool. The downside
would be to have another level in the lock function stack which is
duplicated for CONFIG_MEMCG and !CONFIG_MEMCG, and the !CONFIG_MEMCG
versions are identical.

I'm not sure that's worth it?

---
 mm/list_lru.c | 50 +++++++++++++++++++++++++++++++-------------------
 1 file changed, 31 insertions(+), 19 deletions(-)

diff --git a/mm/list_lru.c b/mm/list_lru.c
index 1ccdd45b1d14..cab716d94ac5 100644
--- a/mm/list_lru.c
+++ b/mm/list_lru.c
@@ -83,13 +83,12 @@ list_lru_from_memcg_idx(struct list_lru *lru, int nid, int idx)
 }
 
 static inline struct list_lru_one *
-lock_list_lru_of_memcg(struct list_lru *lru, int nid, struct mem_cgroup *memcg,
-		       bool irq, unsigned long *irq_flags, bool skip_empty)
+__lock_list_lru_of_memcg(struct list_lru *lru, int nid, struct mem_cgroup *memcg,
+			 bool irq, unsigned long *irq_flags)
 {
 	struct list_lru_one *l;
 
 	rcu_read_lock();
-again:
 	l = list_lru_from_memcg_idx(lru, nid, memcg_kmem_id(memcg));
 	if (likely(l)) {
 		lock_list_lru(l, irq, irq_flags);
@@ -99,18 +98,24 @@ lock_list_lru_of_memcg(struct list_lru *lru, int nid, struct mem_cgroup *memcg,
 		}
 		unlock_list_lru(l, irq, irq_flags);
 	}
-	/*
-	 * Caller may simply bail out if raced with reparenting or
-	 * may iterate through the list_lru and expect empty slots.
-	 */
-	if (skip_empty) {
-		rcu_read_unlock();
-		return NULL;
+	return NULL;
+}
+
+static inline struct list_lru_one *
+lock_list_lru_of_memcg(struct list_lru *lru, int nid, struct mem_cgroup *memcg,
+		       bool irq, unsigned long *irq_flags)
+{
+	struct list_lru_one *l;
+
+	for (;;) {
+		l = __lock_list_lru_of_memcg(lru, nid, memcg, irq, irq_flags);
+		if (likely(l))
+			return l;
+		VM_WARN_ON(!css_is_dying(&memcg->css));
+		memcg = parent_mem_cgroup(memcg);
 	}
-	VM_WARN_ON(!css_is_dying(&memcg->css));
-	memcg = parent_mem_cgroup(memcg);
-	goto again;
 }
+
 #else
 static void list_lru_register(struct list_lru *lru)
 {
@@ -137,8 +142,8 @@ list_lru_from_memcg_idx(struct list_lru *lru, int nid, int idx)
 }
 
 static inline struct list_lru_one *
-lock_list_lru_of_memcg(struct list_lru *lru, int nid, struct mem_cgroup *memcg,
-		       bool irq, unsigned long *irq_flags, bool skip_empty)
+__lock_list_lru_of_memcg(struct list_lru *lru, int nid, struct mem_cgroup *memcg,
+			 bool irq, unsigned long *irq_flags)
 {
 	struct list_lru_one *l = &lru->node[nid].lru;
 
@@ -146,13 +151,20 @@ lock_list_lru_of_memcg(struct list_lru *lru, int nid, struct mem_cgroup *memcg,
 
 	return l;
 }
+
+static inline struct list_lru_one *
+lock_list_lru_of_memcg(struct list_lru *lru, int nid, struct mem_cgroup *memcg,
+		       bool irq, unsigned long *irq_flags)
+{
+	return __lock_list_lru_of_memcg(lru, nid, memcg, irq, irq_flags);
+}
 #endif /* CONFIG_MEMCG */
 
 struct list_lru_one *list_lru_lock(struct list_lru *lru, int nid,
 				   struct mem_cgroup *memcg)
 {
 	return lock_list_lru_of_memcg(lru, nid, memcg, /*irq=*/false,
-				      /*irq_flags=*/NULL, /*skip_empty=*/false);
+				      /*irq_flags=*/NULL);
 }
 
 void list_lru_unlock(struct list_lru_one *l)
@@ -165,7 +177,7 @@ struct list_lru_one *list_lru_lock_irqsave(struct list_lru *lru, int nid,
 					   unsigned long *flags)
 {
 	return lock_list_lru_of_memcg(lru, nid, memcg, /*irq=*/true,
-				      /*irq_flags=*/flags, /*skip_empty=*/false);
+				      /*irq_flags=*/flags);
 }
 
 void list_lru_unlock_irqrestore(struct list_lru_one *l, unsigned long *flags)
@@ -313,8 +325,8 @@ __list_lru_walk_one(struct list_lru *lru, int nid, struct mem_cgroup *memcg,
 	unsigned long isolated = 0;
 
 restart:
-	l = lock_list_lru_of_memcg(lru, nid, memcg, /*irq=*/irq_off,
-				   /*irq_flags=*/NULL, /*skip_empty=*/true);
+	l = __lock_list_lru_of_memcg(lru, nid, memcg, /*irq=*/irq_off,
+				     /*irq_flags=*/NULL);
 	if (!l)
 		return isolated;
 	list_for_each_safe(item, n, &l->list) {
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 1/7] mm: list_lru: lock_list_lru_of_memcg() cannot return NULL if !skip_empty
  2026-03-18 19:25     ` Johannes Weiner
@ 2026-03-18 19:34       ` Shakeel Butt
  0 siblings, 0 replies; 30+ messages in thread
From: Shakeel Butt @ 2026-03-18 19:34 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, David Hildenbrand, Yosry Ahmed, Zi Yan,
	Liam R. Howlett, Usama Arif, Kiryl Shutsemau, Dave Chinner,
	Roman Gushchin, linux-mm, linux-kernel

On Wed, Mar 18, 2026 at 03:25:29PM -0400, Johannes Weiner wrote:
> On Wed, Mar 18, 2026 at 10:56:55AM -0700, Shakeel Butt wrote:
> > On Thu, Mar 12, 2026 at 04:51:49PM -0400, Johannes Weiner wrote:
> > > skip_empty is only for the shrinker to abort and skip a list that's
> > > empty or whose cgroup is being deleted.
> > > 
> > > For list additions and deletions, the cgroup hierarchy is walked
> > > upwards until a valid list_lru head is found, or it will fall back to
> > > the node list. Acquiring the lock won't fail. Remove the NULL checks
> > > in those callers.
> > > 
> > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> > > ---
> > 
> > What do you think about squashing the following into this patch?
> > 
> > From bd56ea4505f792e00079b1a8dd98cb6f7a5e7215 Mon Sep 17 00:00:00 2001
> > From: Shakeel Butt <shakeel.butt@linux.dev>
> > Date: Wed, 18 Mar 2026 10:43:53 -0700
> > Subject: [PATCH] list_lru: cleanup
> > 
> > Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> 
> Thanks for taking a look!
> 
> There is some overlap and conflict between your delta and what later
> patches in the series do.
> 
> AFAICS, the main thing left over would be: to have
> __lock_list_lru_of_memcg() for the reclaimer (which does not walk the
> parents during a cgroup deletion race) and lock_list_lru_of_memcg()
> which does. Thereby eliminating the @skip_empty bool.

Yeah, I saw your discussion with David and thought on how can we further reduce
the params.

> The downside
> would be to have another level in the lock function stack which is
> duplicated for CONFIG_MEMCG and !CONFIG_MEMCG, and the !CONFIG_MEMCG
> versions are identical.
> 
> I'm not sure that's worth it?

I am fine with whatever route you take. I know you have next version ready to
send, I will review the remaining patches for the next version (though I have
taken a look on the current series but will add my tags for the next one :P).


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 7/7] mm: switch deferred split shrinker to list_lru
  2026-03-12 20:51 ` [PATCH v2 7/7] mm: switch deferred split shrinker to list_lru Johannes Weiner
@ 2026-03-18 20:25   ` David Hildenbrand (Arm)
  2026-03-18 22:48     ` Johannes Weiner
  0 siblings, 1 reply; 30+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-18 20:25 UTC (permalink / raw)
  To: Johannes Weiner, Andrew Morton
  Cc: Shakeel Butt, Yosry Ahmed, Zi Yan, Liam R. Howlett, Usama Arif,
	Kiryl Shutsemau, Dave Chinner, Roman Gushchin, linux-mm,
	linux-kernel

On 3/12/26 21:51, Johannes Weiner wrote:
> The deferred split queue handles cgroups in a suboptimal fashion. The
> queue is per-NUMA node or per-cgroup, not the intersection. That means
> on a cgrouped system, a node-restricted allocation entering reclaim
> can end up splitting large pages on other nodes:
> 
> 	alloc/unmap
> 	  deferred_split_folio()
> 	    list_add_tail(memcg->split_queue)
> 	    set_shrinker_bit(memcg, node, deferred_shrinker_id)
> 
> 	for_each_zone_zonelist_nodemask(restricted_nodes)
> 	  mem_cgroup_iter()
> 	    shrink_slab(node, memcg)
> 	      shrink_slab_memcg(node, memcg)
> 	        if test_shrinker_bit(memcg, node, deferred_shrinker_id)
> 	          deferred_split_scan()
> 	            walks memcg->split_queue
> 
> The shrinker bit adds an imperfect guard rail. As soon as the cgroup
> has a single large page on the node of interest, all large pages owned
> by that memcg, including those on other nodes, will be split.
> 
> list_lru properly sets up per-node, per-cgroup lists. As a bonus, it
> streamlines a lot of the list operations and reclaim walks. It's used
> widely by other major shrinkers already. Convert the deferred split
> queue as well.
> 
> The list_lru per-memcg heads are instantiated on demand when the first
> object of interest is allocated for a cgroup, by calling
> memcg_list_lru_alloc_folio(). Add calls to where splittable pages are
> created: anon faults, swapin faults, khugepaged collapse.
> 
> These calls create all possible node heads for the cgroup at once, so
> the migration code (between nodes) doesn't need any special care.


[...]

> -
>  static inline bool is_transparent_hugepage(const struct folio *folio)
>  {
>  	if (!folio_test_large(folio))
> @@ -1293,6 +1189,14 @@ static struct folio *vma_alloc_anon_folio_pmd(struct vm_area_struct *vma,
>  		count_mthp_stat(order, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE);
>  		return NULL;
>  	}
> +
> +	if (memcg_list_lru_alloc_folio(folio, &deferred_split_lru, gfp)) {
> +		folio_put(folio);
> +		count_vm_event(THP_FAULT_FALLBACK);
> +		count_mthp_stat(order, MTHP_STAT_ANON_FAULT_FALLBACK);
> +		return NULL;
> +	}

So, in all anon alloc paths, we essentialy have

1) vma_alloc_folio / __folio_alloc (khugepaged being odd)
2) mem_cgroup_charge / mem_cgroup_swapin_charge_folio
3) memcg_list_lru_alloc_folio

I wonder if we could do better in most cases and have something like a

	vma_alloc_anon_folio()

That wraps the vma_alloc_folio() + memcg_list_lru_alloc_folio(), but
still leaves the charging to the caller?

The would at least combine 1) and 3) in a single API. (except for the
odd cases without a VMA).

I guess we would want to skip the memcg_list_lru_alloc_folio() for
order-0 folios, correct?

> +
>  	folio_throttle_swaprate(folio, gfp);
>  
>         /*
> @@ -3802,33 +3706,28 @@ static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int n
>  	struct folio *new_folio, *next;
>  	int old_order = folio_order(folio);
>  	int ret = 0;
> -	struct deferred_split *ds_queue;
> +	struct list_lru_one *l;
>  
>  	VM_WARN_ON_ONCE(!mapping && end);
>  	/* Prevent deferred_split_scan() touching ->_refcount */
> -	ds_queue = folio_split_queue_lock(folio);
> +	rcu_read_lock();

The RCU lock is for the folio_memcg(), right?

I recall I raised in the past that some get/put-like logic (that wraps
the rcu_read_lock() + folio_memcg()) might make this a lot easier to get.


memcg = folio_memcg_lookup(folio)

... do stuff

folio_memcg_putback(folio, memcg);

Or sth like that.


Alternativey, you could have some helpers that do the
list_lru_lock+unlock etc.

folio_memcg_list_lru_lock()
...
folio_memcg_list_ru_unlock(l);

Just some thoughts as inspiration :)

> +	l = list_lru_lock(&deferred_split_lru, folio_nid(folio), folio_memcg(folio));
>  	if (folio_ref_freeze(folio, folio_cache_ref_count(folio) + 1)) {
>  		struct swap_cluster_info *ci = NULL;
>  		struct lruvec *lruvec;
>  
>  		if (old_order > 1) {
> -			if (!list_empty(&folio->_deferred_list)) {
> -				ds_queue->split_queue_len--;
> -				/*
> -				 * Reinitialize page_deferred_list after removing the
> -				 * page from the split_queue, otherwise a subsequent
> -				 * split will see list corruption when checking the
> -				 * page_deferred_list.
> -				 */
> -				list_del_init(&folio->_deferred_list);
> -			}
> +			__list_lru_del(&deferred_split_lru, l,
> +				       &folio->_deferred_list, folio_nid(folio));
>  			if (folio_test_partially_mapped(folio)) {
>  				folio_clear_partially_mapped(folio);
>  				mod_mthp_stat(old_order,
>  					MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1);
>  			}
>  		}
> -		split_queue_unlock(ds_queue);
> +		list_lru_unlock(l);
> +		rcu_read_unlock();
> +
>  		if (mapping) {

[...]

Most changes here look mostly mechanically, quite nice. I'll probably
have to go over some bits once again with a fresh mind :)

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 7/7] mm: switch deferred split shrinker to list_lru
  2026-03-18 20:25   ` David Hildenbrand (Arm)
@ 2026-03-18 22:48     ` Johannes Weiner
  2026-03-19  7:21       ` David Hildenbrand (Arm)
  0 siblings, 1 reply; 30+ messages in thread
From: Johannes Weiner @ 2026-03-18 22:48 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Andrew Morton, Shakeel Butt, Yosry Ahmed, Zi Yan, Liam R. Howlett,
	Usama Arif, Kiryl Shutsemau, Dave Chinner, Roman Gushchin,
	linux-mm, linux-kernel

On Wed, Mar 18, 2026 at 09:25:17PM +0100, David Hildenbrand (Arm) wrote:
> On 3/12/26 21:51, Johannes Weiner wrote:
> > The deferred split queue handles cgroups in a suboptimal fashion. The
> > queue is per-NUMA node or per-cgroup, not the intersection. That means
> > on a cgrouped system, a node-restricted allocation entering reclaim
> > can end up splitting large pages on other nodes:
> > 
> > 	alloc/unmap
> > 	  deferred_split_folio()
> > 	    list_add_tail(memcg->split_queue)
> > 	    set_shrinker_bit(memcg, node, deferred_shrinker_id)
> > 
> > 	for_each_zone_zonelist_nodemask(restricted_nodes)
> > 	  mem_cgroup_iter()
> > 	    shrink_slab(node, memcg)
> > 	      shrink_slab_memcg(node, memcg)
> > 	        if test_shrinker_bit(memcg, node, deferred_shrinker_id)
> > 	          deferred_split_scan()
> > 	            walks memcg->split_queue
> > 
> > The shrinker bit adds an imperfect guard rail. As soon as the cgroup
> > has a single large page on the node of interest, all large pages owned
> > by that memcg, including those on other nodes, will be split.
> > 
> > list_lru properly sets up per-node, per-cgroup lists. As a bonus, it
> > streamlines a lot of the list operations and reclaim walks. It's used
> > widely by other major shrinkers already. Convert the deferred split
> > queue as well.
> > 
> > The list_lru per-memcg heads are instantiated on demand when the first
> > object of interest is allocated for a cgroup, by calling
> > memcg_list_lru_alloc_folio(). Add calls to where splittable pages are
> > created: anon faults, swapin faults, khugepaged collapse.
> > 
> > These calls create all possible node heads for the cgroup at once, so
> > the migration code (between nodes) doesn't need any special care.
> 
> 
> [...]
> 
> > -
> >  static inline bool is_transparent_hugepage(const struct folio *folio)
> >  {
> >  	if (!folio_test_large(folio))
> > @@ -1293,6 +1189,14 @@ static struct folio *vma_alloc_anon_folio_pmd(struct vm_area_struct *vma,
> >  		count_mthp_stat(order, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE);
> >  		return NULL;
> >  	}
> > +
> > +	if (memcg_list_lru_alloc_folio(folio, &deferred_split_lru, gfp)) {
> > +		folio_put(folio);
> > +		count_vm_event(THP_FAULT_FALLBACK);
> > +		count_mthp_stat(order, MTHP_STAT_ANON_FAULT_FALLBACK);
> > +		return NULL;
> > +	}
> 
> So, in all anon alloc paths, we essentialy have
> 
> 1) vma_alloc_folio / __folio_alloc (khugepaged being odd)
> 2) mem_cgroup_charge / mem_cgroup_swapin_charge_folio
> 3) memcg_list_lru_alloc_folio
> 
> I wonder if we could do better in most cases and have something like a
> 
> 	vma_alloc_anon_folio()
> 
> That wraps the vma_alloc_folio() + memcg_list_lru_alloc_folio(), but
> still leaves the charging to the caller?

Hm, but it's the charging that figures out the memcg and sets
folio_memcg() :(

> The would at least combine 1) and 3) in a single API. (except for the
> odd cases without a VMA).
> 
> I guess we would want to skip the memcg_list_lru_alloc_folio() for
> order-0 folios, correct?

Yeah, we don't use the queue for < order-1. In deferred_split_folio():

	/*
	 * Order 1 folios have no space for a deferred list, but we also
	 * won't waste much memory by not adding them to the deferred list.
	 */
	if (folio_order(folio) <= 1)
		return;

> > @@ -3802,33 +3706,28 @@ static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int n
> >  	struct folio *new_folio, *next;
> >  	int old_order = folio_order(folio);
> >  	int ret = 0;
> > -	struct deferred_split *ds_queue;
> > +	struct list_lru_one *l;
> >  
> >  	VM_WARN_ON_ONCE(!mapping && end);
> >  	/* Prevent deferred_split_scan() touching ->_refcount */
> > -	ds_queue = folio_split_queue_lock(folio);
> > +	rcu_read_lock();
> 
> The RCU lock is for the folio_memcg(), right?
> 
> I recall I raised in the past that some get/put-like logic (that wraps
> the rcu_read_lock() + folio_memcg()) might make this a lot easier to get.
> 
> 
> memcg = folio_memcg_lookup(folio)
> 
> ... do stuff
> 
> folio_memcg_putback(folio, memcg);
> 
> Or sth like that.
> 
> 
> Alternativey, you could have some helpers that do the
> list_lru_lock+unlock etc.
> 
> folio_memcg_list_lru_lock()
> ...
> folio_memcg_list_ru_unlock(l);
> 
> Just some thoughts as inspiration :)

I remember you raising this in the objcg + reparenting patches. There
are a few more instances of

	rcu_read_lock()
	foo = folio_memcg()
	...
	rcu_read_unlock()

in other parts of the code not touched by these patches here, so the
first pattern is a more universal encapsulation.

Let me look into this. Would you be okay with a follow-up that covers
the others as well?


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 7/7] mm: switch deferred split shrinker to list_lru
  2026-03-18 22:48     ` Johannes Weiner
@ 2026-03-19  7:21       ` David Hildenbrand (Arm)
  2026-03-20 16:02         ` Johannes Weiner
  2026-03-20 16:07         ` Johannes Weiner
  0 siblings, 2 replies; 30+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-19  7:21 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Shakeel Butt, Yosry Ahmed, Zi Yan, Liam R. Howlett,
	Usama Arif, Kiryl Shutsemau, Dave Chinner, Roman Gushchin,
	linux-mm, linux-kernel


>>
>> So, in all anon alloc paths, we essentialy have
>>
>> 1) vma_alloc_folio / __folio_alloc (khugepaged being odd)
>> 2) mem_cgroup_charge / mem_cgroup_swapin_charge_folio
>> 3) memcg_list_lru_alloc_folio
>>
>> I wonder if we could do better in most cases and have something like a
>>
>> 	vma_alloc_anon_folio()
>>
>> That wraps the vma_alloc_folio() + memcg_list_lru_alloc_folio(), but
>> still leaves the charging to the caller?
> 
> Hm, but it's the charging that figures out the memcg and sets
> folio_memcg() :(

Oh ... right. I guess we would then have to do all 3 things at the same
time, which makes the helper a bit more involved.

I'll note that collapse_file() also calls alloc_charge_folio(), but not
for allocating an anonymous folio that would have to be placed on the
deferred split queue.

> 
>> The would at least combine 1) and 3) in a single API. (except for the
>> odd cases without a VMA).
>>
>> I guess we would want to skip the memcg_list_lru_alloc_folio() for
>> order-0 folios, correct?
> 
> Yeah, we don't use the queue for < order-1. In deferred_split_folio():
> 
> 	/*
> 	 * Order 1 folios have no space for a deferred list, but we also
> 	 * won't waste much memory by not adding them to the deferred list.
> 	 */
> 	if (folio_order(folio) <= 1)
> 		return;
> 
>>> @@ -3802,33 +3706,28 @@ static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int n
>>>  	struct folio *new_folio, *next;
>>>  	int old_order = folio_order(folio);
>>>  	int ret = 0;
>>> -	struct deferred_split *ds_queue;
>>> +	struct list_lru_one *l;
>>>  
>>>  	VM_WARN_ON_ONCE(!mapping && end);
>>>  	/* Prevent deferred_split_scan() touching ->_refcount */
>>> -	ds_queue = folio_split_queue_lock(folio);
>>> +	rcu_read_lock();
>>
>> The RCU lock is for the folio_memcg(), right?
>>
>> I recall I raised in the past that some get/put-like logic (that wraps
>> the rcu_read_lock() + folio_memcg()) might make this a lot easier to get.
>>
>>
>> memcg = folio_memcg_lookup(folio)
>>
>> ... do stuff
>>
>> folio_memcg_putback(folio, memcg);
>>
>> Or sth like that.
>>
>>
>> Alternativey, you could have some helpers that do the
>> list_lru_lock+unlock etc.
>>
>> folio_memcg_list_lru_lock()
>> ...
>> folio_memcg_list_ru_unlock(l);
>>
>> Just some thoughts as inspiration :)
> 
> I remember you raising this in the objcg + reparenting patches. There
> are a few more instances of
> 
> 	rcu_read_lock()
> 	foo = folio_memcg()
> 	...
> 	rcu_read_unlock()
> 
> in other parts of the code not touched by these patches here, so the
> first pattern is a more universal encapsulation.
> 
> Let me look into this. Would you be okay with a follow-up that covers
> the others as well?

Of course :) If list_lru lock helpers would be the right thing to do, it
might be better placed in this series.

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 7/7] mm: switch deferred split shrinker to list_lru
  2026-03-19  7:21       ` David Hildenbrand (Arm)
@ 2026-03-20 16:02         ` Johannes Weiner
  2026-03-23 19:39           ` David Hildenbrand (Arm)
  2026-03-20 16:07         ` Johannes Weiner
  1 sibling, 1 reply; 30+ messages in thread
From: Johannes Weiner @ 2026-03-20 16:02 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Andrew Morton, Shakeel Butt, Yosry Ahmed, Zi Yan, Liam R. Howlett,
	Usama Arif, Kiryl Shutsemau, Dave Chinner, Roman Gushchin,
	linux-mm, linux-kernel

On Thu, Mar 19, 2026 at 08:21:21AM +0100, David Hildenbrand (Arm) wrote:
> >>> @@ -3802,33 +3706,28 @@ static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int n
> >>>  	struct folio *new_folio, *next;
> >>>  	int old_order = folio_order(folio);
> >>>  	int ret = 0;
> >>> -	struct deferred_split *ds_queue;
> >>> +	struct list_lru_one *l;
> >>>  
> >>>  	VM_WARN_ON_ONCE(!mapping && end);
> >>>  	/* Prevent deferred_split_scan() touching ->_refcount */
> >>> -	ds_queue = folio_split_queue_lock(folio);
> >>> +	rcu_read_lock();
> >>
> >> The RCU lock is for the folio_memcg(), right?
> >>
> >> I recall I raised in the past that some get/put-like logic (that wraps
> >> the rcu_read_lock() + folio_memcg()) might make this a lot easier to get.
> >>
> >>
> >> memcg = folio_memcg_lookup(folio)
> >>
> >> ... do stuff
> >>
> >> folio_memcg_putback(folio, memcg);
> >>
> >> Or sth like that.
> >>
> >>
> >> Alternativey, you could have some helpers that do the
> >> list_lru_lock+unlock etc.
> >>
> >> folio_memcg_list_lru_lock()
> >> ...
> >> folio_memcg_list_ru_unlock(l);
> >>
> >> Just some thoughts as inspiration :)
> > 
> > I remember you raising this in the objcg + reparenting patches. There
> > are a few more instances of
> > 
> > 	rcu_read_lock()
> > 	foo = folio_memcg()
> > 	...
> > 	rcu_read_unlock()
> > 
> > in other parts of the code not touched by these patches here, so the
> > first pattern is a more universal encapsulation.
> > 
> > Let me look into this. Would you be okay with a follow-up that covers
> > the others as well?
> 
> Of course :) If list_lru lock helpers would be the right thing to do, it
> might be better placed in this series.

I'm playing around with the below. But there are a few things that
seem suboptimal:

- We need a local @memcg, which makes sites that just pass
  folio_memcg() somewhere else fatter. More site LOC on average.
- Despite being more verbose, it communicates less. rcu_read_lock()
  is universally understood, folio_memcg_foo() is cryptic.
- It doesn't cover similar accessors with the same lifetime rules,
  like folio_lruvec(), folio_memcg_check()

 include/linux/memcontrol.h | 35 ++++++++++++++++++++++++++---------
 mm/huge_memory.c           | 34 ++++++++++++++++++----------------
 mm/list_lru.c              |  5 ++---
 mm/memcontrol.c            | 17 +++++++----------
 mm/migrate.c               |  5 ++---
 mm/page_io.c               | 12 ++++++------
 mm/vmscan.c                |  7 ++++---
 mm/workingset.c            |  5 ++---
 mm/zswap.c                 | 11 ++++++-----
 9 files changed, 73 insertions(+), 58 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 0782c72a1997..5162145b9322 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -430,6 +430,17 @@ static inline struct mem_cgroup *folio_memcg(struct folio *folio)
 	return objcg ? obj_cgroup_memcg(objcg) : NULL;
 }
 
+static inline struct mem_cgroup *folio_memcg_begin(struct folio *folio)
+{
+	rcu_read_lock();
+	return folio_memcg(folio);
+}
+
+static inline void folio_memcg_end(void)
+{
+	rcu_read_unlock();
+}
+
 /*
  * folio_memcg_charged - If a folio is charged to a memory cgroup.
  * @folio: Pointer to the folio.
@@ -917,11 +928,10 @@ static inline void mod_memcg_page_state(struct page *page,
 	if (mem_cgroup_disabled())
 		return;
 
-	rcu_read_lock();
-	memcg = folio_memcg(page_folio(page));
+	memcg = folio_memcg_begin(page_folio(page));
 	if (memcg)
 		mod_memcg_state(memcg, idx, val);
-	rcu_read_unlock();
+	folio_memcg_end();
 }
 
 unsigned long memcg_events(struct mem_cgroup *memcg, int event);
@@ -949,10 +959,9 @@ static inline void count_memcg_folio_events(struct folio *folio,
 	if (!folio_memcg_charged(folio))
 		return;
 
-	rcu_read_lock();
-	memcg = folio_memcg(folio);
+	memcg = folio_memcg_begin(folio);
 	count_memcg_events(memcg, idx, nr);
-	rcu_read_unlock();
+	folio_memcg_end();
 }
 
 static inline void count_memcg_events_mm(struct mm_struct *mm,
@@ -1035,6 +1044,15 @@ static inline struct mem_cgroup *folio_memcg(struct folio *folio)
 	return NULL;
 }
 
+static inline struct mem_cgroup *folio_memcg_begin(struct folio *folio)
+{
+	return NULL;
+}
+
+static inline void folio_memcg_end(void)
+{
+}
+
 static inline bool folio_memcg_charged(struct folio *folio)
 {
 	return false;
@@ -1546,11 +1564,10 @@ static inline void mem_cgroup_track_foreign_dirty(struct folio *folio,
 	if (!folio_memcg_charged(folio))
 		return;
 
-	rcu_read_lock();
-	memcg = folio_memcg(folio);
+	memcg = folio_memcg_begin(folio);
 	if (unlikely(&memcg->css != wb->memcg_css))
 		mem_cgroup_track_foreign_dirty_slowpath(folio, wb);
-	rcu_read_unlock();
+	folio_memcg_end();
 }
 
 void mem_cgroup_flush_foreign(struct bdi_writeback *wb);
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index e90d08db219d..1aa20c1dd0c1 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3769,9 +3769,10 @@ static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int n
 	/* Prevent deferred_split_scan() touching ->_refcount */
 	dequeue_deferred = folio_test_anon(folio) && old_order > 1;
 	if (dequeue_deferred) {
-		rcu_read_lock();
-		l = list_lru_lock(&deferred_split_lru,
-				  folio_nid(folio), folio_memcg(folio));
+		struct mem_cgroup *memcg;
+
+		memcg = folio_memcg_begin(folio);
+		l = list_lru_lock(&deferred_split_lru, folio_nid(folio), memcg);
 	}
 	if (folio_ref_freeze(folio, folio_cache_ref_count(folio) + 1)) {
 		struct swap_cluster_info *ci = NULL;
@@ -3786,7 +3787,7 @@ static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int n
 					MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1);
 			}
 			list_lru_unlock(l);
-			rcu_read_unlock();
+			folio_memcg_end();
 		}
 
 		if (mapping) {
@@ -3891,7 +3892,7 @@ static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int n
 	} else {
 		if (dequeue_deferred) {
 			list_lru_unlock(l);
-			rcu_read_unlock();
+			folio_memcg_end();
 		}
 		return -EAGAIN;
 	}
@@ -4272,12 +4273,13 @@ bool __folio_unqueue_deferred_split(struct folio *folio)
 	int nid = folio_nid(folio);
 	unsigned long flags;
 	bool unqueued = false;
+	struct mem_cgroup *memcg;
 
 	WARN_ON_ONCE(folio_ref_count(folio));
 	WARN_ON_ONCE(!mem_cgroup_disabled() && !folio_memcg_charged(folio));
 
-	rcu_read_lock();
-	l = list_lru_lock_irqsave(&deferred_split_lru, nid, folio_memcg(folio), &flags);
+	memcg = folio_memcg_begin(folio);
+	l = list_lru_lock_irqsave(&deferred_split_lru, nid, memcg, &flags);
 	if (__list_lru_del(&deferred_split_lru, l, &folio->_deferred_list, nid)) {
 		if (folio_test_partially_mapped(folio)) {
 			folio_clear_partially_mapped(folio);
@@ -4287,7 +4289,7 @@ bool __folio_unqueue_deferred_split(struct folio *folio)
 		unqueued = true;
 	}
 	list_lru_unlock_irqrestore(l, &flags);
-	rcu_read_unlock();
+	folio_memcg_end();
 
 	return unqueued;	/* useful for debug warnings */
 }
@@ -4322,8 +4324,7 @@ void deferred_split_folio(struct folio *folio, bool partially_mapped)
 
 	nid = folio_nid(folio);
 
-	rcu_read_lock();
-	memcg = folio_memcg(folio);
+	memcg = folio_memcg_begin(folio);
 	l = list_lru_lock_irqsave(&deferred_split_lru, nid, memcg, &flags);
 	if (partially_mapped) {
 		if (!folio_test_partially_mapped(folio)) {
@@ -4339,7 +4340,7 @@ void deferred_split_folio(struct folio *folio, bool partially_mapped)
 	}
 	__list_lru_add(&deferred_split_lru, l, &folio->_deferred_list, nid, memcg);
 	list_lru_unlock_irqrestore(l, &flags);
-	rcu_read_unlock();
+	folio_memcg_end();
 }
 
 static unsigned long deferred_split_count(struct shrinker *shrink,
@@ -4445,16 +4446,17 @@ static unsigned long deferred_split_scan(struct shrinker *shrink,
 		 * don't add it back to split_queue.
 		 */
 		if (!did_split && folio_test_partially_mapped(folio)) {
-			rcu_read_lock();
+			struct mem_cgroup *memcg;
+
+			memcg = folio_memcg_begin(folio);
 			l = list_lru_lock_irqsave(&deferred_split_lru,
-						  folio_nid(folio),
-						  folio_memcg(folio),
+						  folio_nid(folio), memcg,
 						  &flags);
 			__list_lru_add(&deferred_split_lru, l,
 				       &folio->_deferred_list,
-				       folio_nid(folio), folio_memcg(folio));
+				       folio_nid(folio), memcg);
 			list_lru_unlock_irqrestore(l, &flags);
-			rcu_read_unlock();
+			folio_memcg_end();
 		}
 		folio_put(folio);
 	}
diff --git a/mm/list_lru.c b/mm/list_lru.c
index 1ccdd45b1d14..638d084bb0f5 100644
--- a/mm/list_lru.c
+++ b/mm/list_lru.c
@@ -604,10 +604,9 @@ int folio_memcg_list_lru_alloc(struct folio *folio, struct list_lru *lru,
 		return 0;
 
 	/* Fast path when list_lru heads already exist */
-	rcu_read_lock();
-	memcg = folio_memcg(folio);
+	memcg = folio_memcg_begin(folio);
 	res = memcg_list_lru_allocated(memcg, lru);
-	rcu_read_unlock();
+	folio_memcg_end();
 	if (likely(res))
 		return 0;
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index f381cb6bdff1..14732f1542f2 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -965,18 +965,17 @@ void lruvec_stat_mod_folio(struct folio *folio, enum node_stat_item idx,
 	pg_data_t *pgdat = folio_pgdat(folio);
 	struct lruvec *lruvec;
 
-	rcu_read_lock();
-	memcg = folio_memcg(folio);
+	memcg = folio_memcg_begin(folio);
 	/* Untracked pages have no memcg, no lruvec. Update only the node */
 	if (!memcg) {
-		rcu_read_unlock();
+		folio_memcg_end();
 		mod_node_page_state(pgdat, idx, val);
 		return;
 	}
 
 	lruvec = mem_cgroup_lruvec(memcg, pgdat);
 	mod_lruvec_state(lruvec, idx, val);
-	rcu_read_unlock();
+	folio_memcg_end();
 }
 EXPORT_SYMBOL(lruvec_stat_mod_folio);
 
@@ -1170,11 +1169,10 @@ struct mem_cgroup *get_mem_cgroup_from_folio(struct folio *folio)
 	if (!folio_memcg_charged(folio))
 		return root_mem_cgroup;
 
-	rcu_read_lock();
 	do {
-		memcg = folio_memcg(folio);
+		memcg = folio_memcg_begin(folio);
 	} while (unlikely(!css_tryget(&memcg->css)));
-	rcu_read_unlock();
+	folio_memcg_end();
 	return memcg;
 }
 
@@ -5535,8 +5533,7 @@ bool mem_cgroup_swap_full(struct folio *folio)
 	if (do_memsw_account() || !folio_memcg_charged(folio))
 		return ret;
 
-	rcu_read_lock();
-	memcg = folio_memcg(folio);
+	memcg = folio_memcg_begin(folio);
 	for (; !mem_cgroup_is_root(memcg); memcg = parent_mem_cgroup(memcg)) {
 		unsigned long usage = page_counter_read(&memcg->swap);
 
@@ -5546,7 +5543,7 @@ bool mem_cgroup_swap_full(struct folio *folio)
 			break;
 		}
 	}
-	rcu_read_unlock();
+	folio_memcg_end();
 
 	return ret;
 }
diff --git a/mm/migrate.c b/mm/migrate.c
index fdbb20163f66..a2d542ebf3ed 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -672,8 +672,7 @@ static int __folio_migrate_mapping(struct address_space *mapping,
 		struct lruvec *old_lruvec, *new_lruvec;
 		struct mem_cgroup *memcg;
 
-		rcu_read_lock();
-		memcg = folio_memcg(folio);
+		memcg = folio_memcg_begin(folio);
 		old_lruvec = mem_cgroup_lruvec(memcg, oldzone->zone_pgdat);
 		new_lruvec = mem_cgroup_lruvec(memcg, newzone->zone_pgdat);
 
@@ -700,7 +699,7 @@ static int __folio_migrate_mapping(struct address_space *mapping,
 			mod_lruvec_state(new_lruvec, NR_FILE_DIRTY, nr);
 			__mod_zone_page_state(newzone, NR_ZONE_WRITE_PENDING, nr);
 		}
-		rcu_read_unlock();
+		folio_memcg_end();
 	}
 	local_irq_enable();
 
diff --git a/mm/page_io.c b/mm/page_io.c
index 63b262f4c5a9..862135a65848 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -239,6 +239,7 @@ static void swap_zeromap_folio_clear(struct folio *folio)
  */
 int swap_writeout(struct folio *folio, struct swap_iocb **swap_plug)
 {
+	struct mem_cgroup *memcg;
 	int ret = 0;
 
 	if (folio_free_swap(folio))
@@ -277,13 +278,13 @@ int swap_writeout(struct folio *folio, struct swap_iocb **swap_plug)
 		goto out_unlock;
 	}
 
-	rcu_read_lock();
-	if (!mem_cgroup_zswap_writeback_enabled(folio_memcg(folio))) {
+	memcg = folio_memcg_begin(folio);
+	if (!mem_cgroup_zswap_writeback_enabled(memcg)) {
 		rcu_read_unlock();
 		folio_mark_dirty(folio);
 		return AOP_WRITEPAGE_ACTIVATE;
 	}
-	rcu_read_unlock();
+	folio_memcg_end();
 
 	__swap_writepage(folio, swap_plug);
 	return 0;
@@ -314,11 +315,10 @@ static void bio_associate_blkg_from_page(struct bio *bio, struct folio *folio)
 	if (!folio_memcg_charged(folio))
 		return;
 
-	rcu_read_lock();
-	memcg = folio_memcg(folio);
+	memcg = folio_memcg_begin(folio);
 	css = cgroup_e_css(memcg->css.cgroup, &io_cgrp_subsys);
 	bio_associate_blkg_from_css(bio, css);
-	rcu_read_unlock();
+	folio_memcg_end();
 }
 #else
 #define bio_associate_blkg_from_page(bio, folio)		do { } while (0)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 33287ba4a500..12ad40fa7d60 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3407,6 +3407,7 @@ static struct folio *get_pfn_folio(unsigned long pfn, struct mem_cgroup *memcg,
 				   struct pglist_data *pgdat)
 {
 	struct folio *folio = pfn_folio(pfn);
+	struct mem_cgroup *this_memcg;
 
 	if (folio_lru_gen(folio) < 0)
 		return NULL;
@@ -3414,10 +3415,10 @@ static struct folio *get_pfn_folio(unsigned long pfn, struct mem_cgroup *memcg,
 	if (folio_nid(folio) != pgdat->node_id)
 		return NULL;
 
-	rcu_read_lock();
-	if (folio_memcg(folio) != memcg)
+	this_memcg = folio_memcg_begin(folio);
+	if (this_memcg != memcg)
 		folio = NULL;
-	rcu_read_unlock();
+	folio_memcg_end();
 
 	return folio;
 }
diff --git a/mm/workingset.c b/mm/workingset.c
index 07e6836d0502..77bfec58b797 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -251,8 +251,7 @@ static void *lru_gen_eviction(struct folio *folio)
 	BUILD_BUG_ON(LRU_GEN_WIDTH + LRU_REFS_WIDTH >
 		     BITS_PER_LONG - max(EVICTION_SHIFT, EVICTION_SHIFT_ANON));
 
-	rcu_read_lock();
-	memcg = folio_memcg(folio);
+	memcg = folio_memcg_begin(folio);
 	lruvec = mem_cgroup_lruvec(memcg, pgdat);
 	lrugen = &lruvec->lrugen;
 	min_seq = READ_ONCE(lrugen->min_seq[type]);
@@ -261,7 +260,7 @@ static void *lru_gen_eviction(struct folio *folio)
 	hist = lru_hist_from_seq(min_seq);
 	atomic_long_add(delta, &lrugen->evicted[hist][type][tier]);
 	memcg_id = mem_cgroup_private_id(memcg);
-	rcu_read_unlock();
+	folio_memcg_end();
 
 	return pack_shadow(memcg_id, pgdat, token, workingset, type);
 }
diff --git a/mm/zswap.c b/mm/zswap.c
index 4f2e652e8ad3..fb035dd70d8b 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -895,14 +895,15 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry,
 	 * to the active LRU list in the case.
 	 */
 	if (comp_ret || !dlen || dlen >= PAGE_SIZE) {
-		rcu_read_lock();
-		if (!mem_cgroup_zswap_writeback_enabled(
-					folio_memcg(page_folio(page)))) {
-			rcu_read_unlock();
+		struct mem_cgroup *memcg;
+
+		memcg = folio_memcg_begin(page_folio(page));
+		if (!mem_cgroup_zswap_writeback_enabled(memcg)) {
+			folio_memcg_end();
 			comp_ret = comp_ret ? comp_ret : -EINVAL;
 			goto unlock;
 		}
-		rcu_read_unlock();
+		folio_memcg_end();
 		comp_ret = 0;
 		dlen = PAGE_SIZE;
 		dst = kmap_local_page(page);


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 7/7] mm: switch deferred split shrinker to list_lru
  2026-03-19  7:21       ` David Hildenbrand (Arm)
  2026-03-20 16:02         ` Johannes Weiner
@ 2026-03-20 16:07         ` Johannes Weiner
  2026-03-23 19:32           ` David Hildenbrand (Arm)
  1 sibling, 1 reply; 30+ messages in thread
From: Johannes Weiner @ 2026-03-20 16:07 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Andrew Morton, Shakeel Butt, Yosry Ahmed, Zi Yan, Liam R. Howlett,
	Usama Arif, Kiryl Shutsemau, Dave Chinner, Roman Gushchin,
	linux-mm, linux-kernel

On Thu, Mar 19, 2026 at 08:21:21AM +0100, David Hildenbrand (Arm) wrote:
> Of course :) If list_lru lock helpers would be the right thing to do, it
> might be better placed in this series.

I think this is slightly more promising. See below. The callsites in
huge_memory.c look nicer. But the double folio_nid() and folio_memcg()
lookups (when the caller needs them too) are kind of unfortunate; and
it feels like a lot of API for 4 callsites. Thoughts?

 include/linux/list_lru.h |  8 ++++++++
 mm/huge_memory.c         | 43 +++++++++++++++----------------------------
 mm/list_lru.c            | 29 +++++++++++++++++++++++++++++
 3 files changed, 52 insertions(+), 28 deletions(-)

diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h
index 4bd29b61c59a..6b734d08fa1b 100644
--- a/include/linux/list_lru.h
+++ b/include/linux/list_lru.h
@@ -123,6 +123,14 @@ struct list_lru_one *list_lru_lock_irqsave(struct list_lru *lru, int nid,
 void list_lru_unlock_irqrestore(struct list_lru_one *l,
 		unsigned long *irq_flags);
 
+struct list_lru_one *folio_list_lru_lock(struct folio *folio,
+		struct list_lru *lru);
+void folio_list_lru_unlock(struct folio *folio, struct list_lru_one *l);
+struct list_lru_one *folio_list_lru_lock_irqsave(struct folio *folio,
+		struct list_lru *lru, unsigned long *flags);
+void folio_list_lru_unlock_irqrestore(struct folio *folio,
+		struct list_lru_one *l, unsigned long *flags);
+
 /* Caller-locked variants, see list_lru_add() etc for documentation */
 bool __list_lru_add(struct list_lru *lru, struct list_lru_one *l,
 		struct list_head *item, int nid, struct mem_cgroup *memcg);
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index e90d08db219d..6996ef224e24 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3768,11 +3768,8 @@ static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int n
 	VM_WARN_ON_ONCE(!mapping && end);
 	/* Prevent deferred_split_scan() touching ->_refcount */
 	dequeue_deferred = folio_test_anon(folio) && old_order > 1;
-	if (dequeue_deferred) {
-		rcu_read_lock();
-		l = list_lru_lock(&deferred_split_lru,
-				  folio_nid(folio), folio_memcg(folio));
-	}
+	if (dequeue_deferred)
+		l = folio_list_lru_lock(folio, &deferred_split_lru);
 	if (folio_ref_freeze(folio, folio_cache_ref_count(folio) + 1)) {
 		struct swap_cluster_info *ci = NULL;
 		struct lruvec *lruvec;
@@ -3785,8 +3782,7 @@ static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int n
 				mod_mthp_stat(old_order,
 					MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1);
 			}
-			list_lru_unlock(l);
-			rcu_read_unlock();
+			folio_list_lru_unlock(folio, l);
 		}
 
 		if (mapping) {
@@ -3889,10 +3885,8 @@ static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int n
 		if (ci)
 			swap_cluster_unlock(ci);
 	} else {
-		if (dequeue_deferred) {
-			list_lru_unlock(l);
-			rcu_read_unlock();
-		}
+		if (dequeue_deferred)
+			folio_list_lru_unlock(folio, l);
 		return -EAGAIN;
 	}
 
@@ -4276,8 +4270,7 @@ bool __folio_unqueue_deferred_split(struct folio *folio)
 	WARN_ON_ONCE(folio_ref_count(folio));
 	WARN_ON_ONCE(!mem_cgroup_disabled() && !folio_memcg_charged(folio));
 
-	rcu_read_lock();
-	l = list_lru_lock_irqsave(&deferred_split_lru, nid, folio_memcg(folio), &flags);
+	l = folio_list_lru_lock_irqsave(folio, &deferred_split_lru, &flags);
 	if (__list_lru_del(&deferred_split_lru, l, &folio->_deferred_list, nid)) {
 		if (folio_test_partially_mapped(folio)) {
 			folio_clear_partially_mapped(folio);
@@ -4286,7 +4279,7 @@ bool __folio_unqueue_deferred_split(struct folio *folio)
 		}
 		unqueued = true;
 	}
-	list_lru_unlock_irqrestore(l, &flags);
+	folio_list_lru_unlock_irqrestore(folio, l, &flags);
 	rcu_read_unlock();
 
 	return unqueued;	/* useful for debug warnings */
@@ -4297,7 +4290,6 @@ void deferred_split_folio(struct folio *folio, bool partially_mapped)
 {
 	struct list_lru_one *l;
 	int nid;
-	struct mem_cgroup *memcg;
 	unsigned long flags;
 
 	/*
@@ -4322,9 +4314,7 @@ void deferred_split_folio(struct folio *folio, bool partially_mapped)
 
 	nid = folio_nid(folio);
 
-	rcu_read_lock();
-	memcg = folio_memcg(folio);
-	l = list_lru_lock_irqsave(&deferred_split_lru, nid, memcg, &flags);
+	l = folio_list_lru_lock_irqsave(folio, &deferred_split_lru, &flags);
 	if (partially_mapped) {
 		if (!folio_test_partially_mapped(folio)) {
 			folio_set_partially_mapped(folio);
@@ -4337,9 +4327,9 @@ void deferred_split_folio(struct folio *folio, bool partially_mapped)
 		/* partially mapped folios cannot become non-partially mapped */
 		VM_WARN_ON_FOLIO(folio_test_partially_mapped(folio), folio);
 	}
-	__list_lru_add(&deferred_split_lru, l, &folio->_deferred_list, nid, memcg);
-	list_lru_unlock_irqrestore(l, &flags);
-	rcu_read_unlock();
+	__list_lru_add(&deferred_split_lru, l, &folio->_deferred_list, nid,
+		       folio_memcg(folio));
+	folio_list_lru_unlock_irqrestore(folio, l, &flags);
 }
 
 static unsigned long deferred_split_count(struct shrinker *shrink,
@@ -4445,16 +4435,13 @@ static unsigned long deferred_split_scan(struct shrinker *shrink,
 		 * don't add it back to split_queue.
 		 */
 		if (!did_split && folio_test_partially_mapped(folio)) {
-			rcu_read_lock();
-			l = list_lru_lock_irqsave(&deferred_split_lru,
-						  folio_nid(folio),
-						  folio_memcg(folio),
-						  &flags);
+			l = folio_list_lru_lock_irqsave(folio,
+							&deferred_split_lru,
+							&flags);
 			__list_lru_add(&deferred_split_lru, l,
 				       &folio->_deferred_list,
 				       folio_nid(folio), folio_memcg(folio));
-			list_lru_unlock_irqrestore(l, &flags);
-			rcu_read_unlock();
+			folio_list_lru_unlock_irqrestore(folio, l, &flags);
 		}
 		folio_put(folio);
 	}
diff --git a/mm/list_lru.c b/mm/list_lru.c
index 1ccdd45b1d14..8d50741ef18d 100644
--- a/mm/list_lru.c
+++ b/mm/list_lru.c
@@ -173,6 +173,35 @@ void list_lru_unlock_irqrestore(struct list_lru_one *l, unsigned long *flags)
 	unlock_list_lru(l, /*irq_off=*/true, /*irq_flags=*/flags);
 }
 
+struct list_lru_one *folio_list_lru_lock(struct folio *folio, struct list_lru *lru)
+{
+	rcu_read_lock();
+	return list_lru_lock(lru, folio_nid(folio), folio_memcg(folio));
+}
+
+void folio_list_lru_unlock(struct folio *folio, struct list_lru_one *l)
+{
+	list_lru_unlock(l);
+	rcu_read_unlock();
+}
+
+struct list_lru_one *folio_list_lru_lock_irqsave(struct folio *folio,
+						 struct list_lru *lru,
+						 unsigned long *flags)
+{
+	rcu_read_lock();
+	return list_lru_lock_irqsave(lru, folio_nid(folio),
+				     folio_memcg(folio), flags);
+}
+
+void folio_list_lru_unlock_irqrestore(struct folio *folio,
+				      struct list_lru_one *l,
+				      unsigned long *flags)
+{
+	list_lru_unlock_irqrestore(l, flags);
+	rcu_read_unlock();
+}
+
 bool __list_lru_add(struct list_lru *lru, struct list_lru_one *l,
 		    struct list_head *item, int nid,
 		    struct mem_cgroup *memcg)


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 7/7] mm: switch deferred split shrinker to list_lru
  2026-03-20 16:07         ` Johannes Weiner
@ 2026-03-23 19:32           ` David Hildenbrand (Arm)
  0 siblings, 0 replies; 30+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-23 19:32 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Shakeel Butt, Yosry Ahmed, Zi Yan, Liam R. Howlett,
	Usama Arif, Kiryl Shutsemau, Dave Chinner, Roman Gushchin,
	linux-mm, linux-kernel

On 3/20/26 17:07, Johannes Weiner wrote:
> On Thu, Mar 19, 2026 at 08:21:21AM +0100, David Hildenbrand (Arm) wrote:
>> Of course :) If list_lru lock helpers would be the right thing to do, it
>> might be better placed in this series.
> 
> I think this is slightly more promising. See below. The callsites in
> huge_memory.c look nicer. But the double folio_nid() and folio_memcg()
> lookups (when the caller needs them too) are kind of unfortunate; and
> it feels like a lot of API for 4 callsites. Thoughts?

I like that. Could we just put the implementation inline into
list_lru.h, such that the compiler could try reusing nid+memcg?

[...]

>  			__list_lru_add(&deferred_split_lru, l,
>  				       &folio->_deferred_list,
>  				       folio_nid(folio), folio_memcg(folio));
> -			list_lru_unlock_irqrestore(l, &flags);
> -			rcu_read_unlock();
> +			folio_list_lru_unlock_irqrestore(folio, l, &flags);

I guess it would look even cleaner if we would wrap the __list_lru_add()
that needs the memcg + nid in an own helper.

These are the only remaining users of memcg+nid, right?

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 7/7] mm: switch deferred split shrinker to list_lru
  2026-03-20 16:02         ` Johannes Weiner
@ 2026-03-23 19:39           ` David Hildenbrand (Arm)
  0 siblings, 0 replies; 30+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-23 19:39 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Shakeel Butt, Yosry Ahmed, Zi Yan, Liam R. Howlett,
	Usama Arif, Kiryl Shutsemau, Dave Chinner, Roman Gushchin,
	linux-mm, linux-kernel

On 3/20/26 17:02, Johannes Weiner wrote:
> On Thu, Mar 19, 2026 at 08:21:21AM +0100, David Hildenbrand (Arm) wrote:
>>>
>>> I remember you raising this in the objcg + reparenting patches. There
>>> are a few more instances of
>>>
>>> 	rcu_read_lock()
>>> 	foo = folio_memcg()
>>> 	...
>>> 	rcu_read_unlock()
>>>
>>> in other parts of the code not touched by these patches here, so the
>>> first pattern is a more universal encapsulation.
>>>
>>> Let me look into this. Would you be okay with a follow-up that covers
>>> the others as well?
>>
>> Of course :) If list_lru lock helpers would be the right thing to do, it
>> might be better placed in this series.
> 
> I'm playing around with the below. But there are a few things that
> seem suboptimal:

I like that as well (and could even be applied on top of the other
proposal later).

> 
> - We need a local @memcg, which makes sites that just pass
>   folio_memcg() somewhere else fatter. More site LOC on average.

The LOC is really mostly just from the helper functions IIUC.

> - Despite being more verbose, it communicates less. rcu_read_lock()
>   is universally understood, folio_memcg_foo() is cryptic.

begin/end is pretty clear IMHO. Not sure about the "cryptic" part. Taste
differs I guess. :)

> - It doesn't cover similar accessors with the same lifetime rules,
>   like folio_lruvec(), folio_memcg_check()


I think it gets inetresting once the RCU would implicitly protect other
stuff as well. And that would be my point: from the code alone it's not
quite clear what the RCU actually protects and what new code should be
using.

But I won't push for that if you/others prefer to spell out the RCU
stuff :) Thanks for playing with the code!


-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2026-03-23 19:39 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-12 20:51 [PATCH v2 0/7] mm: switch THP shrinker to list_lru Johannes Weiner
2026-03-12 20:51 ` [PATCH v2 1/7] mm: list_lru: lock_list_lru_of_memcg() cannot return NULL if !skip_empty Johannes Weiner
2026-03-17  9:43   ` David Hildenbrand (Arm)
2026-03-18 17:56   ` Shakeel Butt
2026-03-18 19:25     ` Johannes Weiner
2026-03-18 19:34       ` Shakeel Butt
2026-03-12 20:51 ` [PATCH v2 2/7] mm: list_lru: deduplicate unlock_list_lru() Johannes Weiner
2026-03-17  9:44   ` David Hildenbrand (Arm)
2026-03-18 17:57   ` Shakeel Butt
2026-03-12 20:51 ` [PATCH v2 3/7] mm: list_lru: move list dead check to lock_list_lru_of_memcg() Johannes Weiner
2026-03-17  9:47   ` David Hildenbrand (Arm)
2026-03-12 20:51 ` [PATCH v2 4/7] mm: list_lru: deduplicate lock_list_lru() Johannes Weiner
2026-03-17  9:51   ` David Hildenbrand (Arm)
2026-03-12 20:51 ` [PATCH v2 5/7] mm: list_lru: introduce caller locking for additions and deletions Johannes Weiner
2026-03-17 10:00   ` David Hildenbrand (Arm)
2026-03-17 14:03     ` Johannes Weiner
2026-03-17 14:34       ` Johannes Weiner
2026-03-17 16:35         ` David Hildenbrand (Arm)
2026-03-12 20:51 ` [PATCH v2 6/7] mm: list_lru: introduce memcg_list_lru_alloc_folio() Johannes Weiner
2026-03-17 10:09   ` David Hildenbrand (Arm)
2026-03-12 20:51 ` [PATCH v2 7/7] mm: switch deferred split shrinker to list_lru Johannes Weiner
2026-03-18 20:25   ` David Hildenbrand (Arm)
2026-03-18 22:48     ` Johannes Weiner
2026-03-19  7:21       ` David Hildenbrand (Arm)
2026-03-20 16:02         ` Johannes Weiner
2026-03-23 19:39           ` David Hildenbrand (Arm)
2026-03-20 16:07         ` Johannes Weiner
2026-03-23 19:32           ` David Hildenbrand (Arm)
2026-03-13 17:39 ` [syzbot ci] Re: mm: switch THP " syzbot ci
2026-03-13 23:08   ` Johannes Weiner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox