Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/9] mm: switch THP shrinker to list_lru
@ 2026-05-27 20:45 Johannes Weiner
  2026-05-27 20:45 ` [PATCH v5 1/9] mm: list_lru: fix set_shrinker_bit() call during race with cgroup deletion Johannes Weiner
                   ` (8 more replies)
  0 siblings, 9 replies; 19+ messages in thread
From: Johannes Weiner @ 2026-05-27 20:45 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Lorenzo Stoakes, Shakeel Butt, Michal Hocko,
	Dave Chinner, Roman Gushchin, Muchun Song, Qi Zheng, Yosry Ahmed,
	Zi Yan, Liam R . Howlett, Usama Arif, Kiryl Shutsemau,
	Vlastimil Babka, Kairui Song, Mikhail Zaslonko, Vasily Gorbik,
	Baolin Wang, Barry Song, Dev Jain, Lance Yang, Nico Pache,
	Ryan Roberts, cgroups, linux-mm, linux-kernel

This is version 5 of switching the THP shrinker to list_lru.

Core of the new version is the list_lru/set_shrinker_bit fix up front,
which minimally affects later patches; and a rebase onto the latest
mm-unstable - replaced alloc_swap_folio() with __swap_cache_alloc().

The changes seemed small enough that *I chose to keep the review tags
from v4*. Please shout if you object to this!

Changes in v5:
- patch 1 is a new fix for a very old, pre-existing set_shrinker_bit()
  problem in list_lru, where the bit can be set on a dying child memcg
  instead of the ancestor that actually received the item. Pointed out
  by Usama Arif and Sashiko; fix it first to make it minimally
  backportable and so the conversion is safe.
- patches 6 and 9 adapt to that fix's new memcg-by-reference
  lock_list_lru_of_memcg() signature
- collapse_huge_page(): propagate folio_memcg_alloc_deferred() failure
  as SCAN_ALLOC_HUGE_PAGE_FAIL instead of leaking SCAN_SUCCEED and
  falsely reporting a successful MADV_COLLAPSE (Usama Arif, Sashiko)
- deferred_split_isolate(): fix a UAF by reading folio state before
  list_lru_isolate(); once removed, a racing folio_put() frees the
  folio via the lockless list_empty() check while we still touch its
  flags and stats (Sashiko)
- rebased to mm-unstable of 2026-05-27, which simplifies the flatten
  prep patch (now anon-only, as alloc_swap_folio() was folded into the
  new __swap_cache_alloc()) and moves the swap-side
  folio_memcg_alloc_deferred() hook into __swap_cache_alloc(). Kairui,
  I would appreciate an eyeball on that.

Changes in v4:
- guard folio_memcg_alloc_deferred() with mem_cgroup_disabled() to fix
  NULL deref in __memcg_list_lru_alloc() when booting with
  cgroup_disable=memory (e.g., kdump capture kernel) -- reported and
  tested by Mikhail Zaslonko on s390 and x86
- flatten if (folio) branches in alloc_swap_folio() and alloc_anon_folio()
  in a prep patch so the list_lru allocation additions are a clean minimal
  diff (Lorenzo)
- folio_memcg_alloc_deferred() moved out of alloc_charge_folio() into the
  anon-only collapse_huge_page() path; collapse_file() shares that helper
  but its pages don't go on the THP shrinker queue (David)
- guard folio_memcg_alloc_deferred() with order > 1; mTHPs below order-2
  can't be queued on the deferred split list (David)
- make deferred_split_lru static, hide behind folio_memcg_alloc_deferred()
  wrapper with GFP_KERNEL (Lorenzo)
- rename l -> lru throughout huge_memory.c (Lorenzo)
- kdoc for folio_memcg_list_lru_alloc() (Lorenzo)
- list_lru_lock_irq()/unlock_irq()/add_irq() irq-disabling variants;
  use list_lru_add_irq() in deferred_split_scan() (Lorenzo)
- reorder shrinker_free() before list_lru_destroy() (Lorenzo)

Changes in v3:
- dedicated lockdep_key for irqsafe deferred_split_lru.lock (syzbot)
- conditional list_lru ops in __folio_freeze_and_split_unmapped() (syzbot)
- annotate runs of inscrutable false, NULL, false function arguments (David)
- rename to folio_memcg_list_lru_alloc() (David)

Changes in v2:
- explicit rcu_read_lock() in __folio_freeze_and_split_unmapped() (Usama)
- split out list_lru prep bits (Dave)

The open-coded deferred split queue has issues. It's not NUMA-aware
(when cgroup is enabled), and it's more complicated in the callsites
interacting with it. Switching to list_lru fixes the NUMA problem and
streamlines things. It also simplifies planned shrinker work.

Patch 1 fixes a pre-existing list_lru bug where the shrinker bit is
set on the caller's memcg rather than the ancestor whose sublist the
item actually lands on after a walk-up. Standalone, backportable; the
rest of the series depends on it.

Patches 2-5 are cleanups and small refactors in list_lru code. They're
basically independent, but make the THP shrinker conversion easier.

Patch 6 extends the list_lru API to allow the caller to control the
locking scope. The THP shrinker has private state it needs to keep
synchronized with the LRU state.

Patch 7 extends the list_lru API with a convenience helper to do
list_lru head allocation (memcg_list_lru_alloc) when coming from a
folio. Anon THPs are instantiated in several places, and with the
folio reparenting patches pending, folio_memcg() access is now a more
delicate dance. This avoids having to replicate that dance everywhere.

Patch 8 flattens the alloc_anon_folio() retry loop so the next patch's
list_lru hook lands as a clean addition rather than nested deep inside
an if (folio) block.

Patch 9 finally switches the deferred_split_queue to list_lru.

Based on mm-unstable.

 include/linux/huge_mm.h    |   7 +-
 include/linux/list_lru.h   |  70 +++++++++
 include/linux/memcontrol.h |   4 -
 include/linux/mmzone.h     |  12 --
 mm/huge_memory.c           | 364 +++++++++++++++------------------------------
 mm/internal.h              |   2 +-
 mm/khugepaged.c            |   5 +
 mm/list_lru.c              | 238 +++++++++++++++++++----------
 mm/memcontrol.c            |  12 +-
 mm/memory.c                |  38 ++---
 mm/mm_init.c               |  15 --
 mm/swap_state.c            |  10 ++
 12 files changed, 399 insertions(+), 378 deletions(-)

The base moved substantially since v4 (the swap allocation rework in
particular reshuffled the alloc_swap_folio() landing spot), so the
patch-level diff between v4 and v5 is non-obvious from a tree diff
alone. For ease of review, here is the range-diff:

 -:  ------------ >  1:  f4f3933599b9 mm: list_lru: set shrinker bit on the memcg that owns the locked sublist
 1:  846dafe02e8b !  2:  e7b8f8bce2ec mm: list_lru: lock_list_lru_of_memcg() cannot return NULL if !skip_empty
    @@ mm/list_lru.c
     @@ mm/list_lru.c: bool list_lru_add(struct list_lru *lru, struct list_head *item, int nid,
      	struct list_lru_one *l;
      
    - 	l = lock_list_lru_of_memcg(lru, nid, memcg, false, false);
    + 	l = lock_list_lru_of_memcg(lru, nid, &memcg, false, false);
     -	if (!l)
     -		return false;
      	if (list_empty(item)) {
      		list_add_tail(item, &l->list);
    - 		/* Set shrinker bit if the first element was added */
    + 		/*
     @@ mm/list_lru.c: bool list_lru_del(struct list_lru *lru, struct list_head *item, int nid,
      {
      	struct list_lru_node *nlru = &lru->node[nid];
      	struct list_lru_one *l;
     +
    - 	l = lock_list_lru_of_memcg(lru, nid, memcg, false, false);
    + 	l = lock_list_lru_of_memcg(lru, nid, &memcg, false, false);
     -	if (!l)
     -		return false;
      	if (!list_empty(item)) {
 2:  afe28e645aff !  3:  f1e34640dff9 mm: list_lru: deduplicate unlock_list_lru()
    @@ mm/list_lru.c: static inline bool lock_list_lru(struct list_lru_one *l, bool irq
      		return false;
      	}
      	return true;
    -@@ mm/list_lru.c: lock_list_lru_of_memcg(struct list_lru *lru, int nid, struct mem_cgroup *memcg,
    - 	memcg = parent_mem_cgroup(memcg);
    +@@ mm/list_lru.c: lock_list_lru_of_memcg(struct list_lru *lru, int nid,
    + 	*memcg = parent_mem_cgroup(*memcg);
      	goto again;
      }
     -
    @@ mm/list_lru.c: lock_list_lru_of_memcg(struct list_lru *lru, int nid, struct mem_
      #else
      static void list_lru_register(struct list_lru *lru)
      {
    -@@ mm/list_lru.c: lock_list_lru_of_memcg(struct list_lru *lru, int nid, struct mem_cgroup *memcg,
    +@@ mm/list_lru.c: lock_list_lru_of_memcg(struct list_lru *lru, int nid,
      
      	return l;
      }
 3:  9e5499facfb1 !  4:  2612b71187ea mm: list_lru: move list dead check to lock_list_lru_of_memcg()
    @@ mm/list_lru.c: list_lru_from_memcg_idx(struct list_lru *lru, int nid, int idx)
      }
      
      static inline struct list_lru_one *
    -@@ mm/list_lru.c: lock_list_lru_of_memcg(struct list_lru *lru, int nid, struct mem_cgroup *memcg,
    +@@ mm/list_lru.c: lock_list_lru_of_memcg(struct list_lru *lru, int nid,
      	rcu_read_lock();
      again:
    - 	l = list_lru_from_memcg_idx(lru, nid, memcg_kmem_id(memcg));
    + 	l = list_lru_from_memcg_idx(lru, nid, memcg_kmem_id(*memcg));
     -	if (likely(l) && lock_list_lru(l, irq)) {
     -		rcu_read_unlock();
     -		return l;
 4:  855b908bfb82 !  5:  cc2819362f07 mm: list_lru: deduplicate lock_list_lru()
    @@ mm/list_lru.c: list_lru_from_memcg_idx(struct list_lru *lru, int nid, int idx)
     -}
     -
      static inline struct list_lru_one *
    - lock_list_lru_of_memcg(struct list_lru *lru, int nid, struct mem_cgroup *memcg,
    - 		       bool irq, bool skip_empty)
    -@@ mm/list_lru.c: lock_list_lru_of_memcg(struct list_lru *lru, int nid, struct mem_cgroup *memcg,
    + lock_list_lru_of_memcg(struct list_lru *lru, int nid,
    + 		       struct mem_cgroup **memcg, bool irq, bool skip_empty)
    +@@ mm/list_lru.c: lock_list_lru_of_memcg(struct list_lru *lru, int nid,
      {
      	struct list_lru_one *l = &lru->node[nid].lru;
      
 5:  b8a70f1016f3 !  6:  08c4561616df mm: list_lru: introduce caller locking for additions and deletions
    @@ include/linux/list_lru.h: int memcg_list_lru_alloc(struct mem_cgroup *memcg, str
     + * list_lru_lock: lock the sublist for the given node and memcg
     + * @lru: the lru pointer
     + * @nid: the node id of the sublist to lock.
    -+ * @memcg: the cgroup of the sublist to lock.
    ++ * @memcg: pointer to the cgroup of the sublist to lock. On return,
    ++ *         updated to the cgroup whose sublist was actually locked,
    ++ *         which may be an ancestor if the original memcg was dying.
     + *
     + * Returns the locked list_lru_one sublist. The caller must call
     + * list_lru_unlock() when done.
    @@ include/linux/list_lru.h: int memcg_list_lru_alloc(struct mem_cgroup *memcg, str
     + * Return: the locked list_lru_one, or NULL on failure
     + */
     +struct list_lru_one *list_lru_lock(struct list_lru *lru, int nid,
    -+		struct mem_cgroup *memcg);
    ++		struct mem_cgroup **memcg);
     +
     +/**
     + * list_lru_unlock: unlock a sublist locked by list_lru_lock()
    @@ include/linux/list_lru.h: int memcg_list_lru_alloc(struct mem_cgroup *memcg, str
     +void list_lru_unlock(struct list_lru_one *l);
     +
     +struct list_lru_one *list_lru_lock_irq(struct list_lru *lru, int nid,
    -+		struct mem_cgroup *memcg);
    ++		struct mem_cgroup **memcg);
     +void list_lru_unlock_irq(struct list_lru_one *l);
     +
     +struct list_lru_one *list_lru_lock_irqsave(struct list_lru *lru, int nid,
    -+		struct mem_cgroup *memcg, unsigned long *irq_flags);
    ++		struct mem_cgroup **memcg, unsigned long *irq_flags);
     +void list_lru_unlock_irqrestore(struct list_lru_one *l,
     +		unsigned long *irq_flags);
     +
    @@ mm/list_lru.c
     @@ mm/list_lru.c: list_lru_from_memcg_idx(struct list_lru *lru, int nid, int idx)
      
      static inline struct list_lru_one *
    - lock_list_lru_of_memcg(struct list_lru *lru, int nid, struct mem_cgroup *memcg,
    --		       bool irq, bool skip_empty)
    -+		       bool irq, unsigned long *irq_flags, bool skip_empty)
    + lock_list_lru_of_memcg(struct list_lru *lru, int nid,
    +-		       struct mem_cgroup **memcg, bool irq, bool skip_empty)
    ++		       struct mem_cgroup **memcg, bool irq,
    ++		       unsigned long *irq_flags, bool skip_empty)
      {
      	struct list_lru_one *l;
      
    -@@ mm/list_lru.c: lock_list_lru_of_memcg(struct list_lru *lru, int nid, struct mem_cgroup *memcg,
    +@@ mm/list_lru.c: lock_list_lru_of_memcg(struct list_lru *lru, int nid,
      again:
    - 	l = list_lru_from_memcg_idx(lru, nid, memcg_kmem_id(memcg));
    + 	l = list_lru_from_memcg_idx(lru, nid, memcg_kmem_id(*memcg));
      	if (likely(l)) {
     -		lock_list_lru(l, irq);
     +		lock_list_lru(l, irq, irq_flags);
    @@ mm/list_lru.c: lock_list_lru_of_memcg(struct list_lru *lru, int nid, struct mem_
     @@ mm/list_lru.c: list_lru_from_memcg_idx(struct list_lru *lru, int nid, int idx)
      
      static inline struct list_lru_one *
    - lock_list_lru_of_memcg(struct list_lru *lru, int nid, struct mem_cgroup *memcg,
    --		       bool irq, bool skip_empty)
    -+		       bool irq, unsigned long *irq_flags, bool skip_empty)
    + lock_list_lru_of_memcg(struct list_lru *lru, int nid,
    +-		       struct mem_cgroup **memcg, bool irq, bool skip_empty)
    ++		       struct mem_cgroup **memcg, bool irq,
    ++		       unsigned long *irq_flags, bool skip_empty)
      {
      	struct list_lru_one *l = &lru->node[nid].lru;
      
    @@ mm/list_lru.c: list_lru_from_memcg_idx(struct list_lru *lru, int nid, int idx)
     -bool list_lru_add(struct list_lru *lru, struct list_head *item, int nid,
     -		  struct mem_cgroup *memcg)
     +struct list_lru_one *list_lru_lock(struct list_lru *lru, int nid,
    -+				   struct mem_cgroup *memcg)
    ++				   struct mem_cgroup **memcg)
      {
     -	struct list_lru_node *nlru = &lru->node[nid];
     -	struct list_lru_one *l;
    @@ mm/list_lru.c: list_lru_from_memcg_idx(struct list_lru *lru, int nid, int idx)
     +}
     +
     +struct list_lru_one *list_lru_lock_irq(struct list_lru *lru, int nid,
    -+				       struct mem_cgroup *memcg)
    ++				       struct mem_cgroup **memcg)
     +{
     +	return lock_list_lru_of_memcg(lru, nid, memcg, /*irq=*/true,
     +				      /*irq_flags=*/NULL, /*skip_empty=*/false);
    @@ mm/list_lru.c: list_lru_from_memcg_idx(struct list_lru *lru, int nid, int idx)
     +	unlock_list_lru(l, /*irq_off=*/true, /*irq_flags=*/NULL);
     +}
      
    --	l = lock_list_lru_of_memcg(lru, nid, memcg, false, false);
    +-	l = lock_list_lru_of_memcg(lru, nid, &memcg, false, false);
     +struct list_lru_one *list_lru_lock_irqsave(struct list_lru *lru, int nid,
    -+					   struct mem_cgroup *memcg,
    ++					   struct mem_cgroup **memcg,
     +					   unsigned long *flags)
     +{
     +	return lock_list_lru_of_memcg(lru, nid, memcg, /*irq=*/true,
    @@ mm/list_lru.c: list_lru_from_memcg_idx(struct list_lru *lru, int nid, int idx)
     +{
      	if (list_empty(item)) {
      		list_add_tail(item, &l->list);
    - 		/* Set shrinker bit if the first element was added */
    + 		/*
    +@@ mm/list_lru.c: bool list_lru_add(struct list_lru *lru, struct list_head *item, int nid,
    + 		 */
      		if (!l->nr_items++)
      			set_shrinker_bit(memcg, nid, lru_shrinker_id(lru));
     -		unlock_list_lru(l, false);
    @@ mm/list_lru.c: list_lru_from_memcg_idx(struct list_lru *lru, int nid, int idx)
     +	struct list_lru_one *l;
     +	bool ret;
     +
    -+	l = list_lru_lock(lru, nid, memcg);
    ++	l = list_lru_lock(lru, nid, &memcg);
     +	ret = __list_lru_add(lru, l, item, nid, memcg);
     +	list_lru_unlock(l);
     +	return ret;
    @@ mm/list_lru.c: list_lru_from_memcg_idx(struct list_lru *lru, int nid, int idx)
     +	struct list_lru_one *l;
     +	bool ret;
     +
    -+	l = list_lru_lock_irq(lru, nid, memcg);
    ++	l = list_lru_lock_irq(lru, nid, &memcg);
     +	ret = __list_lru_add(lru, l, item, nid, memcg);
     +	list_lru_unlock_irq(l);
     +	return ret;
    @@ mm/list_lru.c: EXPORT_SYMBOL_GPL(list_lru_add_obj);
      	struct list_lru_one *l;
     +	bool ret;
      
    --	l = lock_list_lru_of_memcg(lru, nid, memcg, false, false);
    +-	l = lock_list_lru_of_memcg(lru, nid, &memcg, false, false);
     -	if (!list_empty(item)) {
     -		list_del_init(item);
     -		l->nr_items--;
    @@ mm/list_lru.c: EXPORT_SYMBOL_GPL(list_lru_add_obj);
     -	}
     -	unlock_list_lru(l, false);
     -	return false;
    -+	l = list_lru_lock(lru, nid, memcg);
    ++	l = list_lru_lock(lru, nid, &memcg);
     +	ret = __list_lru_del(lru, l, item, nid);
     +	list_lru_unlock(l);
     +	return ret;
    @@ mm/list_lru.c: __list_lru_walk_one(struct list_lru *lru, int nid, struct mem_cgr
      	unsigned long isolated = 0;
      
      restart:
    --	l = lock_list_lru_of_memcg(lru, nid, memcg, irq_off, true);
    -+	l = lock_list_lru_of_memcg(lru, nid, memcg, /*irq=*/irq_off,
    +-	l = lock_list_lru_of_memcg(lru, nid, &memcg, irq_off, true);
    ++	l = lock_list_lru_of_memcg(lru, nid, &memcg, /*irq=*/irq_off,
     +				   /*irq_flags=*/NULL, /*skip_empty=*/true);
      	if (!l)
      		return isolated;
 6:  0bf8cd5bc205 =  7:  9b1b9ab5e749 mm: list_lru: introduce folio_memcg_list_lru_alloc()
 7:  a26656c1c0a5 !  8:  fd4e1d364dc2 mm: memory: flatten folio allocation retry loops
    @@ Metadata
     Author: Johannes Weiner <hannes@cmpxchg.org>
     
      ## Commit message ##
    -    mm: memory: flatten folio allocation retry loops
    +    mm: memory: flatten alloc_anon_folio() retry loop
     
    -    alloc_swap_folio() and alloc_anon_folio() use a top-level if (folio)
    -    that buries the success path four levels deep. This makes for awkward
    -    long lines and wrapping. The next patch will add more code here, so
    -    flatten this now to keep things clean and simple.
    +    alloc_anon_folio() uses a top-level if (folio) that buries the success
    +    path four levels deep. This makes for awkward long lines and wrapping.
    +    The next patch will add more code here, so flatten this now to keep
    +    things clean and simple.
     
    -    alloc_anon_folio() already has a next label, use it for !folio. Add
    -    the equivalent to alloc_swap_folio().
    +    The next label is already there, use it for !folio.
     
         No functional change intended.
     
    @@ Commit message
         Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
     
      ## mm/memory.c ##
    -@@ mm/memory.c: static struct folio *alloc_swap_folio(struct vm_fault *vmf)
    - 	while (orders) {
    - 		addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << order);
    - 		folio = vma_alloc_folio(gfp, order, vma, addr);
    --		if (folio) {
    --			if (!mem_cgroup_swapin_charge_folio(folio, vma->vm_mm,
    --							    gfp, entry))
    --				return folio;
    -+		if (!folio)
    -+			goto next;
    -+		if (mem_cgroup_swapin_charge_folio(folio, vma->vm_mm, gfp, entry)) {
    - 			count_mthp_stat(order, MTHP_STAT_SWPIN_FALLBACK_CHARGE);
    - 			folio_put(folio);
    -+			goto next;
    - 		}
    -+		return folio;
    -+next:
    - 		count_mthp_stat(order, MTHP_STAT_SWPIN_FALLBACK);
    - 		order = next_order(&orders, order);
    - 	}
     @@ mm/memory.c: static struct folio *alloc_anon_folio(struct vm_fault *vmf)
      	while (orders) {
      		addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << order);
 8:  e454696ab1b7 !  9:  70fe768450de mm: switch deferred split shrinker to list_lru
    @@ mm/huge_memory.c: static int __folio_freeze_and_split_unmapped(struct folio *fol
     +	 */
     +	dequeue_deferred = folio_test_anon(folio) && old_order > 1;
     +	if (dequeue_deferred) {
    ++		struct mem_cgroup *memcg;
    ++
     +		rcu_read_lock();
    ++		memcg = folio_memcg(folio);
     +		lru = list_lru_lock(&deferred_split_lru,
    -+				    folio_nid(folio), folio_memcg(folio));
    ++				    folio_nid(folio), &memcg);
     +	}
      	if (folio_ref_freeze(folio, folio_cache_ref_count(folio) + 1)) {
      		struct swap_cluster_info *ci = NULL;
    @@ mm/huge_memory.c: int split_folio_to_list(struct folio *folio, struct list_head
      bool __folio_unqueue_deferred_split(struct folio *folio)
      {
     -	struct deferred_split *ds_queue;
    ++	struct mem_cgroup *memcg;
     +	struct list_lru_one *lru;
     +	int nid = folio_nid(folio);
      	unsigned long flags;
    @@ mm/huge_memory.c: int split_folio_to_list(struct folio *folio, struct list_head
     -	if (!list_empty(&folio->_deferred_list)) {
     -		ds_queue->split_queue_len--;
     +	rcu_read_lock();
    -+	lru = list_lru_lock_irqsave(&deferred_split_lru, nid, folio_memcg(folio), &flags);
    ++	memcg = folio_memcg(folio);
    ++	lru = list_lru_lock_irqsave(&deferred_split_lru, nid, &memcg, &flags);
     +	if (__list_lru_del(&deferred_split_lru, lru, &folio->_deferred_list, nid)) {
      		if (folio_test_partially_mapped(folio)) {
      			folio_clear_partially_mapped(folio);
    @@ mm/huge_memory.c: void deferred_split_folio(struct folio *folio, bool partially_
     +
     +	rcu_read_lock();
     +	memcg = folio_memcg(folio);
    -+	lru = list_lru_lock_irqsave(&deferred_split_lru, nid, memcg, &flags);
    ++	lru = list_lru_lock_irqsave(&deferred_split_lru, nid, &memcg, &flags);
      	if (partially_mapped) {
      		if (!folio_test_partially_mapped(folio)) {
      			folio_set_partially_mapped(folio);
    @@ mm/huge_memory.c: static bool thp_underused(struct folio *folio)
     +		return LRU_REMOVED;
     +	}
     +
    -+	/* We lost race with folio_put() */
    -+	list_lru_isolate(lru, item);
    ++	/*
    ++	 * We lost race with folio_put(). Read folio state before the
    ++	 * isolate: folio_unqueue_deferred_split() checks list_empty()
    ++	 * locklessly, so once removed the folio can be freed any time.
    ++	 */
     +	if (folio_test_partially_mapped(folio)) {
     +		folio_clear_partially_mapped(folio);
     +		mod_mthp_stat(folio_order(folio),
     +			      MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1);
     +	}
    ++	list_lru_isolate(lru, item);
     +	return LRU_REMOVED;
     +}
     +
    @@ mm/huge_memory.c: static bool thp_underused(struct folio *folio)
      	struct folio *folio, *next;
     -	int split = 0, i;
     -	struct folio_batch fbatch;
    +-
    +-	folio_batch_init(&fbatch);
     +	int split = 0;
     +	unsigned long isolated;
      
    --	folio_batch_init(&fbatch);
    -+	isolated = list_lru_shrink_walk_irq(&deferred_split_lru, sc,
    -+					    deferred_split_isolate, &dispose);
    - 
     -retry:
     -	ds_queue = split_queue_lock_irqsave(sc->nid, sc->memcg, &flags);
     -	/* Take pin on all head pages to avoid freeing them under us */
    @@ mm/huge_memory.c: static bool thp_underused(struct folio *folio)
     -			break;
     -	}
     -	split_queue_unlock_irqrestore(ds_queue, flags);
    --
    ++	isolated = list_lru_shrink_walk_irq(&deferred_split_lru, sc,
    ++					    deferred_split_isolate, &dispose);
    + 
     -	for (i = 0; i < folio_batch_count(&fbatch); i++) {
     +	list_for_each_entry_safe(folio, next, &dispose, _deferred_list) {
      		bool did_split = false;
    @@ mm/khugepaged.c: static enum scan_result collapse_huge_page(struct mm_struct *mm
      	if (result != SCAN_SUCCEED)
      		goto out_nolock;
      
    -+	if (folio_memcg_alloc_deferred(folio))
    ++	if (folio_memcg_alloc_deferred(folio)) {
    ++		result = SCAN_ALLOC_HUGE_PAGE_FAIL;
     +		goto out_nolock;
    ++	}
     +
      	mmap_read_lock(mm);
      	result = hugepage_vma_revalidate(mm, pmd_addr, /*expect_anon=*/ true,
    @@ mm/memcontrol.c: static void mem_cgroup_css_offline(struct cgroup_subsys_state *
      	reparent_shrinker_deferred(memcg);
     
      ## mm/memory.c ##
    -@@ mm/memory.c: static struct folio *alloc_swap_folio(struct vm_fault *vmf)
    - 			folio_put(folio);
    - 			goto next;
    - 		}
    -+		if (order > 1 && folio_memcg_alloc_deferred(folio)) {
    -+			folio_put(folio);
    -+			goto fallback;
    -+		}
    - 		return folio;
    - next:
    - 		count_mthp_stat(order, MTHP_STAT_SWPIN_FALLBACK);
     @@ mm/memory.c: static struct folio *alloc_anon_folio(struct vm_fault *vmf)
      			folio_put(folio);
      			goto next;
    @@ mm/mm_init.c: static void __meminit pgdat_init_internals(struct pglist_data *pgd
      	pgdat_init_kcompactd(pgdat);
      
      	init_waitqueue_head(&pgdat->kswapd_wait);
    +
    + ## mm/swap_state.c ##
    +@@ mm/swap_state.c: static struct folio *__swap_cache_alloc(struct swap_cluster_info *ci,
    + 		return ERR_PTR(-ENOMEM);
    + 	}
    + 
    ++	if (order > 1 && folio_memcg_alloc_deferred(folio)) {
    ++		spin_lock(&ci->lock);
    ++		__swap_cache_do_del_folio(ci, folio, entry, shadow);
    ++		spin_unlock(&ci->lock);
    ++		folio_unlock(folio);
    ++		/* nr_pages refs from swap cache, 1 from allocation */
    ++		folio_put_refs(folio, nr_pages + 1);
    ++		return ERR_PTR(-ENOMEM);
    ++	}
    ++
    + 	/* memsw uncharges swap when folio is added to swap cache */
    + 	memcg1_swapin(folio);
    + 	if (shadow)



^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2026-05-29 17:34 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-27 20:45 [PATCH v5 0/9] mm: switch THP shrinker to list_lru Johannes Weiner
2026-05-27 20:45 ` [PATCH v5 1/9] mm: list_lru: fix set_shrinker_bit() call during race with cgroup deletion Johannes Weiner
2026-05-28 13:25   ` Usama Arif
2026-05-27 20:45 ` [PATCH v5 2/9] mm: list_lru: lock_list_lru_of_memcg() cannot return NULL if !skip_empty Johannes Weiner
2026-05-27 20:45 ` [PATCH v5 3/9] mm: list_lru: deduplicate unlock_list_lru() Johannes Weiner
2026-05-27 20:45 ` [PATCH v5 4/9] mm: list_lru: move list dead check to lock_list_lru_of_memcg() Johannes Weiner
2026-05-27 20:45 ` [PATCH v5 5/9] mm: list_lru: deduplicate lock_list_lru() Johannes Weiner
2026-05-29  9:56   ` Wei Yang
2026-05-29 13:42     ` Johannes Weiner
2026-05-27 20:45 ` [PATCH v5 6/9] mm: list_lru: introduce caller locking for additions and deletions Johannes Weiner
2026-05-27 20:45 ` [PATCH v5 7/9] mm: list_lru: introduce folio_memcg_list_lru_alloc() Johannes Weiner
2026-05-27 20:45 ` [PATCH v5 8/9] mm: memory: flatten alloc_anon_folio() retry loop Johannes Weiner
2026-05-27 20:45 ` [PATCH v5 9/9] mm: switch deferred split shrinker to list_lru Johannes Weiner
2026-05-28  7:08   ` SeongJae Park
2026-05-28 14:03     ` Johannes Weiner
2026-05-28 13:32   ` Usama Arif
2026-05-28 14:02     ` Johannes Weiner
2026-05-28 15:31       ` Usama Arif
2026-05-29 17:33   ` Kairui Song

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox