From: Usama Arif <usama.arif@linux.dev>
To: Johannes Weiner <hannes@cmpxchg.org>,
Andrew Morton <akpm@linux-foundation.org>
Cc: David Hildenbrand <david@redhat.com>, Zi Yan <ziy@nvidia.com>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
Kiryl Shutsemau <kas@kernel.org>,
Dave Chinner <david@fromorbit.com>,
Roman Gushchin <roman.gushchin@linux.dev>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm: switch deferred split shrinker to list_lru
Date: Wed, 11 Mar 2026 20:00:29 +0300 [thread overview]
Message-ID: <050ce5bd-4725-468e-acaf-7fca72b84d06@linux.dev> (raw)
In-Reply-To: <20260311154358.150977-1-hannes@cmpxchg.org>
On 11/03/2026 18:43, Johannes Weiner wrote:
> The deferred split queue handles cgroups in a suboptimal fashion. The
> queue is per-NUMA node or per-cgroup, not the intersection. That means
> on a cgrouped system, a node-restricted allocation entering reclaim
> can end up splitting large pages on other nodes:
>
> alloc/unmap
> deferred_split_folio()
> list_add_tail(memcg->split_queue)
> set_shrinker_bit(memcg, node, deferred_shrinker_id)
>
> for_each_zone_zonelist_nodemask(restricted_nodes)
> mem_cgroup_iter()
> shrink_slab(node, memcg)
> shrink_slab_memcg(node, memcg)
> if test_shrinker_bit(memcg, node, deferred_shrinker_id)
> deferred_split_scan()
> walks memcg->split_queue
>
> The shrinker bit adds an imperfect guard rail. As soon as the cgroup
> has a single large page on the node of interest, all large pages owned
> by that memcg, including those on other nodes, will be split.
>
> list_lru properly sets up per-node, per-cgroup lists. As a bonus, it
> streamlines a lot of the list operations and reclaim walks. It's used
> widely by other major shrinkers already. Convert the deferred split
> queue as well.
>
> The list_lru per-memcg heads are instantiated on demand when the first
> object of interest is allocated for a cgroup, by calling
> memcg_list_lru_alloc(). Add calls to where splittable pages are
> created: anon faults, swapin faults, khugepaged collapse.
>
> These calls create all possible node heads for the cgroup at once, so
> the migration code (between nodes) doesn't need any special care.
>
> The folio_test_partially_mapped() state is currently protected and
> serialized wrt LRU state by the deferred split queue lock. To
> facilitate the transition, add helpers to the list_lru API to allow
> caller-side locking.
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
> include/linux/huge_mm.h | 6 +-
> include/linux/list_lru.h | 48 ++++++
> include/linux/memcontrol.h | 4 -
> include/linux/mmzone.h | 12 --
> mm/huge_memory.c | 326 +++++++++++--------------------------
> mm/internal.h | 2 +-
> mm/khugepaged.c | 7 +
> mm/list_lru.c | 197 ++++++++++++++--------
> mm/memcontrol.c | 12 +-
> mm/memory.c | 52 +++---
> mm/mm_init.c | 14 --
> 11 files changed, 310 insertions(+), 370 deletions(-)
>
[..]
> @@ -3802,33 +3706,25 @@ static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int n
> struct folio *new_folio, *next;
> int old_order = folio_order(folio);
> int ret = 0;
> - struct deferred_split *ds_queue;
> + struct list_lru_one *l;
>
> VM_WARN_ON_ONCE(!mapping && end);
> /* Prevent deferred_split_scan() touching ->_refcount */
> - ds_queue = folio_split_queue_lock(folio);
> + l = list_lru_lock(&deferred_split_lru, folio_nid(folio), folio_memcg(folio));
Hello Johannes!
I think we need folio_memcg() to be under rcu_read_lock()?
folio_memcg() calls obj_cgroup_memcg() which has lockdep_assert_once(rcu_read_lock_held()).
folio_split_queue_lock() wraps split_queue_lock() under rcu_read_lock() so wasnt an issue.
> if (folio_ref_freeze(folio, folio_cache_ref_count(folio) + 1)) {
> struct swap_cluster_info *ci = NULL;
> struct lruvec *lruvec;
>
> if (old_order > 1) {
> - if (!list_empty(&folio->_deferred_list)) {
> - ds_queue->split_queue_len--;
> - /*
> - * Reinitialize page_deferred_list after removing the
> - * page from the split_queue, otherwise a subsequent
> - * split will see list corruption when checking the
> - * page_deferred_list.
> - */
> - list_del_init(&folio->_deferred_list);
> - }
> + __list_lru_del(&deferred_split_lru, l,
> + &folio->_deferred_list, folio_nid(folio));
> if (folio_test_partially_mapped(folio)) {
> folio_clear_partially_mapped(folio);
> mod_mthp_stat(old_order,
> MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1);
> }
> }
> - split_queue_unlock(ds_queue);
> + list_lru_unlock(l);
> if (mapping) {
> int nr = folio_nr_pages(folio);
>
> @@ -3929,7 +3825,7 @@ static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int n
> if (ci)
> swap_cluster_unlock(ci);
> } else {
> - split_queue_unlock(ds_queue);
> + list_lru_unlock(l);
> return -EAGAIN;
> }
>
[..]
next prev parent reply other threads:[~2026-03-11 17:00 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-11 15:43 [PATCH] mm: switch deferred split shrinker to list_lru Johannes Weiner
2026-03-11 15:46 ` Johannes Weiner
2026-03-11 15:49 ` David Hildenbrand (Arm)
2026-03-11 17:00 ` Usama Arif [this message]
2026-03-11 17:42 ` Johannes Weiner
2026-03-11 19:24 ` Johannes Weiner
2026-03-11 20:09 ` Shakeel Butt
2026-03-11 21:59 ` Yosry Ahmed
2026-03-11 22:23 ` Dave Chinner
2026-03-12 14:26 ` Johannes Weiner
2026-03-12 9:14 ` [syzbot ci] " syzbot ci
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=050ce5bd-4725-468e-acaf-7fca72b84d06@linux.dev \
--to=usama.arif@linux.dev \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=david@fromorbit.com \
--cc=david@redhat.com \
--cc=hannes@cmpxchg.org \
--cc=kas@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=roman.gushchin@linux.dev \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox