From: SeongJae Park <sj@kernel.org>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: SeongJae Park <sj@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
David Hildenbrand <david@kernel.org>,
Lorenzo Stoakes <ljs@kernel.org>,
Shakeel Butt <shakeel.butt@linux.dev>,
Michal Hocko <mhocko@kernel.org>,
Dave Chinner <david@fromorbit.com>,
Roman Gushchin <roman.gushchin@linux.dev>,
Muchun Song <muchun.song@linux.dev>,
Qi Zheng <qi.zheng@linux.dev>,
Yosry Ahmed <yosry.ahmed@linux.dev>, Zi Yan <ziy@nvidia.com>,
"Liam R . Howlett" <liam@infradead.org>,
Usama Arif <usama.arif@linux.dev>,
Kiryl Shutsemau <kas@kernel.org>,
Vlastimil Babka <vbabka@kernel.org>,
Kairui Song <ryncsn@gmail.com>,
Mikhail Zaslonko <zaslonko@linux.ibm.com>,
Vasily Gorbik <gor@linux.ibm.com>,
Baolin Wang <baolin.wang@linux.alibaba.com>,
Barry Song <baohua@kernel.org>, Dev Jain <dev.jain@arm.com>,
Lance Yang <lance.yang@linux.dev>, Nico Pache <npache@redhat.com>,
Ryan Roberts <ryan.roberts@arm.com>,
cgroups@vger.kernel.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v5 9/9] mm: switch deferred split shrinker to list_lru
Date: Thu, 28 May 2026 00:08:05 -0700 [thread overview]
Message-ID: <20260528070807.144064-1-sj@kernel.org> (raw)
In-Reply-To: <20260527204757.2544958-10-hannes@cmpxchg.org>
Hi Johannes,
On Wed, 27 May 2026 16:45:16 -0400 Johannes Weiner <hannes@cmpxchg.org> wrote:
> The deferred split queue handles cgroups in a suboptimal fashion. The
> queue is per-NUMA node or per-cgroup, not the intersection. That means
> on a cgrouped system, a node-restricted allocation entering reclaim
> can end up splitting large pages on other nodes:
>
> alloc/unmap
> deferred_split_folio()
> list_add_tail(memcg->split_queue)
> set_shrinker_bit(memcg, node, deferred_shrinker_id)
>
> for_each_zone_zonelist_nodemask(restricted_nodes)
> mem_cgroup_iter()
> shrink_slab(node, memcg)
> shrink_slab_memcg(node, memcg)
> if test_shrinker_bit(memcg, node, deferred_shrinker_id)
> deferred_split_scan()
> walks memcg->split_queue
>
> The shrinker bit adds an imperfect guard rail. As soon as the cgroup
> has a single large page on the node of interest, all large pages owned
> by that memcg, including those on other nodes, will be split.
>
> list_lru properly sets up per-node, per-cgroup lists. As a bonus, it
> streamlines a lot of the list operations and reclaim walks. It's used
> widely by other major shrinkers already. Convert the deferred split
> queue as well.
>
> The list_lru per-memcg heads are instantiated on demand when the first
> object of interest is allocated for a cgroup, by calling
> folio_memcg_alloc_deferred(). Add calls to where splittable pages are
> created: anon faults, swapin faults, khugepaged collapse.
>
> These calls create all possible node heads for the cgroup at once, so
> the migration code (between nodes) doesn't need any special care.
>
> Reported-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
> Tested-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
> Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
> include/linux/huge_mm.h | 7 +-
> include/linux/memcontrol.h | 4 -
> include/linux/mmzone.h | 12 --
> mm/huge_memory.c | 364 +++++++++++++------------------------
> mm/internal.h | 2 +-
> mm/khugepaged.c | 5 +
> mm/memcontrol.c | 12 +-
> mm/memory.c | 4 +
> mm/mm_init.c | 15 --
> mm/swap_state.c | 10 +
> 10 files changed, 150 insertions(+), 285 deletions(-)
>
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index edece3e26985..f6c2531a27a3 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -423,10 +423,10 @@ static inline int split_huge_page(struct page *page)
> {
> return split_huge_page_to_list_to_order(page, NULL, 0);
> }
> +
> +int folio_memcg_alloc_deferred(struct folio *folio);
> +
> void deferred_split_folio(struct folio *folio, bool partially_mapped);
> -#ifdef CONFIG_MEMCG
> -void reparent_deferred_split_queue(struct mem_cgroup *memcg);
> -#endif
>
> void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
> unsigned long address, bool freeze);
> @@ -664,7 +664,6 @@ static inline int folio_split(struct folio *folio, unsigned int new_order,
> }
>
> static inline void deferred_split_folio(struct folio *folio, bool partially_mapped) {}
> -static inline void reparent_deferred_split_queue(struct mem_cgroup *memcg) {}
> #define split_huge_pmd(__vma, __pmd, __address) \
> do { } while (0)
I found this patch is now in mm-new and it makes UM mode kunit fails like
below.
$ ./tools/testing/kunit/kunit.py run --kunitconfig mm/damon/tests/
[00:00:02] Configuring KUnit Kernel ...
[00:00:02] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make all compile_commands.json scripts_gdb ARCH=um O=.kunit --jobs=8
ERROR:root:../mm/swap_state.c: In function ‘__swap_cache_alloc’:
../mm/swap_state.c:468:26: error: implicit declaration of function ‘folio_memcg_alloc_deferred’ [-Wimplicit-function-declaration]
468 | if (order > 1 && folio_memcg_alloc_deferred(folio)) {
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
make[4]: *** [../scripts/Makefile.build:289: mm/swap_state.o] Error 1
make[4]: *** Waiting for unfinished jobs....
make[3]: *** [../scripts/Makefile.build:548: mm] Error 2
make[3]: *** Waiting for unfinished jobs....
make[2]: *** [/home/lkhack/linux/Makefile:2143: .] Error 2
make[1]: *** [/home/lkhack/linux/Makefile:248: __sub-make] Error 2
make: *** [Makefile:248: __sub-make] Error 2
Maybe we can define the function for CONFIG_TRANSPARENT_HUGEPAGE unset case? I
confirmed the below attaching temporal fix works for at least kunit.
Thanks,
SJ
[...]
=== >8 ===
From 23b5800dd49085707baee5774b74782c3e424f24 Mon Sep 17 00:00:00 2001
From: SeongJae Park <sj@kernel.org>
Date: Wed, 27 May 2026 23:58:07 -0700
Subject: [PATCH] mm/huge_mm: define memcg_alloc_deferred() for
!CONFIG_TRANSPARENT_HUGEPPAGE
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Without this, UM mode kunit fails like below.
$ ./tools/testing/kunit/kunit.py run --kunitconfig mm/damon/tests/
[00:00:02] Configuring KUnit Kernel ...
[00:00:02] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make all compile_commands.json scripts_gdb ARCH=um O=.kunit --jobs=8
ERROR:root:../mm/swap_state.c: In function ‘__swap_cache_alloc’:
../mm/swap_state.c:468:26: error: implicit declaration of function ‘folio_memcg_alloc_deferred’ [-Wimplicit-function-declaration]
468 | if (order > 1 && folio_memcg_alloc_deferred(folio)) {
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
make[4]: *** [../scripts/Makefile.build:289: mm/swap_state.o] Error 1
make[4]: *** Waiting for unfinished jobs....
make[3]: *** [../scripts/Makefile.build:548: mm] Error 2
make[3]: *** Waiting for unfinished jobs....
make[2]: *** [/home/lkhack/linux/Makefile:2143: .] Error 2
make[1]: *** [/home/lkhack/linux/Makefile:248: __sub-make] Error 2
make: *** [Makefile:248: __sub-make] Error 2
Fix by implementing the function for CONFIG_TRANSPARENT_HUGEPPAGE unset
case.
Fixes: https://lore.kernel.org/20260527204757.2544958-10-hannes@cmpxchg.org
Signed-off-by: SeongJae Park <sj@kernel.org>
---
include/linux/huge_mm.h | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index f6c2531a27a35..055de7b8ed487 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -663,6 +663,11 @@ static inline int folio_split(struct folio *folio, unsigned int new_order,
return -EINVAL;
}
+static inline int folio_memcg_alloc_deferred(struct folio *folio)
+{
+ return 0;
+}
+
static inline void deferred_split_folio(struct folio *folio, bool partially_mapped) {}
#define split_huge_pmd(__vma, __pmd, __address) \
do { } while (0)
--
2.47.3
next prev parent reply other threads:[~2026-05-28 7:08 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-27 20:45 [PATCH v5 0/9] mm: switch THP shrinker to list_lru Johannes Weiner
2026-05-27 20:45 ` [PATCH v5 1/9] mm: list_lru: fix set_shrinker_bit() call during race with cgroup deletion Johannes Weiner
2026-05-28 13:25 ` Usama Arif
2026-05-30 2:38 ` Wei Yang
2026-05-27 20:45 ` [PATCH v5 2/9] mm: list_lru: lock_list_lru_of_memcg() cannot return NULL if !skip_empty Johannes Weiner
2026-05-27 20:45 ` [PATCH v5 3/9] mm: list_lru: deduplicate unlock_list_lru() Johannes Weiner
2026-05-27 20:45 ` [PATCH v5 4/9] mm: list_lru: move list dead check to lock_list_lru_of_memcg() Johannes Weiner
2026-05-27 20:45 ` [PATCH v5 5/9] mm: list_lru: deduplicate lock_list_lru() Johannes Weiner
2026-05-29 9:56 ` Wei Yang
2026-05-29 13:42 ` Johannes Weiner
2026-05-30 1:25 ` Wei Yang
2026-05-27 20:45 ` [PATCH v5 6/9] mm: list_lru: introduce caller locking for additions and deletions Johannes Weiner
2026-05-27 20:45 ` [PATCH v5 7/9] mm: list_lru: introduce folio_memcg_list_lru_alloc() Johannes Weiner
2026-05-27 20:45 ` [PATCH v5 8/9] mm: memory: flatten alloc_anon_folio() retry loop Johannes Weiner
2026-05-30 9:06 ` Dev Jain
2026-05-27 20:45 ` [PATCH v5 9/9] mm: switch deferred split shrinker to list_lru Johannes Weiner
2026-05-28 7:08 ` SeongJae Park [this message]
2026-05-28 14:03 ` Johannes Weiner
2026-05-28 13:32 ` Usama Arif
2026-05-28 14:02 ` Johannes Weiner
2026-05-28 15:31 ` Usama Arif
2026-05-29 17:33 ` Kairui Song
2026-05-31 8:00 ` Wei Yang
2026-06-01 10:39 ` Lance Yang
2026-06-01 11:09 ` Lance Yang
2026-06-01 13:21 ` Lance Yang
2026-06-01 18:17 ` Johannes Weiner
2026-06-01 8:36 ` [PATCH v5 0/9] mm: switch THP " Lance Yang
2026-06-02 21:46 ` Johannes Weiner
2026-06-03 4:44 ` Lance Yang
2026-06-03 11:41 ` Johannes Weiner
2026-06-03 11:53 ` Lance Yang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260528070807.144064-1-sj@kernel.org \
--to=sj@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=cgroups@vger.kernel.org \
--cc=david@fromorbit.com \
--cc=david@kernel.org \
--cc=dev.jain@arm.com \
--cc=gor@linux.ibm.com \
--cc=hannes@cmpxchg.org \
--cc=kas@kernel.org \
--cc=lance.yang@linux.dev \
--cc=liam@infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=mhocko@kernel.org \
--cc=muchun.song@linux.dev \
--cc=npache@redhat.com \
--cc=qi.zheng@linux.dev \
--cc=roman.gushchin@linux.dev \
--cc=ryan.roberts@arm.com \
--cc=ryncsn@gmail.com \
--cc=shakeel.butt@linux.dev \
--cc=usama.arif@linux.dev \
--cc=vbabka@kernel.org \
--cc=yosry.ahmed@linux.dev \
--cc=zaslonko@linux.ibm.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.