* [RFC PATCH] mm: Avoiding split large folios if swap has no space
@ 2026-06-18 22:17 Barry Song (Xiaomi)
2026-06-18 23:46 ` Nico Pache
` (3 more replies)
0 siblings, 4 replies; 10+ messages in thread
From: Barry Song (Xiaomi) @ 2026-06-18 22:17 UTC (permalink / raw)
To: akpm, linux-mm
Cc: linux-kernel, Barry Song (Xiaomi), Nanzhe Zhao, David Hildenbrand,
Lorenzo Stoakes, Zi Yan, Baolin Wang, Liam R . Howlett,
Nico Pache, Ryan Roberts, Dev Jain, Lance Yang, Kairui Song,
Qi Zheng, Shakeel Butt, Axel Rasmussen, Yuanchu Xie, Wei Xu
When swap is disabled or exhausted, swap slot allocation
may fail during swapout, causing large folios to be split
into small folios. The splitting is reasonable when we
truly fail to obtain contiguous swap slots, but it is
pointless in the no-space case.
A simple way to reproduce this is to invoke MADV_PAGEOUT on
a system with mTHP enabled but without swap configured.
#define SIZE (16 * 1024 * 1024)
int main(void)
{
char *buf = mmap(NULL, SIZE, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
memset(buf, 1, SIZE);
madvise(buf, SIZE, MADV_PAGEOUT);
munmap(buf, SIZE);
return 0;
}
With 16KB mTHP enabled, we observe:
~ # cat /sys/kernel/mm/transparent_hugepage/hugepages-16kB/stats/split
1024
This patch checks swap space before splitting. If there is
no available space, it skips splitting. After the patch, we
observe:
~ # cat /sys/kernel/mm/transparent_hugepage/hugepages-16kB/stats/split
0
Reported-by: Nanzhe Zhao <zhaonanzhe@xiaomi.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Liam R. Howlett <liam@infradead.org>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Kairui Song <kasong@tencent.com>
Cc: Qi Zheng <qi.zheng@linux.dev>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Wei Xu <weixugc@google.com>
Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
---
mm/vmscan.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 299b5d9e8836..33f84a5fe7ee 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -339,8 +339,7 @@ static bool can_demote(int nid, struct scan_control *sc,
return !nodes_empty(allowed_mask);
}
-static inline bool can_reclaim_anon_pages(struct mem_cgroup *memcg,
- int nid,
+static inline bool __can_reclaim_anon_pages(struct mem_cgroup *memcg,
struct scan_control *sc)
{
if (memcg == NULL) {
@@ -356,6 +355,16 @@ static inline bool can_reclaim_anon_pages(struct mem_cgroup *memcg,
return true;
}
+ return false;
+}
+
+static inline bool can_reclaim_anon_pages(struct mem_cgroup *memcg,
+ int nid,
+ struct scan_control *sc)
+{
+ if (__can_reclaim_anon_pages(memcg, sc))
+ return true;
+
/*
* The page can not be swapped.
*
@@ -1280,6 +1289,8 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
if (!folio_test_large(folio))
goto activate_locked_split;
+ if (!__can_reclaim_anon_pages(memcg, sc))
+ goto activate_locked_split;
/* Fallback to swap normal pages */
if (split_folio_to_list(folio, folio_list))
goto activate_locked;
--
2.39.3 (Apple Git-146)
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [RFC PATCH] mm: Avoiding split large folios if swap has no space
2026-06-18 22:17 [RFC PATCH] mm: Avoiding split large folios if swap has no space Barry Song (Xiaomi)
@ 2026-06-18 23:46 ` Nico Pache
2026-06-19 0:59 ` Barry Song
2026-06-19 14:01 ` David Hildenbrand (Arm)
` (2 subsequent siblings)
3 siblings, 1 reply; 10+ messages in thread
From: Nico Pache @ 2026-06-18 23:46 UTC (permalink / raw)
To: Barry Song (Xiaomi)
Cc: akpm, linux-mm, linux-kernel, Nanzhe Zhao, David Hildenbrand,
Lorenzo Stoakes, Zi Yan, Baolin Wang, Liam R . Howlett,
Ryan Roberts, Dev Jain, Lance Yang, Kairui Song, Qi Zheng,
Shakeel Butt, Axel Rasmussen, Yuanchu Xie, Wei Xu
On Thu, Jun 18, 2026 at 4:17 PM Barry Song (Xiaomi) <baohua@kernel.org> wrote:
>
> When swap is disabled or exhausted, swap slot allocation
> may fail during swapout, causing large folios to be split
> into small folios. The splitting is reasonable when we
> truly fail to obtain contiguous swap slots, but it is
> pointless in the no-space case.
Hi Barry,
Are we sure splitting is pointless? Since b1f202060afe ("mm: remap
unused subpages to shared zeropage when splitting isolated thp") a
split can lead to memory reclaim.
I've been far removed from the reclaim code for quite some time but im
assuming in this case we can actually reclaim some unused memory which
might be beneficial if there is memory pressure. However there are
some discussions on whether we should further guard this zeropage
remapping behavior [1].
[1] - https://lore.kernel.org/lkml/20260609114619.144416-1-npache@redhat.com/
Cheers,
-- Nico
>
> A simple way to reproduce this is to invoke MADV_PAGEOUT on
> a system with mTHP enabled but without swap configured.
>
> #define SIZE (16 * 1024 * 1024)
> int main(void)
> {
> char *buf = mmap(NULL, SIZE, PROT_READ | PROT_WRITE,
> MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> memset(buf, 1, SIZE);
> madvise(buf, SIZE, MADV_PAGEOUT);
> munmap(buf, SIZE);
> return 0;
> }
>
> With 16KB mTHP enabled, we observe:
> ~ # cat /sys/kernel/mm/transparent_hugepage/hugepages-16kB/stats/split
> 1024
>
> This patch checks swap space before splitting. If there is
> no available space, it skips splitting. After the patch, we
> observe:
> ~ # cat /sys/kernel/mm/transparent_hugepage/hugepages-16kB/stats/split
> 0
>
> Reported-by: Nanzhe Zhao <zhaonanzhe@xiaomi.com>
> Cc: David Hildenbrand <david@kernel.org>
> Cc: Lorenzo Stoakes <ljs@kernel.org>
> Cc: Zi Yan <ziy@nvidia.com>
> Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
> Cc: Liam R. Howlett <liam@infradead.org>
> Cc: Nico Pache <npache@redhat.com>
> Cc: Ryan Roberts <ryan.roberts@arm.com>
> Cc: Dev Jain <dev.jain@arm.com>
> Cc: Lance Yang <lance.yang@linux.dev>
> Cc: Kairui Song <kasong@tencent.com>
> Cc: Qi Zheng <qi.zheng@linux.dev>
> Cc: Shakeel Butt <shakeel.butt@linux.dev>
> Cc: Axel Rasmussen <axelrasmussen@google.com>
> Cc: Yuanchu Xie <yuanchu@google.com>
> Cc: Wei Xu <weixugc@google.com>
> Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
> ---
> mm/vmscan.c | 15 +++++++++++++--
> 1 file changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 299b5d9e8836..33f84a5fe7ee 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -339,8 +339,7 @@ static bool can_demote(int nid, struct scan_control *sc,
> return !nodes_empty(allowed_mask);
> }
>
> -static inline bool can_reclaim_anon_pages(struct mem_cgroup *memcg,
> - int nid,
> +static inline bool __can_reclaim_anon_pages(struct mem_cgroup *memcg,
> struct scan_control *sc)
> {
> if (memcg == NULL) {
> @@ -356,6 +355,16 @@ static inline bool can_reclaim_anon_pages(struct mem_cgroup *memcg,
> return true;
> }
>
> + return false;
> +}
> +
> +static inline bool can_reclaim_anon_pages(struct mem_cgroup *memcg,
> + int nid,
> + struct scan_control *sc)
> +{
> + if (__can_reclaim_anon_pages(memcg, sc))
> + return true;
> +
> /*
> * The page can not be swapped.
> *
> @@ -1280,6 +1289,8 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
>
> if (!folio_test_large(folio))
> goto activate_locked_split;
> + if (!__can_reclaim_anon_pages(memcg, sc))
> + goto activate_locked_split;
> /* Fallback to swap normal pages */
> if (split_folio_to_list(folio, folio_list))
> goto activate_locked;
> --
> 2.39.3 (Apple Git-146)
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH] mm: Avoiding split large folios if swap has no space
2026-06-18 23:46 ` Nico Pache
@ 2026-06-19 0:59 ` Barry Song
0 siblings, 0 replies; 10+ messages in thread
From: Barry Song @ 2026-06-19 0:59 UTC (permalink / raw)
To: Nico Pache
Cc: akpm, linux-mm, linux-kernel, Nanzhe Zhao, David Hildenbrand,
Lorenzo Stoakes, Zi Yan, Baolin Wang, Liam R . Howlett,
Ryan Roberts, Dev Jain, Lance Yang, Kairui Song, Qi Zheng,
Shakeel Butt, Axel Rasmussen, Yuanchu Xie, Wei Xu
On Fri, Jun 19, 2026 at 7:45 AM Nico Pache <npache@redhat.com> wrote:
>
> On Thu, Jun 18, 2026 at 4:17 PM Barry Song (Xiaomi) <baohua@kernel.org> wrote:
> >
> > When swap is disabled or exhausted, swap slot allocation
> > may fail during swapout, causing large folios to be split
> > into small folios. The splitting is reasonable when we
> > truly fail to obtain contiguous swap slots, but it is
> > pointless in the no-space case.
>
> Hi Barry,
>
> Are we sure splitting is pointless? Since b1f202060afe ("mm: remap
> unused subpages to shared zeropage when splitting isolated thp") a
> split can lead to memory reclaim.
>
> I've been far removed from the reclaim code for quite some time but im
> assuming in this case we can actually reclaim some unused memory which
> might be beneficial if there is memory pressure. However there are
> some discussions on whether we should further guard this zeropage
> remapping behavior [1].
Hi Nico,
nice to hear your feedback.
We already have code to split partially unmapped subpages
to reclaim memory before folio_alloc_swap(folio):
/*
* Split partially mapped folios right away.
* We can free the unmapped pages without IO.
*/
if
(data_race(!list_empty(&folio->_deferred_list) &&
folio_test_partially_mapped(folio)) &&
split_folio_to_list(folio, folio_list))
goto activate_locked;
But it seems you are suggesting that we could split even
fully mapped folios, possibly to look for zero-filled
subpages within them?
I feel this only makes sense when there is an actual split
requirement. In cases where we are running out of swap or
swap is unavailable, forcing a split seems incorrect. Even in
a normal mTHP case without swapping involved, an mTHP can
still contain zero-filled subpages.
Recently, we did a number of experiments to relocate
zero-filled pages to zero_pfn by forcing SWAPOUT, and then
mapping them back to zero_pfn during swapin via my previous
RFC - mm: map zero-filled pages to zero_pfn while doing
swap-in[1]. However, we found no improvement in reducing
memory usage, as these zero-filled pages are quite volatile
and are quickly overwritten with non-zero data. So I gave
up efforts to detect zero-filled pages and perform
zero_pfn remapping.
In conclusion, I feel it is not useful to detect zero-filled
subpages, since they are quickly replaced by non-zero data.
In our experiments, we triggered forced SWAPOUT every minute
after detecting zero-filled pages, but found no reduction in
memory usage.
[1] https://lore.kernel.org/linux-mm/20241212073711.82300-1-21cnbao@gmail.com/
>
> [1] - https://lore.kernel.org/lkml/20260609114619.144416-1-npache@redhat.com/
Best Regards
Barry
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH] mm: Avoiding split large folios if swap has no space
2026-06-18 22:17 [RFC PATCH] mm: Avoiding split large folios if swap has no space Barry Song (Xiaomi)
2026-06-18 23:46 ` Nico Pache
@ 2026-06-19 14:01 ` David Hildenbrand (Arm)
2026-06-19 23:01 ` Barry Song
2026-06-19 14:04 ` David Hildenbrand (Arm)
2026-06-19 19:17 ` Kairui Song
3 siblings, 1 reply; 10+ messages in thread
From: David Hildenbrand (Arm) @ 2026-06-19 14:01 UTC (permalink / raw)
To: Barry Song (Xiaomi), akpm, linux-mm
Cc: linux-kernel, Nanzhe Zhao, Lorenzo Stoakes, Zi Yan, Baolin Wang,
Liam R . Howlett, Nico Pache, Ryan Roberts, Dev Jain, Lance Yang,
Kairui Song, Qi Zheng, Shakeel Butt, Axel Rasmussen, Yuanchu Xie,
Wei Xu
On 6/19/26 00:17, Barry Song (Xiaomi) wrote:
> When swap is disabled or exhausted, swap slot allocation
> may fail during swapout, causing large folios to be split
> into small folios. The splitting is reasonable when we
> truly fail to obtain contiguous swap slots, but it is
> pointless in the no-space case.
>
> A simple way to reproduce this is to invoke MADV_PAGEOUT on
> a system with mTHP enabled but without swap configured.
Would we ever run into that code without MADV_PAGEOUT, or is that a
MADV_PAGEOUT-special thing?
I recall that can_reclaim_anon_pages() would make vmscan skip such code, but
it's tricky.
--
Cheers,
David
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH] mm: Avoiding split large folios if swap has no space
2026-06-18 22:17 [RFC PATCH] mm: Avoiding split large folios if swap has no space Barry Song (Xiaomi)
2026-06-18 23:46 ` Nico Pache
2026-06-19 14:01 ` David Hildenbrand (Arm)
@ 2026-06-19 14:04 ` David Hildenbrand (Arm)
2026-06-20 8:10 ` Barry Song (Xiaomi)
2026-06-19 19:17 ` Kairui Song
3 siblings, 1 reply; 10+ messages in thread
From: David Hildenbrand (Arm) @ 2026-06-19 14:04 UTC (permalink / raw)
To: Barry Song (Xiaomi), akpm, linux-mm
Cc: linux-kernel, Nanzhe Zhao, Lorenzo Stoakes, Zi Yan, Baolin Wang,
Liam R . Howlett, Nico Pache, Ryan Roberts, Dev Jain, Lance Yang,
Kairui Song, Qi Zheng, Shakeel Butt, Axel Rasmussen, Yuanchu Xie,
Wei Xu
On 6/19/26 00:17, Barry Song (Xiaomi) wrote:
> When swap is disabled or exhausted, swap slot allocation
> may fail during swapout, causing large folios to be split
> into small folios. The splitting is reasonable when we
> truly fail to obtain contiguous swap slots, but it is
> pointless in the no-space case.
>
> A simple way to reproduce this is to invoke MADV_PAGEOUT on
> a system with mTHP enabled but without swap configured.
>
> #define SIZE (16 * 1024 * 1024)
> int main(void)
> {
> char *buf = mmap(NULL, SIZE, PROT_READ | PROT_WRITE,
> MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> memset(buf, 1, SIZE);
> madvise(buf, SIZE, MADV_PAGEOUT);
> munmap(buf, SIZE);
> return 0;
> }
>
> With 16KB mTHP enabled, we observe:
> ~ # cat /sys/kernel/mm/transparent_hugepage/hugepages-16kB/stats/split
> 1024
>
> This patch checks swap space before splitting. If there is
> no available space, it skips splitting. After the patch, we
> observe:
> ~ # cat /sys/kernel/mm/transparent_hugepage/hugepages-16kB/stats/split
> 0
>
> Reported-by: Nanzhe Zhao <zhaonanzhe@xiaomi.com>
> Cc: David Hildenbrand <david@kernel.org>
> Cc: Lorenzo Stoakes <ljs@kernel.org>
> Cc: Zi Yan <ziy@nvidia.com>
> Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
> Cc: Liam R. Howlett <liam@infradead.org>
> Cc: Nico Pache <npache@redhat.com>
> Cc: Ryan Roberts <ryan.roberts@arm.com>
> Cc: Dev Jain <dev.jain@arm.com>
> Cc: Lance Yang <lance.yang@linux.dev>
> Cc: Kairui Song <kasong@tencent.com>
> Cc: Qi Zheng <qi.zheng@linux.dev>
> Cc: Shakeel Butt <shakeel.butt@linux.dev>
> Cc: Axel Rasmussen <axelrasmussen@google.com>
> Cc: Yuanchu Xie <yuanchu@google.com>
> Cc: Wei Xu <weixugc@google.com>
> Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
> ---
> mm/vmscan.c | 15 +++++++++++++--
> 1 file changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 299b5d9e8836..33f84a5fe7ee 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -339,8 +339,7 @@ static bool can_demote(int nid, struct scan_control *sc,
> return !nodes_empty(allowed_mask);
> }
>
> -static inline bool can_reclaim_anon_pages(struct mem_cgroup *memcg,
> - int nid,
> +static inline bool __can_reclaim_anon_pages(struct mem_cgroup *memcg,
> struct scan_control *sc)
> {
> if (memcg == NULL) {
> @@ -356,6 +355,16 @@ static inline bool can_reclaim_anon_pages(struct mem_cgroup *memcg,
> return true;
> }
>
> + return false;
> +}
> +
> +static inline bool can_reclaim_anon_pages(struct mem_cgroup *memcg,
> + int nid,
> + struct scan_control *sc)
> +{
> + if (__can_reclaim_anon_pages(memcg, sc))
> + return true;
> +
> /*
> * The page can not be swapped.
> *
> @@ -1280,6 +1289,8 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
>
> if (!folio_test_large(folio))
> goto activate_locked_split;
> + if (!__can_reclaim_anon_pages(memcg, sc))
> + goto activate_locked_split;
Why are we even trying to allocate swap space if we cannot reclaim such pages?
Makes we wonder whether we would want to have that check earlier, before the
folio_alloc_swap().
Any downsides?
--
Cheers,
David
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH] mm: Avoiding split large folios if swap has no space
2026-06-18 22:17 [RFC PATCH] mm: Avoiding split large folios if swap has no space Barry Song (Xiaomi)
` (2 preceding siblings ...)
2026-06-19 14:04 ` David Hildenbrand (Arm)
@ 2026-06-19 19:17 ` Kairui Song
2026-06-19 22:42 ` Barry Song (Xiaomi)
3 siblings, 1 reply; 10+ messages in thread
From: Kairui Song @ 2026-06-19 19:17 UTC (permalink / raw)
To: Barry Song (Xiaomi)
Cc: akpm, linux-mm, linux-kernel, Nanzhe Zhao, David Hildenbrand,
Lorenzo Stoakes, Zi Yan, Baolin Wang, Liam R . Howlett,
Nico Pache, Ryan Roberts, Dev Jain, Lance Yang, Qi Zheng,
Shakeel Butt, Axel Rasmussen, Yuanchu Xie, Wei Xu
On Fri, Jun 19, 2026 at 6:17 AM Barry Song (Xiaomi) <baohua@kernel.org> wrote:
>
> When swap is disabled or exhausted, swap slot allocation
> may fail during swapout, causing large folios to be split
> into small folios. The splitting is reasonable when we
> truly fail to obtain contiguous swap slots, but it is
> pointless in the no-space case.
>
> A simple way to reproduce this is to invoke MADV_PAGEOUT on
> a system with mTHP enabled but without swap configured.
>
> #define SIZE (16 * 1024 * 1024)
> int main(void)
> {
> char *buf = mmap(NULL, SIZE, PROT_READ | PROT_WRITE,
> MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> memset(buf, 1, SIZE);
> madvise(buf, SIZE, MADV_PAGEOUT);
> munmap(buf, SIZE);
> return 0;
> }
>
> With 16KB mTHP enabled, we observe:
> ~ # cat /sys/kernel/mm/transparent_hugepage/hugepages-16kB/stats/split
> 1024
>
> This patch checks swap space before splitting. If there is
> no available space, it skips splitting. After the patch, we
> observe:
> ~ # cat /sys/kernel/mm/transparent_hugepage/hugepages-16kB/stats/split
> 0
>
> Reported-by: Nanzhe Zhao <zhaonanzhe@xiaomi.com>
> Cc: David Hildenbrand <david@kernel.org>
> Cc: Lorenzo Stoakes <ljs@kernel.org>
> Cc: Zi Yan <ziy@nvidia.com>
> Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
> Cc: Liam R. Howlett <liam@infradead.org>
> Cc: Nico Pache <npache@redhat.com>
> Cc: Ryan Roberts <ryan.roberts@arm.com>
> Cc: Dev Jain <dev.jain@arm.com>
> Cc: Lance Yang <lance.yang@linux.dev>
> Cc: Kairui Song <kasong@tencent.com>
> Cc: Qi Zheng <qi.zheng@linux.dev>
> Cc: Shakeel Butt <shakeel.butt@linux.dev>
> Cc: Axel Rasmussen <axelrasmussen@google.com>
> Cc: Yuanchu Xie <yuanchu@google.com>
> Cc: Wei Xu <weixugc@google.com>
> Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
> ---
> mm/vmscan.c | 15 +++++++++++++--
> 1 file changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 299b5d9e8836..33f84a5fe7ee 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -339,8 +339,7 @@ static bool can_demote(int nid, struct scan_control *sc,
> return !nodes_empty(allowed_mask);
> }
>
> -static inline bool can_reclaim_anon_pages(struct mem_cgroup *memcg,
> - int nid,
> +static inline bool __can_reclaim_anon_pages(struct mem_cgroup *memcg,
> struct scan_control *sc)
> {
> if (memcg == NULL) {
> @@ -356,6 +355,16 @@ static inline bool can_reclaim_anon_pages(struct mem_cgroup *memcg,
> return true;
> }
>
> + return false;
> +}
> +
> +static inline bool can_reclaim_anon_pages(struct mem_cgroup *memcg,
> + int nid,
> + struct scan_control *sc)
> +{
> + if (__can_reclaim_anon_pages(memcg, sc))
> + return true;
> +
> /*
> * The page can not be swapped.
> *
> @@ -1280,6 +1289,8 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
>
> if (!folio_test_large(folio))
> goto activate_locked_split;
> + if (!__can_reclaim_anon_pages(memcg, sc))
> + goto activate_locked_split;
> /* Fallback to swap normal pages */
> if (split_folio_to_list(folio, folio_list))
> goto activate_locked;
Hello Barry,
Thanks for raising this issue. I saw a similar issue report in the
mail list before and was thinking that, perhaps another approach is to
let folio_alloc_swap return a more detailed error code, for example:
- 1. the mem_cgroup_try_charge_swap in it failed
- 2. allocation failed but nr_swap_pages > folio size
- 3. allocation failed because all devices are full or unusable
(roughly nr_swap_pages < folio size)
Only case 2 requires splitting. __can_reclaim_anon_pages also checks
demote which is not related to swap.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH] mm: Avoiding split large folios if swap has no space
2026-06-19 19:17 ` Kairui Song
@ 2026-06-19 22:42 ` Barry Song (Xiaomi)
2026-06-20 7:19 ` Kairui Song
0 siblings, 1 reply; 10+ messages in thread
From: Barry Song (Xiaomi) @ 2026-06-19 22:42 UTC (permalink / raw)
To: ryncsn
Cc: akpm, axelrasmussen, baohua, baolin.wang, david, dev.jain,
lance.yang, liam, linux-kernel, linux-mm, ljs, npache, qi.zheng,
ryan.roberts, shakeel.butt, weixugc, yuanchu, zhaonanzhe, ziy
On Sat, Jun 20, 2026 at 3:18 AM Kairui Song <ryncsn@gmail.com> wrote:
>
> On Fri, Jun 19, 2026 at 6:17 AM Barry Song (Xiaomi) <baohua@kernel.org> wrote:
> >
> > When swap is disabled or exhausted, swap slot allocation
> > may fail during swapout, causing large folios to be split
> > into small folios. The splitting is reasonable when we
> > truly fail to obtain contiguous swap slots, but it is
> > pointless in the no-space case.
> >
> > A simple way to reproduce this is to invoke MADV_PAGEOUT on
> > a system with mTHP enabled but without swap configured.
> >
> > #define SIZE (16 * 1024 * 1024)
> > int main(void)
> > {
> > char *buf = mmap(NULL, SIZE, PROT_READ | PROT_WRITE,
> > MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> > memset(buf, 1, SIZE);
> > madvise(buf, SIZE, MADV_PAGEOUT);
> > munmap(buf, SIZE);
> > return 0;
> > }
> >
> > With 16KB mTHP enabled, we observe:
> > ~ # cat /sys/kernel/mm/transparent_hugepage/hugepages-16kB/stats/split
> > 1024
> >
> > This patch checks swap space before splitting. If there is
> > no available space, it skips splitting. After the patch, we
> > observe:
> > ~ # cat /sys/kernel/mm/transparent_hugepage/hugepages-16kB/stats/split
> > 0
> >
> > Reported-by: Nanzhe Zhao <zhaonanzhe@xiaomi.com>
> > Cc: David Hildenbrand <david@kernel.org>
> > Cc: Lorenzo Stoakes <ljs@kernel.org>
> > Cc: Zi Yan <ziy@nvidia.com>
> > Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
> > Cc: Liam R. Howlett <liam@infradead.org>
> > Cc: Nico Pache <npache@redhat.com>
> > Cc: Ryan Roberts <ryan.roberts@arm.com>
> > Cc: Dev Jain <dev.jain@arm.com>
> > Cc: Lance Yang <lance.yang@linux.dev>
> > Cc: Kairui Song <kasong@tencent.com>
> > Cc: Qi Zheng <qi.zheng@linux.dev>
> > Cc: Shakeel Butt <shakeel.butt@linux.dev>
> > Cc: Axel Rasmussen <axelrasmussen@google.com>
> > Cc: Yuanchu Xie <yuanchu@google.com>
> > Cc: Wei Xu <weixugc@google.com>
> > Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
> > ---
> > mm/vmscan.c | 15 +++++++++++++--
> > 1 file changed, 13 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 299b5d9e8836..33f84a5fe7ee 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -339,8 +339,7 @@ static bool can_demote(int nid, struct scan_control *sc,
> > return !nodes_empty(allowed_mask);
> > }
> >
> > -static inline bool can_reclaim_anon_pages(struct mem_cgroup *memcg,
> > - int nid,
> > +static inline bool __can_reclaim_anon_pages(struct mem_cgroup *memcg,
> > struct scan_control *sc)
> > {
> > if (memcg == NULL) {
> > @@ -356,6 +355,16 @@ static inline bool can_reclaim_anon_pages(struct mem_cgroup *memcg,
> > return true;
> > }
> >
> > + return false;
> > +}
> > +
> > +static inline bool can_reclaim_anon_pages(struct mem_cgroup *memcg,
> > + int nid,
> > + struct scan_control *sc)
> > +{
> > + if (__can_reclaim_anon_pages(memcg, sc))
> > + return true;
> > +
> > /*
> > * The page can not be swapped.
> > *
> > @@ -1280,6 +1289,8 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
> >
> > if (!folio_test_large(folio))
> > goto activate_locked_split;
> > + if (!__can_reclaim_anon_pages(memcg, sc))
> > + goto activate_locked_split;
> > /* Fallback to swap normal pages */
> > if (split_folio_to_list(folio, folio_list))
> > goto activate_locked;
>
> Hello Barry,
>
> Thanks for raising this issue. I saw a similar issue report in the
> mail list before and was thinking that, perhaps another approach is to
Hi Kairui,
Could you please post the link to your report? I'd like to add
your Reported-by and Closes tags as well.
> let folio_alloc_swap return a more detailed error code, for example:
>
> - 1. the mem_cgroup_try_charge_swap in it failed
> - 2. allocation failed but nr_swap_pages > folio size
> - 3. allocation failed because all devices are full or unusable
> (roughly nr_swap_pages < folio size)
>
folio_alloc_swap() returns error codes such as -EAGAIN,
-EINVAL, and -ENOMEM. For cases 1, 2, and 3, I assume it
would return -ENOMEM?
I assume you mean that we might want folio_alloc_swap() to
return an enum instead?
another approach is that I can return -EAGAIN for those cases
we want to retry swapping-out after splitting folios:
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 78b49b0658ad..62e2c506ccae 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1755,6 +1755,9 @@ int folio_alloc_swap(struct folio *folio)
VM_WARN_ON_ONCE(1);
return -EINVAL;
}
+
+ if (get_nr_swap_pages() < (1 << order))
+ return -ENOMEM;
}
again:
@@ -1769,11 +1772,13 @@ int folio_alloc_swap(struct folio *folio)
}
/* Need to call this even if allocation failed, for MEMCG_SWAP_FAIL. */
- if (unlikely(mem_cgroup_try_charge_swap(folio)))
+ if (unlikely(mem_cgroup_try_charge_swap(folio))) {
swap_cache_del_folio(folio);
+ return -ENOMEM;
+ }
if (unlikely(!folio_test_swapcache(folio)))
- return -ENOMEM;
+ return -EAGAIN;
return 0;
}
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 299b5d9e8836..4c4cbd72c013 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1257,6 +1257,8 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
*/
if (folio_test_anon(folio) && folio_test_swapbacked(folio) &&
!folio_test_swapcache(folio)) {
+ int ret;
+
if (!(sc->gfp_mask & __GFP_IO))
goto keep_locked;
if (folio_maybe_dma_pinned(folio))
@@ -1275,25 +1277,24 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
split_folio_to_list(folio, folio_list))
goto activate_locked;
}
- if (folio_alloc_swap(folio)) {
- int __maybe_unused order = folio_order(folio);
+ ret = folio_alloc_swap(folio);
+ if (!folio_test_large(folio) || ret != -EAGAIN)
+ goto activate_locked_split;
- if (!folio_test_large(folio))
- goto activate_locked_split;
- /* Fallback to swap normal pages */
- if (split_folio_to_list(folio, folio_list))
- goto activate_locked;
+ /* Fallback to swap normal pages */
+ if (split_folio_to_list(folio, folio_list))
+ goto activate_locked;
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
- if (nr_pages >= HPAGE_PMD_NR) {
- count_memcg_folio_events(folio,
+ if (nr_pages >= HPAGE_PMD_NR) {
+ count_memcg_folio_events(folio,
THP_SWPOUT_FALLBACK, 1);
- count_vm_event(THP_SWPOUT_FALLBACK);
- }
-#endif
- count_mthp_stat(order, MTHP_STAT_SWPOUT_FALLBACK);
- if (folio_alloc_swap(folio))
- goto activate_locked_split;
+ count_vm_event(THP_SWPOUT_FALLBACK);
}
+#endif
+ count_mthp_stat(folio_order(folio), MTHP_STAT_SWPOUT_FALLBACK);
+ if (folio_alloc_swap(folio))
+ goto activate_locked_split;
+
/*
* Normally the folio will be dirtied in unmap because
* its pte should be dirty. A special case is MADV_FREE
> Only case 2 requires splitting. __can_reclaim_anon_pages also checks
> demote which is not related to swap.
I actually extracted __can_reclaim_anon_pages(), which only
checks swap, whereas can_reclaim_anon_pages() checks both
swap and demotion. :-)
Best Regards
Barry
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [RFC PATCH] mm: Avoiding split large folios if swap has no space
2026-06-19 14:01 ` David Hildenbrand (Arm)
@ 2026-06-19 23:01 ` Barry Song
0 siblings, 0 replies; 10+ messages in thread
From: Barry Song @ 2026-06-19 23:01 UTC (permalink / raw)
To: David Hildenbrand (Arm)
Cc: akpm, linux-mm, linux-kernel, Nanzhe Zhao, Lorenzo Stoakes,
Zi Yan, Baolin Wang, Liam R . Howlett, Nico Pache, Ryan Roberts,
Dev Jain, Lance Yang, Kairui Song, Qi Zheng, Shakeel Butt,
Axel Rasmussen, Yuanchu Xie, Wei Xu
On Fri, Jun 19, 2026 at 10:02 PM David Hildenbrand (Arm)
<david@kernel.org> wrote:
>
> On 6/19/26 00:17, Barry Song (Xiaomi) wrote:
> > When swap is disabled or exhausted, swap slot allocation
> > may fail during swapout, causing large folios to be split
> > into small folios. The splitting is reasonable when we
> > truly fail to obtain contiguous swap slots, but it is
> > pointless in the no-space case.
> >
> > A simple way to reproduce this is to invoke MADV_PAGEOUT on
> > a system with mTHP enabled but without swap configured.
>
> Would we ever run into that code without MADV_PAGEOUT, or is that a
> MADV_PAGEOUT-special thing?
>
> I recall that can_reclaim_anon_pages() would make vmscan skip such code, but
> it's tricky.
I guess there could also be cases such as
get_nr_swap_pages() == 1 while folio_nr_pages() > 1,
for example, where can_reclaim_anon_pages() cannot help.
Best Regards
Barry
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH] mm: Avoiding split large folios if swap has no space
2026-06-19 22:42 ` Barry Song (Xiaomi)
@ 2026-06-20 7:19 ` Kairui Song
0 siblings, 0 replies; 10+ messages in thread
From: Kairui Song @ 2026-06-20 7:19 UTC (permalink / raw)
To: Barry Song (Xiaomi)
Cc: akpm, axelrasmussen, baolin.wang, david, dev.jain, lance.yang,
liam, linux-kernel, linux-mm, ljs, npache, qi.zheng, ryan.roberts,
shakeel.butt, weixugc, yuanchu, zhaonanzhe, ziy
On Sat, Jun 20, 2026 at 6:42 AM Barry Song (Xiaomi) <baohua@kernel.org> wrote:
>
> On Sat, Jun 20, 2026 at 3:18 AM Kairui Song <ryncsn@gmail.com> wrote:
> >
> >
> > Hello Barry,
> >
> > Thanks for raising this issue. I saw a similar issue report in the
> > mail list before and was thinking that, perhaps another approach is to
>
> Hi Kairui,
>
> Could you please post the link to your report? I'd like to add
> your Reported-by and Closes tags as well.
Hello, here is my previous reply:
https://lore.kernel.org/linux-mm/CAMgjq7AD2ivdgD_kRx+0+w0M=Fvz5dXepw84_Ui_q7=97B4Ofw@mail.gmail.com/
It's the same issue, not reported by me though.
>
> > let folio_alloc_swap return a more detailed error code, for example:
> >
> > - 1. the mem_cgroup_try_charge_swap in it failed
> > - 2. allocation failed but nr_swap_pages > folio size
> > - 3. allocation failed because all devices are full or unusable
> > (roughly nr_swap_pages < folio size)
> >
>
> folio_alloc_swap() returns error codes such as -EAGAIN,
> -EINVAL, and -ENOMEM. For cases 1, 2, and 3, I assume it
> would return -ENOMEM?
>
> I assume you mean that we might want folio_alloc_swap() to
> return an enum instead?
>
> another approach is that I can return -EAGAIN for those cases
> we want to retry swapping-out after splitting folios:
Right, I don't have a strong preference on this.
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 78b49b0658ad..62e2c506ccae 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -1755,6 +1755,9 @@ int folio_alloc_swap(struct folio *folio)
> VM_WARN_ON_ONCE(1);
> return -EINVAL;
> }
> +
> + if (get_nr_swap_pages() < (1 << order))
> + return -ENOMEM;
> }
>
> again:
> @@ -1769,11 +1772,13 @@ int folio_alloc_swap(struct folio *folio)
> }
>
> /* Need to call this even if allocation failed, for MEMCG_SWAP_FAIL. */
> - if (unlikely(mem_cgroup_try_charge_swap(folio)))
> + if (unlikely(mem_cgroup_try_charge_swap(folio))) {
> swap_cache_del_folio(folio);
> + return -ENOMEM;
> + }
>
> if (unlikely(!folio_test_swapcache(folio)))
> - return -ENOMEM;
> + return -EAGAIN;
>
> return 0;
> }
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 299b5d9e8836..4c4cbd72c013 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1257,6 +1257,8 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
> */
> if (folio_test_anon(folio) && folio_test_swapbacked(folio) &&
> !folio_test_swapcache(folio)) {
> + int ret;
> +
> if (!(sc->gfp_mask & __GFP_IO))
> goto keep_locked;
> if (folio_maybe_dma_pinned(folio))
> @@ -1275,25 +1277,24 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
> split_folio_to_list(folio, folio_list))
> goto activate_locked;
> }
> - if (folio_alloc_swap(folio)) {
> - int __maybe_unused order = folio_order(folio);
> + ret = folio_alloc_swap(folio);
> + if (!folio_test_large(folio) || ret != -EAGAIN)
> + goto activate_locked_split;
>
> - if (!folio_test_large(folio))
> - goto activate_locked_split;
> - /* Fallback to swap normal pages */
> - if (split_folio_to_list(folio, folio_list))
> - goto activate_locked;
> + /* Fallback to swap normal pages */
> + if (split_folio_to_list(folio, folio_list))
> + goto activate_locked;
> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> - if (nr_pages >= HPAGE_PMD_NR) {
> - count_memcg_folio_events(folio,
> + if (nr_pages >= HPAGE_PMD_NR) {
> + count_memcg_folio_events(folio,
> THP_SWPOUT_FALLBACK, 1);
> - count_vm_event(THP_SWPOUT_FALLBACK);
> - }
> -#endif
> - count_mthp_stat(order, MTHP_STAT_SWPOUT_FALLBACK);
> - if (folio_alloc_swap(folio))
> - goto activate_locked_split;
> + count_vm_event(THP_SWPOUT_FALLBACK);
> }
> +#endif
> + count_mthp_stat(folio_order(folio), MTHP_STAT_SWPOUT_FALLBACK);
> + if (folio_alloc_swap(folio))
> + goto activate_locked_split;
> +
> /*
> * Normally the folio will be dirtied in unmap because
> * its pte should be dirty. A special case is MADV_FREE
>
> > Only case 2 requires splitting. __can_reclaim_anon_pages also checks
> > demote which is not related to swap.
>
> I actually extracted __can_reclaim_anon_pages(), which only
> checks swap, whereas can_reclaim_anon_pages() checks both
> swap and demotion. :-)
Ah yeah you are right, I didn't see that part.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH] mm: Avoiding split large folios if swap has no space
2026-06-19 14:04 ` David Hildenbrand (Arm)
@ 2026-06-20 8:10 ` Barry Song (Xiaomi)
0 siblings, 0 replies; 10+ messages in thread
From: Barry Song (Xiaomi) @ 2026-06-20 8:10 UTC (permalink / raw)
To: david
Cc: akpm, axelrasmussen, baohua, baolin.wang, dev.jain, kasong,
lance.yang, liam, linux-kernel, linux-mm, ljs, npache, qi.zheng,
ryan.roberts, shakeel.butt, weixugc, yuanchu, zhaonanzhe, ziy
On Fri, Jun 19, 2026 at 10:04 PM David Hildenbrand (Arm) <david@kernel.org> wrote:
[...]
> > /*
> > * The page can not be swapped.
> > *
> > @@ -1280,6 +1289,8 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
> >
> > if (!folio_test_large(folio))
> > goto activate_locked_split;
> > + if (!__can_reclaim_anon_pages(memcg, sc))
> > + goto activate_locked_split;
>
> Why are we even trying to allocate swap space if we cannot reclaim such pages?
> Makes we wonder whether we would want to have that check earlier, before the
> folio_alloc_swap().
>
> Any downsides?
I don't think there are any obvious downsides there. One issue is that
the memcg may not be passed from reclaim_pages(), so memcg would
always be NULL. However, the folio could still belong to a memcg
whose swap quota has been exhausted. In that case, my
__can_reclaim_anon_pages() will fail when checking whether we can
swap out. But switching to folio_memcg() also seems awkward.
So I feel Kairui’s suggestion [1] might be the best approach. In
folio_alloc_swap(), we return -EAGAIN to tell vmscan.c that
we can split the folio and retry the swap-out.
only when there are sufficient swap slots and sufficient memcg swap
quota do we return -EAGAIN, allowing vmscan to perform a split.
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 78b49b0658ad..62e2c506ccae 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1755,6 +1755,9 @@ int folio_alloc_swap(struct folio *folio)
VM_WARN_ON_ONCE(1);
return -EINVAL;
}
+
+ if (get_nr_swap_pages() < (1 << order))
+ return -ENOMEM;
}
again:
@@ -1769,11 +1772,13 @@ int folio_alloc_swap(struct folio *folio)
}
/* Need to call this even if allocation failed, for MEMCG_SWAP_FAIL. */
- if (unlikely(mem_cgroup_try_charge_swap(folio)))
+ if (unlikely(mem_cgroup_try_charge_swap(folio))) {
swap_cache_del_folio(folio);
+ return -ENOMEM;
+ }
if (unlikely(!folio_test_swapcache(folio)))
- return -ENOMEM;
+ return -EAGAIN;
return 0;
}
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 299b5d9e8836..63e8578454ea 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1257,6 +1257,8 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
*/
if (folio_test_anon(folio) && folio_test_swapbacked(folio) &&
!folio_test_swapcache(folio)) {
+ int ret;
+
if (!(sc->gfp_mask & __GFP_IO))
goto keep_locked;
if (folio_maybe_dma_pinned(folio))
@@ -1275,10 +1277,10 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
split_folio_to_list(folio, folio_list))
goto activate_locked;
}
- if (folio_alloc_swap(folio)) {
+ if ((ret = folio_alloc_swap(folio))) {
int __maybe_unused order = folio_order(folio);
- if (!folio_test_large(folio))
+ if (!folio_test_large(folio) || ret != -EAGAIN)
goto activate_locked_split;
/* Fallback to swap normal pages */
if (split_folio_to_list(folio, folio_list))
What’s your view on this, David?
[1] https://lore.kernel.org/linux-mm/CAMgjq7Bmi2XYxPc4tVO8KTWSuj8jpt-c-JueqYRK1kLC_nmBqA@mail.gmail.com/
Best Regards
Barry
^ permalink raw reply related [flat|nested] 10+ messages in thread
end of thread, other threads:[~2026-06-20 14:08 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-18 22:17 [RFC PATCH] mm: Avoiding split large folios if swap has no space Barry Song (Xiaomi)
2026-06-18 23:46 ` Nico Pache
2026-06-19 0:59 ` Barry Song
2026-06-19 14:01 ` David Hildenbrand (Arm)
2026-06-19 23:01 ` Barry Song
2026-06-19 14:04 ` David Hildenbrand (Arm)
2026-06-20 8:10 ` Barry Song (Xiaomi)
2026-06-19 19:17 ` Kairui Song
2026-06-19 22:42 ` Barry Song (Xiaomi)
2026-06-20 7:19 ` Kairui Song
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.