From: "Barry Song (Xiaomi)" <baohua@kernel.org>
To: ryncsn@gmail.com
Cc: akpm@linux-foundation.org, axelrasmussen@google.com,
baohua@kernel.org, baolin.wang@linux.alibaba.com,
david@kernel.org, dev.jain@arm.com, lance.yang@linux.dev,
liam@infradead.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, ljs@kernel.org, npache@redhat.com,
qi.zheng@linux.dev, ryan.roberts@arm.com, shakeel.butt@linux.dev,
weixugc@google.com, yuanchu@google.com, zhaonanzhe@xiaomi.com,
ziy@nvidia.com
Subject: Re: [RFC PATCH] mm: Avoiding split large folios if swap has no space
Date: Sat, 20 Jun 2026 06:42:21 +0800 [thread overview]
Message-ID: <20260619224221.85678-1-baohua@kernel.org> (raw)
In-Reply-To: <CAMgjq7Bmi2XYxPc4tVO8KTWSuj8jpt-c-JueqYRK1kLC_nmBqA@mail.gmail.com>
On Sat, Jun 20, 2026 at 3:18 AM Kairui Song <ryncsn@gmail.com> wrote:
>
> On Fri, Jun 19, 2026 at 6:17 AM Barry Song (Xiaomi) <baohua@kernel.org> wrote:
> >
> > When swap is disabled or exhausted, swap slot allocation
> > may fail during swapout, causing large folios to be split
> > into small folios. The splitting is reasonable when we
> > truly fail to obtain contiguous swap slots, but it is
> > pointless in the no-space case.
> >
> > A simple way to reproduce this is to invoke MADV_PAGEOUT on
> > a system with mTHP enabled but without swap configured.
> >
> > #define SIZE (16 * 1024 * 1024)
> > int main(void)
> > {
> > char *buf = mmap(NULL, SIZE, PROT_READ | PROT_WRITE,
> > MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> > memset(buf, 1, SIZE);
> > madvise(buf, SIZE, MADV_PAGEOUT);
> > munmap(buf, SIZE);
> > return 0;
> > }
> >
> > With 16KB mTHP enabled, we observe:
> > ~ # cat /sys/kernel/mm/transparent_hugepage/hugepages-16kB/stats/split
> > 1024
> >
> > This patch checks swap space before splitting. If there is
> > no available space, it skips splitting. After the patch, we
> > observe:
> > ~ # cat /sys/kernel/mm/transparent_hugepage/hugepages-16kB/stats/split
> > 0
> >
> > Reported-by: Nanzhe Zhao <zhaonanzhe@xiaomi.com>
> > Cc: David Hildenbrand <david@kernel.org>
> > Cc: Lorenzo Stoakes <ljs@kernel.org>
> > Cc: Zi Yan <ziy@nvidia.com>
> > Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
> > Cc: Liam R. Howlett <liam@infradead.org>
> > Cc: Nico Pache <npache@redhat.com>
> > Cc: Ryan Roberts <ryan.roberts@arm.com>
> > Cc: Dev Jain <dev.jain@arm.com>
> > Cc: Lance Yang <lance.yang@linux.dev>
> > Cc: Kairui Song <kasong@tencent.com>
> > Cc: Qi Zheng <qi.zheng@linux.dev>
> > Cc: Shakeel Butt <shakeel.butt@linux.dev>
> > Cc: Axel Rasmussen <axelrasmussen@google.com>
> > Cc: Yuanchu Xie <yuanchu@google.com>
> > Cc: Wei Xu <weixugc@google.com>
> > Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
> > ---
> > mm/vmscan.c | 15 +++++++++++++--
> > 1 file changed, 13 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 299b5d9e8836..33f84a5fe7ee 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -339,8 +339,7 @@ static bool can_demote(int nid, struct scan_control *sc,
> > return !nodes_empty(allowed_mask);
> > }
> >
> > -static inline bool can_reclaim_anon_pages(struct mem_cgroup *memcg,
> > - int nid,
> > +static inline bool __can_reclaim_anon_pages(struct mem_cgroup *memcg,
> > struct scan_control *sc)
> > {
> > if (memcg == NULL) {
> > @@ -356,6 +355,16 @@ static inline bool can_reclaim_anon_pages(struct mem_cgroup *memcg,
> > return true;
> > }
> >
> > + return false;
> > +}
> > +
> > +static inline bool can_reclaim_anon_pages(struct mem_cgroup *memcg,
> > + int nid,
> > + struct scan_control *sc)
> > +{
> > + if (__can_reclaim_anon_pages(memcg, sc))
> > + return true;
> > +
> > /*
> > * The page can not be swapped.
> > *
> > @@ -1280,6 +1289,8 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
> >
> > if (!folio_test_large(folio))
> > goto activate_locked_split;
> > + if (!__can_reclaim_anon_pages(memcg, sc))
> > + goto activate_locked_split;
> > /* Fallback to swap normal pages */
> > if (split_folio_to_list(folio, folio_list))
> > goto activate_locked;
>
> Hello Barry,
>
> Thanks for raising this issue. I saw a similar issue report in the
> mail list before and was thinking that, perhaps another approach is to
Hi Kairui,
Could you please post the link to your report? I'd like to add
your Reported-by and Closes tags as well.
> let folio_alloc_swap return a more detailed error code, for example:
>
> - 1. the mem_cgroup_try_charge_swap in it failed
> - 2. allocation failed but nr_swap_pages > folio size
> - 3. allocation failed because all devices are full or unusable
> (roughly nr_swap_pages < folio size)
>
folio_alloc_swap() returns error codes such as -EAGAIN,
-EINVAL, and -ENOMEM. For cases 1, 2, and 3, I assume it
would return -ENOMEM?
I assume you mean that we might want folio_alloc_swap() to
return an enum instead?
another approach is that I can return -EAGAIN for those cases
we want to retry swapping-out after splitting folios:
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 78b49b0658ad..62e2c506ccae 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1755,6 +1755,9 @@ int folio_alloc_swap(struct folio *folio)
VM_WARN_ON_ONCE(1);
return -EINVAL;
}
+
+ if (get_nr_swap_pages() < (1 << order))
+ return -ENOMEM;
}
again:
@@ -1769,11 +1772,13 @@ int folio_alloc_swap(struct folio *folio)
}
/* Need to call this even if allocation failed, for MEMCG_SWAP_FAIL. */
- if (unlikely(mem_cgroup_try_charge_swap(folio)))
+ if (unlikely(mem_cgroup_try_charge_swap(folio))) {
swap_cache_del_folio(folio);
+ return -ENOMEM;
+ }
if (unlikely(!folio_test_swapcache(folio)))
- return -ENOMEM;
+ return -EAGAIN;
return 0;
}
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 299b5d9e8836..4c4cbd72c013 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1257,6 +1257,8 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
*/
if (folio_test_anon(folio) && folio_test_swapbacked(folio) &&
!folio_test_swapcache(folio)) {
+ int ret;
+
if (!(sc->gfp_mask & __GFP_IO))
goto keep_locked;
if (folio_maybe_dma_pinned(folio))
@@ -1275,25 +1277,24 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
split_folio_to_list(folio, folio_list))
goto activate_locked;
}
- if (folio_alloc_swap(folio)) {
- int __maybe_unused order = folio_order(folio);
+ ret = folio_alloc_swap(folio);
+ if (!folio_test_large(folio) || ret != -EAGAIN)
+ goto activate_locked_split;
- if (!folio_test_large(folio))
- goto activate_locked_split;
- /* Fallback to swap normal pages */
- if (split_folio_to_list(folio, folio_list))
- goto activate_locked;
+ /* Fallback to swap normal pages */
+ if (split_folio_to_list(folio, folio_list))
+ goto activate_locked;
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
- if (nr_pages >= HPAGE_PMD_NR) {
- count_memcg_folio_events(folio,
+ if (nr_pages >= HPAGE_PMD_NR) {
+ count_memcg_folio_events(folio,
THP_SWPOUT_FALLBACK, 1);
- count_vm_event(THP_SWPOUT_FALLBACK);
- }
-#endif
- count_mthp_stat(order, MTHP_STAT_SWPOUT_FALLBACK);
- if (folio_alloc_swap(folio))
- goto activate_locked_split;
+ count_vm_event(THP_SWPOUT_FALLBACK);
}
+#endif
+ count_mthp_stat(folio_order(folio), MTHP_STAT_SWPOUT_FALLBACK);
+ if (folio_alloc_swap(folio))
+ goto activate_locked_split;
+
/*
* Normally the folio will be dirtied in unmap because
* its pte should be dirty. A special case is MADV_FREE
> Only case 2 requires splitting. __can_reclaim_anon_pages also checks
> demote which is not related to swap.
I actually extracted __can_reclaim_anon_pages(), which only
checks swap, whereas can_reclaim_anon_pages() checks both
swap and demotion. :-)
Best Regards
Barry
prev parent reply other threads:[~2026-06-19 22:42 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-18 22:17 [RFC PATCH] mm: Avoiding split large folios if swap has no space Barry Song (Xiaomi)
2026-06-18 23:46 ` Nico Pache
2026-06-19 0:59 ` Barry Song
2026-06-19 14:01 ` David Hildenbrand (Arm)
2026-06-19 23:01 ` Barry Song
2026-06-19 14:04 ` David Hildenbrand (Arm)
2026-06-20 8:10 ` Barry Song (Xiaomi)
2026-06-19 19:17 ` Kairui Song
2026-06-19 22:42 ` Barry Song (Xiaomi) [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260619224221.85678-1-baohua@kernel.org \
--to=baohua@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=axelrasmussen@google.com \
--cc=baolin.wang@linux.alibaba.com \
--cc=david@kernel.org \
--cc=dev.jain@arm.com \
--cc=lance.yang@linux.dev \
--cc=liam@infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=npache@redhat.com \
--cc=qi.zheng@linux.dev \
--cc=ryan.roberts@arm.com \
--cc=ryncsn@gmail.com \
--cc=shakeel.butt@linux.dev \
--cc=weixugc@google.com \
--cc=yuanchu@google.com \
--cc=zhaonanzhe@xiaomi.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox