Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "Barry Song (Xiaomi)" <baohua@kernel.org>
To: ryncsn@gmail.com
Cc: akpm@linux-foundation.org, axelrasmussen@google.com,
	baohua@kernel.org, baolin.wang@linux.alibaba.com,
	david@kernel.org, dev.jain@arm.com, lance.yang@linux.dev,
	liam@infradead.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, ljs@kernel.org, npache@redhat.com,
	qi.zheng@linux.dev, ryan.roberts@arm.com, shakeel.butt@linux.dev,
	weixugc@google.com, yuanchu@google.com, zhaonanzhe@xiaomi.com,
	ziy@nvidia.com
Subject: Re: [RFC PATCH] mm: Avoiding split large folios if swap has no space
Date: Sat, 20 Jun 2026 06:42:21 +0800	[thread overview]
Message-ID: <20260619224221.85678-1-baohua@kernel.org> (raw)
In-Reply-To: <CAMgjq7Bmi2XYxPc4tVO8KTWSuj8jpt-c-JueqYRK1kLC_nmBqA@mail.gmail.com>

On Sat, Jun 20, 2026 at 3:18 AM Kairui Song <ryncsn@gmail.com> wrote:
>
> On Fri, Jun 19, 2026 at 6:17 AM Barry Song (Xiaomi) <baohua@kernel.org> wrote:
> >
> > When swap is disabled or exhausted, swap slot allocation
> > may fail during swapout, causing large folios to be split
> > into small folios. The splitting is reasonable when we
> > truly fail to obtain contiguous swap slots, but it is
> > pointless in the no-space case.
> >
> > A simple way to reproduce this is to invoke MADV_PAGEOUT on
> > a system with mTHP enabled but without swap configured.
> >
> >  #define SIZE (16 * 1024 * 1024)
> >  int main(void)
> >  {
> >          char *buf = mmap(NULL, SIZE, PROT_READ | PROT_WRITE,
> >                     MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> >          memset(buf, 1, SIZE);
> >          madvise(buf, SIZE, MADV_PAGEOUT);
> >          munmap(buf, SIZE);
> >          return 0;
> >  }
> >
> > With 16KB mTHP enabled, we observe:
> > ~ # cat /sys/kernel/mm/transparent_hugepage/hugepages-16kB/stats/split
> > 1024
> >
> > This patch checks swap space before splitting. If there is
> > no available space, it skips splitting. After the patch, we
> > observe:
> > ~ # cat /sys/kernel/mm/transparent_hugepage/hugepages-16kB/stats/split
> > 0
> >
> > Reported-by: Nanzhe Zhao <zhaonanzhe@xiaomi.com>
> > Cc: David Hildenbrand <david@kernel.org>
> > Cc: Lorenzo Stoakes <ljs@kernel.org>
> > Cc: Zi Yan <ziy@nvidia.com>
> > Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
> > Cc: Liam R. Howlett <liam@infradead.org>
> > Cc: Nico Pache <npache@redhat.com>
> > Cc: Ryan Roberts <ryan.roberts@arm.com>
> > Cc: Dev Jain <dev.jain@arm.com>
> > Cc: Lance Yang <lance.yang@linux.dev>
> > Cc: Kairui Song <kasong@tencent.com>
> > Cc: Qi Zheng <qi.zheng@linux.dev>
> > Cc: Shakeel Butt <shakeel.butt@linux.dev>
> > Cc: Axel Rasmussen <axelrasmussen@google.com>
> > Cc: Yuanchu Xie <yuanchu@google.com>
> > Cc: Wei Xu <weixugc@google.com>
> > Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
> > ---
> >  mm/vmscan.c | 15 +++++++++++++--
> >  1 file changed, 13 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 299b5d9e8836..33f84a5fe7ee 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -339,8 +339,7 @@ static bool can_demote(int nid, struct scan_control *sc,
> >         return !nodes_empty(allowed_mask);
> >  }
> >
> > -static inline bool can_reclaim_anon_pages(struct mem_cgroup *memcg,
> > -                                         int nid,
> > +static inline bool __can_reclaim_anon_pages(struct mem_cgroup *memcg,
> >                                           struct scan_control *sc)
> >  {
> >         if (memcg == NULL) {
> > @@ -356,6 +355,16 @@ static inline bool can_reclaim_anon_pages(struct mem_cgroup *memcg,
> >                         return true;
> >         }
> >
> > +       return false;
> > +}
> > +
> > +static inline bool can_reclaim_anon_pages(struct mem_cgroup *memcg,
> > +                                         int nid,
> > +                                         struct scan_control *sc)
> > +{
> > +       if (__can_reclaim_anon_pages(memcg, sc))
> > +               return true;
> > +
> >         /*
> >          * The page can not be swapped.
> >          *
> > @@ -1280,6 +1289,8 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
> >
> >                                 if (!folio_test_large(folio))
> >                                         goto activate_locked_split;
> > +                               if (!__can_reclaim_anon_pages(memcg, sc))
> > +                                       goto activate_locked_split;
> >                                 /* Fallback to swap normal pages */
> >                                 if (split_folio_to_list(folio, folio_list))
> >                                         goto activate_locked;
>
> Hello Barry,
>
> Thanks for raising this issue. I saw a similar issue report in the
> mail list before and was thinking that, perhaps another approach is to

Hi Kairui,

Could you please post the link to your report? I'd like to add
your Reported-by and Closes tags as well.


> let folio_alloc_swap return a more detailed error code, for example:
>
> - 1. the mem_cgroup_try_charge_swap in it failed
> - 2. allocation failed but nr_swap_pages > folio size
> - 3. allocation failed because all devices are full or unusable
> (roughly nr_swap_pages < folio size)
>

folio_alloc_swap() returns error codes such as -EAGAIN,
-EINVAL, and -ENOMEM. For cases 1, 2, and 3, I assume it
would return -ENOMEM?

I assume you mean that we might want folio_alloc_swap() to
return an enum instead?

another approach is that I can return -EAGAIN for those cases
we want to retry swapping-out after splitting folios:

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 78b49b0658ad..62e2c506ccae 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1755,6 +1755,9 @@ int folio_alloc_swap(struct folio *folio)
 			VM_WARN_ON_ONCE(1);
 			return -EINVAL;
 		}
+
+		if (get_nr_swap_pages() < (1 << order))
+			return -ENOMEM;
 	}
 
 again:
@@ -1769,11 +1772,13 @@ int folio_alloc_swap(struct folio *folio)
 	}
 
 	/* Need to call this even if allocation failed, for MEMCG_SWAP_FAIL. */
-	if (unlikely(mem_cgroup_try_charge_swap(folio)))
+	if (unlikely(mem_cgroup_try_charge_swap(folio))) {
 		swap_cache_del_folio(folio);
+		return -ENOMEM;
+	}
 
 	if (unlikely(!folio_test_swapcache(folio)))
-		return -ENOMEM;
+		return -EAGAIN;
 
 	return 0;
 }
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 299b5d9e8836..4c4cbd72c013 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1257,6 +1257,8 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
 		 */
 		if (folio_test_anon(folio) && folio_test_swapbacked(folio) &&
 				!folio_test_swapcache(folio)) {
+			int ret;
+
 			if (!(sc->gfp_mask & __GFP_IO))
 				goto keep_locked;
 			if (folio_maybe_dma_pinned(folio))
@@ -1275,25 +1277,24 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
 				    split_folio_to_list(folio, folio_list))
 					goto activate_locked;
 			}
-			if (folio_alloc_swap(folio)) {
-				int __maybe_unused order = folio_order(folio);
+			ret = folio_alloc_swap(folio);
+			if (!folio_test_large(folio) || ret != -EAGAIN)
+				goto activate_locked_split;
 
-				if (!folio_test_large(folio))
-					goto activate_locked_split;
-				/* Fallback to swap normal pages */
-				if (split_folio_to_list(folio, folio_list))
-					goto activate_locked;
+			/* Fallback to swap normal pages */
+			if (split_folio_to_list(folio, folio_list))
+				goto activate_locked;
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
-				if (nr_pages >= HPAGE_PMD_NR) {
-					count_memcg_folio_events(folio,
+			if (nr_pages >= HPAGE_PMD_NR) {
+				count_memcg_folio_events(folio,
 						THP_SWPOUT_FALLBACK, 1);
-					count_vm_event(THP_SWPOUT_FALLBACK);
-				}
-#endif
-				count_mthp_stat(order, MTHP_STAT_SWPOUT_FALLBACK);
-				if (folio_alloc_swap(folio))
-					goto activate_locked_split;
+				count_vm_event(THP_SWPOUT_FALLBACK);
 			}
+#endif
+			count_mthp_stat(folio_order(folio), MTHP_STAT_SWPOUT_FALLBACK);
+			if (folio_alloc_swap(folio))
+				goto activate_locked_split;
+
 			/*
 			 * Normally the folio will be dirtied in unmap because
 			 * its pte should be dirty. A special case is MADV_FREE

> Only case 2 requires splitting. __can_reclaim_anon_pages also checks
> demote which is not related to swap.

I actually extracted __can_reclaim_anon_pages(), which only
checks swap, whereas can_reclaim_anon_pages() checks both
swap and demotion. :-)

Best Regards
Barry


      reply	other threads:[~2026-06-19 22:42 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-18 22:17 [RFC PATCH] mm: Avoiding split large folios if swap has no space Barry Song (Xiaomi)
2026-06-18 23:46 ` Nico Pache
2026-06-19  0:59   ` Barry Song
2026-06-19 14:01 ` David Hildenbrand (Arm)
2026-06-19 23:01   ` Barry Song
2026-06-19 14:04 ` David Hildenbrand (Arm)
2026-06-20  8:10   ` Barry Song (Xiaomi)
2026-06-19 19:17 ` Kairui Song
2026-06-19 22:42   ` Barry Song (Xiaomi) [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260619224221.85678-1-baohua@kernel.org \
    --to=baohua@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=axelrasmussen@google.com \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=lance.yang@linux.dev \
    --cc=liam@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=npache@redhat.com \
    --cc=qi.zheng@linux.dev \
    --cc=ryan.roberts@arm.com \
    --cc=ryncsn@gmail.com \
    --cc=shakeel.butt@linux.dev \
    --cc=weixugc@google.com \
    --cc=yuanchu@google.com \
    --cc=zhaonanzhe@xiaomi.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox