All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Barry Song (Xiaomi)" <baohua@kernel.org>
To: ryncsn@gmail.com
Cc: akpm@linux-foundation.org, axelrasmussen@google.com,
	baohua@kernel.org, baolin.wang@linux.alibaba.com,
	david@kernel.org, dev.jain@arm.com, lance.yang@linux.dev,
	liam@infradead.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, ljs@kernel.org, npache@redhat.com,
	qi.zheng@linux.dev, ryan.roberts@arm.com, shakeel.butt@linux.dev,
	weixugc@google.com, yuanchu@google.com, zhaonanzhe@xiaomi.com,
	ziy@nvidia.com
Subject: Re: [RFC PATCH] mm: Avoiding split large folios if swap has no space
Date: Sat, 20 Jun 2026 06:42:21 +0800	[thread overview]
Message-ID: <20260619224221.85678-1-baohua@kernel.org> (raw)
In-Reply-To: <CAMgjq7Bmi2XYxPc4tVO8KTWSuj8jpt-c-JueqYRK1kLC_nmBqA@mail.gmail.com>

On Sat, Jun 20, 2026 at 3:18 AM Kairui Song <ryncsn@gmail.com> wrote:
>
> On Fri, Jun 19, 2026 at 6:17 AM Barry Song (Xiaomi) <baohua@kernel.org> wrote:
> >
> > When swap is disabled or exhausted, swap slot allocation
> > may fail during swapout, causing large folios to be split
> > into small folios. The splitting is reasonable when we
> > truly fail to obtain contiguous swap slots, but it is
> > pointless in the no-space case.
> >
> > A simple way to reproduce this is to invoke MADV_PAGEOUT on
> > a system with mTHP enabled but without swap configured.
> >
> >  #define SIZE (16 * 1024 * 1024)
> >  int main(void)
> >  {
> >          char *buf = mmap(NULL, SIZE, PROT_READ | PROT_WRITE,
> >                     MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> >          memset(buf, 1, SIZE);
> >          madvise(buf, SIZE, MADV_PAGEOUT);
> >          munmap(buf, SIZE);
> >          return 0;
> >  }
> >
> > With 16KB mTHP enabled, we observe:
> > ~ # cat /sys/kernel/mm/transparent_hugepage/hugepages-16kB/stats/split
> > 1024
> >
> > This patch checks swap space before splitting. If there is
> > no available space, it skips splitting. After the patch, we
> > observe:
> > ~ # cat /sys/kernel/mm/transparent_hugepage/hugepages-16kB/stats/split
> > 0
> >
> > Reported-by: Nanzhe Zhao <zhaonanzhe@xiaomi.com>
> > Cc: David Hildenbrand <david@kernel.org>
> > Cc: Lorenzo Stoakes <ljs@kernel.org>
> > Cc: Zi Yan <ziy@nvidia.com>
> > Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
> > Cc: Liam R. Howlett <liam@infradead.org>
> > Cc: Nico Pache <npache@redhat.com>
> > Cc: Ryan Roberts <ryan.roberts@arm.com>
> > Cc: Dev Jain <dev.jain@arm.com>
> > Cc: Lance Yang <lance.yang@linux.dev>
> > Cc: Kairui Song <kasong@tencent.com>
> > Cc: Qi Zheng <qi.zheng@linux.dev>
> > Cc: Shakeel Butt <shakeel.butt@linux.dev>
> > Cc: Axel Rasmussen <axelrasmussen@google.com>
> > Cc: Yuanchu Xie <yuanchu@google.com>
> > Cc: Wei Xu <weixugc@google.com>
> > Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
> > ---
> >  mm/vmscan.c | 15 +++++++++++++--
> >  1 file changed, 13 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 299b5d9e8836..33f84a5fe7ee 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -339,8 +339,7 @@ static bool can_demote(int nid, struct scan_control *sc,
> >         return !nodes_empty(allowed_mask);
> >  }
> >
> > -static inline bool can_reclaim_anon_pages(struct mem_cgroup *memcg,
> > -                                         int nid,
> > +static inline bool __can_reclaim_anon_pages(struct mem_cgroup *memcg,
> >                                           struct scan_control *sc)
> >  {
> >         if (memcg == NULL) {
> > @@ -356,6 +355,16 @@ static inline bool can_reclaim_anon_pages(struct mem_cgroup *memcg,
> >                         return true;
> >         }
> >
> > +       return false;
> > +}
> > +
> > +static inline bool can_reclaim_anon_pages(struct mem_cgroup *memcg,
> > +                                         int nid,
> > +                                         struct scan_control *sc)
> > +{
> > +       if (__can_reclaim_anon_pages(memcg, sc))
> > +               return true;
> > +
> >         /*
> >          * The page can not be swapped.
> >          *
> > @@ -1280,6 +1289,8 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
> >
> >                                 if (!folio_test_large(folio))
> >                                         goto activate_locked_split;
> > +                               if (!__can_reclaim_anon_pages(memcg, sc))
> > +                                       goto activate_locked_split;
> >                                 /* Fallback to swap normal pages */
> >                                 if (split_folio_to_list(folio, folio_list))
> >                                         goto activate_locked;
>
> Hello Barry,
>
> Thanks for raising this issue. I saw a similar issue report in the
> mail list before and was thinking that, perhaps another approach is to

Hi Kairui,

Could you please post the link to your report? I'd like to add
your Reported-by and Closes tags as well.


> let folio_alloc_swap return a more detailed error code, for example:
>
> - 1. the mem_cgroup_try_charge_swap in it failed
> - 2. allocation failed but nr_swap_pages > folio size
> - 3. allocation failed because all devices are full or unusable
> (roughly nr_swap_pages < folio size)
>

folio_alloc_swap() returns error codes such as -EAGAIN,
-EINVAL, and -ENOMEM. For cases 1, 2, and 3, I assume it
would return -ENOMEM?

I assume you mean that we might want folio_alloc_swap() to
return an enum instead?

another approach is that I can return -EAGAIN for those cases
we want to retry swapping-out after splitting folios:

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 78b49b0658ad..62e2c506ccae 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1755,6 +1755,9 @@ int folio_alloc_swap(struct folio *folio)
 			VM_WARN_ON_ONCE(1);
 			return -EINVAL;
 		}
+
+		if (get_nr_swap_pages() < (1 << order))
+			return -ENOMEM;
 	}
 
 again:
@@ -1769,11 +1772,13 @@ int folio_alloc_swap(struct folio *folio)
 	}
 
 	/* Need to call this even if allocation failed, for MEMCG_SWAP_FAIL. */
-	if (unlikely(mem_cgroup_try_charge_swap(folio)))
+	if (unlikely(mem_cgroup_try_charge_swap(folio))) {
 		swap_cache_del_folio(folio);
+		return -ENOMEM;
+	}
 
 	if (unlikely(!folio_test_swapcache(folio)))
-		return -ENOMEM;
+		return -EAGAIN;
 
 	return 0;
 }
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 299b5d9e8836..4c4cbd72c013 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1257,6 +1257,8 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
 		 */
 		if (folio_test_anon(folio) && folio_test_swapbacked(folio) &&
 				!folio_test_swapcache(folio)) {
+			int ret;
+
 			if (!(sc->gfp_mask & __GFP_IO))
 				goto keep_locked;
 			if (folio_maybe_dma_pinned(folio))
@@ -1275,25 +1277,24 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
 				    split_folio_to_list(folio, folio_list))
 					goto activate_locked;
 			}
-			if (folio_alloc_swap(folio)) {
-				int __maybe_unused order = folio_order(folio);
+			ret = folio_alloc_swap(folio);
+			if (!folio_test_large(folio) || ret != -EAGAIN)
+				goto activate_locked_split;
 
-				if (!folio_test_large(folio))
-					goto activate_locked_split;
-				/* Fallback to swap normal pages */
-				if (split_folio_to_list(folio, folio_list))
-					goto activate_locked;
+			/* Fallback to swap normal pages */
+			if (split_folio_to_list(folio, folio_list))
+				goto activate_locked;
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
-				if (nr_pages >= HPAGE_PMD_NR) {
-					count_memcg_folio_events(folio,
+			if (nr_pages >= HPAGE_PMD_NR) {
+				count_memcg_folio_events(folio,
 						THP_SWPOUT_FALLBACK, 1);
-					count_vm_event(THP_SWPOUT_FALLBACK);
-				}
-#endif
-				count_mthp_stat(order, MTHP_STAT_SWPOUT_FALLBACK);
-				if (folio_alloc_swap(folio))
-					goto activate_locked_split;
+				count_vm_event(THP_SWPOUT_FALLBACK);
 			}
+#endif
+			count_mthp_stat(folio_order(folio), MTHP_STAT_SWPOUT_FALLBACK);
+			if (folio_alloc_swap(folio))
+				goto activate_locked_split;
+
 			/*
 			 * Normally the folio will be dirtied in unmap because
 			 * its pte should be dirty. A special case is MADV_FREE

> Only case 2 requires splitting. __can_reclaim_anon_pages also checks
> demote which is not related to swap.

I actually extracted __can_reclaim_anon_pages(), which only
checks swap, whereas can_reclaim_anon_pages() checks both
swap and demotion. :-)

Best Regards
Barry


  reply	other threads:[~2026-06-19 22:42 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-18 22:17 [RFC PATCH] mm: Avoiding split large folios if swap has no space Barry Song (Xiaomi)
2026-06-18 23:46 ` Nico Pache
2026-06-19  0:59   ` Barry Song
2026-06-19 14:01 ` David Hildenbrand (Arm)
2026-06-19 23:01   ` Barry Song
2026-06-19 14:04 ` David Hildenbrand (Arm)
2026-06-20  8:10   ` Barry Song (Xiaomi)
2026-06-19 19:17 ` Kairui Song
2026-06-19 22:42   ` Barry Song (Xiaomi) [this message]
2026-06-20  7:19     ` Kairui Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260619224221.85678-1-baohua@kernel.org \
    --to=baohua@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=axelrasmussen@google.com \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=lance.yang@linux.dev \
    --cc=liam@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=npache@redhat.com \
    --cc=qi.zheng@linux.dev \
    --cc=ryan.roberts@arm.com \
    --cc=ryncsn@gmail.com \
    --cc=shakeel.butt@linux.dev \
    --cc=weixugc@google.com \
    --cc=yuanchu@google.com \
    --cc=zhaonanzhe@xiaomi.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.