From: Fujunjie <fujunjie1@qq.com>
To: Kairui Song <ryncsn@gmail.com>,
"David Hildenbrand (Arm)" <david@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Chris Li <chrisl@kernel.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Nhat Pham <nphamcs@gmail.com>, Yosry Ahmed <yosry@kernel.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
linux-doc@vger.kernel.org, Jonathan Corbet <corbet@lwn.net>,
Ryan Roberts <ryan.roberts@arm.com>,
Barry Song <baohua@kernel.org>,
Baolin Wang <baolin.wang@linux.alibaba.com>,
Chengming Zhou <chengming.zhou@linux.dev>,
Baoquan He <bhe@redhat.com>, Lorenzo Stoakes <ljs@kernel.org>
Subject: Re: [RFC PATCH 4/5] mm: swap: fall back to order-0 after large swapin races
Date: Tue, 12 May 2026 15:57:15 +0800 [thread overview]
Message-ID: <tencent_FFCDE2856BFC5941B165FEC5837B44E72605@qq.com> (raw)
In-Reply-To: <CAMgjq7Cokjb4-F9=cvwKmWR0q4==Vd61FHnjKbRdSHKH57erxw@mail.gmail.com>
On 5/11/2026 10:59 PM, Kairui Song wrote:
> On Mon, May 11, 2026 at 9:14 PM David Hildenbrand (Arm)
> <david@kernel.org> wrote:
>>
>> On 5/8/26 22:20, fujunjie wrote:
>>> swapin_folio() documents that a large folio insertion race returns NULL
>>> so the caller can fall back to order-0 swapin. do_swap_page() currently
>>> turns that NULL into VM_FAULT_OOM if the PTE is unchanged, which is
>>> harsher than necessary and gets in the way of rejecting large folio
>>> ranges for backend reasons.
>>>
>>> Move the synchronous swapin sequence into a helper and retry with an
>>> order-0 folio when a large folio cannot be inserted into the swap cache.
>>> Count the event as an mTHP swapin fallback before dropping the failed
>>> large allocation.
>>>
>>> Signed-off-by: fujunjie <fujunjie1@qq.com>
>>> ---
>>> mm/memory.c | 50 +++++++++++++++++++++++++++++++++++++++-----------
>>> 1 file changed, 39 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/mm/memory.c b/mm/memory.c
>>> index ea6568571131..84e3b77b8293 100644
>>> --- a/mm/memory.c
>>> +++ b/mm/memory.c
>>> @@ -4757,6 +4757,44 @@ static struct folio *alloc_swap_folio(struct vm_fault *vmf)
>>> }
>>> #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>>>
>>> +static struct folio *swapin_synchronous_folio(swp_entry_t entry,
>>> + struct vm_fault *vmf)
>>> +{
>>> + struct folio *swapcache, *folio;
>>> + bool large;
>>> + int order;
>>> +
>>> + folio = alloc_swap_folio(vmf);
>>> + if (!folio)
>>> + return NULL;
>>> +
>>> + large = folio_test_large(folio);
>>> + order = folio_order(folio);
>>> +
>>> + /*
>>> + * folio is charged, so swapin can only fail due to raced swapin and
>>> + * return NULL.
>>> + */
>>> + swapcache = swapin_folio(entry, folio);
>>> + if (swapcache == folio)
>>> + return folio;
>>> +
>>> + if (!swapcache && large)
>>> + count_mthp_stat(order, MTHP_STAT_SWPIN_FALLBACK);
>>> + folio_put(folio);
>>> + if (swapcache || !large)
>>> + return swapcache;
>>> +
>>> + folio = __alloc_swap_folio(vmf);
>>> + if (!folio)
>>> + return NULL;
>>> +
>>> + swapcache = swapin_folio(entry, folio);
>>> + if (swapcache != folio)
>>> + folio_put(folio);
>>> + return swapcache;
>>> +}
>>> +
>>> /* Sanity check that a folio is fully exclusive */
>>> static void check_swap_exclusive(struct folio *folio, swp_entry_t entry,
>>> unsigned int nr_pages)
>>> @@ -4860,17 +4898,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
>>> swap_update_readahead(folio, vma, vmf->address);
>>> if (!folio) {
>>> if (data_race(si->flags & SWP_SYNCHRONOUS_IO)) {
>>> - folio = alloc_swap_folio(vmf);
>>> - if (folio) {
>>> - /*
>>> - * folio is charged, so swapin can only fail due
>>> - * to raced swapin and return NULL.
>>> - */
>>> - swapcache = swapin_folio(entry, folio);
>>> - if (swapcache != folio)
>>> - folio_put(folio);
>>> - folio = swapcache;
>>> - }
>>> + folio = swapin_synchronous_folio(entry, vmf);
>>> } else {
>>> folio = swapin_readahead(entry, GFP_HIGHUSER_MOVABLE, vmf);
>>> }
>>
>> There are some upcoming changes with:
>>
>> https://lore.kernel.org/r/20260421-swap-table-p4-v3-5-2f23759a76bc@tencent.com
>>
>>
>> All the of that logic you have in swapin_synchronous_folio() should ideally not
>> go into memory.c, but into some swap specific code.
>>
>> But
>>
>> https://lore.kernel.org/r/20260421-swap-table-p4-v3-0-2f23759a76bc@tencent.com
>
> Thanks for mentioning this!
>
> I think Junjie's change fits better after that change indeed. And I
> checked the code, it should fits easily too.
>
> It's already strange enough that THP swapin is bundled with
> synchronous swapin, we better not make it more divergent here, and add
> more bits into memory.c.
>
> And this commit will limit it to anon, no shmem, which is another
> strange detail. Or we'll have to repeat everything and copy these code
> to shmem.c...
>
> Once all swap-ins uses basically the same path as in that series, all
> swap-ins will be able to have similar THP and zswap THP support too.
Thanks David and Kairui.
That makes sense. The helper in memory.c was mainly added to demonstrate
the fallback needed by this RFC, but I agree that growing more large-folio
swapin logic in the anon synchronous swapin path is not the right
direction.
I will not carry this patch forward in its current form. If I continue
with this work, I will rebase it on top of Kairui's swap-table or unified
swapin work and keep the allocation and fallback handling in the swap-specific
common path.
I also noticed Alexandre is already working on the large-folio swapin
side, so I will follow that series to avoid duplicating work.
Thanks for the pointers.
next prev parent reply other threads:[~2026-05-12 7:57 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-08 20:18 [RFC PATCH 0/5] mm: support zswap-backed anonymous large folio swapin fujunjie
2026-05-08 20:20 ` [RFC PATCH 1/5] mm: zswap: decompress into a folio subpage fujunjie
2026-05-08 20:20 ` [RFC PATCH 2/5] mm: zswap: add a zswap entry batch helper fujunjie
2026-05-08 20:20 ` [RFC PATCH 3/5] mm: zswap: load fully stored large folios fujunjie
2026-05-11 22:38 ` Yosry Ahmed
2026-05-12 8:05 ` Fujunjie
2026-05-08 20:20 ` [RFC PATCH 4/5] mm: swap: fall back to order-0 after large swapin races fujunjie
2026-05-11 13:03 ` David Hildenbrand (Arm)
2026-05-11 14:59 ` Kairui Song
2026-05-12 7:57 ` Fujunjie [this message]
2026-05-08 20:20 ` [RFC PATCH 5/5] mm: swap: allow zswap-backed large folio swapin fujunjie
2026-05-11 22:13 ` [RFC PATCH 0/5] mm: support zswap-backed anonymous " Yosry Ahmed
2026-05-12 6:14 ` David Hildenbrand (Arm)
2026-05-12 19:19 ` Yosry Ahmed
2026-05-12 8:02 ` Fujunjie
2026-05-12 4:20 ` Alexandre Ghiti
2026-05-12 7:46 ` Fujunjie
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=tencent_FFCDE2856BFC5941B165FEC5837B44E72605@qq.com \
--to=fujunjie1@qq.com \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=bhe@redhat.com \
--cc=chengming.zhou@linux.dev \
--cc=chrisl@kernel.org \
--cc=corbet@lwn.net \
--cc=david@kernel.org \
--cc=hannes@cmpxchg.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=nphamcs@gmail.com \
--cc=ryan.roberts@arm.com \
--cc=ryncsn@gmail.com \
--cc=yosry@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox