From: Lance Yang <lance.yang@linux.dev>
To: usama.arif@linux.dev
Cc: akpm@linux-foundation.org, david@kernel.org, chrisl@kernel.org,
kasong@tencent.com, ljs@kernel.org, ziy@nvidia.com,
ying.huang@linux.alibaba.com, baoquan.he@linux.dev,
willy@infradead.org, youngjun.park@lge.com, hannes@cmpxchg.org,
riel@surriel.com, shakeel.butt@linux.dev, alex@ghiti.fr,
kas@kernel.org, baohua@kernel.org, dev.jain@arm.com,
baolin.wang@linux.alibaba.com, npache@redhat.com,
liam@infradead.org, ryan.roberts@arm.com, vbabka@kernel.org,
lance.yang@linux.dev, linux-kernel@vger.kernel.org,
nphamcs@gmail.com, shikemeng@huaweicloud.com,
kernel-team@meta.com, linux-mm@kvack.org
Subject: Re: [v2 15/16] mm: install PMD swap entries on swap-out
Date: Fri, 12 Jun 2026 22:21:24 +0800 [thread overview]
Message-ID: <20260612142124.73367-1-lance.yang@linux.dev> (raw)
In-Reply-To: <20260602142537.198755-16-usama.arif@linux.dev>
+Cc linux-mm
On Tue, Jun 02, 2026 at 07:24:23AM -0700, Usama Arif wrote:
[...]
>diff --git a/mm/vmscan.c b/mm/vmscan.c
>index e8a90911bf88..0f376fbf9bb3 100644
>--- a/mm/vmscan.c
>+++ b/mm/vmscan.c
>@@ -64,6 +64,7 @@
>
> #include <linux/swapops.h>
> #include <linux/sched/sysctl.h>
>+#include <linux/zswap.h>
>
> #include "internal.h"
> #include "swap.h"
>@@ -1332,7 +1333,18 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
> enum ttu_flags flags = TTU_BATCH_FLUSH;
> bool was_swapbacked = folio_test_swapbacked(folio);
>
>- if (folio_test_pmd_mappable(folio))
>+ /*
>+ * With THP_SWAP, PMD-mappable folios already in the
>+ * swap cache can be unmapped with a PMD-level swap
>+ * entry, avoiding the cost of splitting the PMD.
>+ * Skip this when zswap has been enabled because
>+ * zswap stores pages individually and cannot
>+ * reconstruct a large folio on swap-in.
>+ */
>+ if (folio_test_pmd_mappable(folio) &&
>+ !(IS_ENABLED(CONFIG_THP_SWAP) &&
>+ folio_test_swapcache(folio) &&
>+ zswap_never_enabled()))
There may be a race here ...
1) zswap_never_enabled() passes, 2) try_to_unmap() installs the PMD swap
entry, and 3) zswap can still be enabled before the later pageout() ->
swap_writeout() -> zswap_store().
zswap_store() loops over each page of the folio:
for (index = 0; index < nr_pages; ++index) {
struct page *page = folio_page(folio, index);
if (!zswap_store_page(page, objcg, pool))
goto put_pool;
}
So still one PMD swap entry, while zswap has 512 entries, one for each
page of the folio ...
If the swapcache is reclaimed later, a PMD fault will try PMD-order
swapin again:
do_huge_pmd_swap_page()
swap_cache_get_folio()
swapin_sync(..., BIT(HPAGE_PMD_ORDER))
swap_read_folio()
zswap_load()
zswap_load() rejects large folios with -EINVAL and leaves the folio not
uptodate:
/*
* Large folios should not be swapped in while zswap is being used, as
* they are not properly handled. Zswap does not properly load large
* folios, and a large folio may only be partially in zswap.
*/
if (WARN_ON_ONCE(folio_test_large(folio))) {
folio_unlock(folio);
return -EINVAL;
}
swap_read_folio() jumps to finish and does not try a normal swap read:
if (zswap_load(folio) != -ENOENT)
goto finish;
And the awkward part is that no error really gets propagated ...
swap_read_folio() is void, and swapin_sync() just hands the same folio
back to do_huge_pmd_swap_page().
At that point the folio is still !uptodate, so the fault would just end
up:
if (unlikely(!folio_test_uptodate(folio))) {
ret = VM_FAULT_SIGBUS;
goto out_page;
}
Looks race, but possible?
Cheers, Lance
> flags |= TTU_SPLIT_HUGE_PMD;
> /*
> * Without TTU_SYNC, try_to_unmap will only begin to
>diff --git a/mm/vmstat.c b/mm/vmstat.c
>index f534972f517d..9b4963a7eb04 100644
>--- a/mm/vmstat.c
>+++ b/mm/vmstat.c
>@@ -1421,6 +1421,7 @@ const char * const vmstat_text[] = {
> [I(THP_ZERO_PAGE_ALLOC_FAILED)] = "thp_zero_page_alloc_failed",
> [I(THP_SWPOUT)] = "thp_swpout",
> [I(THP_SWPOUT_FALLBACK)] = "thp_swpout_fallback",
>+ [I(THP_SWPOUT_PMD)] = "thp_swpout_pmd",
> #endif
> #ifdef CONFIG_BALLOON
> [I(BALLOON_INFLATE)] = "balloon_inflate",
>--
>2.52.0
>
>
next prev parent reply other threads:[~2026-06-12 14:22 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-02 14:24 [v2 00/16] mm: PMD-level swap entries for anonymous THPs Usama Arif
2026-06-02 14:24 ` [v2 01/16] mm: add softleaf_to_pmd() and convert existing callers Usama Arif
2026-06-02 14:24 ` [v2 02/16] mm: extract mm_prepare_for_swap_entries() helper Usama Arif
2026-06-02 14:24 ` [v2 03/16] fs/proc: use softleaf_has_pfn() in pagemap PMD walker Usama Arif
2026-06-02 14:24 ` [v2 04/16] mm/huge_memory: move softleaf_to_folio() inside migration branch Usama Arif
2026-06-02 14:24 ` [v2 05/16] mm/migrate_device: move softleaf_to_folio() inside device-private branch Usama Arif
2026-06-02 14:24 ` [v2 06/16] mm: rename ARCH_ENABLE_THP_MIGRATION to ARCH_SUPPORTS_PMD_SOFTLEAF Usama Arif
2026-06-02 14:24 ` [v2 07/16] mm: add PMD swap entry detection support Usama Arif
2026-06-02 14:24 ` [v2 08/16] mm: add PMD swap entry splitting support Usama Arif
2026-06-02 14:24 ` [v2 09/16] mm: handle PMD swap entries in fork path Usama Arif
2026-06-02 14:24 ` [v2 10/16] mm: swap in PMD swap entries as whole THPs during swapoff Usama Arif
2026-06-02 14:24 ` [v2 11/16] mm: handle PMD swap entries in non-present PMD walkers Usama Arif
2026-06-12 6:45 ` Lance Yang
2026-06-12 15:05 ` Usama Arif
2026-06-12 15:21 ` Lance Yang
2026-06-02 14:24 ` [v2 12/16] mm: handle PMD swap entries in MADV_WILLNEED Usama Arif
2026-06-02 14:24 ` [v2 13/16] mm: handle PMD swap entries in UFFDIO_MOVE Usama Arif
2026-06-12 8:50 ` Lance Yang
2026-06-02 14:24 ` [v2 14/16] mm: handle PMD swap entry faults on swap-in Usama Arif
2026-06-02 14:24 ` [v2 15/16] mm: install PMD swap entries on swap-out Usama Arif
2026-06-12 14:21 ` Lance Yang [this message]
2026-06-02 14:24 ` [v2 16/16] selftests/mm: add PMD swap entry tests Usama Arif
2026-06-09 14:29 ` [v2 00/16] mm: PMD-level swap entries for anonymous THPs Usama Arif
2026-06-10 12:24 ` David Hildenbrand (Arm)
2026-06-10 13:01 ` Lance Yang
2026-06-10 13:48 ` David Hildenbrand (Arm)
2026-06-10 14:44 ` Usama Arif
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260612142124.73367-1-lance.yang@linux.dev \
--to=lance.yang@linux.dev \
--cc=akpm@linux-foundation.org \
--cc=alex@ghiti.fr \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=baoquan.he@linux.dev \
--cc=chrisl@kernel.org \
--cc=david@kernel.org \
--cc=dev.jain@arm.com \
--cc=hannes@cmpxchg.org \
--cc=kas@kernel.org \
--cc=kasong@tencent.com \
--cc=kernel-team@meta.com \
--cc=liam@infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=npache@redhat.com \
--cc=nphamcs@gmail.com \
--cc=riel@surriel.com \
--cc=ryan.roberts@arm.com \
--cc=shakeel.butt@linux.dev \
--cc=shikemeng@huaweicloud.com \
--cc=usama.arif@linux.dev \
--cc=vbabka@kernel.org \
--cc=willy@infradead.org \
--cc=ying.huang@linux.alibaba.com \
--cc=youngjun.park@lge.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox