* Re: [v2 15/16] mm: install PMD swap entries on swap-out
[not found] <20260602142537.198755-16-usama.arif@linux.dev>
@ 2026-06-12 14:21 ` Lance Yang
0 siblings, 0 replies; only message in thread
From: Lance Yang @ 2026-06-12 14:21 UTC (permalink / raw)
To: usama.arif
Cc: akpm, david, chrisl, kasong, ljs, ziy, ying.huang, baoquan.he,
willy, youngjun.park, hannes, riel, shakeel.butt, alex, kas,
baohua, dev.jain, baolin.wang, npache, liam, ryan.roberts, vbabka,
lance.yang, linux-kernel, nphamcs, shikemeng, kernel-team,
linux-mm
+Cc linux-mm
On Tue, Jun 02, 2026 at 07:24:23AM -0700, Usama Arif wrote:
[...]
>diff --git a/mm/vmscan.c b/mm/vmscan.c
>index e8a90911bf88..0f376fbf9bb3 100644
>--- a/mm/vmscan.c
>+++ b/mm/vmscan.c
>@@ -64,6 +64,7 @@
>
> #include <linux/swapops.h>
> #include <linux/sched/sysctl.h>
>+#include <linux/zswap.h>
>
> #include "internal.h"
> #include "swap.h"
>@@ -1332,7 +1333,18 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
> enum ttu_flags flags = TTU_BATCH_FLUSH;
> bool was_swapbacked = folio_test_swapbacked(folio);
>
>- if (folio_test_pmd_mappable(folio))
>+ /*
>+ * With THP_SWAP, PMD-mappable folios already in the
>+ * swap cache can be unmapped with a PMD-level swap
>+ * entry, avoiding the cost of splitting the PMD.
>+ * Skip this when zswap has been enabled because
>+ * zswap stores pages individually and cannot
>+ * reconstruct a large folio on swap-in.
>+ */
>+ if (folio_test_pmd_mappable(folio) &&
>+ !(IS_ENABLED(CONFIG_THP_SWAP) &&
>+ folio_test_swapcache(folio) &&
>+ zswap_never_enabled()))
There may be a race here ...
1) zswap_never_enabled() passes, 2) try_to_unmap() installs the PMD swap
entry, and 3) zswap can still be enabled before the later pageout() ->
swap_writeout() -> zswap_store().
zswap_store() loops over each page of the folio:
for (index = 0; index < nr_pages; ++index) {
struct page *page = folio_page(folio, index);
if (!zswap_store_page(page, objcg, pool))
goto put_pool;
}
So still one PMD swap entry, while zswap has 512 entries, one for each
page of the folio ...
If the swapcache is reclaimed later, a PMD fault will try PMD-order
swapin again:
do_huge_pmd_swap_page()
swap_cache_get_folio()
swapin_sync(..., BIT(HPAGE_PMD_ORDER))
swap_read_folio()
zswap_load()
zswap_load() rejects large folios with -EINVAL and leaves the folio not
uptodate:
/*
* Large folios should not be swapped in while zswap is being used, as
* they are not properly handled. Zswap does not properly load large
* folios, and a large folio may only be partially in zswap.
*/
if (WARN_ON_ONCE(folio_test_large(folio))) {
folio_unlock(folio);
return -EINVAL;
}
swap_read_folio() jumps to finish and does not try a normal swap read:
if (zswap_load(folio) != -ENOENT)
goto finish;
And the awkward part is that no error really gets propagated ...
swap_read_folio() is void, and swapin_sync() just hands the same folio
back to do_huge_pmd_swap_page().
At that point the folio is still !uptodate, so the fault would just end
up:
if (unlikely(!folio_test_uptodate(folio))) {
ret = VM_FAULT_SIGBUS;
goto out_page;
}
Looks race, but possible?
Cheers, Lance
> flags |= TTU_SPLIT_HUGE_PMD;
> /*
> * Without TTU_SYNC, try_to_unmap will only begin to
>diff --git a/mm/vmstat.c b/mm/vmstat.c
>index f534972f517d..9b4963a7eb04 100644
>--- a/mm/vmstat.c
>+++ b/mm/vmstat.c
>@@ -1421,6 +1421,7 @@ const char * const vmstat_text[] = {
> [I(THP_ZERO_PAGE_ALLOC_FAILED)] = "thp_zero_page_alloc_failed",
> [I(THP_SWPOUT)] = "thp_swpout",
> [I(THP_SWPOUT_FALLBACK)] = "thp_swpout_fallback",
>+ [I(THP_SWPOUT_PMD)] = "thp_swpout_pmd",
> #endif
> #ifdef CONFIG_BALLOON
> [I(BALLOON_INFLATE)] = "balloon_inflate",
>--
>2.52.0
>
>
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2026-06-12 14:22 UTC | newest]
Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20260602142537.198755-16-usama.arif@linux.dev>
2026-06-12 14:21 ` [v2 15/16] mm: install PMD swap entries on swap-out Lance Yang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox