Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [v2 15/16] mm: install PMD swap entries on swap-out
       [not found] <20260602142537.198755-16-usama.arif@linux.dev>
@ 2026-06-12 14:21 ` Lance Yang
  0 siblings, 0 replies; only message in thread
From: Lance Yang @ 2026-06-12 14:21 UTC (permalink / raw)
  To: usama.arif
  Cc: akpm, david, chrisl, kasong, ljs, ziy, ying.huang, baoquan.he,
	willy, youngjun.park, hannes, riel, shakeel.butt, alex, kas,
	baohua, dev.jain, baolin.wang, npache, liam, ryan.roberts, vbabka,
	lance.yang, linux-kernel, nphamcs, shikemeng, kernel-team,
	linux-mm

+Cc linux-mm

On Tue, Jun 02, 2026 at 07:24:23AM -0700, Usama Arif wrote:
[...]
>diff --git a/mm/vmscan.c b/mm/vmscan.c
>index e8a90911bf88..0f376fbf9bb3 100644
>--- a/mm/vmscan.c
>+++ b/mm/vmscan.c
>@@ -64,6 +64,7 @@
> 
> #include <linux/swapops.h>
> #include <linux/sched/sysctl.h>
>+#include <linux/zswap.h>
> 
> #include "internal.h"
> #include "swap.h"
>@@ -1332,7 +1333,18 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
> 			enum ttu_flags flags = TTU_BATCH_FLUSH;
> 			bool was_swapbacked = folio_test_swapbacked(folio);
> 
>-			if (folio_test_pmd_mappable(folio))
>+			/*
>+			 * With THP_SWAP, PMD-mappable folios already in the
>+			 * swap cache can be unmapped with a PMD-level swap
>+			 * entry, avoiding the cost of splitting the PMD.
>+			 * Skip this when zswap has been enabled because
>+			 * zswap stores pages individually and cannot
>+			 * reconstruct a large folio on swap-in.
>+			 */
>+			if (folio_test_pmd_mappable(folio) &&
>+			    !(IS_ENABLED(CONFIG_THP_SWAP) &&
>+			      folio_test_swapcache(folio) &&
>+			      zswap_never_enabled()))

There may be a race here ...

1) zswap_never_enabled() passes, 2) try_to_unmap() installs the PMD swap
entry, and 3) zswap can still be enabled before the later pageout() ->
swap_writeout() -> zswap_store().

zswap_store() loops over each page of the folio:

	for (index = 0; index < nr_pages; ++index) {
		struct page *page = folio_page(folio, index);

		if (!zswap_store_page(page, objcg, pool))
			goto put_pool;
	}

So still one PMD swap entry, while zswap has 512 entries, one for each
page of the folio ...

If the swapcache is reclaimed later, a PMD fault will try PMD-order
swapin again:

do_huge_pmd_swap_page()
	swap_cache_get_folio()
	swapin_sync(..., BIT(HPAGE_PMD_ORDER))
		swap_read_folio()
			zswap_load()

zswap_load() rejects large folios with -EINVAL and leaves the folio not
uptodate:

	/*
	 * Large folios should not be swapped in while zswap is being used, as
	 * they are not properly handled. Zswap does not properly load large
	 * folios, and a large folio may only be partially in zswap.
	 */
	if (WARN_ON_ONCE(folio_test_large(folio))) {
		folio_unlock(folio);
		return -EINVAL;
	}

swap_read_folio() jumps to finish and does not try a normal swap read:

	if (zswap_load(folio) != -ENOENT)
		goto finish;

And the awkward part is that no error really gets propagated ...
swap_read_folio() is void, and swapin_sync() just hands the same folio
back to do_huge_pmd_swap_page().

At that point the folio is still !uptodate, so the fault would just end
up:

	if (unlikely(!folio_test_uptodate(folio))) {
		ret = VM_FAULT_SIGBUS;
		goto out_page;
	}

Looks race, but possible?

Cheers, Lance

> 				flags |= TTU_SPLIT_HUGE_PMD;
> 			/*
> 			 * Without TTU_SYNC, try_to_unmap will only begin to
>diff --git a/mm/vmstat.c b/mm/vmstat.c
>index f534972f517d..9b4963a7eb04 100644
>--- a/mm/vmstat.c
>+++ b/mm/vmstat.c
>@@ -1421,6 +1421,7 @@ const char * const vmstat_text[] = {
> 	[I(THP_ZERO_PAGE_ALLOC_FAILED)]		= "thp_zero_page_alloc_failed",
> 	[I(THP_SWPOUT)]				= "thp_swpout",
> 	[I(THP_SWPOUT_FALLBACK)]		= "thp_swpout_fallback",
>+	[I(THP_SWPOUT_PMD)]			= "thp_swpout_pmd",
> #endif
> #ifdef CONFIG_BALLOON
> 	[I(BALLOON_INFLATE)]			= "balloon_inflate",
>-- 
>2.52.0
>
>


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2026-06-12 14:22 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20260602142537.198755-16-usama.arif@linux.dev>
2026-06-12 14:21 ` [v2 15/16] mm: install PMD swap entries on swap-out Lance Yang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox