* + mm-swap-split-swap-cache-preparation-loop-into-a-standalone-helper.patch added to mm-new branch
@ 2025-12-20 22:04 Andrew Morton
0 siblings, 0 replies; only message in thread
From: Andrew Morton @ 2025-12-20 22:04 UTC (permalink / raw)
To: mm-commits, yosry.ahmed, rafael, nphamcs, chrisl, bhe,
baolin.wang, baohua, kasong, akpm
The patch titled
Subject: mm, swap: split swap cache preparation loop into a standalone helper
has been added to the -mm mm-new branch. Its filename is
mm-swap-split-swap-cache-preparation-loop-into-a-standalone-helper.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/mm-swap-split-swap-cache-preparation-loop-into-a-standalone-helper.patch
This patch will later appear in the mm-new branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Note, mm-new is a provisional staging ground for work-in-progress
patches, and acceptance into mm-new is a notification for others take
notice and to finish up reviews. Please do not hesitate to respond to
review feedback and post updated versions to replace or incrementally
fixup patches in mm-new.
The mm-new branch of mm.git is not included in linux-next
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via various
branches at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there most days
------------------------------------------------------
From: Kairui Song <kasong@tencent.com>
Subject: mm, swap: split swap cache preparation loop into a standalone helper
Date: Sat, 20 Dec 2025 03:43:31 +0800
To prepare for the removal of swap cache bypass swapin, introduce a new
helper that accepts an allocated and charged fresh folio, prepares the
folio, the swap map, and then adds the folio to the swap cache.
This doesn't change how swap cache works yet, we are still depending on
the SWAP_HAS_CACHE in the swap map for synchronization. But all
synchronization hacks are now all in this single helper.
No feature change.
Link: https://lkml.kernel.org/r/20251220-swap-table-p2-v5-2-8862a265a033@tencent.com
Signed-off-by: Kairui Song <kasong@tencent.com>
Acked-by: Chris Li <chrisl@kernel.org>
Reviewed-by: Barry Song <baohua@kernel.org>
Reviewed-by: Baoquan He <bhe@redhat.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Rafael J. Wysocki (Intel) <rafael@kernel.org>
Cc: Yosry Ahmed <yosry.ahmed@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/swap_state.c | 201 +++++++++++++++++++++++++---------------------
1 file changed, 111 insertions(+), 90 deletions(-)
--- a/mm/swap_state.c~mm-swap-split-swap-cache-preparation-loop-into-a-standalone-helper
+++ a/mm/swap_state.c
@@ -403,6 +403,97 @@ void swap_update_readahead(struct folio
}
/**
+ * __swap_cache_prepare_and_add - Prepare the folio and add it to swap cache.
+ * @entry: swap entry to be bound to the folio.
+ * @folio: folio to be added.
+ * @gfp: memory allocation flags for charge, can be 0 if @charged if true.
+ * @charged: if the folio is already charged.
+ * @skip_if_exists: if the slot is in a cached state, return NULL.
+ * This is an old workaround that will be removed shortly.
+ *
+ * Update the swap_map and add folio as swap cache, typically before swapin.
+ * All swap slots covered by the folio must have a non-zero swap count.
+ *
+ * Context: Caller must protect the swap device with reference count or locks.
+ * Return: Returns the folio being added on success. Returns the existing folio
+ * if @entry is already cached. Returns NULL if raced with swapin or swapoff.
+ */
+static struct folio *__swap_cache_prepare_and_add(swp_entry_t entry,
+ struct folio *folio,
+ gfp_t gfp, bool charged,
+ bool skip_if_exists)
+{
+ struct folio *swapcache;
+ void *shadow;
+ int ret;
+
+ /*
+ * Check and pin the swap map with SWAP_HAS_CACHE, then add the folio
+ * into the swap cache. Loop with a schedule delay if raced with
+ * another process setting SWAP_HAS_CACHE. This hackish loop will
+ * be fixed very soon.
+ */
+ for (;;) {
+ ret = swapcache_prepare(entry, folio_nr_pages(folio));
+ if (!ret)
+ break;
+
+ /*
+ * The skip_if_exists is for protecting against a recursive
+ * call to this helper on the same entry waiting forever
+ * here because SWAP_HAS_CACHE is set but the folio is not
+ * in the swap cache yet. This can happen today if
+ * mem_cgroup_swapin_charge_folio() below triggers reclaim
+ * through zswap, which may call this helper again in the
+ * writeback path.
+ *
+ * Large order allocation also needs special handling on
+ * race: if a smaller folio exists in cache, swapin needs
+ * to fallback to order 0, and doing a swap cache lookup
+ * might return a folio that is irrelevant to the faulting
+ * entry because @entry is aligned down. Just return NULL.
+ */
+ if (ret != -EEXIST || skip_if_exists || folio_test_large(folio))
+ return NULL;
+
+ /*
+ * Check the swap cache again, we can only arrive
+ * here because swapcache_prepare returns -EEXIST.
+ */
+ swapcache = swap_cache_get_folio(entry);
+ if (swapcache)
+ return swapcache;
+
+ /*
+ * We might race against __swap_cache_del_folio(), and
+ * stumble across a swap_map entry whose SWAP_HAS_CACHE
+ * has not yet been cleared. Or race against another
+ * swap_cache_alloc_folio(), which has set SWAP_HAS_CACHE
+ * in swap_map, but not yet added its folio to swap cache.
+ */
+ schedule_timeout_uninterruptible(1);
+ }
+
+ __folio_set_locked(folio);
+ __folio_set_swapbacked(folio);
+
+ if (!charged && mem_cgroup_swapin_charge_folio(folio, NULL, gfp, entry)) {
+ put_swap_folio(folio, entry);
+ folio_unlock(folio);
+ return NULL;
+ }
+
+ swap_cache_add_folio(folio, entry, &shadow);
+ memcg1_swapin(entry, folio_nr_pages(folio));
+ if (shadow)
+ workingset_refault(folio, shadow);
+
+ /* Caller will initiate read into locked folio */
+ folio_add_lru(folio);
+ return folio;
+}
+
+/**
* swap_cache_alloc_folio - Allocate folio for swapped out slot in swap cache.
* @entry: the swapped out swap entry to be binded to the folio.
* @gfp_mask: memory allocation flags
@@ -428,99 +519,29 @@ struct folio *swap_cache_alloc_folio(swp
{
struct swap_info_struct *si = __swap_entry_to_info(entry);
struct folio *folio;
- struct folio *new_folio = NULL;
struct folio *result = NULL;
- void *shadow = NULL;
*new_page_allocated = false;
- for (;;) {
- int err;
-
- /*
- * Check the swap cache first, if a cached folio is found,
- * return it unlocked. The caller will lock and check it.
- */
- folio = swap_cache_get_folio(entry);
- if (folio)
- goto got_folio;
-
- /*
- * Just skip read ahead for unused swap slot.
- */
- if (!swap_entry_swapped(si, entry))
- goto put_and_return;
-
- /*
- * Get a new folio to read into from swap. Allocate it now if
- * new_folio not exist, before marking swap_map SWAP_HAS_CACHE,
- * when -EEXIST will cause any racers to loop around until we
- * add it to cache.
- */
- if (!new_folio) {
- new_folio = folio_alloc_mpol(gfp_mask, 0, mpol, ilx, numa_node_id());
- if (!new_folio)
- goto put_and_return;
- }
-
- /*
- * Swap entry may have been freed since our caller observed it.
- */
- err = swapcache_prepare(entry, 1);
- if (!err)
- break;
- else if (err != -EEXIST)
- goto put_and_return;
-
- /*
- * Protect against a recursive call to swap_cache_alloc_folio()
- * on the same entry waiting forever here because SWAP_HAS_CACHE
- * is set but the folio is not the swap cache yet. This can
- * happen today if mem_cgroup_swapin_charge_folio() below
- * triggers reclaim through zswap, which may call
- * swap_cache_alloc_folio() in the writeback path.
- */
- if (skip_if_exists)
- goto put_and_return;
-
- /*
- * We might race against __swap_cache_del_folio(), and
- * stumble across a swap_map entry whose SWAP_HAS_CACHE
- * has not yet been cleared. Or race against another
- * swap_cache_alloc_folio(), which has set SWAP_HAS_CACHE
- * in swap_map, but not yet added its folio to swap cache.
- */
- schedule_timeout_uninterruptible(1);
- }
-
- /*
- * The swap entry is ours to swap in. Prepare the new folio.
- */
- __folio_set_locked(new_folio);
- __folio_set_swapbacked(new_folio);
-
- if (mem_cgroup_swapin_charge_folio(new_folio, NULL, gfp_mask, entry))
- goto fail_unlock;
-
- swap_cache_add_folio(new_folio, entry, &shadow);
- memcg1_swapin(entry, 1);
-
- if (shadow)
- workingset_refault(new_folio, shadow);
-
- /* Caller will initiate read into locked new_folio */
- folio_add_lru(new_folio);
- *new_page_allocated = true;
- folio = new_folio;
-got_folio:
- result = folio;
- goto put_and_return;
-
-fail_unlock:
- put_swap_folio(new_folio, entry);
- folio_unlock(new_folio);
-put_and_return:
- if (!(*new_page_allocated) && new_folio)
- folio_put(new_folio);
+ /* Check the swap cache again for readahead path. */
+ folio = swap_cache_get_folio(entry);
+ if (folio)
+ return folio;
+
+ /* Skip allocation for unused swap slot for readahead path. */
+ if (!swap_entry_swapped(si, entry))
+ return NULL;
+
+ /* Allocate a new folio to be added into the swap cache. */
+ folio = folio_alloc_mpol(gfp_mask, 0, mpol, ilx, numa_node_id());
+ if (!folio)
+ return NULL;
+ /* Try add the new folio, returns existing folio or NULL on failure. */
+ result = __swap_cache_prepare_and_add(entry, folio, gfp_mask,
+ false, skip_if_exists);
+ if (result == folio)
+ *new_page_allocated = true;
+ else
+ folio_put(folio);
return result;
}
_
Patches currently in -mm which might be from kasong@tencent.com are
mm-swap-rename-__read_swap_cache_async-to-swap_cache_alloc_folio.patch
mm-swap-split-swap-cache-preparation-loop-into-a-standalone-helper.patch
mm-swap-never-bypass-the-swap-cache-even-for-swp_synchronous_io.patch
mm-swap-always-try-to-free-swap-cache-for-swp_synchronous_io-devices.patch
mm-swap-simplify-the-code-and-reduce-indention.patch
mm-swap-free-the-swap-cache-after-folio-is-mapped.patch
mm-shmem-never-bypass-the-swap-cache-for-swp_synchronous_io.patch
mm-swap-swap-entry-of-a-bad-slot-should-not-be-considered-as-swapped-out.patch
mm-swap-consolidate-cluster-reclaim-and-usability-check.patch
mm-swap-split-locked-entry-duplicating-into-a-standalone-helper.patch
mm-swap-use-swap-cache-as-the-swap-in-synchronize-layer.patch
mm-swap-remove-workaround-for-unsynchronized-swap-map-cache-state.patch
mm-swap-cleanup-swap-entry-management-workflow.patch
mm-swap-add-folio-to-swap-cache-directly-on-allocation.patch
mm-swap-check-swap-table-directly-for-checking-cache.patch
mm-swap-clean-up-and-improve-swap-entries-freeing.patch
mm-swap-drop-the-swap_has_cache-flag.patch
mm-swap-remove-no-longer-needed-_swap_info_get.patch
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2025-12-20 22:04 UTC | newest]
Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-20 22:04 + mm-swap-split-swap-cache-preparation-loop-into-a-standalone-helper.patch added to mm-new branch Andrew Morton
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.