From: Usama Arif <usama.arif@linux.dev>
To: Andrew Morton <akpm@linux-foundation.org>,
david@kernel.org, chrisl@kernel.org, kasong@tencent.com,
ljs@kernel.org, ziy@nvidia.com, linux-mm@kvack.org
Cc: ying.huang@linux.alibaba.com, Baoquan He <baoquan.he@linux.dev>,
willy@infradead.org, youngjun.park@lge.com, hannes@cmpxchg.org,
riel@surriel.com, shakeel.butt@linux.dev, alex@ghiti.fr,
kas@kernel.org, baohua@kernel.org, dev.jain@arm.com,
baolin.wang@linux.alibaba.com, npache@redhat.com,
Liam R. Howlett <liam@infradead.org>,
ryan.roberts@arm.com, Vlastimil Babka <vbabka@kernel.org>,
lance.yang@linux.dev, linux-kernel@vger.kernel.org,
nphamcs@gmail.com, shikemeng@huaweicloud.com,
kernel-team@meta.com, Alexandre Ghiti <alexghiti@fb.com>,
Usama Arif <usama.arif@linux.dev>
Subject: [PATCH v3 04/11] mm: zswap: add range lookup for large-folio swapin
Date: Fri, 3 Jul 2026 10:38:21 -0700 [thread overview]
Message-ID: <20260703173903.3789516-5-usama.arif@linux.dev> (raw)
In-Reply-To: <20260703173903.3789516-1-usama.arif@linux.dev>
From: Alexandre Ghiti <alexghiti@fb.com>
A large folio reaches zswap_load() only when the caller expects the
whole range to be on disk. Zswap still stores large folios as
independent order-0 entries, so reconstructing a large folio from
zswap entries would risk returning partially initialized data.
Teach zswap_load() to scan the covered range. If no slot is in zswap,
return -ENOENT so swap_read_folio() reads the backing device. If any
slot is still in zswap, fail the large-folio read so the caller can
fall back to per-page swapin.
Add zswap_range_has_entry() so PMD swap-entry consumers can make the
same range decision before attempting PMD-order swapin.
Signed-off-by: Alexandre Ghiti <alexghiti@fb.com>
Signed-off-by: Usama Arif <usama.arif@linux.dev>
---
include/linux/zswap.h | 7 +++++++
mm/zswap.c | 46 +++++++++++++++++++++++++++++++++----------
2 files changed, 43 insertions(+), 10 deletions(-)
diff --git a/include/linux/zswap.h b/include/linux/zswap.h
index 30c193a1207e..de10aa528597 100644
--- a/include/linux/zswap.h
+++ b/include/linux/zswap.h
@@ -35,6 +35,7 @@ void zswap_lruvec_state_init(struct lruvec *lruvec);
void zswap_folio_swapin(struct folio *folio);
bool zswap_is_enabled(void);
bool zswap_never_enabled(void);
+bool zswap_range_has_entry(swp_entry_t entry, unsigned int nr);
#else
struct zswap_lruvec_state {};
@@ -69,6 +70,12 @@ static inline bool zswap_never_enabled(void)
return true;
}
+static inline bool zswap_range_has_entry(swp_entry_t entry,
+ unsigned int nr)
+{
+ return false;
+}
+
#endif
#endif /* _LINUX_ZSWAP_H */
diff --git a/mm/zswap.c b/mm/zswap.c
index b5a17ea20237..89dd88a5223f 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -1559,6 +1559,27 @@ bool zswap_store(struct folio *folio)
return ret;
}
+/**
+ * zswap_range_has_entry() - is any slot in [entry, entry + nr) in zswap?
+ * @entry: base swap entry of the range
+ * @nr: number of contiguous slots to check
+ */
+bool zswap_range_has_entry(swp_entry_t entry, unsigned int nr)
+{
+ pgoff_t offset = swp_offset(entry);
+ XA_STATE(xas, swap_zswap_tree(entry), offset);
+ bool found;
+
+ if (!nr || zswap_never_enabled())
+ return false;
+
+ rcu_read_lock();
+ found = !!xas_find(&xas, offset + nr - 1);
+ rcu_read_unlock();
+
+ return found;
+}
+
/**
* zswap_load() - load a folio from zswap
* @folio: folio to load
@@ -1571,10 +1592,9 @@ bool zswap_store(struct folio *folio)
* NOT marked up-to-date, so that an IO error is emitted (e.g. do_swap_page()
* will SIGBUS).
*
- * -EINVAL: if the swapped out content was in zswap, but the page belongs
- * to a large folio, which is not supported by zswap. The folio is unlocked,
- * but NOT marked up-to-date, so that an IO error is emitted (e.g.
- * do_swap_page() will SIGBUS).
+ * -EIO: if a slot in a large-folio range is unexpectedly still in zswap.
+ * The folio is unlocked, but NOT marked up-to-date, so that an IO
+ * error is emitted (e.g. do_swap_page() will SIGBUS).
*
* -ENOENT: if the swapped out content was not in zswap. The folio remains
* locked on return.
@@ -1593,13 +1613,19 @@ int zswap_load(struct folio *folio)
return -ENOENT;
/*
- * Large folios should not be swapped in while zswap is being used, as
- * they are not properly handled. Zswap does not properly load large
- * folios, and a large folio may only be partially in zswap.
+ * A large folio reaches zswap_load() only when its whole range is
+ * expected to be on disk: PMD swap-entry consumers split before
+ * calling into PMD-order swapin whenever any slot is still in zswap.
+ * Confirm the range is entirely absent from zswap and return -ENOENT
+ * so the caller reads it from disk; if a slot is unexpectedly still in
+ * zswap, fail the read rather than return partially-initialized data.
*/
- if (WARN_ON_ONCE(folio_test_large(folio))) {
- folio_unlock(folio);
- return -EINVAL;
+ if (folio_test_large(folio)) {
+ if (zswap_range_has_entry(swp, folio_nr_pages(folio))) {
+ folio_unlock(folio);
+ return -EIO;
+ }
+ return -ENOENT;
}
entry = xa_load(tree, offset);
--
2.53.0-Meta
next prev parent reply other threads:[~2026-07-03 17:39 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-07-03 17:38 [PATCH v3 00/11] mm: PMD-level swap entries for anonymous THPs Usama Arif
2026-07-03 17:38 ` [PATCH v3 01/11] mm: add PMD swap entry detection support Usama Arif
2026-07-03 17:38 ` [PATCH v3 02/11] mm: add PMD swap entry splitting support Usama Arif
2026-07-03 17:38 ` [PATCH v3 03/11] mm: handle PMD swap entries in fork path Usama Arif
2026-07-03 17:38 ` Usama Arif [this message]
2026-07-03 17:38 ` [PATCH v3 05/11] mm: swap in PMD swap entries as whole THPs during swapoff Usama Arif
2026-07-03 17:38 ` [PATCH v3 06/11] mm: handle PMD swap entries in non-present PMD walkers Usama Arif
2026-07-03 17:38 ` [PATCH v3 07/11] mm: handle PMD swap entries in MADV_WILLNEED Usama Arif
2026-07-03 17:38 ` [PATCH v3 08/11] mm: handle PMD swap entries in UFFDIO_MOVE Usama Arif
2026-07-03 17:38 ` [PATCH v3 09/11] mm: handle PMD swap entry faults on swap-in Usama Arif
2026-07-03 17:38 ` [PATCH v3 10/11] mm: install PMD swap entries on swap-out Usama Arif
2026-07-03 17:38 ` [PATCH v3 11/11] selftests/mm: add PMD swap entry tests Usama Arif
2026-07-04 6:27 ` kernel test robot
2026-07-04 8:30 ` kernel test robot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260703173903.3789516-5-usama.arif@linux.dev \
--to=usama.arif@linux.dev \
--cc=akpm@linux-foundation.org \
--cc=alex@ghiti.fr \
--cc=alexghiti@fb.com \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=baoquan.he@linux.dev \
--cc=chrisl@kernel.org \
--cc=david@kernel.org \
--cc=dev.jain@arm.com \
--cc=hannes@cmpxchg.org \
--cc=kas@kernel.org \
--cc=kasong@tencent.com \
--cc=kernel-team@meta.com \
--cc=lance.yang@linux.dev \
--cc=liam@infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=npache@redhat.com \
--cc=nphamcs@gmail.com \
--cc=riel@surriel.com \
--cc=ryan.roberts@arm.com \
--cc=shakeel.butt@linux.dev \
--cc=shikemeng@huaweicloud.com \
--cc=vbabka@kernel.org \
--cc=willy@infradead.org \
--cc=ying.huang@linux.alibaba.com \
--cc=youngjun.park@lge.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox