From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out-173.mta1.migadu.com (out-173.mta1.migadu.com [95.215.58.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A054430B50D for ; Fri, 3 Jul 2026 17:39:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.173 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1783100381; cv=none; b=jzWrnzF7HNORL3JhhQmA2VM+Huf0+aXGTOWnRSWX45kxWX9X0DQeGEYr6dfdr78ez2431cVrhP3ldpMoHaU3OTWu3Snv+a+yZtoUTHPKg9DlNl/S0P0/xPC00xNaDgDl9c+yGX7Rb77SfJ0SpM8L7JbAwf/Md1VYoN27o+GApuA= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1783100381; c=relaxed/simple; bh=pYJO5NFde+8rHDsbdj9rYkJXT+3zusT55HxCtYjLkvg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=tyw2dEktvnJUgDfjbWmvYmtpyeB9VODC+FQVgv6jdke452viugVVw3hbq4zxt2nIYIiw5+JiBK6xUFR00777/W8gPKWWIl8cp52FKM7LJxUR1kss/QH7I4Xxxh9RL5HWhjExylxjyxUN7ZBg5pwVFHjy3DU59mT0zU6tyTesjc8= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=Srkat/f5; arc=none smtp.client-ip=95.215.58.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="Srkat/f5" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1783100377; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=kglKe2lO/2DLtx0uqkbUxkjlIDuPAsDBzRzykOy8o0w=; b=Srkat/f54XcoOhBvNcNGPtoxqbvvmwjAGDHDY06w9NQtjuV+ZT2oIagXp2yKTlMqC9nkFA F70iEsN8rq5q5YNooE8d09ZH4dNHnTbg4ePa4DkqdJS3Qux0ApGqjhTTTAdu0X+A0efWDq 0qpRl3acWDcUugdPCt1g9+e+OKMBeZE= From: Usama Arif To: Andrew Morton , david@kernel.org, chrisl@kernel.org, kasong@tencent.com, ljs@kernel.org, ziy@nvidia.com, linux-mm@kvack.org Cc: ying.huang@linux.alibaba.com, Baoquan He , willy@infradead.org, youngjun.park@lge.com, hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev, alex@ghiti.fr, kas@kernel.org, baohua@kernel.org, dev.jain@arm.com, baolin.wang@linux.alibaba.com, npache@redhat.com, Liam R. Howlett , ryan.roberts@arm.com, Vlastimil Babka , lance.yang@linux.dev, linux-kernel@vger.kernel.org, nphamcs@gmail.com, shikemeng@huaweicloud.com, kernel-team@meta.com, Alexandre Ghiti , Usama Arif Subject: [PATCH v3 04/11] mm: zswap: add range lookup for large-folio swapin Date: Fri, 3 Jul 2026 10:38:21 -0700 Message-ID: <20260703173903.3789516-5-usama.arif@linux.dev> In-Reply-To: <20260703173903.3789516-1-usama.arif@linux.dev> References: <20260703173903.3789516-1-usama.arif@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT From: Alexandre Ghiti A large folio reaches zswap_load() only when the caller expects the whole range to be on disk. Zswap still stores large folios as independent order-0 entries, so reconstructing a large folio from zswap entries would risk returning partially initialized data. Teach zswap_load() to scan the covered range. If no slot is in zswap, return -ENOENT so swap_read_folio() reads the backing device. If any slot is still in zswap, fail the large-folio read so the caller can fall back to per-page swapin. Add zswap_range_has_entry() so PMD swap-entry consumers can make the same range decision before attempting PMD-order swapin. Signed-off-by: Alexandre Ghiti Signed-off-by: Usama Arif --- include/linux/zswap.h | 7 +++++++ mm/zswap.c | 46 +++++++++++++++++++++++++++++++++---------- 2 files changed, 43 insertions(+), 10 deletions(-) diff --git a/include/linux/zswap.h b/include/linux/zswap.h index 30c193a1207e..de10aa528597 100644 --- a/include/linux/zswap.h +++ b/include/linux/zswap.h @@ -35,6 +35,7 @@ void zswap_lruvec_state_init(struct lruvec *lruvec); void zswap_folio_swapin(struct folio *folio); bool zswap_is_enabled(void); bool zswap_never_enabled(void); +bool zswap_range_has_entry(swp_entry_t entry, unsigned int nr); #else struct zswap_lruvec_state {}; @@ -69,6 +70,12 @@ static inline bool zswap_never_enabled(void) return true; } +static inline bool zswap_range_has_entry(swp_entry_t entry, + unsigned int nr) +{ + return false; +} + #endif #endif /* _LINUX_ZSWAP_H */ diff --git a/mm/zswap.c b/mm/zswap.c index b5a17ea20237..89dd88a5223f 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -1559,6 +1559,27 @@ bool zswap_store(struct folio *folio) return ret; } +/** + * zswap_range_has_entry() - is any slot in [entry, entry + nr) in zswap? + * @entry: base swap entry of the range + * @nr: number of contiguous slots to check + */ +bool zswap_range_has_entry(swp_entry_t entry, unsigned int nr) +{ + pgoff_t offset = swp_offset(entry); + XA_STATE(xas, swap_zswap_tree(entry), offset); + bool found; + + if (!nr || zswap_never_enabled()) + return false; + + rcu_read_lock(); + found = !!xas_find(&xas, offset + nr - 1); + rcu_read_unlock(); + + return found; +} + /** * zswap_load() - load a folio from zswap * @folio: folio to load @@ -1571,10 +1592,9 @@ bool zswap_store(struct folio *folio) * NOT marked up-to-date, so that an IO error is emitted (e.g. do_swap_page() * will SIGBUS). * - * -EINVAL: if the swapped out content was in zswap, but the page belongs - * to a large folio, which is not supported by zswap. The folio is unlocked, - * but NOT marked up-to-date, so that an IO error is emitted (e.g. - * do_swap_page() will SIGBUS). + * -EIO: if a slot in a large-folio range is unexpectedly still in zswap. + * The folio is unlocked, but NOT marked up-to-date, so that an IO + * error is emitted (e.g. do_swap_page() will SIGBUS). * * -ENOENT: if the swapped out content was not in zswap. The folio remains * locked on return. @@ -1593,13 +1613,19 @@ int zswap_load(struct folio *folio) return -ENOENT; /* - * Large folios should not be swapped in while zswap is being used, as - * they are not properly handled. Zswap does not properly load large - * folios, and a large folio may only be partially in zswap. + * A large folio reaches zswap_load() only when its whole range is + * expected to be on disk: PMD swap-entry consumers split before + * calling into PMD-order swapin whenever any slot is still in zswap. + * Confirm the range is entirely absent from zswap and return -ENOENT + * so the caller reads it from disk; if a slot is unexpectedly still in + * zswap, fail the read rather than return partially-initialized data. */ - if (WARN_ON_ONCE(folio_test_large(folio))) { - folio_unlock(folio); - return -EINVAL; + if (folio_test_large(folio)) { + if (zswap_range_has_entry(swp, folio_nr_pages(folio))) { + folio_unlock(folio); + return -EIO; + } + return -ENOENT; } entry = xa_load(tree, offset); -- 2.53.0-Meta