From: Hui Zhu <hui.zhu@linux.dev>
To: Andrew Morton <akpm@linux-foundation.org>,
Chris Li <chrisl@kernel.org>, Kairui Song <kasong@tencent.com>,
Kemeng Shi <shikemeng@huaweicloud.com>,
Nhat Pham <nphamcs@gmail.com>, Baoquan He <bhe@redhat.com>,
Barry Song <baohua@kernel.org>,
YoungJun Park <youngjun.park@lge.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Cc: Hui Zhu <zhuhui@kylinos.cn>
Subject: [PATCH v3] mm/swap: strengthen locking assertions and invariants in cluster allocation
Date: Tue, 10 Mar 2026 09:56:57 +0800 [thread overview]
Message-ID: <20260310015657.42395-1-hui.zhu@linux.dev> (raw)
From: Hui Zhu <zhuhui@kylinos.cn>
The swap_cluster_alloc_table() function requires several locks to be held
by its callers: ci->lock, the per-CPU swap_cluster lock, and, for
non-solid-state devices (non-SWP_SOLIDSTATE), the si->global_cluster_lock.
While most call paths (e.g., via cluster_alloc_swap_entry() or
alloc_swap_scan_list()) correctly acquire these locks before invocation,
the path through swap_reclaim_work() -> swap_reclaim_full_clusters() ->
isolate_lock_cluster() is distinct. This path operates exclusively on
si->full_clusters, where the swap allocation tables are guaranteed to be
already allocated. Consequently, isolate_lock_cluster() should never
trigger a call to swap_cluster_alloc_table() for these clusters.
Strengthen the locking and state assertions to formalize these invariants:
1. Add a lockdep_assert_held() for si->global_cluster_lock in
swap_cluster_alloc_table() for non-SWP_SOLIDSTATE devices.
2. Reorder existing lockdep assertions in swap_cluster_alloc_table() to
match the actual lock acquisition order (per-CPU lock, then global lock,
then cluster lock).
3. Add a VM_WARN_ON_ONCE() in isolate_lock_cluster() to ensure that table
allocations are only attempted for clusters being isolated from the
free list. Attempting to allocate a table for a cluster from other
lists (like the full list during reclaim) indicates a violation of
subsystem invariants.
These changes ensure locking consistency and help catch potential
synchronization or logic issues during development.
Changelog:
v3:
According to the comments of Kairui Song, squash patches and fix logic
bug in isolate_lock_cluster() where flags were cleared before check.
v2:
According to the comments of YoungJun Park, Kairui Song and Chris Li,
change acquire locks in swap_reclaim_work() to adds a VM_WARN_ON in
isolate_lock_cluster().
According to the comments of YoungJun Park, add code in patch 2 to Change
the order of lockdep_assert_held() to match the actual lock acquisition
order.
Reviewed-by: Youngjun Park <youngjun.park@lge.com>
Signed-off-by: Hui Zhu <zhuhui@kylinos.cn>
---
mm/swapfile.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 94af29d1de88..4e0fb1ce5245 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -476,8 +476,10 @@ swap_cluster_alloc_table(struct swap_info_struct *si,
* Only cluster isolation from the allocator does table allocation.
* Swap allocator uses percpu clusters and holds the local lock.
*/
- lockdep_assert_held(&ci->lock);
lockdep_assert_held(&this_cpu_ptr(&percpu_swap_cluster)->lock);
+ if (!(si->flags & SWP_SOLIDSTATE))
+ lockdep_assert_held(&si->global_cluster_lock);
+ lockdep_assert_held(&ci->lock);
/* The cluster must be free and was just isolated from the free list. */
VM_WARN_ON_ONCE(ci->flags || !cluster_is_empty(ci));
@@ -577,6 +579,7 @@ static struct swap_cluster_info *isolate_lock_cluster(
struct swap_info_struct *si, struct list_head *list)
{
struct swap_cluster_info *ci, *found = NULL;
+ u8 flags;
spin_lock(&si->lock);
list_for_each_entry(ci, list, list) {
@@ -589,6 +592,7 @@ static struct swap_cluster_info *isolate_lock_cluster(
ci->flags != CLUSTER_FLAG_FULL);
list_del(&ci->list);
+ flags = ci->flags;
ci->flags = CLUSTER_FLAG_NONE;
found = ci;
break;
@@ -596,6 +600,9 @@ static struct swap_cluster_info *isolate_lock_cluster(
spin_unlock(&si->lock);
if (found && !cluster_table_is_alloced(found)) {
+ /* Table of non-free cluster must be allocated. */
+ VM_WARN_ON_ONCE(flags != CLUSTER_FLAG_FREE);
+
/* Only an empty free cluster's swap table can be freed. */
VM_WARN_ON_ONCE(list != &si->free_clusters);
VM_WARN_ON_ONCE(!cluster_is_empty(found));
--
2.43.0
next reply other threads:[~2026-03-10 1:57 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-10 1:56 Hui Zhu [this message]
2026-03-10 22:07 ` [PATCH v3] mm/swap: strengthen locking assertions and invariants in cluster allocation Barry Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260310015657.42395-1-hui.zhu@linux.dev \
--to=hui.zhu@linux.dev \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=bhe@redhat.com \
--cc=chrisl@kernel.org \
--cc=kasong@tencent.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nphamcs@gmail.com \
--cc=shikemeng@huaweicloud.com \
--cc=youngjun.park@lge.com \
--cc=zhuhui@kylinos.cn \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.