* [PATCH v4] mm/swap: strengthen locking assertions and invariants in cluster allocation
@ 2026-03-11 2:22 Hui Zhu
2026-03-11 17:34 ` Chris Li
0 siblings, 1 reply; 4+ messages in thread
From: Hui Zhu @ 2026-03-11 2:22 UTC (permalink / raw)
To: Andrew Morton, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham,
Baoquan He, Barry Song, YoungJun Park, linux-mm, linux-kernel
Cc: Hui Zhu
From: Hui Zhu <zhuhui@kylinos.cn>
The swap_cluster_alloc_table() function requires several locks to be held
by its callers: ci->lock, the per-CPU swap_cluster lock, and, for
non-solid-state devices (non-SWP_SOLIDSTATE), the si->global_cluster_lock.
While most call paths (e.g., via cluster_alloc_swap_entry() or
alloc_swap_scan_list()) correctly acquire these locks before invocation,
the path through swap_reclaim_work() -> swap_reclaim_full_clusters() ->
isolate_lock_cluster() is distinct. This path operates exclusively on
si->full_clusters, where the swap allocation tables are guaranteed to be
already allocated. Consequently, isolate_lock_cluster() should never
trigger a call to swap_cluster_alloc_table() for these clusters.
Strengthen the locking and state assertions to formalize these invariants:
1. Add a lockdep_assert_held() for si->global_cluster_lock in
swap_cluster_alloc_table() for non-SWP_SOLIDSTATE devices.
2. Reorder existing lockdep assertions in swap_cluster_alloc_table() to
match the actual lock acquisition order (per-CPU lock, then global lock,
then cluster lock).
3. Add a VM_WARN_ON_ONCE() in isolate_lock_cluster() to ensure that table
allocations are only attempted for clusters being isolated from the
free list. Attempting to allocate a table for a cluster from other
lists (like the full list during reclaim) indicates a violation of
subsystem invariants.
These changes ensure locking consistency and help catch potential
synchronization or logic issues during development.
Changelog:
v4:
According to the comments of Barry Song, remove redundant comment.
v3:
According to the comments of Kairui Song, squash patches and fix logic
bug in isolate_lock_cluster() where flags were cleared before check.
v2:
According to the comments of YoungJun Park, Kairui Song and Chris Li,
change acquire locks in swap_reclaim_work() to adds a VM_WARN_ON in
isolate_lock_cluster().
According to the comments of YoungJun Park, add code in patch 2 to Change
the order of lockdep_assert_held() to match the actual lock acquisition
order.
Reviewed-by: Youngjun Park <youngjun.park@lge.com>
Reviewed-by: Barry Song <baohua@kernel.org>
Signed-off-by: Hui Zhu <zhuhui@kylinos.cn>
---
mm/swapfile.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 94af29d1de88..e25cdb0046d8 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -476,8 +476,10 @@ swap_cluster_alloc_table(struct swap_info_struct *si,
* Only cluster isolation from the allocator does table allocation.
* Swap allocator uses percpu clusters and holds the local lock.
*/
- lockdep_assert_held(&ci->lock);
lockdep_assert_held(&this_cpu_ptr(&percpu_swap_cluster)->lock);
+ if (!(si->flags & SWP_SOLIDSTATE))
+ lockdep_assert_held(&si->global_cluster_lock);
+ lockdep_assert_held(&ci->lock);
/* The cluster must be free and was just isolated from the free list. */
VM_WARN_ON_ONCE(ci->flags || !cluster_is_empty(ci));
@@ -577,6 +579,7 @@ static struct swap_cluster_info *isolate_lock_cluster(
struct swap_info_struct *si, struct list_head *list)
{
struct swap_cluster_info *ci, *found = NULL;
+ u8 flags;
spin_lock(&si->lock);
list_for_each_entry(ci, list, list) {
@@ -589,6 +592,7 @@ static struct swap_cluster_info *isolate_lock_cluster(
ci->flags != CLUSTER_FLAG_FULL);
list_del(&ci->list);
+ flags = ci->flags;
ci->flags = CLUSTER_FLAG_NONE;
found = ci;
break;
@@ -597,6 +601,7 @@ static struct swap_cluster_info *isolate_lock_cluster(
if (found && !cluster_table_is_alloced(found)) {
/* Only an empty free cluster's swap table can be freed. */
+ VM_WARN_ON_ONCE(flags != CLUSTER_FLAG_FREE);
VM_WARN_ON_ONCE(list != &si->free_clusters);
VM_WARN_ON_ONCE(!cluster_is_empty(found));
return swap_cluster_alloc_table(si, found);
--
2.43.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH v4] mm/swap: strengthen locking assertions and invariants in cluster allocation
2026-03-11 2:22 [PATCH v4] mm/swap: strengthen locking assertions and invariants in cluster allocation Hui Zhu
@ 2026-03-11 17:34 ` Chris Li
2026-03-12 2:13 ` Geliang Tang
2026-03-12 2:15 ` hui.zhu
0 siblings, 2 replies; 4+ messages in thread
From: Chris Li @ 2026-03-11 17:34 UTC (permalink / raw)
To: Hui Zhu
Cc: Andrew Morton, Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He,
Barry Song, YoungJun Park, linux-mm, linux-kernel, Hui Zhu
On Tue, Mar 10, 2026 at 7:23 PM Hui Zhu <hui.zhu@linux.dev> wrote:
>
> From: Hui Zhu <zhuhui@kylinos.cn>
>
> The swap_cluster_alloc_table() function requires several locks to be held
> by its callers: ci->lock, the per-CPU swap_cluster lock, and, for
> non-solid-state devices (non-SWP_SOLIDSTATE), the si->global_cluster_lock.
>
> While most call paths (e.g., via cluster_alloc_swap_entry() or
> alloc_swap_scan_list()) correctly acquire these locks before invocation,
> the path through swap_reclaim_work() -> swap_reclaim_full_clusters() ->
> isolate_lock_cluster() is distinct. This path operates exclusively on
> si->full_clusters, where the swap allocation tables are guaranteed to be
> already allocated. Consequently, isolate_lock_cluster() should never
> trigger a call to swap_cluster_alloc_table() for these clusters.
>
> Strengthen the locking and state assertions to formalize these invariants:
>
> 1. Add a lockdep_assert_held() for si->global_cluster_lock in
> swap_cluster_alloc_table() for non-SWP_SOLIDSTATE devices.
> 2. Reorder existing lockdep assertions in swap_cluster_alloc_table() to
> match the actual lock acquisition order (per-CPU lock, then global lock,
> then cluster lock).
> 3. Add a VM_WARN_ON_ONCE() in isolate_lock_cluster() to ensure that table
> allocations are only attempted for clusters being isolated from the
> free list. Attempting to allocate a table for a cluster from other
> lists (like the full list during reclaim) indicates a violation of
> subsystem invariants.
>
> These changes ensure locking consistency and help catch potential
> synchronization or logic issues during development.
>
> Changelog:
> v4:
> According to the comments of Barry Song, remove redundant comment.
> v3:
> According to the comments of Kairui Song, squash patches and fix logic
> bug in isolate_lock_cluster() where flags were cleared before check.
> v2:
> According to the comments of YoungJun Park, Kairui Song and Chris Li,
> change acquire locks in swap_reclaim_work() to adds a VM_WARN_ON in
> isolate_lock_cluster().
> According to the comments of YoungJun Park, add code in patch 2 to Change
> the order of lockdep_assert_held() to match the actual lock acquisition
> order.
>
> Reviewed-by: Youngjun Park <youngjun.park@lge.com>
> Reviewed-by: Barry Song <baohua@kernel.org>
> Signed-off-by: Hui Zhu <zhuhui@kylinos.cn>
Acked-by: Chris Li <chrisl@kernel.org>
> ---
> mm/swapfile.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 94af29d1de88..e25cdb0046d8 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -476,8 +476,10 @@ swap_cluster_alloc_table(struct swap_info_struct *si,
> * Only cluster isolation from the allocator does table allocation.
> * Swap allocator uses percpu clusters and holds the local lock.
> */
> - lockdep_assert_held(&ci->lock);
> lockdep_assert_held(&this_cpu_ptr(&percpu_swap_cluster)->lock);
> + if (!(si->flags & SWP_SOLIDSTATE))
> + lockdep_assert_held(&si->global_cluster_lock);
> + lockdep_assert_held(&ci->lock);
>
> /* The cluster must be free and was just isolated from the free list. */
> VM_WARN_ON_ONCE(ci->flags || !cluster_is_empty(ci));
> @@ -577,6 +579,7 @@ static struct swap_cluster_info *isolate_lock_cluster(
> struct swap_info_struct *si, struct list_head *list)
> {
> struct swap_cluster_info *ci, *found = NULL;
> + u8 flags;
Nit pick: consider initializing the value. The flags assignment occurs
in a conditional block. The compiler might or might not realize the
"flags" assigned only if "found" is also assigned, and might complain
that flags can be used without initialization.
>
> spin_lock(&si->lock);
> list_for_each_entry(ci, list, list) {
> @@ -589,6 +592,7 @@ static struct swap_cluster_info *isolate_lock_cluster(
> ci->flags != CLUSTER_FLAG_FULL);
>
> list_del(&ci->list);
> + flags = ci->flags;
If VM debug is disabled, this variable is not used after its value is
assigned. Please test it with gcc and llvm (VM debug disabled) to
ensure it doesn't generate any warnings. I don't expect it to be, I
just want to make sure.
Chris
> ci->flags = CLUSTER_FLAG_NONE;
> found = ci;
> break;
> @@ -597,6 +601,7 @@ static struct swap_cluster_info *isolate_lock_cluster(
>
> if (found && !cluster_table_is_alloced(found)) {
> /* Only an empty free cluster's swap table can be freed. */
> + VM_WARN_ON_ONCE(flags != CLUSTER_FLAG_FREE);
> VM_WARN_ON_ONCE(list != &si->free_clusters);
> VM_WARN_ON_ONCE(!cluster_is_empty(found));
> return swap_cluster_alloc_table(si, found);
> --
> 2.43.0
>
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v4] mm/swap: strengthen locking assertions and invariants in cluster allocation
2026-03-11 17:34 ` Chris Li
@ 2026-03-12 2:13 ` Geliang Tang
2026-03-12 2:15 ` hui.zhu
1 sibling, 0 replies; 4+ messages in thread
From: Geliang Tang @ 2026-03-12 2:13 UTC (permalink / raw)
To: Chris Li, Hui Zhu
Cc: Andrew Morton, Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He,
Barry Song, YoungJun Park, linux-mm, linux-kernel, Hui Zhu
Hi Hui,
On Wed, 2026-03-11 at 10:34 -0700, Chris Li wrote:
> On Tue, Mar 10, 2026 at 7:23 PM Hui Zhu <hui.zhu@linux.dev> wrote:
> >
> > From: Hui Zhu <zhuhui@kylinos.cn>
> >
> > The swap_cluster_alloc_table() function requires several locks to
> > be held
> > by its callers: ci->lock, the per-CPU swap_cluster lock, and, for
> > non-solid-state devices (non-SWP_SOLIDSTATE), the si-
> > >global_cluster_lock.
> >
> > While most call paths (e.g., via cluster_alloc_swap_entry() or
> > alloc_swap_scan_list()) correctly acquire these locks before
> > invocation,
> > the path through swap_reclaim_work() ->
> > swap_reclaim_full_clusters() ->
> > isolate_lock_cluster() is distinct. This path operates exclusively
> > on
> > si->full_clusters, where the swap allocation tables are guaranteed
> > to be
> > already allocated. Consequently, isolate_lock_cluster() should
> > never
> > trigger a call to swap_cluster_alloc_table() for these clusters.
> >
> > Strengthen the locking and state assertions to formalize these
> > invariants:
> >
> > 1. Add a lockdep_assert_held() for si->global_cluster_lock in
> > swap_cluster_alloc_table() for non-SWP_SOLIDSTATE devices.
> > 2. Reorder existing lockdep assertions in
> > swap_cluster_alloc_table() to
> > match the actual lock acquisition order (per-CPU lock, then
> > global lock,
> > then cluster lock).
> > 3. Add a VM_WARN_ON_ONCE() in isolate_lock_cluster() to ensure that
> > table
> > allocations are only attempted for clusters being isolated from
> > the
> > free list. Attempting to allocate a table for a cluster from
> > other
> > lists (like the full list during reclaim) indicates a violation
> > of
> > subsystem invariants.
> >
> > These changes ensure locking consistency and help catch potential
> > synchronization or logic issues during development.
> >
> > Changelog:
> > v4:
> > According to the comments of Barry Song, remove redundant comment.
> > v3:
> > According to the comments of Kairui Song, squash patches and fix
> > logic
> > bug in isolate_lock_cluster() where flags were cleared before
> > check.
> > v2:
> > According to the comments of YoungJun Park, Kairui Song and Chris
> > Li,
> > change acquire locks in swap_reclaim_work() to adds a VM_WARN_ON in
> > isolate_lock_cluster().
> > According to the comments of YoungJun Park, add code in patch 2 to
> > Change
> > the order of lockdep_assert_held() to match the actual lock
> > acquisition
> > order.
> >
> > Reviewed-by: Youngjun Park <youngjun.park@lge.com>
> > Reviewed-by: Barry Song <baohua@kernel.org>
> > Signed-off-by: Hui Zhu <zhuhui@kylinos.cn>
>
> Acked-by: Chris Li <chrisl@kernel.org>
Acked-by: Geliang Tang <geliang@kernel.org>
Thanks,
-Geliang
>
> > ---
> > mm/swapfile.c | 7 ++++++-
> > 1 file changed, 6 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/swapfile.c b/mm/swapfile.c
> > index 94af29d1de88..e25cdb0046d8 100644
> > --- a/mm/swapfile.c
> > +++ b/mm/swapfile.c
> > @@ -476,8 +476,10 @@ swap_cluster_alloc_table(struct
> > swap_info_struct *si,
> > * Only cluster isolation from the allocator does table
> > allocation.
> > * Swap allocator uses percpu clusters and holds the local
> > lock.
> > */
> > - lockdep_assert_held(&ci->lock);
> > lockdep_assert_held(&this_cpu_ptr(&percpu_swap_cluster)-
> > >lock);
> > + if (!(si->flags & SWP_SOLIDSTATE))
> > + lockdep_assert_held(&si->global_cluster_lock);
> > + lockdep_assert_held(&ci->lock);
> >
> > /* The cluster must be free and was just isolated from the
> > free list. */
> > VM_WARN_ON_ONCE(ci->flags || !cluster_is_empty(ci));
> > @@ -577,6 +579,7 @@ static struct swap_cluster_info
> > *isolate_lock_cluster(
> > struct swap_info_struct *si, struct list_head
> > *list)
> > {
> > struct swap_cluster_info *ci, *found = NULL;
> > + u8 flags;
>
> Nit pick: consider initializing the value. The flags assignment
> occurs
> in a conditional block. The compiler might or might not realize the
> "flags" assigned only if "found" is also assigned, and might complain
> that flags can be used without initialization.
>
> >
> > spin_lock(&si->lock);
> > list_for_each_entry(ci, list, list) {
> > @@ -589,6 +592,7 @@ static struct swap_cluster_info
> > *isolate_lock_cluster(
> > ci->flags != CLUSTER_FLAG_FULL);
> >
> > list_del(&ci->list);
> > + flags = ci->flags;
>
> If VM debug is disabled, this variable is not used after its value is
> assigned. Please test it with gcc and llvm (VM debug disabled) to
> ensure it doesn't generate any warnings. I don't expect it to be, I
> just want to make sure.
>
> Chris
>
> > ci->flags = CLUSTER_FLAG_NONE;
> > found = ci;
> > break;
> > @@ -597,6 +601,7 @@ static struct swap_cluster_info
> > *isolate_lock_cluster(
> >
> > if (found && !cluster_table_is_alloced(found)) {
> > /* Only an empty free cluster's swap table can be
> > freed. */
> > + VM_WARN_ON_ONCE(flags != CLUSTER_FLAG_FREE);
> > VM_WARN_ON_ONCE(list != &si->free_clusters);
> > VM_WARN_ON_ONCE(!cluster_is_empty(found));
> > return swap_cluster_alloc_table(si, found);
> > --
> > 2.43.0
> >
> >
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v4] mm/swap: strengthen locking assertions and invariants in cluster allocation
2026-03-11 17:34 ` Chris Li
2026-03-12 2:13 ` Geliang Tang
@ 2026-03-12 2:15 ` hui.zhu
1 sibling, 0 replies; 4+ messages in thread
From: hui.zhu @ 2026-03-12 2:15 UTC (permalink / raw)
To: Chris Li
Cc: Andrew Morton, Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He,
Barry Song, YoungJun Park, linux-mm, linux-kernel, Hui Zhu
2026年3月12日 01:34, "Chris Li" <chrisl@kernel.org mailto:chrisl@kernel.org?to=%22Chris%20Li%22%20%3Cchrisl%40kernel.org%3E > 写到:
>
> On Tue, Mar 10, 2026 at 7:23 PM Hui Zhu <hui.zhu@linux.dev> wrote:
>
> >
> > From: Hui Zhu <zhuhui@kylinos.cn>
> >
> > The swap_cluster_alloc_table() function requires several locks to be held
> > by its callers: ci->lock, the per-CPU swap_cluster lock, and, for
> > non-solid-state devices (non-SWP_SOLIDSTATE), the si->global_cluster_lock.
> >
> > While most call paths (e.g., via cluster_alloc_swap_entry() or
> > alloc_swap_scan_list()) correctly acquire these locks before invocation,
> > the path through swap_reclaim_work() -> swap_reclaim_full_clusters() ->
> > isolate_lock_cluster() is distinct. This path operates exclusively on
> > si->full_clusters, where the swap allocation tables are guaranteed to be
> > already allocated. Consequently, isolate_lock_cluster() should never
> > trigger a call to swap_cluster_alloc_table() for these clusters.
> >
> > Strengthen the locking and state assertions to formalize these invariants:
> >
> > 1. Add a lockdep_assert_held() for si->global_cluster_lock in
> > swap_cluster_alloc_table() for non-SWP_SOLIDSTATE devices.
> > 2. Reorder existing lockdep assertions in swap_cluster_alloc_table() to
> > match the actual lock acquisition order (per-CPU lock, then global lock,
> > then cluster lock).
> > 3. Add a VM_WARN_ON_ONCE() in isolate_lock_cluster() to ensure that table
> > allocations are only attempted for clusters being isolated from the
> > free list. Attempting to allocate a table for a cluster from other
> > lists (like the full list during reclaim) indicates a violation of
> > subsystem invariants.
> >
> > These changes ensure locking consistency and help catch potential
> > synchronization or logic issues during development.
> >
> > Changelog:
> > v4:
> > According to the comments of Barry Song, remove redundant comment.
> > v3:
> > According to the comments of Kairui Song, squash patches and fix logic
> > bug in isolate_lock_cluster() where flags were cleared before check.
> > v2:
> > According to the comments of YoungJun Park, Kairui Song and Chris Li,
> > change acquire locks in swap_reclaim_work() to adds a VM_WARN_ON in
> > isolate_lock_cluster().
> > According to the comments of YoungJun Park, add code in patch 2 to Change
> > the order of lockdep_assert_held() to match the actual lock acquisition
> > order.
> >
> > Reviewed-by: Youngjun Park <youngjun.park@lge.com>
> > Reviewed-by: Barry Song <baohua@kernel.org>
> > Signed-off-by: Hui Zhu <zhuhui@kylinos.cn>
> >
> Acked-by: Chris Li <chrisl@kernel.org>
>
> >
> > ---
> > mm/swapfile.c | 7 ++++++-
> > 1 file changed, 6 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/swapfile.c b/mm/swapfile.c
> > index 94af29d1de88..e25cdb0046d8 100644
> > --- a/mm/swapfile.c
> > +++ b/mm/swapfile.c
> > @@ -476,8 +476,10 @@ swap_cluster_alloc_table(struct swap_info_struct *si,
> > * Only cluster isolation from the allocator does table allocation.
> > * Swap allocator uses percpu clusters and holds the local lock.
> > */
> > - lockdep_assert_held(&ci->lock);
> > lockdep_assert_held(&this_cpu_ptr(&percpu_swap_cluster)->lock);
> > + if (!(si->flags & SWP_SOLIDSTATE))
> > + lockdep_assert_held(&si->global_cluster_lock);
> > + lockdep_assert_held(&ci->lock);
> >
> > /* The cluster must be free and was just isolated from the free list. */
> > VM_WARN_ON_ONCE(ci->flags || !cluster_is_empty(ci));
> > @@ -577,6 +579,7 @@ static struct swap_cluster_info *isolate_lock_cluster(
> > struct swap_info_struct *si, struct list_head *list)
> > {
> > struct swap_cluster_info *ci, *found = NULL;
> > + u8 flags;
> >
> Nit pick: consider initializing the value. The flags assignment occurs
> in a conditional block. The compiler might or might not realize the
> "flags" assigned only if "found" is also assigned, and might complain
> that flags can be used without initialization.
>
> >
> > spin_lock(&si->lock);
> > list_for_each_entry(ci, list, list) {
> > @@ -589,6 +592,7 @@ static struct swap_cluster_info *isolate_lock_cluster(
> > ci->flags != CLUSTER_FLAG_FULL);
> >
> > list_del(&ci->list);
> > + flags = ci->flags;
> >
> If VM debug is disabled, this variable is not used after its value is
> assigned. Please test it with gcc and llvm (VM debug disabled) to
> ensure it doesn't generate any warnings. I don't expect it to be, I
> just want to make sure.
After adding the initialization code, I turned off VM_DEBUG and compiled
it with both clang18 and gcc13. No warnings during compilation.
Best,
Hui
>
> Chris
>
> >
> > ci->flags = CLUSTER_FLAG_NONE;
> > found = ci;
> > break;
> > @@ -597,6 +601,7 @@ static struct swap_cluster_info *isolate_lock_cluster(
> >
> > if (found && !cluster_table_is_alloced(found)) {
> > /* Only an empty free cluster's swap table can be freed. */
> > + VM_WARN_ON_ONCE(flags != CLUSTER_FLAG_FREE);
> > VM_WARN_ON_ONCE(list != &si->free_clusters);
> > VM_WARN_ON_ONCE(!cluster_is_empty(found));
> > return swap_cluster_alloc_table(si, found);
> > --
> > 2.43.0
> >
>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-03-12 2:15 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-11 2:22 [PATCH v4] mm/swap: strengthen locking assertions and invariants in cluster allocation Hui Zhu
2026-03-11 17:34 ` Chris Li
2026-03-12 2:13 ` Geliang Tang
2026-03-12 2:15 ` hui.zhu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox