From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out-180.mta1.migadu.com (out-180.mta1.migadu.com [95.215.58.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9867C3382D3 for ; Thu, 12 Mar 2026 02:15:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.180 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773281748; cv=none; b=i8e5sGRiYQtLp9ugoYDmnMOVa6Na/u9TtfFdnk0nKhGLs3ebScKm+/pe4ozyxNtlYJ3C7xqHyyamePf9a/QnzwngfOo6yLQEzxUmnHo0j0mPxe4xh3GAwKd436i30iOsmYnTg7IUzIKZxv1w9p0eHpPCNMgOO49gL3fzhqbZAeY= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773281748; c=relaxed/simple; bh=ZpbEq0rta5pCDesJZuyK+KvTW8EHCPMy3DqkPqu1N9k=; h=MIME-Version:Date:Content-Type:From:Message-ID:Subject:To:Cc: In-Reply-To:References; b=g8exUKSRIiYTG8Nb+lTqChTtyJ56cDHFl9N3x8T7tAqkyusUvMkffri2MUFa3SlnGoVmeI0YuphXJxS7FNH5A2Ab/0Ter5w4oKWGwxxm+vXpmlUHwnurtutI0BLiQPkkdeaEbUP690dNTWemFXDj1hm2ALEnieSWES/HpJP+3Pw= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=xpC3+Mpr; arc=none smtp.client-ip=95.215.58.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="xpC3+Mpr" Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1773281742; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TIQkQUVHBjAH9cVwEx7XxqiyBXCERXLZwu9xPGpZC6c=; b=xpC3+Mpr1Ozny7Aakt7MAy8d4M+vxecK8xsq2pzIublz4HefX9SiUbP81XJjoWoQ8xsWN3 mmWI7LQ3x3GRqJzD76tyMj5dR9GFlBjSZZeq8yMHPUj1eskmQLmtBwczJ969/YlyxWlBdM KWdVzmpjbJj4xPrXZ5bm7dxEQBdt9HQ= Date: Thu, 12 Mar 2026 02:15:34 +0000 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: hui.zhu@linux.dev Message-ID: <1e56279445806f6e1f0ff5ac142b6efb9074dfa5@linux.dev> TLS-Required: No Subject: Re: [PATCH v4] mm/swap: strengthen locking assertions and invariants in cluster allocation To: "Chris Li" Cc: "Andrew Morton" , "Kairui Song" , "Kemeng Shi" , "Nhat Pham" , "Baoquan He" , "Barry Song" , "YoungJun Park" , linux-mm@kvack.org, linux-kernel@vger.kernel.org, "Hui Zhu" In-Reply-To: References: <20260311022241.177801-1-hui.zhu@linux.dev> X-Migadu-Flow: FLOW_OUT 2026=E5=B9=B43=E6=9C=8812=E6=97=A5 01:34, "Chris Li" =E5=86=99=E5=88=B0: >=20 >=20On Tue, Mar 10, 2026 at 7:23 PM Hui Zhu wrote: >=20 >=20>=20 >=20> From: Hui Zhu > >=20 >=20> The swap_cluster_alloc_table() function requires several locks to = be held > > by its callers: ci->lock, the per-CPU swap_cluster lock, and, for > > non-solid-state devices (non-SWP_SOLIDSTATE), the si->global_cluster= _lock. > >=20 >=20> While most call paths (e.g., via cluster_alloc_swap_entry() or > > alloc_swap_scan_list()) correctly acquire these locks before invocat= ion, > > the path through swap_reclaim_work() -> swap_reclaim_full_clusters()= -> > > isolate_lock_cluster() is distinct. This path operates exclusively o= n > > si->full_clusters, where the swap allocation tables are guaranteed t= o be > > already allocated. Consequently, isolate_lock_cluster() should never > > trigger a call to swap_cluster_alloc_table() for these clusters. > >=20 >=20> Strengthen the locking and state assertions to formalize these inv= ariants: > >=20 >=20> 1. Add a lockdep_assert_held() for si->global_cluster_lock in > > swap_cluster_alloc_table() for non-SWP_SOLIDSTATE devices. > > 2. Reorder existing lockdep assertions in swap_cluster_alloc_table()= to > > match the actual lock acquisition order (per-CPU lock, then global l= ock, > > then cluster lock). > > 3. Add a VM_WARN_ON_ONCE() in isolate_lock_cluster() to ensure that = table > > allocations are only attempted for clusters being isolated from the > > free list. Attempting to allocate a table for a cluster from other > > lists (like the full list during reclaim) indicates a violation of > > subsystem invariants. > >=20 >=20> These changes ensure locking consistency and help catch potential > > synchronization or logic issues during development. > >=20 >=20> Changelog: > > v4: > > According to the comments of Barry Song, remove redundant comment. > > v3: > > According to the comments of Kairui Song, squash patches and fix log= ic > > bug in isolate_lock_cluster() where flags were cleared before check. > > v2: > > According to the comments of YoungJun Park, Kairui Song and Chris Li= , > > change acquire locks in swap_reclaim_work() to adds a VM_WARN_ON in > > isolate_lock_cluster(). > > According to the comments of YoungJun Park, add code in patch 2 to C= hange > > the order of lockdep_assert_held() to match the actual lock acquisit= ion > > order. > >=20 >=20> Reviewed-by: Youngjun Park > > Reviewed-by: Barry Song > > Signed-off-by: Hui Zhu > >=20 >=20Acked-by: Chris Li >=20 >=20>=20 >=20> --- > > mm/swapfile.c | 7 ++++++- > > 1 file changed, 6 insertions(+), 1 deletion(-) > >=20 >=20> diff --git a/mm/swapfile.c b/mm/swapfile.c > > index 94af29d1de88..e25cdb0046d8 100644 > > --- a/mm/swapfile.c > > +++ b/mm/swapfile.c > > @@ -476,8 +476,10 @@ swap_cluster_alloc_table(struct swap_info_struc= t *si, > > * Only cluster isolation from the allocator does table allocation. > > * Swap allocator uses percpu clusters and holds the local lock. > > */ > > - lockdep_assert_held(&ci->lock); > > lockdep_assert_held(&this_cpu_ptr(&percpu_swap_cluster)->lock); > > + if (!(si->flags & SWP_SOLIDSTATE)) > > + lockdep_assert_held(&si->global_cluster_lock); > > + lockdep_assert_held(&ci->lock); > >=20 >=20> /* The cluster must be free and was just isolated from the free li= st. */ > > VM_WARN_ON_ONCE(ci->flags || !cluster_is_empty(ci)); > > @@ -577,6 +579,7 @@ static struct swap_cluster_info *isolate_lock_cl= uster( > > struct swap_info_struct *si, struct list_head *list) > > { > > struct swap_cluster_info *ci, *found =3D NULL; > > + u8 flags; > >=20 >=20Nit pick: consider initializing the value. The flags assignment occur= s > in a conditional block. The compiler might or might not realize the > "flags" assigned only if "found" is also assigned, and might complain > that flags can be used without initialization. >=20 >=20>=20 >=20> spin_lock(&si->lock); > > list_for_each_entry(ci, list, list) { > > @@ -589,6 +592,7 @@ static struct swap_cluster_info *isolate_lock_cl= uster( > > ci->flags !=3D CLUSTER_FLAG_FULL); > >=20 >=20> list_del(&ci->list); > > + flags =3D ci->flags; > >=20 >=20If VM debug is disabled, this variable is not used after its value is > assigned. Please test it with gcc and llvm (VM debug disabled) to > ensure it doesn't generate any warnings. I don't expect it to be, I > just want to make sure. After adding the initialization code, I turned off VM_DEBUG and compiled it with both clang18 and gcc13. No warnings during compilation. Best, Hui >=20 >=20Chris >=20 >=20>=20 >=20> ci->flags =3D CLUSTER_FLAG_NONE; > > found =3D ci; > > break; > > @@ -597,6 +601,7 @@ static struct swap_cluster_info *isolate_lock_cl= uster( > >=20 >=20> if (found && !cluster_table_is_alloced(found)) { > > /* Only an empty free cluster's swap table can be freed. */ > > + VM_WARN_ON_ONCE(flags !=3D CLUSTER_FLAG_FREE); > > VM_WARN_ON_ONCE(list !=3D &si->free_clusters); > > VM_WARN_ON_ONCE(!cluster_is_empty(found)); > > return swap_cluster_alloc_table(si, found); > > -- > > 2.43.0 > > >