[PATCH] mm, swap: free the cluster extend table on teardown

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] mm, swap: free the cluster extend table on teardown
@ 2026-06-02 22:23 David Carlier
  2026-06-03  2:41 ` Kairui Song
  2026-06-03 20:43 ` Andrew Morton
  0 siblings, 2 replies; 4+ messages in thread
From: David Carlier @ 2026-06-02 22:23 UTC (permalink / raw)
  To: akpm
  Cc: linux-mm, linux-kernel, David Carlier,
	syzbot+deedf22929084640666f, stable, Chris Li, Kairui Song,
	Kemeng Shi, Nhat Pham, Baoquan He, Barry Song, Youngjun Park

swap_cluster_free_table() frees every per-cluster side table but
ci->extend_table. That table is only released by
swap_extend_table_try_free(), which the teardown path never calls, so a
cluster can be freed with an extend table still attached.

It can also linger while the cluster is live. swap_dup_entries_cluster()
drops the lock to allocate an extend table when a slot reaches
SWP_TB_COUNT_MAX - 1, then retries. If the count dropped in the meantime,
the retry takes the normal path and leaves the table behind, all entries
zero; only the failure path frees it.

Since a swap_cluster_info is reused in place and swap_extend_table_alloc()
skips allocation when ci->extend_table is set, the next user of the
cluster inherits the stale table and its leftover counts, corrupting the
swap count of any slot that overflows. CONFIG_DEBUG_VM catches the
dangling table in swap_cluster_assert_empty(); otherwise it is silent.

Free it in swap_cluster_free_table(), and also on the
swap_dup_entries_cluster() success path to match the failure path.

Reported-by: syzbot+deedf22929084640666f@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=deedf22929084640666f
Fixes: 0d6af9bcf383 ("mm, swap: use the swap table to track the swap count")
Cc: <stable@vger.kernel.org>
Signed-off-by: David Carlier <devnexen@gmail.com>
---
 mm/swapfile.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 615d90867111..a69a26aec4c0 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -432,6 +432,9 @@ static void swap_cluster_free_table(struct swap_cluster_info *ci)
 	ci->zero_bitmap = NULL;
 #endif

+	kfree(ci->extend_table);
+	ci->extend_table = NULL;
+
 	table = (struct swap_table *)rcu_access_pointer(ci->table);
 	if (!table)
 		return;
@@ -1711,6 +1714,7 @@ static int swap_dup_entries_cluster(struct swap_info_struct *si,
 			goto failed;
 		}
 	} while (++ci_off < ci_end);
+	swap_extend_table_try_free(ci);
 	swap_cluster_unlock(ci);
 	return 0;
 failed:
-- 
2.53.0

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] mm, swap: free the cluster extend table on teardown
  2026-06-02 22:23 [PATCH] mm, swap: free the cluster extend table on teardown David Carlier
@ 2026-06-03  2:41 ` Kairui Song
  2026-06-03 20:51   ` David CARLIER
  2026-06-03 20:43 ` Andrew Morton
  1 sibling, 1 reply; 4+ messages in thread
From: Kairui Song @ 2026-06-03  2:41 UTC (permalink / raw)
  To: David Carlier
  Cc: akpm, linux-mm, linux-kernel, syzbot+deedf22929084640666f, stable,
	Chris Li, Kemeng Shi, Nhat Pham, Baoquan He, Barry Song,
	Youngjun Park

On Wed, Jun 3, 2026 at 6:27 AM David Carlier <devnexen@gmail.com> wrote:
>
> swap_cluster_free_table() frees every per-cluster side table but
> ci->extend_table. That table is only released by
> swap_extend_table_try_free(), which the teardown path never calls, so a
> cluster can be freed with an extend table still attached.
>
> It can also linger while the cluster is live. swap_dup_entries_cluster()
> drops the lock to allocate an extend table when a slot reaches
> SWP_TB_COUNT_MAX - 1, then retries. If the count dropped in the meantime,
> the retry takes the normal path and leaves the table behind, all entries
> zero; only the failure path frees it.
>
> Since a swap_cluster_info is reused in place and swap_extend_table_alloc()
> skips allocation when ci->extend_table is set, the next user of the
> cluster inherits the stale table and its leftover counts, corrupting the
> swap count of any slot that overflows. CONFIG_DEBUG_VM catches the

There won't be a corruption, extend_table is all zero at this point,
the leak on swapoff is real though.

> dangling table in swap_cluster_assert_empty(); otherwise it is silent.
>
> Free it in swap_cluster_free_table(), and also on the
> swap_dup_entries_cluster() success path to match the failure path.
>
> Reported-by: syzbot+deedf22929084640666f@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=deedf22929084640666f
> Fixes: 0d6af9bcf383 ("mm, swap: use the swap table to track the swap count")
> Cc: <stable@vger.kernel.org>
> Signed-off-by: David Carlier <devnexen@gmail.com>
> ---
>  mm/swapfile.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 615d90867111..a69a26aec4c0 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -432,6 +432,9 @@ static void swap_cluster_free_table(struct swap_cluster_info *ci)
>         ci->zero_bitmap = NULL;
>  #endif
>
> +       kfree(ci->extend_table);
> +       ci->extend_table = NULL;
> +

Still a bit too late to avoid the WARN? The WARN is already triggered
at this point, swap_cluster_free_table is called after
swap_cluster_assert_empty.

>         table = (struct swap_table *)rcu_access_pointer(ci->table);
>         if (!table)
>                 return;
> @@ -1711,6 +1714,7 @@ static int swap_dup_entries_cluster(struct swap_info_struct *si,
>                         goto failed;
>                 }
>         } while (++ci_off < ci_end);
> +       swap_extend_table_try_free(ci);
>         swap_cluster_unlock(ci);
>         return 0;
>  failed:
> --
> 2.53.0

I think we have already fixed this?
https://lore.kernel.org/all/6a1eac8e.fbc46276.3c3783.0008.GAE@google.com/T/


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] mm, swap: free the cluster extend table on teardown
  2026-06-02 22:23 [PATCH] mm, swap: free the cluster extend table on teardown David Carlier
  2026-06-03  2:41 ` Kairui Song
@ 2026-06-03 20:43 ` Andrew Morton
  1 sibling, 0 replies; 4+ messages in thread
From: Andrew Morton @ 2026-06-03 20:43 UTC (permalink / raw)
  To: David Carlier
  Cc: linux-mm, linux-kernel, syzbot+deedf22929084640666f, stable,
	Chris Li, Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He,
	Barry Song, Youngjun Park

On Tue,  2 Jun 2026 23:23:57 +0100 David Carlier <devnexen@gmail.com> wrote:

> swap_cluster_free_table() frees every per-cluster side table but
> ci->extend_table. That table is only released by
> swap_extend_table_try_free(), which the teardown path never calls, so a
> cluster can be freed with an extend table still attached.
> 
> It can also linger while the cluster is live. swap_dup_entries_cluster()
> drops the lock to allocate an extend table when a slot reaches
> SWP_TB_COUNT_MAX - 1, then retries. If the count dropped in the meantime,
> the retry takes the normal path and leaves the table behind, all entries
> zero; only the failure path frees it.
> 
> Since a swap_cluster_info is reused in place and swap_extend_table_alloc()
> skips allocation when ci->extend_table is set, the next user of the
> cluster inherits the stale table and its leftover counts, corrupting the
> swap count of any slot that overflows. CONFIG_DEBUG_VM catches the
> dangling table in swap_cluster_assert_empty(); otherwise it is silent.
> 
> Free it in swap_cluster_free_table(), and also on the
> swap_dup_entries_cluster() success path to match the failure path.

This all sounds rather horrid.  We have no description of how this all
manifests for the user, but I assume "badly"?

> Reported-by: syzbot+deedf22929084640666f@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=deedf22929084640666f
> Fixes: 0d6af9bcf383 ("mm, swap: use the swap table to track the swap count")
> Cc: <stable@vger.kernel.org>

First merged in 7.1-rc1 so no cc:stable should be needed, if we upstream a fix
promptly.

> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -432,6 +432,9 @@ static void swap_cluster_free_table(struct swap_cluster_info *ci)
>  	ci->zero_bitmap = NULL;
>  #endif
>  
> +	kfree(ci->extend_table);
> +	ci->extend_table = NULL;
> +
>  	table = (struct swap_table *)rcu_access_pointer(ci->table);
>  	if (!table)
>  		return;
> @@ -1711,6 +1714,7 @@ static int swap_dup_entries_cluster(struct swap_info_struct *si,
>  			goto failed;
>  		}
>  	} while (++ci_off < ci_end);
> +	swap_extend_table_try_free(ci);
>  	swap_cluster_unlock(ci);
>  	return 0;
>  failed:

AI reviw flagged a possible issue:
	https://sashiko.dev/#/patchset/20260602222358.49061-1-devnexen@gmail.com



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] mm, swap: free the cluster extend table on teardown
  2026-06-03  2:41 ` Kairui Song
@ 2026-06-03 20:51   ` David CARLIER
  0 siblings, 0 replies; 4+ messages in thread
From: David CARLIER @ 2026-06-03 20:51 UTC (permalink / raw)
  To: Kairui Song
  Cc: akpm, linux-mm, linux-kernel, syzbot+deedf22929084640666f, stable,
	Chris Li, Kemeng Shi, Nhat Pham, Baoquan He, Barry Song,
	Youngjun Park

On Wed, 3 Jun 2026 at 03:42, Kairui Song <ryncsn@gmail.com> wrote:
>
> On Wed, Jun 3, 2026 at 6:27 AM David Carlier <devnexen@gmail.com> wrote:
> >
> > swap_cluster_free_table() frees every per-cluster side table but
> > ci->extend_table. That table is only released by
> > swap_extend_table_try_free(), which the teardown path never calls, so a
> > cluster can be freed with an extend table still attached.
> >
> > It can also linger while the cluster is live. swap_dup_entries_cluster()
> > drops the lock to allocate an extend table when a slot reaches
> > SWP_TB_COUNT_MAX - 1, then retries. If the count dropped in the meantime,
> > the retry takes the normal path and leaves the table behind, all entries
> > zero; only the failure path frees it.
> >
> > Since a swap_cluster_info is reused in place and swap_extend_table_alloc()
> > skips allocation when ci->extend_table is set, the next user of the
> > cluster inherits the stale table and its leftover counts, corrupting the
> > swap count of any slot that overflows. CONFIG_DEBUG_VM catches the
>
> There won't be a corruption, extend_table is all zero at this point,
> the leak on swapoff is real though.
>
> > dangling table in swap_cluster_assert_empty(); otherwise it is silent.
> >
> > Free it in swap_cluster_free_table(), and also on the
> > swap_dup_entries_cluster() success path to match the failure path.
> >
> > Reported-by: syzbot+deedf22929084640666f@syzkaller.appspotmail.com
> > Closes: https://syzkaller.appspot.com/bug?extid=deedf22929084640666f
> > Fixes: 0d6af9bcf383 ("mm, swap: use the swap table to track the swap count")
> > Cc: <stable@vger.kernel.org>
> > Signed-off-by: David Carlier <devnexen@gmail.com>
> > ---
> >  mm/swapfile.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/mm/swapfile.c b/mm/swapfile.c
> > index 615d90867111..a69a26aec4c0 100644
> > --- a/mm/swapfile.c
> > +++ b/mm/swapfile.c
> > @@ -432,6 +432,9 @@ static void swap_cluster_free_table(struct swap_cluster_info *ci)
> >         ci->zero_bitmap = NULL;
> >  #endif
> >
> > +       kfree(ci->extend_table);
> > +       ci->extend_table = NULL;
> > +
>
> Still a bit too late to avoid the WARN? The WARN is already triggered
> at this point, swap_cluster_free_table is called after
> swap_cluster_assert_empty.
>
> >         table = (struct swap_table *)rcu_access_pointer(ci->table);
> >         if (!table)
> >                 return;
> > @@ -1711,6 +1714,7 @@ static int swap_dup_entries_cluster(struct swap_info_struct *si,
> >                         goto failed;
> >                 }
> >         } while (++ci_off < ci_end);
> > +       swap_extend_table_try_free(ci);
> >         swap_cluster_unlock(ci);
> >         return 0;
> >  failed:
> > --
> > 2.53.0
>
> I think we have already fixed this?
> https://lore.kernel.org/all/6a1eac8e.fbc46276.3c3783.0008.GAE@google.com/T/


 Thanks for the review.

  Agreed on all counts. 0475fde0f68d already addresses both the warning
  and the swapoff leak at the allocation site, so this patch is
  redundant. Please drop it.

  Andrew, you're right that no cc:stable was warranted here.

Cheers.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-06-03 20:51 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-02 22:23 [PATCH] mm, swap: free the cluster extend table on teardown David Carlier
2026-06-03  2:41 ` Kairui Song
2026-06-03 20:51   ` David CARLIER
2026-06-03 20:43 ` Andrew Morton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.