linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] xfs: don't assume perags are initialised when trimming AGs
@ 2025-04-30 23:27 Dave Chinner
  2025-05-01  4:37 ` Darrick J. Wong
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Dave Chinner @ 2025-04-30 23:27 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

When running fstrim immediately after mounting a V4 filesystem,
the fstrim fails to trim all the free space in the filesystem. It
only trims the first extent in the by-size free space tree in each
AG and then returns. If a second fstrim is then run, it runs
correctly and the entire free space in the filesystem is iterated
and discarded correctly.

The problem lies in the setup of the trim cursor - it assumes that
pag->pagf_longest is valid without either reading the AGF first or
checking if xfs_perag_initialised_agf(pag) is true or not.

As a result, when a filesystem is mounted without reading the AGF
(e.g. a clean mount on a v4 filesystem) and the first operation is a
fstrim call, pag->pagf_longest is zero and so the free extent search
starts at the wrong end of the by-size btree and exits after
discarding the first record in the tree.

Fix this by deferring the initialisation of tcur->count to after
we have locked the AGF and guaranteed that the perag is properly
initialised. We trigger this on tcur->count == 0 after locking the
AGF, as this will only occur on the first call to
xfs_trim_gather_extents() for each AG. If we need to iterate,
tcur->count will be set to the length of the record we need to
restart at, so we can use this to ensure we only sample a valid
pag->pagf_longest value for the iteration.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_discard.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_discard.c b/fs/xfs/xfs_discard.c
index c1a306268ae4..94d0873bcd62 100644
--- a/fs/xfs/xfs_discard.c
+++ b/fs/xfs/xfs_discard.c
@@ -167,6 +167,14 @@ xfs_discard_extents(
 	return error;
 }
 
+/*
+ * Care must be taken setting up the trim cursor as the perags may not have been
+ * initialised when the cursor is initialised. e.g. a clean mount which hasn't
+ * read in AGFs and the first operation run on the mounted fs is a trim. This
+ * can result in perag fields that aren't initialised until
+ * xfs_trim_gather_extents() calls xfs_alloc_read_agf() to lock down the AG for
+ * the free space search.
+ */
 struct xfs_trim_cur {
 	xfs_agblock_t	start;
 	xfs_extlen_t	count;
@@ -204,6 +212,14 @@ xfs_trim_gather_extents(
 	if (error)
 		goto out_trans_cancel;
 
+	/*
+	 * First time through tcur->count will not have been initialised as
+	 * pag->pagf_longest is not guaranteed to be valid before we read
+	 * the AGF buffer above.
+	 */
+	if (!tcur->count)
+		tcur->count = pag->pagf_longest;
+
 	if (tcur->by_bno) {
 		/* sub-AG discard request always starts at tcur->start */
 		cur = xfs_bnobt_init_cursor(mp, tp, agbp, pag);
@@ -350,7 +366,6 @@ xfs_trim_perag_extents(
 {
 	struct xfs_trim_cur	tcur = {
 		.start		= start,
-		.count		= pag->pagf_longest,
 		.end		= end,
 		.minlen		= minlen,
 	};
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] xfs: don't assume perags are initialised when trimming AGs
  2025-04-30 23:27 [PATCH] xfs: don't assume perags are initialised when trimming AGs Dave Chinner
@ 2025-05-01  4:37 ` Darrick J. Wong
  2025-05-01  7:25   ` Dave Chinner
  2025-05-01 15:06 ` Bill O'Donnell
  2025-05-09 11:43 ` Carlos Maiolino
  2 siblings, 1 reply; 7+ messages in thread
From: Darrick J. Wong @ 2025-05-01  4:37 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Thu, May 01, 2025 at 09:27:24AM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> When running fstrim immediately after mounting a V4 filesystem,
> the fstrim fails to trim all the free space in the filesystem. It
> only trims the first extent in the by-size free space tree in each
> AG and then returns. If a second fstrim is then run, it runs
> correctly and the entire free space in the filesystem is iterated
> and discarded correctly.
> 
> The problem lies in the setup of the trim cursor - it assumes that
> pag->pagf_longest is valid without either reading the AGF first or
> checking if xfs_perag_initialised_agf(pag) is true or not.
> 
> As a result, when a filesystem is mounted without reading the AGF
> (e.g. a clean mount on a v4 filesystem) and the first operation is a
> fstrim call, pag->pagf_longest is zero and so the free extent search
> starts at the wrong end of the by-size btree and exits after
> discarding the first record in the tree.
> 
> Fix this by deferring the initialisation of tcur->count to after
> we have locked the AGF and guaranteed that the perag is properly
> initialised. We trigger this on tcur->count == 0 after locking the
> AGF, as this will only occur on the first call to
> xfs_trim_gather_extents() for each AG. If we need to iterate,
> tcur->count will be set to the length of the record we need to
> restart at, so we can use this to ensure we only sample a valid
> pag->pagf_longest value for the iteration.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Makes sense to me.  Please add the following trailers on merge:

Cc: <stable@vger.kernel.org> # v6.10
Fixes: b0ffe661fab4b9 ("xfs: fix performance problems when fstrimming a subset of a fragmented AG")

Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>

--D

> ---
>  fs/xfs/xfs_discard.c | 17 ++++++++++++++++-
>  1 file changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_discard.c b/fs/xfs/xfs_discard.c
> index c1a306268ae4..94d0873bcd62 100644
> --- a/fs/xfs/xfs_discard.c
> +++ b/fs/xfs/xfs_discard.c
> @@ -167,6 +167,14 @@ xfs_discard_extents(
>  	return error;
>  }
>  
> +/*
> + * Care must be taken setting up the trim cursor as the perags may not have been
> + * initialised when the cursor is initialised. e.g. a clean mount which hasn't
> + * read in AGFs and the first operation run on the mounted fs is a trim. This
> + * can result in perag fields that aren't initialised until
> + * xfs_trim_gather_extents() calls xfs_alloc_read_agf() to lock down the AG for
> + * the free space search.
> + */
>  struct xfs_trim_cur {
>  	xfs_agblock_t	start;
>  	xfs_extlen_t	count;
> @@ -204,6 +212,14 @@ xfs_trim_gather_extents(
>  	if (error)
>  		goto out_trans_cancel;
>  
> +	/*
> +	 * First time through tcur->count will not have been initialised as
> +	 * pag->pagf_longest is not guaranteed to be valid before we read
> +	 * the AGF buffer above.
> +	 */
> +	if (!tcur->count)
> +		tcur->count = pag->pagf_longest;
> +
>  	if (tcur->by_bno) {
>  		/* sub-AG discard request always starts at tcur->start */
>  		cur = xfs_bnobt_init_cursor(mp, tp, agbp, pag);
> @@ -350,7 +366,6 @@ xfs_trim_perag_extents(
>  {
>  	struct xfs_trim_cur	tcur = {
>  		.start		= start,
> -		.count		= pag->pagf_longest,
>  		.end		= end,
>  		.minlen		= minlen,
>  	};
> -- 
> 2.45.2
> 
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xfs: don't assume perags are initialised when trimming AGs
  2025-05-01  4:37 ` Darrick J. Wong
@ 2025-05-01  7:25   ` Dave Chinner
  2025-05-05  7:42     ` Carlos Maiolino
  0 siblings, 1 reply; 7+ messages in thread
From: Dave Chinner @ 2025-05-01  7:25 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Apr 30, 2025 at 09:37:35PM -0700, Darrick J. Wong wrote:
> On Thu, May 01, 2025 at 09:27:24AM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > When running fstrim immediately after mounting a V4 filesystem,
> > the fstrim fails to trim all the free space in the filesystem. It
> > only trims the first extent in the by-size free space tree in each
> > AG and then returns. If a second fstrim is then run, it runs
> > correctly and the entire free space in the filesystem is iterated
> > and discarded correctly.
> > 
> > The problem lies in the setup of the trim cursor - it assumes that
> > pag->pagf_longest is valid without either reading the AGF first or
> > checking if xfs_perag_initialised_agf(pag) is true or not.
> > 
> > As a result, when a filesystem is mounted without reading the AGF
> > (e.g. a clean mount on a v4 filesystem) and the first operation is a
> > fstrim call, pag->pagf_longest is zero and so the free extent search
> > starts at the wrong end of the by-size btree and exits after
> > discarding the first record in the tree.
> > 
> > Fix this by deferring the initialisation of tcur->count to after
> > we have locked the AGF and guaranteed that the perag is properly
> > initialised. We trigger this on tcur->count == 0 after locking the
> > AGF, as this will only occur on the first call to
> > xfs_trim_gather_extents() for each AG. If we need to iterate,
> > tcur->count will be set to the length of the record we need to
> > restart at, so we can use this to ensure we only sample a valid
> > pag->pagf_longest value for the iteration.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> 
> Makes sense to me.  Please add the following trailers on merge:
> 
> Cc: <stable@vger.kernel.org> # v6.10
> Fixes: b0ffe661fab4b9 ("xfs: fix performance problems when fstrimming a subset of a fragmented AG")

Those tags are incorrect. The regression was introduced by commit
89cfa899608f ("xfs: reduce AGF hold times during fstrim operations")
a few releases before that change....

> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>

Thanks.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xfs: don't assume perags are initialised when trimming AGs
  2025-04-30 23:27 [PATCH] xfs: don't assume perags are initialised when trimming AGs Dave Chinner
  2025-05-01  4:37 ` Darrick J. Wong
@ 2025-05-01 15:06 ` Bill O'Donnell
  2025-05-09 11:43 ` Carlos Maiolino
  2 siblings, 0 replies; 7+ messages in thread
From: Bill O'Donnell @ 2025-05-01 15:06 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Thu, May 01, 2025 at 09:27:24AM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> When running fstrim immediately after mounting a V4 filesystem,
> the fstrim fails to trim all the free space in the filesystem. It
> only trims the first extent in the by-size free space tree in each
> AG and then returns. If a second fstrim is then run, it runs
> correctly and the entire free space in the filesystem is iterated
> and discarded correctly.
> 
> The problem lies in the setup of the trim cursor - it assumes that
> pag->pagf_longest is valid without either reading the AGF first or
> checking if xfs_perag_initialised_agf(pag) is true or not.
> 
> As a result, when a filesystem is mounted without reading the AGF
> (e.g. a clean mount on a v4 filesystem) and the first operation is a
> fstrim call, pag->pagf_longest is zero and so the free extent search
> starts at the wrong end of the by-size btree and exits after
> discarding the first record in the tree.
> 
> Fix this by deferring the initialisation of tcur->count to after
> we have locked the AGF and guaranteed that the perag is properly
> initialised. We trigger this on tcur->count == 0 after locking the
> AGF, as this will only occur on the first call to
> xfs_trim_gather_extents() for each AG. If we need to iterate,
> tcur->count will be set to the length of the record we need to
> restart at, so we can use this to ensure we only sample a valid
> pag->pagf_longest value for the iteration.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
Needs a "fixes" note in the description. Otherwise, lgtm.
Reviewed-by: Bill O'Donnell <bodonnel@redhat.com>


>  fs/xfs/xfs_discard.c | 17 ++++++++++++++++-
>  1 file changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_discard.c b/fs/xfs/xfs_discard.c
> index c1a306268ae4..94d0873bcd62 100644
> --- a/fs/xfs/xfs_discard.c
> +++ b/fs/xfs/xfs_discard.c
> @@ -167,6 +167,14 @@ xfs_discard_extents(
>  	return error;
>  }
>  
> +/*
> + * Care must be taken setting up the trim cursor as the perags may not have been
> + * initialised when the cursor is initialised. e.g. a clean mount which hasn't
> + * read in AGFs and the first operation run on the mounted fs is a trim. This
> + * can result in perag fields that aren't initialised until
> + * xfs_trim_gather_extents() calls xfs_alloc_read_agf() to lock down the AG for
> + * the free space search.
> + */
>  struct xfs_trim_cur {
>  	xfs_agblock_t	start;
>  	xfs_extlen_t	count;
> @@ -204,6 +212,14 @@ xfs_trim_gather_extents(
>  	if (error)
>  		goto out_trans_cancel;
>  
> +	/*
> +	 * First time through tcur->count will not have been initialised as
> +	 * pag->pagf_longest is not guaranteed to be valid before we read
> +	 * the AGF buffer above.
> +	 */
> +	if (!tcur->count)
> +		tcur->count = pag->pagf_longest;
> +
>  	if (tcur->by_bno) {
>  		/* sub-AG discard request always starts at tcur->start */
>  		cur = xfs_bnobt_init_cursor(mp, tp, agbp, pag);
> @@ -350,7 +366,6 @@ xfs_trim_perag_extents(
>  {
>  	struct xfs_trim_cur	tcur = {
>  		.start		= start,
> -		.count		= pag->pagf_longest,
>  		.end		= end,
>  		.minlen		= minlen,
>  	};
> -- 
> 2.45.2
> 
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xfs: don't assume perags are initialised when trimming AGs
  2025-05-01  7:25   ` Dave Chinner
@ 2025-05-05  7:42     ` Carlos Maiolino
  2025-05-05 15:07       ` Darrick J. Wong
  0 siblings, 1 reply; 7+ messages in thread
From: Carlos Maiolino @ 2025-05-05  7:42 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Darrick J. Wong, linux-xfs

On Thu, May 01, 2025 at 05:25:16PM +1000, Dave Chinner wrote:
> On Wed, Apr 30, 2025 at 09:37:35PM -0700, Darrick J. Wong wrote:
> > On Thu, May 01, 2025 at 09:27:24AM +1000, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > >
> > > When running fstrim immediately after mounting a V4 filesystem,
> > > the fstrim fails to trim all the free space in the filesystem. It
> > > only trims the first extent in the by-size free space tree in each
> > > AG and then returns. If a second fstrim is then run, it runs
> > > correctly and the entire free space in the filesystem is iterated
> > > and discarded correctly.
> > >
> > > The problem lies in the setup of the trim cursor - it assumes that
> > > pag->pagf_longest is valid without either reading the AGF first or
> > > checking if xfs_perag_initialised_agf(pag) is true or not.
> > >
> > > As a result, when a filesystem is mounted without reading the AGF
> > > (e.g. a clean mount on a v4 filesystem) and the first operation is a
> > > fstrim call, pag->pagf_longest is zero and so the free extent search
> > > starts at the wrong end of the by-size btree and exits after
> > > discarding the first record in the tree.
> > >
> > > Fix this by deferring the initialisation of tcur->count to after
> > > we have locked the AGF and guaranteed that the perag is properly
> > > initialised. We trigger this on tcur->count == 0 after locking the
> > > AGF, as this will only occur on the first call to
> > > xfs_trim_gather_extents() for each AG. If we need to iterate,
> > > tcur->count will be set to the length of the record we need to
> > > restart at, so we can use this to ensure we only sample a valid
> > > pag->pagf_longest value for the iteration.
> > >
> > > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> >
> > Makes sense to me.  Please add the following trailers on merge:
> >
> > Cc: <stable@vger.kernel.org> # v6.10
> > Fixes: b0ffe661fab4b9 ("xfs: fix performance problems when fstrimming a subset of a fragmented AG")
> 
> Those tags are incorrect. The regression was introduced by commit
> 89cfa899608f ("xfs: reduce AGF hold times during fstrim operations")
> a few releases before that change....

This sounds right, introduced in v6.6.

Darrick, I'll add a stable #v6.6 tag then.

> 
> > Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
> 
> Thanks.
> 
> -Dave.
> --
> Dave Chinner
> david@fromorbit.com
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xfs: don't assume perags are initialised when trimming AGs
  2025-05-05  7:42     ` Carlos Maiolino
@ 2025-05-05 15:07       ` Darrick J. Wong
  0 siblings, 0 replies; 7+ messages in thread
From: Darrick J. Wong @ 2025-05-05 15:07 UTC (permalink / raw)
  To: Carlos Maiolino; +Cc: Dave Chinner, linux-xfs

On Mon, May 05, 2025 at 09:42:27AM +0200, Carlos Maiolino wrote:
> On Thu, May 01, 2025 at 05:25:16PM +1000, Dave Chinner wrote:
> > On Wed, Apr 30, 2025 at 09:37:35PM -0700, Darrick J. Wong wrote:
> > > On Thu, May 01, 2025 at 09:27:24AM +1000, Dave Chinner wrote:
> > > > From: Dave Chinner <dchinner@redhat.com>
> > > >
> > > > When running fstrim immediately after mounting a V4 filesystem,
> > > > the fstrim fails to trim all the free space in the filesystem. It
> > > > only trims the first extent in the by-size free space tree in each
> > > > AG and then returns. If a second fstrim is then run, it runs
> > > > correctly and the entire free space in the filesystem is iterated
> > > > and discarded correctly.
> > > >
> > > > The problem lies in the setup of the trim cursor - it assumes that
> > > > pag->pagf_longest is valid without either reading the AGF first or
> > > > checking if xfs_perag_initialised_agf(pag) is true or not.
> > > >
> > > > As a result, when a filesystem is mounted without reading the AGF
> > > > (e.g. a clean mount on a v4 filesystem) and the first operation is a
> > > > fstrim call, pag->pagf_longest is zero and so the free extent search
> > > > starts at the wrong end of the by-size btree and exits after
> > > > discarding the first record in the tree.
> > > >
> > > > Fix this by deferring the initialisation of tcur->count to after
> > > > we have locked the AGF and guaranteed that the perag is properly
> > > > initialised. We trigger this on tcur->count == 0 after locking the
> > > > AGF, as this will only occur on the first call to
> > > > xfs_trim_gather_extents() for each AG. If we need to iterate,
> > > > tcur->count will be set to the length of the record we need to
> > > > restart at, so we can use this to ensure we only sample a valid
> > > > pag->pagf_longest value for the iteration.
> > > >
> > > > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > >
> > > Makes sense to me.  Please add the following trailers on merge:
> > >
> > > Cc: <stable@vger.kernel.org> # v6.10
> > > Fixes: b0ffe661fab4b9 ("xfs: fix performance problems when fstrimming a subset of a fragmented AG")
> > 
> > Those tags are incorrect. The regression was introduced by commit
> > 89cfa899608f ("xfs: reduce AGF hold times during fstrim operations")
> > a few releases before that change....
> 
> This sounds right, introduced in v6.6.
> 
> Darrick, I'll add a stable #v6.6 tag then.

Ok thanks.

--D

> > 
> > > Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
> > 
> > Thanks.
> > 
> > -Dave.
> > --
> > Dave Chinner
> > david@fromorbit.com
> > 
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xfs: don't assume perags are initialised when trimming AGs
  2025-04-30 23:27 [PATCH] xfs: don't assume perags are initialised when trimming AGs Dave Chinner
  2025-05-01  4:37 ` Darrick J. Wong
  2025-05-01 15:06 ` Bill O'Donnell
@ 2025-05-09 11:43 ` Carlos Maiolino
  2 siblings, 0 replies; 7+ messages in thread
From: Carlos Maiolino @ 2025-05-09 11:43 UTC (permalink / raw)
  To: linux-xfs, Dave Chinner

On Thu, 01 May 2025 09:27:24 +1000, Dave Chinner wrote:
> When running fstrim immediately after mounting a V4 filesystem,
> the fstrim fails to trim all the free space in the filesystem. It
> only trims the first extent in the by-size free space tree in each
> AG and then returns. If a second fstrim is then run, it runs
> correctly and the entire free space in the filesystem is iterated
> and discarded correctly.
> 
> [...]

Applied to for-next, thanks!

[1/1] xfs: don't assume perags are initialised when trimming AGs
      commit: 23be716b1c4f3f3a6c00ee38d51a57ef7db9ef7d

Best regards,
-- 
Carlos Maiolino <cem@kernel.org>


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-05-09 11:43 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-30 23:27 [PATCH] xfs: don't assume perags are initialised when trimming AGs Dave Chinner
2025-05-01  4:37 ` Darrick J. Wong
2025-05-01  7:25   ` Dave Chinner
2025-05-05  7:42     ` Carlos Maiolino
2025-05-05 15:07       ` Darrick J. Wong
2025-05-01 15:06 ` Bill O'Donnell
2025-05-09 11:43 ` Carlos Maiolino

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).