* [PATCH] xfs: avoid redundant AGFL buffer invalidation
@ 2024-05-27 6:10 Gao Xiang
2024-05-27 8:59 ` Dave Chinner
0 siblings, 1 reply; 5+ messages in thread
From: Gao Xiang @ 2024-05-27 6:10 UTC (permalink / raw)
To: linux-xfs, Dave Chinner, Darrick J . Wong, Chandan Babu R; +Cc: LKML, Gao Xiang
Currently AGFL blocks can be filled from the following three sources:
- allocbt free blocks, as in xfs_allocbt_free_block();
- rmapbt free blocks, as in xfs_rmapbt_free_block();
- refilled from freespace btrees, as in xfs_alloc_fix_freelist().
Originally, allocbt free blocks would be marked as stale only when they
put back in the general free space pool as Dave mentioned on IRC, "we
don't stale AGF metadata btree blocks when they are returned to the
AGFL .. but once they get put back in the general free space pool, we
have to make sure the buffers are marked stale as the next user of
those blocks might be user data...."
However, after commit ca250b1b3d71 ("xfs: invalidate allocbt blocks
moved to the free list") and commit edfd9dd54921 ("xfs: move buffer
invalidation to xfs_btree_free_block"), even allocbt / bmapbt free
blocks will be invalidated immediately since they may fail to pass
V5 format validation on writeback even writeback to free space would be
safe.
IOWs, IMHO currently there is actually no difference of free blocks
between AGFL freespace pool and the general free space pool. So let's
avoid extra redundant AGFL buffer invalidation, since otherwise we're
currently facing unnecessary xfs_log_force() due to xfs_trans_binval()
again on buffers already marked as stale before as below:
[ 333.507469] Call Trace:
[ 333.507862] xfs_buf_find+0x371/0x6a0
[ 333.508451] xfs_buf_get_map+0x3f/0x230
[ 333.509062] xfs_trans_get_buf_map+0x11a/0x280
[ 333.509751] xfs_free_agfl_block+0xa1/0xd0
[ 333.510403] xfs_agfl_free_finish_item+0x16e/0x1d0
[ 333.511157] xfs_defer_finish_noroll+0x1ef/0x5c0
[ 333.511871] xfs_defer_finish+0xc/0xa0
[ 333.512471] xfs_itruncate_extents_flags+0x18a/0x5e0
[ 333.513253] xfs_inactive_truncate+0xb8/0x130
[ 333.513930] xfs_inactive+0x223/0x270
And xfs_log_force() will take tens of milliseconds with AGF buffer
locked. It becomes an unnecessary long latency especially on our PMEM
devices with FSDAX enabled. Also fstests are passed with this patch.
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
fs/xfs/libxfs/xfs_alloc.c | 18 ++----------------
1 file changed, 2 insertions(+), 16 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 6cb8b2ddc541..a80d2a31252a 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -2432,22 +2432,8 @@ xfs_free_agfl_block(
struct xfs_buf *agbp,
struct xfs_owner_info *oinfo)
{
- int error;
- struct xfs_buf *bp;
-
- error = xfs_free_ag_extent(tp, agbp, agno, agbno, 1, oinfo,
- XFS_AG_RESV_AGFL);
- if (error)
- return error;
-
- error = xfs_trans_get_buf(tp, tp->t_mountp->m_ddev_targp,
- XFS_AGB_TO_DADDR(tp->t_mountp, agno, agbno),
- tp->t_mountp->m_bsize, 0, &bp);
- if (error)
- return error;
- xfs_trans_binval(tp, bp);
-
- return 0;
+ return xfs_free_ag_extent(tp, agbp, agno, agbno, 1, oinfo,
+ XFS_AG_RESV_AGFL);
}
/*
--
2.39.3
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: [PATCH] xfs: avoid redundant AGFL buffer invalidation 2024-05-27 6:10 [PATCH] xfs: avoid redundant AGFL buffer invalidation Gao Xiang @ 2024-05-27 8:59 ` Dave Chinner 2024-05-27 9:23 ` Gao Xiang 0 siblings, 1 reply; 5+ messages in thread From: Dave Chinner @ 2024-05-27 8:59 UTC (permalink / raw) To: Gao Xiang; +Cc: linux-xfs, Darrick J . Wong, Chandan Babu R, LKML On Mon, May 27, 2024 at 02:10:06PM +0800, Gao Xiang wrote: > Currently AGFL blocks can be filled from the following three sources: > - allocbt free blocks, as in xfs_allocbt_free_block(); > - rmapbt free blocks, as in xfs_rmapbt_free_block(); > - refilled from freespace btrees, as in xfs_alloc_fix_freelist(). > > Originally, allocbt free blocks would be marked as stale only when they > put back in the general free space pool as Dave mentioned on IRC, "we > don't stale AGF metadata btree blocks when they are returned to the > AGFL .. but once they get put back in the general free space pool, we > have to make sure the buffers are marked stale as the next user of > those blocks might be user data...." So it turns out that xfs_alloc_ag_vextent_small() does this when allocating from the AGFL: if (args->datatype & XFS_ALLOC_USERDATA) { struct xfs_buf *bp; error = xfs_trans_get_buf(args->tp, args->mp->m_ddev_targp, XFS_AGB_TO_DADDR(args->mp, args->agno, fbno), args->mp->m_bsize, 0, &bp); if (error) goto error; xfs_trans_binval(args->tp, bp); } Hence we're already invalidating any buffer over the block allocated from the AGFL to ensure nothing will overwrite the user data that will be placed in the block after the allocation is committed. This means we can trigger the log force from this path - more about that below.... > However, after commit ca250b1b3d71 ("xfs: invalidate allocbt blocks > moved to the free list") and commit edfd9dd54921 ("xfs: move buffer > invalidation to xfs_btree_free_block"), even allocbt / bmapbt free > blocks will be invalidated immediately since they may fail to pass > V5 format validation on writeback even writeback to free space would be > safe. *nod* > IOWs, IMHO currently there is actually no difference of free blocks > between AGFL freespace pool and the general free space pool. So let's > avoid extra redundant AGFL buffer invalidation, since otherwise we're > currently facing unnecessary xfs_log_force() due to xfs_trans_binval() > again on buffers already marked as stale before as below: > > [ 333.507469] Call Trace: > [ 333.507862] xfs_buf_find+0x371/0x6a0 > [ 333.508451] xfs_buf_get_map+0x3f/0x230 > [ 333.509062] xfs_trans_get_buf_map+0x11a/0x280 > [ 333.509751] xfs_free_agfl_block+0xa1/0xd0 > [ 333.510403] xfs_agfl_free_finish_item+0x16e/0x1d0 > [ 333.511157] xfs_defer_finish_noroll+0x1ef/0x5c0 > [ 333.511871] xfs_defer_finish+0xc/0xa0 > [ 333.512471] xfs_itruncate_extents_flags+0x18a/0x5e0 > [ 333.513253] xfs_inactive_truncate+0xb8/0x130 > [ 333.513930] xfs_inactive+0x223/0x270 > > And xfs_log_force() will take tens of milliseconds with AGF buffer > locked. It becomes an unnecessary long latency especially on our PMEM > devices with FSDAX enabled. Also fstests are passed with this patch. Well, keep in mind the reason the log force was introduced in xfs_buf_lock() - commit ed3b4d6cdc81 ("xfs: Improve scalability of busy extent tracking") says: The only problem with this approach is that when a metadata buffer is marked stale (e.g. a directory block is removed), then buffer remains pinned and locked until the log goes to disk. The issue here is that if that stale buffer is reallocated in a subsequent transaction, the attempt to lock that buffer in the transaction will hang waiting the log to go to disk to unlock and unpin the buffer. Hence if someone tries to lock a pinned, stale, locked buffer we need to push on the log to get it unlocked ASAP. Effectively we are trading off a guaranteed log force for a much less common trigger for log force to occur. IOWs, this "log force on buffer lock" trigger isn't specific to AGFL blocks. The log force is placed there to ensure that access latency to any block we rapidly reallocate is *only* a few milliseconds, rather than being "whenever the next log writes trigger" which could be tens of seconds away.... Hence we need to be aware that removing the double invalidation on the AGFL blocks does not make this "log force on stale buffer" latency issue go away, it just changes when and where it happens (i.e. on reallocation). > Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> > --- > fs/xfs/libxfs/xfs_alloc.c | 18 ++---------------- > 1 file changed, 2 insertions(+), 16 deletions(-) > > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c > index 6cb8b2ddc541..a80d2a31252a 100644 > --- a/fs/xfs/libxfs/xfs_alloc.c > +++ b/fs/xfs/libxfs/xfs_alloc.c > @@ -2432,22 +2432,8 @@ xfs_free_agfl_block( > struct xfs_buf *agbp, > struct xfs_owner_info *oinfo) > { > - int error; > - struct xfs_buf *bp; > - > - error = xfs_free_ag_extent(tp, agbp, agno, agbno, 1, oinfo, > - XFS_AG_RESV_AGFL); > - if (error) > - return error; > - > - error = xfs_trans_get_buf(tp, tp->t_mountp->m_ddev_targp, > - XFS_AGB_TO_DADDR(tp->t_mountp, agno, agbno), > - tp->t_mountp->m_bsize, 0, &bp); > - if (error) > - return error; > - xfs_trans_binval(tp, bp); > - > - return 0; > + return xfs_free_ag_extent(tp, agbp, agno, agbno, 1, oinfo, > + XFS_AG_RESV_AGFL); > } I'd just get rid of the xfs_free_agfl_block() wrapper entirely and call xfs_free_ag_extent() directly from xfs_agfl_free_finish_item(). -Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] xfs: avoid redundant AGFL buffer invalidation 2024-05-27 8:59 ` Dave Chinner @ 2024-05-27 9:23 ` Gao Xiang 2024-05-28 4:12 ` [PATCH v2] " Gao Xiang 0 siblings, 1 reply; 5+ messages in thread From: Gao Xiang @ 2024-05-27 9:23 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs, Darrick J . Wong, Chandan Babu R, LKML Hi Dave, On 2024/5/27 16:59, Dave Chinner wrote: > On Mon, May 27, 2024 at 02:10:06PM +0800, Gao Xiang wrote: >> Currently AGFL blocks can be filled from the following three sources: >> - allocbt free blocks, as in xfs_allocbt_free_block(); >> - rmapbt free blocks, as in xfs_rmapbt_free_block(); >> - refilled from freespace btrees, as in xfs_alloc_fix_freelist(). >> >> Originally, allocbt free blocks would be marked as stale only when they >> put back in the general free space pool as Dave mentioned on IRC, "we >> don't stale AGF metadata btree blocks when they are returned to the >> AGFL .. but once they get put back in the general free space pool, we >> have to make sure the buffers are marked stale as the next user of >> those blocks might be user data...." > > So it turns out that xfs_alloc_ag_vextent_small() does this when > allocating from the AGFL: > > if (args->datatype & XFS_ALLOC_USERDATA) { > struct xfs_buf *bp; > > error = xfs_trans_get_buf(args->tp, args->mp->m_ddev_targp, > XFS_AGB_TO_DADDR(args->mp, args->agno, fbno), > args->mp->m_bsize, 0, &bp); > if (error) > goto error; > xfs_trans_binval(args->tp, bp); > } > > Hence we're already invalidating any buffer over the block allocated > from the AGFL to ensure nothing will overwrite the user data that > will be placed in the block after the allocation is committed. > > This means we can trigger the log force from this path - more > about that below.... Thanks for your reply! Yeah, I understand. > >> However, after commit ca250b1b3d71 ("xfs: invalidate allocbt blocks >> moved to the free list") and commit edfd9dd54921 ("xfs: move buffer >> invalidation to xfs_btree_free_block"), even allocbt / bmapbt free >> blocks will be invalidated immediately since they may fail to pass >> V5 format validation on writeback even writeback to free space would be >> safe. > > *nod* > >> IOWs, IMHO currently there is actually no difference of free blocks >> between AGFL freespace pool and the general free space pool. So let's >> avoid extra redundant AGFL buffer invalidation, since otherwise we're >> currently facing unnecessary xfs_log_force() due to xfs_trans_binval() >> again on buffers already marked as stale before as below: >> >> [ 333.507469] Call Trace: >> [ 333.507862] xfs_buf_find+0x371/0x6a0 >> [ 333.508451] xfs_buf_get_map+0x3f/0x230 >> [ 333.509062] xfs_trans_get_buf_map+0x11a/0x280 >> [ 333.509751] xfs_free_agfl_block+0xa1/0xd0 >> [ 333.510403] xfs_agfl_free_finish_item+0x16e/0x1d0 >> [ 333.511157] xfs_defer_finish_noroll+0x1ef/0x5c0 >> [ 333.511871] xfs_defer_finish+0xc/0xa0 >> [ 333.512471] xfs_itruncate_extents_flags+0x18a/0x5e0 >> [ 333.513253] xfs_inactive_truncate+0xb8/0x130 >> [ 333.513930] xfs_inactive+0x223/0x270 >> >> And xfs_log_force() will take tens of milliseconds with AGF buffer >> locked. It becomes an unnecessary long latency especially on our PMEM >> devices with FSDAX enabled. Also fstests are passed with this patch. > > Well, keep in mind the reason the log force was introduced in > xfs_buf_lock() - commit ed3b4d6cdc81 ("xfs: Improve scalability of > busy extent tracking") says: > > The only problem with this approach is that when a metadata buffer is > marked stale (e.g. a directory block is removed), then buffer remains > pinned and locked until the log goes to disk. The issue here is that > if that stale buffer is reallocated in a subsequent transaction, the > attempt to lock that buffer in the transaction will hang waiting > the log to go to disk to unlock and unpin the buffer. Hence if > someone tries to lock a pinned, stale, locked buffer we need to > push on the log to get it unlocked ASAP. Effectively we are trading > off a guaranteed log force for a much less common trigger for log > force to occur. > > IOWs, this "log force on buffer lock" trigger isn't specific to AGFL > blocks. The log force is placed there to ensure that access latency > to any block we rapidly reallocate is *only* a few milliseconds, > rather than being "whenever the next log writes trigger" which could > be tens of seconds away.... Yes, I understand, also see below: > > Hence we need to be aware that removing the double invalidation on > the AGFL blocks does not make this "log force on stale buffer" > latency issue go away, it just changes when and where it happens > (i.e. on reallocation). Yes, I totally agree with you that removing the double invalidation doesn't make this "log force on stale buffer" go away. However, currently with our workloads we've seen this path ("xfs_agfl_free_finish_item->take AGF lock-> xfs_trans_get_buf_map-> xfs_buf_lock on AGFL blocks") takes AGF lock for tens of milliseconds. At the same time, other parallel operations like "xfs_reflink_find_shared" which tend to take AGF lock will be blocked for tens of milliseconds. And after we remove the double invalidation, our workloads now behave normal (AGF lock won't be bottlenecked at least in our workloads).. Since it's unnecessary for the current codebase and it resolves our issues, I'd like to remove this double invalidation entirely. > >> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> >> --- >> fs/xfs/libxfs/xfs_alloc.c | 18 ++---------------- >> 1 file changed, 2 insertions(+), 16 deletions(-) >> >> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c >> index 6cb8b2ddc541..a80d2a31252a 100644 >> --- a/fs/xfs/libxfs/xfs_alloc.c >> +++ b/fs/xfs/libxfs/xfs_alloc.c >> @@ -2432,22 +2432,8 @@ xfs_free_agfl_block( >> struct xfs_buf *agbp, >> struct xfs_owner_info *oinfo) >> { >> - int error; >> - struct xfs_buf *bp; >> - >> - error = xfs_free_ag_extent(tp, agbp, agno, agbno, 1, oinfo, >> - XFS_AG_RESV_AGFL); >> - if (error) >> - return error; >> - >> - error = xfs_trans_get_buf(tp, tp->t_mountp->m_ddev_targp, >> - XFS_AGB_TO_DADDR(tp->t_mountp, agno, agbno), >> - tp->t_mountp->m_bsize, 0, &bp); >> - if (error) >> - return error; >> - xfs_trans_binval(tp, bp); >> - >> - return 0; >> + return xfs_free_ag_extent(tp, agbp, agno, agbno, 1, oinfo, >> + XFS_AG_RESV_AGFL); >> } > > I'd just get rid of the xfs_free_agfl_block() wrapper entirely and > call xfs_free_ag_extent() directly from xfs_agfl_free_finish_item(). Okay, let me revise a new version later. Thanks, Gao Xiang > > -Dave. ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v2] xfs: avoid redundant AGFL buffer invalidation 2024-05-27 9:23 ` Gao Xiang @ 2024-05-28 4:12 ` Gao Xiang 2024-05-29 0:28 ` Dave Chinner 0 siblings, 1 reply; 5+ messages in thread From: Gao Xiang @ 2024-05-28 4:12 UTC (permalink / raw) To: linux-xfs, Dave Chinner, Darrick J . Wong, Chandan Babu R; +Cc: LKML, Gao Xiang Currently AGFL blocks can be filled from the following three sources: - allocbt free blocks, as in xfs_allocbt_free_block(); - rmapbt free blocks, as in xfs_rmapbt_free_block(); - refilled from freespace btrees, as in xfs_alloc_fix_freelist(). Originally, allocbt free blocks would be marked as stale only when they put back in the general free space pool as Dave mentioned on IRC, "we don't stale AGF metadata btree blocks when they are returned to the AGFL .. but once they get put back in the general free space pool, we have to make sure the buffers are marked stale as the next user of those blocks might be user data...." However, after commit ca250b1b3d71 ("xfs: invalidate allocbt blocks moved to the free list") and commit edfd9dd54921 ("xfs: move buffer invalidation to xfs_btree_free_block"), even allocbt / bmapbt free blocks will be invalidated immediately since they may fail to pass V5 format validation on writeback even writeback to free space would be safe. IOWs, IMHO currently there is actually no difference of free blocks between AGFL freespace pool and the general free space pool. So let's avoid extra redundant AGFL buffer invalidation, since otherwise we're currently facing unnecessary xfs_log_force() due to xfs_trans_binval() again on buffers already marked as stale before as below: [ 333.507469] Call Trace: [ 333.507862] xfs_buf_find+0x371/0x6a0 <- xfs_buf_lock [ 333.508451] xfs_buf_get_map+0x3f/0x230 [ 333.509062] xfs_trans_get_buf_map+0x11a/0x280 [ 333.509751] xfs_free_agfl_block+0xa1/0xd0 [ 333.510403] xfs_agfl_free_finish_item+0x16e/0x1d0 [ 333.511157] xfs_defer_finish_noroll+0x1ef/0x5c0 [ 333.511871] xfs_defer_finish+0xc/0xa0 [ 333.512471] xfs_itruncate_extents_flags+0x18a/0x5e0 [ 333.513253] xfs_inactive_truncate+0xb8/0x130 [ 333.513930] xfs_inactive+0x223/0x270 xfs_log_force() will take tens of milliseconds with AGF buffer locked. It becomes an unnecessary long latency especially on our PMEM devices with FSDAX enabled and fsops like xfs_reflink_find_shared() at the same time are stuck due to the same AGF lock. Removing the double invalidation on the AGFL blocks does not make this issue go away, but this patch fixes for our workloads in reality and it should also work by the code analysis. Note that I'm not sure I need to remove another redundant one in xfs_alloc_ag_vextent_small() since it's unrelated to our workloads. Also fstests are passed with this patch. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> --- changes since v1: - Get rid of xfs_free_agfl_block() suggested by Dave; - Some commit message refinement. fs/xfs/libxfs/xfs_alloc.c | 28 +--------------------------- fs/xfs/libxfs/xfs_alloc.h | 6 ++++-- fs/xfs/xfs_extfree_item.c | 4 ++-- 3 files changed, 7 insertions(+), 31 deletions(-) diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c index 6cb8b2ddc541..67b0709734f5 100644 --- a/fs/xfs/libxfs/xfs_alloc.c +++ b/fs/xfs/libxfs/xfs_alloc.c @@ -1934,7 +1934,7 @@ xfs_alloc_ag_vextent_size( /* * Free the extent starting at agno/bno for length. */ -STATIC int +int xfs_free_ag_extent( struct xfs_trans *tp, struct xfs_buf *agbp, @@ -2424,32 +2424,6 @@ xfs_alloc_space_available( return true; } -int -xfs_free_agfl_block( - struct xfs_trans *tp, - xfs_agnumber_t agno, - xfs_agblock_t agbno, - struct xfs_buf *agbp, - struct xfs_owner_info *oinfo) -{ - int error; - struct xfs_buf *bp; - - error = xfs_free_ag_extent(tp, agbp, agno, agbno, 1, oinfo, - XFS_AG_RESV_AGFL); - if (error) - return error; - - error = xfs_trans_get_buf(tp, tp->t_mountp->m_ddev_targp, - XFS_AGB_TO_DADDR(tp->t_mountp, agno, agbno), - tp->t_mountp->m_bsize, 0, &bp); - if (error) - return error; - xfs_trans_binval(tp, bp); - - return 0; -} - /* * Check the agfl fields of the agf for inconsistency or corruption. * diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h index 0b956f8b9d5a..3dc8e44fea76 100644 --- a/fs/xfs/libxfs/xfs_alloc.h +++ b/fs/xfs/libxfs/xfs_alloc.h @@ -80,6 +80,10 @@ int xfs_alloc_get_freelist(struct xfs_perag *pag, struct xfs_trans *tp, int xfs_alloc_put_freelist(struct xfs_perag *pag, struct xfs_trans *tp, struct xfs_buf *agfbp, struct xfs_buf *agflbp, xfs_agblock_t bno, int btreeblk); +int xfs_free_ag_extent(struct xfs_trans *tp, struct xfs_buf *agbp, + xfs_agnumber_t agno, xfs_agblock_t bno, + xfs_extlen_t len, const struct xfs_owner_info *oinfo, + enum xfs_ag_resv_type type); /* * Compute and fill in value of m_alloc_maxlevels. @@ -194,8 +198,6 @@ int xfs_alloc_read_agf(struct xfs_perag *pag, struct xfs_trans *tp, int flags, struct xfs_buf **agfbpp); int xfs_alloc_read_agfl(struct xfs_perag *pag, struct xfs_trans *tp, struct xfs_buf **bpp); -int xfs_free_agfl_block(struct xfs_trans *, xfs_agnumber_t, xfs_agblock_t, - struct xfs_buf *, struct xfs_owner_info *); int xfs_alloc_fix_freelist(struct xfs_alloc_arg *args, uint32_t alloc_flags); int xfs_free_extent_fix_freelist(struct xfs_trans *tp, struct xfs_perag *pag, struct xfs_buf **agbp); diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c index 8c382f092332..01ebbd7691a5 100644 --- a/fs/xfs/xfs_extfree_item.c +++ b/fs/xfs/xfs_extfree_item.c @@ -547,8 +547,8 @@ xfs_agfl_free_finish_item( error = xfs_alloc_read_agf(xefi->xefi_pag, tp, 0, &agbp); if (!error) - error = xfs_free_agfl_block(tp, xefi->xefi_pag->pag_agno, - agbno, agbp, &oinfo); + error = xfs_free_ag_extent(tp, agbp, xefi->xefi_pag->pag_agno, + agbno, 1, &oinfo, XFS_AG_RESV_AGFL); next_extent = efdp->efd_next_extent; ASSERT(next_extent < efdp->efd_format.efd_nextents); -- 2.39.3 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v2] xfs: avoid redundant AGFL buffer invalidation 2024-05-28 4:12 ` [PATCH v2] " Gao Xiang @ 2024-05-29 0:28 ` Dave Chinner 0 siblings, 0 replies; 5+ messages in thread From: Dave Chinner @ 2024-05-29 0:28 UTC (permalink / raw) To: Gao Xiang; +Cc: linux-xfs, Darrick J . Wong, Chandan Babu R, LKML On Tue, May 28, 2024 at 12:12:39PM +0800, Gao Xiang wrote: > Currently AGFL blocks can be filled from the following three sources: > - allocbt free blocks, as in xfs_allocbt_free_block(); > - rmapbt free blocks, as in xfs_rmapbt_free_block(); > - refilled from freespace btrees, as in xfs_alloc_fix_freelist(). > > Originally, allocbt free blocks would be marked as stale only when they > put back in the general free space pool as Dave mentioned on IRC, "we > don't stale AGF metadata btree blocks when they are returned to the > AGFL .. but once they get put back in the general free space pool, we > have to make sure the buffers are marked stale as the next user of > those blocks might be user data...." > > However, after commit ca250b1b3d71 ("xfs: invalidate allocbt blocks > moved to the free list") and commit edfd9dd54921 ("xfs: move buffer > invalidation to xfs_btree_free_block"), even allocbt / bmapbt free > blocks will be invalidated immediately since they may fail to pass > V5 format validation on writeback even writeback to free space would be > safe. > > IOWs, IMHO currently there is actually no difference of free blocks > between AGFL freespace pool and the general free space pool. So let's > avoid extra redundant AGFL buffer invalidation, since otherwise we're > currently facing unnecessary xfs_log_force() due to xfs_trans_binval() > again on buffers already marked as stale before as below: > > [ 333.507469] Call Trace: > [ 333.507862] xfs_buf_find+0x371/0x6a0 <- xfs_buf_lock > [ 333.508451] xfs_buf_get_map+0x3f/0x230 > [ 333.509062] xfs_trans_get_buf_map+0x11a/0x280 > [ 333.509751] xfs_free_agfl_block+0xa1/0xd0 > [ 333.510403] xfs_agfl_free_finish_item+0x16e/0x1d0 > [ 333.511157] xfs_defer_finish_noroll+0x1ef/0x5c0 > [ 333.511871] xfs_defer_finish+0xc/0xa0 > [ 333.512471] xfs_itruncate_extents_flags+0x18a/0x5e0 > [ 333.513253] xfs_inactive_truncate+0xb8/0x130 > [ 333.513930] xfs_inactive+0x223/0x270 > > xfs_log_force() will take tens of milliseconds with AGF buffer locked. > It becomes an unnecessary long latency especially on our PMEM devices > with FSDAX enabled and fsops like xfs_reflink_find_shared() at the same > time are stuck due to the same AGF lock. Removing the double > invalidation on the AGFL blocks does not make this issue go away, but > this patch fixes for our workloads in reality and it should also work > by the code analysis. > > Note that I'm not sure I need to remove another redundant one in > xfs_alloc_ag_vextent_small() since it's unrelated to our workloads. > Also fstests are passed with this patch. > > Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> > --- > changes since v1: > - Get rid of xfs_free_agfl_block() suggested by Dave; > - Some commit message refinement. > > fs/xfs/libxfs/xfs_alloc.c | 28 +--------------------------- > fs/xfs/libxfs/xfs_alloc.h | 6 ++++-- > fs/xfs/xfs_extfree_item.c | 4 ++-- > 3 files changed, 7 insertions(+), 31 deletions(-) Looks fine - it appears to me that all the places that put blocks onto the freelist will have already invalidated any buffers over those blocks, so we don't need to do it again when moving them from the freelist to the freespace btrees. Reviewed-by: Dave Chinner <dchinner@redhat.com> -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-05-29 0:28 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-05-27 6:10 [PATCH] xfs: avoid redundant AGFL buffer invalidation Gao Xiang 2024-05-27 8:59 ` Dave Chinner 2024-05-27 9:23 ` Gao Xiang 2024-05-28 4:12 ` [PATCH v2] " Gao Xiang 2024-05-29 0:28 ` Dave Chinner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox