public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 2/8] xfs: use deferred frees for btree block freeing
Date: Thu, 29 Jun 2023 08:55:41 +1000	[thread overview]
Message-ID: <ZJy6bXphyOp4NGzk@dread.disaster.area> (raw)
In-Reply-To: <20230628174625.GT11441@frogsfrogsfrogs>

On Wed, Jun 28, 2023 at 10:46:25AM -0700, Darrick J. Wong wrote:
> On Wed, Jun 28, 2023 at 08:44:06AM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Btrees that aren't freespace management trees use the normal extent
> > allocation and freeing routines for their blocks. Hence when a btree
> > block is freed, a direct call to xfs_free_extent() is made and the
> > extent is immediately freed. This puts the entire free space
> > management btrees under this path, so we are stacking btrees on
> > btrees in the call stack. The inobt, finobt and refcount btrees
> > all do this.
> > 
> > However, the bmap btree does not do this - it calls
> > xfs_free_extent_later() to defer the extent free operation via an
> > XEFI and hence it gets processed in deferred operation processing
> > during the commit of the primary transaction (i.e. via intent
> > chaining).
> > 
> > We need to change xfs_free_extent() to behave in a non-blocking
> > manner so that we can avoid deadlocks with busy extents near ENOSPC
> > in transactions that free multiple extents. Inserting or removing a
> > record from a btree can cause a multi-level tree merge operation and
> > that will free multiple blocks from the btree in a single
> > transaction. i.e. we can call xfs_free_extent() multiple times, and
> > hence the btree manipulation transaction is vulnerable to this busy
> > extent deadlock vector.
> > 
> > To fix this, convert all the remaining callers of xfs_free_extent()
> > to use xfs_free_extent_later() to queue XEFIs and hence defer
> > processing of the extent frees to a context that can be safely
> > restarted if a deadlock condition is detected.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> >  fs/xfs/libxfs/xfs_ag.c             | 2 +-
> >  fs/xfs/libxfs/xfs_alloc.c          | 4 ++++
> >  fs/xfs/libxfs/xfs_alloc.h          | 8 +++++---
> >  fs/xfs/libxfs/xfs_bmap.c           | 8 +++++---
> >  fs/xfs/libxfs/xfs_bmap_btree.c     | 3 ++-
> >  fs/xfs/libxfs/xfs_ialloc.c         | 8 ++++----
> >  fs/xfs/libxfs/xfs_ialloc_btree.c   | 3 +--
> >  fs/xfs/libxfs/xfs_refcount.c       | 9 ++++++---
> >  fs/xfs/libxfs/xfs_refcount_btree.c | 8 +-------
> >  fs/xfs/xfs_extfree_item.c          | 3 ++-
> >  fs/xfs/xfs_reflink.c               | 3 ++-
> >  11 files changed, 33 insertions(+), 26 deletions(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c
> > index ee84835ebc66..e9cc481b4ddf 100644
> > --- a/fs/xfs/libxfs/xfs_ag.c
> > +++ b/fs/xfs/libxfs/xfs_ag.c
> > @@ -985,7 +985,7 @@ xfs_ag_shrink_space(
> >  			goto resv_err;
> >  
> >  		err2 = __xfs_free_extent_later(*tpp, args.fsbno, delta, NULL,
> > -				true);
> > +				XFS_AG_RESV_NONE, true);
> >  		if (err2)
> >  			goto resv_err;
> >  
> > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> > index c20fe99405d8..cc3f7b905ea1 100644
> > --- a/fs/xfs/libxfs/xfs_alloc.c
> > +++ b/fs/xfs/libxfs/xfs_alloc.c
> > @@ -2449,6 +2449,7 @@ xfs_defer_agfl_block(
> >  	xefi->xefi_startblock = XFS_AGB_TO_FSB(mp, agno, agbno);
> >  	xefi->xefi_blockcount = 1;
> >  	xefi->xefi_owner = oinfo->oi_owner;
> > +	xefi->xefi_type = XFS_AG_RESV_AGFL;
> >  	if (XFS_IS_CORRUPT(mp, !xfs_verify_fsbno(mp, xefi->xefi_startblock)))
> >  		return -EFSCORRUPTED;
> > @@ -2470,6 +2471,7 @@ __xfs_free_extent_later(
> >  	xfs_fsblock_t			bno,
> >  	xfs_filblks_t			len,
> >  	const struct xfs_owner_info	*oinfo,
> > +	enum xfs_ag_resv_type		type,
> >  	bool				skip_discard)
> >  {
> >  	struct xfs_extent_free_item	*xefi;
> > @@ -2490,6 +2492,7 @@ __xfs_free_extent_later(
> >  	ASSERT(agbno + len <= mp->m_sb.sb_agblocks);
> >  #endif
> >  	ASSERT(xfs_extfree_item_cache != NULL);
> > +	ASSERT(type != XFS_AG_RESV_AGFL);
> >  
> >  	if (XFS_IS_CORRUPT(mp, !xfs_verify_fsbext(mp, bno, len)))
> >  		return -EFSCORRUPTED;
> > @@ -2498,6 +2501,7 @@ __xfs_free_extent_later(
> >  			       GFP_KERNEL | __GFP_NOFAIL);
> >  	xefi->xefi_startblock = bno;
> >  	xefi->xefi_blockcount = (xfs_extlen_t)len;
> > +	xefi->xefi_type = type;
> >  	if (skip_discard)
> >  		xefi->xefi_flags |= XFS_EFI_SKIP_DISCARD;
> >  	if (oinfo) {
> > diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
> > index 85ac470be0da..121faf1e11ad 100644
> > --- a/fs/xfs/libxfs/xfs_alloc.h
> > +++ b/fs/xfs/libxfs/xfs_alloc.h
> > @@ -232,7 +232,7 @@ xfs_buf_to_agfl_bno(
> >  
> >  int __xfs_free_extent_later(struct xfs_trans *tp, xfs_fsblock_t bno,
> >  		xfs_filblks_t len, const struct xfs_owner_info *oinfo,
> > -		bool skip_discard);
> > +		enum xfs_ag_resv_type type, bool skip_discard);
> >  
> >  /*
> >   * List of extents to be free "later".
> > @@ -245,6 +245,7 @@ struct xfs_extent_free_item {
> >  	xfs_extlen_t		xefi_blockcount;/* number of blocks in extent */
> >  	struct xfs_perag	*xefi_pag;
> >  	unsigned int		xefi_flags;
> 
> /me is barely back from vacation, starting to process the ~1100 emails
> by taking care of the obvious bugfixes first...
> 
> > +	enum xfs_ag_resv_type	xefi_type;
> 
> I got confused by 'xefi_type' until I remembered that
> XFS_DEFER_OPS_TYPE_AGFL_FREE / XFS_DEFER_OPS_TYPE_FREE are stuffed in
> the xfs_defer_pending structure, not the xefi itself.
> 
> Could this field be named xefi_agresv instead?

Sure.

> The rest of the logic in this patch looks correct and makes things
> easier for the rt modernization patches, so I'll say
> 
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> 
> and change the name on commit, if that's ok?

That's fine.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2023-06-28 22:56 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-27 22:44 [PATCH 0/8 v3] xfs: various fixes for 6.5 Dave Chinner
2023-06-27 22:44 ` [PATCH 1/8] xfs: don't reverse order of items in bulk AIL insertion Dave Chinner
2023-06-28  6:03   ` Christoph Hellwig
2023-06-28  9:55   ` Chandan Babu R
2023-06-28 17:46   ` Darrick J. Wong
2023-06-27 22:44 ` [PATCH 2/8] xfs: use deferred frees for btree block freeing Dave Chinner
2023-06-28 17:46   ` Darrick J. Wong
2023-06-28 22:55     ` Dave Chinner [this message]
2023-06-29  7:52   ` Chandan Babu R
2023-06-27 22:44 ` [PATCH 3/8] xfs: pass alloc flags through to xfs_extent_busy_flush() Dave Chinner
2023-06-29  9:44   ` Chandan Babu R
2023-06-27 22:44 ` [PATCH 4/8] xfs: allow extent free intents to be retried Dave Chinner
2023-06-28 17:48   ` Darrick J. Wong
2023-06-28 22:57     ` Dave Chinner
2023-06-29  9:50   ` Chandan Babu R
2023-06-27 22:44 ` [PATCH 5/8] xfs: don't block in busy flushing when freeing extents Dave Chinner
2023-06-27 22:44 ` [PATCH 6/8] xfs: journal geometry is not properly bounds checked Dave Chinner
2023-06-28  6:08   ` Christoph Hellwig
2023-06-28  6:38     ` Dave Chinner
2023-06-28 17:50   ` Darrick J. Wong
2023-06-27 22:44 ` [PATCH 7/8] xfs: AGF length has never been " Dave Chinner
2023-06-28 17:52   ` Darrick J. Wong
2023-06-29  2:09     ` [PATCH 7/8 V2] " Dave Chinner
2023-06-29 16:35       ` Darrick J. Wong
2023-06-29 22:33         ` Dave Chinner
2023-06-27 22:44 ` [PATCH 8/8] xfs: fix bounds check in xfs_defer_agfl_block() Dave Chinner
2023-06-28  6:09   ` Christoph Hellwig
2023-06-28 17:52   ` Darrick J. Wong
2023-06-29 19:42 ` [RFC PATCH 9/8] xfs: AGI length should be bounds checked Darrick J. Wong
2023-06-29 22:35   ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZJy6bXphyOp4NGzk@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox