Re: [PATCH 4/5] xfs: don't block in busy flushing when freeing extents

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

From: Dave Chinner <david@fromorbit.com>
To: Chandan Babu R <chandan.babu@oracle.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 4/5] xfs: don't block in busy flushing when freeing extents
Date: Wed, 21 Jun 2023 08:18:01 +1000	[thread overview]
Message-ID: <ZJIlmbuHIhu5BMG+@dread.disaster.area> (raw)
In-Reply-To: <87a5wumafo.fsf@debian-BULLSEYE-live-builder-AMD64>

On Tue, Jun 20, 2023 at 08:23:33PM +0530, Chandan Babu R wrote:
> On Tue, Jun 20, 2023 at 10:20:20 AM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> >
> > If the current transaction holds a busy extent and we are trying to
> > allocate a new extent to fix up the free list, we can deadlock if
> > the AG is entirely empty except for the busy extent held by the
> > transaction.
....
> > @@ -577,10 +588,23 @@ xfs_extent_busy_flush(
> >  	DEFINE_WAIT		(wait);
> >  	int			error;
> >  
> > -	error = xfs_log_force(mp, XFS_LOG_SYNC);
> > +	error = xfs_log_force(tp->t_mountp, XFS_LOG_SYNC);
> >  	if (error)
> > -		return;
> > +		return error;
> >  
> > +	/* Avoid deadlocks on uncommitted busy extents. */
> > +	if (!list_empty(&tp->t_busy)) {
> > +		if (alloc_flags & XFS_ALLOC_FLAG_TRYFLUSH)
> > +			return 0;
> > +
> > +		if (busy_gen != READ_ONCE(pag->pagb_gen))
> > +			return 0;
> > +
> > +		if (alloc_flags & XFS_ALLOC_FLAG_FREEING)
> > +			return -EAGAIN;
> > +	}
> 
> In the case where a task is freeing an ondisk inode, an ifree transaction can
> invoke __xfs_inobt_free_block() twice; Once to free the inobt's leaf block and
> the next call to free its immediate parent block.
> 
> The first call to __xfs_inobt_free_block() adds the freed extent into the
> transaction's busy list and also into the per-ag rb tree tracking the busy
> extent. Freeing the second inobt block could lead to the following sequence of
> function calls,
> 
> __xfs_free_extent() => xfs_free_extent_fix_freelist() =>
> xfs_alloc_fix_freelist() => xfs_alloc_ag_vextent_size()

Yes, I think you might be right. I checked inode chunks - they are
freed from this path via:

xfs_ifree
  xfs_difree
    xfs_difree_inobt
      xfs_difree_inobt_chunk
        xfs_free_extent_later
	  <queues an XEFI for deferred freeing>

And I didn't think about the inobt blocks themselves because freeing
an inode can require allocation of finobt blocks and hence there's a
transaction reservation for block allocation on finobt enabled
filesystems. i.e. freeing can't proceed unless there is some amount
of free blocks available, and that's why the finobt has an amount of
per-ag space reserved for it.

Hence, for finobt enabled filesystems, I don't think we can ever get
down to a completely empty AG and an AGFL that needs refilling from
the inode path - the metadata reserve doesn't allow the AG to be
completely emptied in the way that is needed for this bug to
manifest.

Yes, I think it is still possible for all the free space to be busy,
and so when online discard is enabled we need to do the busy wait
after the log force to avoid that. However, for non-discard
filesystems the sync log force is all that is needed to resolve busy
extents outside the current transaction, so this wouldn't be an
issue for the current patchset.

I suspect that is why I haven't seen issues on v5 filesystems,
though I also haven't seen issues on v4 filesystems that don't have
the finobt per-ag metadata reservation nor the space reservation at
transaction reservation time. I know that the fstests enospc group
is exercising the busy flush code, but I doubt that it was exercised
through the inode btree block freeing path...

I note that the refcount btree block freeing path also call
xfs_free_extent(). This might be OK, because refcount btree updates
get called from deferred intent processing, and hence the EAGAIN
will trigger a transaction roll and retry correctly.

I suspect, however, that both of these paths should simply call
xfs_free_extent_later() to queue an XEFI for deferred processing,
and that takes the entire extent freeing path out from under the
btree operations. 

I'll look into that. Thanks!

-Dave.
-- 
Dave Chinner
david@fromorbit.com

next prev parent reply	other threads:[~2023-06-20 22:18 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-20  0:20 [PATCH 0/5 V2] xfs: various fixes Dave Chinner
2023-06-20  0:20 ` [PATCH 1/5] xfs: don't reverse order of items in bulk AIL insertion Dave Chinner
2023-06-20  0:20 ` [PATCH 2/5] xfs: pass alloc flags through to xfs_extent_busy_flush() Dave Chinner
2023-06-20  0:20 ` [PATCH 3/5] xfs: allow extent free intents to be retried Dave Chinner
2023-06-20  0:20 ` [PATCH 4/5] xfs: don't block in busy flushing when freeing extents Dave Chinner
2023-06-20 14:53   ` Chandan Babu R
2023-06-20 22:18     ` Dave Chinner [this message]
2023-06-22  3:59       ` Chandan Babu R
2023-06-22  5:28         ` Dave Chinner
2023-06-22  5:36           ` Chandan Babu R
2023-06-20  0:20 ` [PATCH 5/5] xfs: journal geometry is not properly bounds checked Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZJIlmbuHIhu5BMG+@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=chandan.babu@oracle.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox