From: "Darrick J. Wong" <djwong@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org, chandanrlinux@gmail.com,
wen.gang.wang@oracle.com
Subject: Re: [PATCH 2/3] xfs: allow extent free intents to be retried
Date: Thu, 15 Jun 2023 07:41:50 -0700 [thread overview]
Message-ID: <20230615144150.GO11441@frogsfrogsfrogs> (raw)
In-Reply-To: <ZIqMIgyHBmZu4jxb@dread.disaster.area>
On Thu, Jun 15, 2023 at 01:57:22PM +1000, Dave Chinner wrote:
> On Wed, Jun 14, 2023 at 08:38:37PM -0700, Darrick J. Wong wrote:
> > On Thu, Jun 15, 2023 at 11:42:00AM +1000, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > >
> > > Extent freeing neeeds to be able to avoid a busy extent deadlock
> > > when the transaction itself holds the only busy extents in the
> > > allocation group. This may occur if we have an EFI that contains
> > > multiple extents to be freed, and the freeing the second intent
> > > requires the space the first extent free released to expand the
> > > AGFL. If we block on the busy extent at this point, we deadlock.
> > >
> > > We hold a dirty transaction that contains a entire atomic extent
> > > free operations within it, so if we can abort the extent free
> > > operation and commit the progress that we've made, the busy extent
> > > can be resolved by a log force. Hence we can restart the aborted
> > > extent free with a new transaction and continue to make
> > > progress without risking deadlocks.
> > >
> > > To enable this, we need the EFI processing code to be able to handle
> > > an -EAGAIN error to tell it to commit the current transaction and
> > > retry again. This mechanism is already built into the defer ops
> > > processing (used bythe refcount btree modification intents), so
> > > there's relatively little handling we need to add to the EFI code to
> > > enable this.
> > >
> > > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > > ---
> > > fs/xfs/xfs_extfree_item.c | 64 +++++++++++++++++++++++++++++++++++++--
> > > 1 file changed, 61 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
> > > index f9e36b810663..3b33d27efdce 100644
> > > --- a/fs/xfs/xfs_extfree_item.c
> > > +++ b/fs/xfs/xfs_extfree_item.c
> > > @@ -336,6 +336,29 @@ xfs_trans_get_efd(
> > > return efdp;
> > > }
> > >
> > > +/*
> > > + * Fill the EFD with all extents from the EFI when we need to roll the
> > > + * transaction and continue with a new EFI.
> > > + */
> > > +static void
> > > +xfs_efd_from_efi(
> > > + struct xfs_efd_log_item *efdp)
> > > +{
> > > + struct xfs_efi_log_item *efip = efdp->efd_efip;
> > > + uint i;
> > > +
> > > + ASSERT(efip->efi_format.efi_nextents > 0);
> > > +
> > > + if (efdp->efd_next_extent == efip->efi_format.efi_nextents)
> > > + return;
> > > +
> > > + for (i = 0; i < efip->efi_format.efi_nextents; i++) {
> > > + efdp->efd_format.efd_extents[i] =
> > > + efip->efi_format.efi_extents[i];
> > > + }
> > > + efdp->efd_next_extent = efip->efi_format.efi_nextents;
> >
> > Odd question -- if we managed to free half the extents mentioned in an
> > EFI before hitting -EAGAIN, then efdp->efd_next_extent should already be
> > half of efip->efi_format.efi_nextents, right?
>
> Yes, on success we normally update the EFD with the extent we just
> completed and move the EFI/EFD cursors forwards.
>
> > I suppose it's duplicative (or maybe just careful) to recopy the entire
> > thing... but maybe that doesn't even really matter since no modern xlog
> > code actually pays attention to what's in the EFD aside from the ID
> > number.
>
> *nod*
>
> On second thoughts, now that you've questioned this behaviour, I
> need to go back and check my assumptions about what the intent
> creation is doing vs the current EFI vs the XEFI we are processing.
> The new EFI we log shouldn't have the extents we've completed in it,
> just the ones we haven't run, and I need to make sure that is
> actually what is happening here.
That shouldn't be happening -- each of the xfs_free_extent_later calls
below adds new incore EFIs to an xfs_defer_pending.dfp_work list and
each xfs_defer_pending itself gets added to xfs_trans.t_dfops. The
defer_capture_and_commit function will turn the xfs_defer_pending into a
new EFI log item with the queued dfp_work items attached.
IOWs, as long as you don't call xfs_free_extent_later on any of the
xefi_startblock/blockcount pairs where xfs_trans_free_extent returned 0,
your assumptions are correct.
The code presented in this patch is correct.
--D
> > > @@ -652,9 +694,25 @@ xfs_efi_item_recover(
> > > fake.xefi_startblock = extp->ext_start;
> > > fake.xefi_blockcount = extp->ext_len;
> > >
> > > - xfs_extent_free_get_group(mp, &fake);
> > > - error = xfs_trans_free_extent(tp, efdp, &fake);
> > > - xfs_extent_free_put_group(&fake);
> > > + if (!requeue_only) {
> > > + xfs_extent_free_get_group(mp, &fake);
> > > + error = xfs_trans_free_extent(tp, efdp, &fake);
> > > + xfs_extent_free_put_group(&fake);
> > > + }
> > > +
> > > + /*
> > > + * If we can't free the extent without potentially deadlocking,
> > > + * requeue the rest of the extents to a new so that they get
> > > + * run again later with a new transaction context.
> > > + */
> > > + if (error == -EAGAIN || requeue_only) {
> > > + xfs_free_extent_later(tp, fake.xefi_startblock,
> > > + fake.xefi_blockcount, &XFS_RMAP_OINFO_ANY_OWNER);
> >
> > Shouldn't we check the return value of xfs_free_extent_later now?
> > I think we already did that above, but since you just plumbed in the
> > extra checks, we ought to use it. :)
>
> Oh, right, my cscope tree needs updating, so I was thinking it is
> still a void function.
>
> > (Also the indenting here isn't the usual two tabs)
>
> I'll fix that too.
>
> Cheers,
>
> Dave.
>
> --
> Dave Chinner
> david@fromorbit.com
next prev parent reply other threads:[~2023-06-15 14:41 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-15 1:41 [PATCH 0/3] xfs: fix xfs_extent_busy_flush() deadlock in EFI processing Dave Chinner
2023-06-15 1:41 ` [PATCH 1/3] xfs: pass alloc flags through to xfs_extent_busy_flush() Dave Chinner
2023-06-15 3:32 ` Darrick J. Wong
2023-06-15 3:48 ` Dave Chinner
2023-06-15 21:57 ` Wengang Wang
2023-06-15 22:14 ` Dave Chinner
2023-06-15 22:31 ` Wengang Wang
2023-06-15 23:09 ` Wengang Wang
2023-06-15 23:33 ` Dave Chinner
2023-06-15 23:51 ` Wengang Wang
2023-06-16 0:17 ` Dave Chinner
2023-06-16 0:42 ` Wengang Wang
2023-06-16 4:27 ` Wengang Wang
2023-06-16 5:04 ` Wengang Wang
2023-06-16 7:36 ` Dave Chinner
2023-06-16 17:43 ` Wengang Wang
2023-06-16 22:29 ` Dave Chinner
2023-06-16 22:53 ` Wengang Wang
2023-06-16 23:14 ` Wengang Wang
2023-06-17 0:47 ` Dave Chinner
2023-06-20 16:56 ` Wengang Wang
2023-06-22 1:15 ` Wengang Wang
2023-06-15 1:42 ` [PATCH 2/3] xfs: allow extent free intents to be retried Dave Chinner
2023-06-15 3:38 ` Darrick J. Wong
2023-06-15 3:57 ` Dave Chinner
2023-06-15 14:41 ` Darrick J. Wong [this message]
2023-06-15 22:21 ` Dave Chinner
2023-06-15 1:42 ` [PATCH 3/3] xfs: don't block in busy flushing when freeing extents Dave Chinner
2023-06-15 3:40 ` Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230615144150.GO11441@frogsfrogsfrogs \
--to=djwong@kernel.org \
--cc=chandanrlinux@gmail.com \
--cc=david@fromorbit.com \
--cc=linux-xfs@vger.kernel.org \
--cc=wen.gang.wang@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox