From: "Darrick J. Wong" <djwong@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 4/8] xfs: allow extent free intents to be retried
Date: Wed, 28 Jun 2023 10:48:36 -0700 [thread overview]
Message-ID: <20230628174836.GV11441@frogsfrogsfrogs> (raw)
In-Reply-To: <20230627224412.2242198-5-david@fromorbit.com>
On Wed, Jun 28, 2023 at 08:44:08AM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> Extent freeing neeeds to be able to avoid a busy extent deadlock
> when the transaction itself holds the only busy extents in the
> allocation group. This may occur if we have an EFI that contains
> multiple extents to be freed, and the freeing the second intent
> requires the space the first extent free released to expand the
> AGFL. If we block on the busy extent at this point, we deadlock.
>
> We hold a dirty transaction that contains a entire atomic extent
> free operations within it, so if we can abort the extent free
> operation and commit the progress that we've made, the busy extent
> can be resolved by a log force. Hence we can restart the aborted
> extent free with a new transaction and continue to make
> progress without risking deadlocks.
>
> To enable this, we need the EFI processing code to be able to handle
> an -EAGAIN error to tell it to commit the current transaction and
> retry again. This mechanism is already built into the defer ops
> processing (used bythe refcount btree modification intents), so
> there's relatively little handling we need to add to the EFI code to
> enable this.
>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
> fs/xfs/xfs_extfree_item.c | 73 +++++++++++++++++++++++++++++++++++++--
> 1 file changed, 70 insertions(+), 3 deletions(-)
>
> diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
> index 79e65bb6b0a2..098420cbd4a0 100644
> --- a/fs/xfs/xfs_extfree_item.c
> +++ b/fs/xfs/xfs_extfree_item.c
> @@ -336,6 +336,34 @@ xfs_trans_get_efd(
> return efdp;
> }
>
> +/*
> + * Fill the EFD with all extents from the EFI when we need to roll the
> + * transaction and continue with a new EFI.
> + *
> + * This simply copies all the extents in the EFI to the EFD rather than make
> + * assumptions about which extents in the EFI have already been processed. We
> + * currently keep the xefi list in the same order as the EFI extent list, but
> + * that may not always be the case. Copying everything avoids leaving a landmine
> + * were we fail to cancel all the extents in an EFI if the xefi list is
> + * processed in a different order to the extents in the EFI.
> + */
> +static void
> +xfs_efd_from_efi(
> + struct xfs_efd_log_item *efdp)
> +{
> + struct xfs_efi_log_item *efip = efdp->efd_efip;
> + uint i;
> +
> + ASSERT(efip->efi_format.efi_nextents > 0);
> + ASSERT(efdp->efd_next_extent < efip->efi_format.efi_nextents);
> +
> + for (i = 0; i < efip->efi_format.efi_nextents; i++) {
> + efdp->efd_format.efd_extents[i] =
> + efip->efi_format.efi_extents[i];
> + }
> + efdp->efd_next_extent = efip->efi_format.efi_nextents;
> +}
> +
> /*
> * Free an extent and log it to the EFD. Note that the transaction is marked
> * dirty regardless of whether the extent free succeeds or fails to support the
> @@ -378,6 +406,17 @@ xfs_trans_free_extent(
> tp->t_flags |= XFS_TRANS_DIRTY | XFS_TRANS_HAS_INTENT_DONE;
> set_bit(XFS_LI_DIRTY, &efdp->efd_item.li_flags);
>
> + /*
> + * If we need a new transaction to make progress, the caller will log a
> + * new EFI with the current contents. It will also log an EFD to cancel
> + * the existing EFI, and so we need to copy all the unprocessed extents
> + * in this EFI to the EFD so this works correctly.
> + */
> + if (error == -EAGAIN) {
> + xfs_efd_from_efi(efdp);
> + return error;
> + }
> +
> next_extent = efdp->efd_next_extent;
> ASSERT(next_extent < efdp->efd_format.efd_nextents);
> extp = &(efdp->efd_format.efd_extents[next_extent]);
> @@ -495,6 +534,13 @@ xfs_extent_free_finish_item(
>
> error = xfs_trans_free_extent(tp, EFD_ITEM(done), xefi);
>
> + /*
> + * Don't free the XEFI if we need a new transaction to complete
> + * processing of it.
> + */
> + if (error == -EAGAIN)
> + return error;
> +
> xfs_extent_free_put_group(xefi);
> kmem_cache_free(xfs_extfree_item_cache, xefi);
> return error;
> @@ -620,6 +666,7 @@ xfs_efi_item_recover(
> struct xfs_trans *tp;
> int i;
> int error = 0;
> + bool requeue_only = false;
>
> /*
> * First check the validity of the extents described by the
> @@ -653,9 +700,29 @@ xfs_efi_item_recover(
> fake.xefi_startblock = extp->ext_start;
> fake.xefi_blockcount = extp->ext_len;
>
> - xfs_extent_free_get_group(mp, &fake);
> - error = xfs_trans_free_extent(tp, efdp, &fake);
> - xfs_extent_free_put_group(&fake);
> + if (!requeue_only) {
> + xfs_extent_free_get_group(mp, &fake);
> + error = xfs_trans_free_extent(tp, efdp, &fake);
> + xfs_extent_free_put_group(&fake);
> + }
> +
> + /*
> + * If we can't free the extent without potentially deadlocking,
> + * requeue the rest of the extents to a new so that they get
> + * run again later with a new transaction context.
> + */
> + if (error == -EAGAIN || requeue_only) {
> + error = xfs_free_extent_later(tp, fake.xefi_startblock,
> + fake.xefi_blockcount,
> + &XFS_RMAP_OINFO_ANY_OWNER,
> + fake.xefi_type);
> + if (!error) {
> + requeue_only = true;
> + error = 0;
@error is already zero so this is redundant, right?
If yes then I'll remove the line on commit,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
--D
> + continue;
> + }
> + };
> +
> if (error == -EFSCORRUPTED)
> XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
> extp, sizeof(*extp));
> --
> 2.40.1
>
next prev parent reply other threads:[~2023-06-28 17:48 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-27 22:44 [PATCH 0/8 v3] xfs: various fixes for 6.5 Dave Chinner
2023-06-27 22:44 ` [PATCH 1/8] xfs: don't reverse order of items in bulk AIL insertion Dave Chinner
2023-06-28 6:03 ` Christoph Hellwig
2023-06-28 9:55 ` Chandan Babu R
2023-06-28 17:46 ` Darrick J. Wong
2023-06-27 22:44 ` [PATCH 2/8] xfs: use deferred frees for btree block freeing Dave Chinner
2023-06-28 17:46 ` Darrick J. Wong
2023-06-28 22:55 ` Dave Chinner
2023-06-29 7:52 ` Chandan Babu R
2023-06-27 22:44 ` [PATCH 3/8] xfs: pass alloc flags through to xfs_extent_busy_flush() Dave Chinner
2023-06-29 9:44 ` Chandan Babu R
2023-06-27 22:44 ` [PATCH 4/8] xfs: allow extent free intents to be retried Dave Chinner
2023-06-28 17:48 ` Darrick J. Wong [this message]
2023-06-28 22:57 ` Dave Chinner
2023-06-29 9:50 ` Chandan Babu R
2023-06-27 22:44 ` [PATCH 5/8] xfs: don't block in busy flushing when freeing extents Dave Chinner
2023-06-27 22:44 ` [PATCH 6/8] xfs: journal geometry is not properly bounds checked Dave Chinner
2023-06-28 6:08 ` Christoph Hellwig
2023-06-28 6:38 ` Dave Chinner
2023-06-28 17:50 ` Darrick J. Wong
2023-06-27 22:44 ` [PATCH 7/8] xfs: AGF length has never been " Dave Chinner
2023-06-28 17:52 ` Darrick J. Wong
2023-06-29 2:09 ` [PATCH 7/8 V2] " Dave Chinner
2023-06-29 16:35 ` Darrick J. Wong
2023-06-29 22:33 ` Dave Chinner
2023-06-27 22:44 ` [PATCH 8/8] xfs: fix bounds check in xfs_defer_agfl_block() Dave Chinner
2023-06-28 6:09 ` Christoph Hellwig
2023-06-28 17:52 ` Darrick J. Wong
2023-06-29 19:42 ` [RFC PATCH 9/8] xfs: AGI length should be bounds checked Darrick J. Wong
2023-06-29 22:35 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230628174836.GV11441@frogsfrogsfrogs \
--to=djwong@kernel.org \
--cc=david@fromorbit.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox