Re: [PATCH 3/6] xfs: xfs_itruncate_extents has no extent count limitation

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Chandan Babu R <chandanrlinux@gmail.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 3/6] xfs: xfs_itruncate_extents has no extent count limitation
Date: Tue, 01 Jun 2021 12:12:36 +0530	[thread overview]
Message-ID: <871r9mytk3.fsf@garuda> (raw)
In-Reply-To: <20210531232820.GT664593@dread.disaster.area>

On 01 Jun 2021 at 04:58, Dave Chinner wrote:
> On Mon, May 31, 2021 at 06:35:40PM +0530, Chandan Babu R wrote:
>> On 31 May 2021 at 18:25, Chandan Babu R wrote:
>> > On 27 May 2021 at 10:21, Dave Chinner wrote:
>> >> From: Dave Chinner <dchinner@redhat.com>
>> >>
>> >> Ever since we moved to freeing of extents by deferred operations,
>> >> we've already freed extents via individual transactions. Hence the
>> >> only limitation of how many extents we can mark for freeing in a
>> >> single xfs_bunmapi() call bound only by how many deferrals we want
>> >> to queue.
>> >>
>> >> That is xfs_bunmapi() doesn't actually do any AG based extent
>> >> freeing, so there's no actually transaction reservation used up by
>> >> calling bunmapi with a large count of extents to be freed. RT
>> >> extents have always been freed directly by bunmapi, but that doesn't
>> >> require modification of large number of blocks as there are no
>> >> btrees to split.
>> >>
>> >> Some callers of xfs_bunmapi assume that the extent count being freed
>> >> is bound by geometry (e.g. directories) and these can ask bunmapi to
>> >> free up to 64 extents in a single call. These functions just work as
>> >> tehy stand, so there's no reason for truncate to have a limit of
>> >> just two extents per bunmapi call anymore.
>> >>
>> >> Increase XFS_ITRUNC_MAX_EXTENTS to 64 to match the number of extents
>> >> that can be deferred in a single loop to match what the directory
>> >> code already uses.
>> >>
>> >> For realtime inodes, where xfs_bunmapi() directly frees extents,
>> >> leave the limit at 2 extents per loop as this is all the space that
>> >> the transaction reservation will cover.
>> >>
>> >> Signed-off-by: Dave Chinner <dchinner@redhat.com>
>> >> ---
>> >>  fs/xfs/xfs_inode.c | 15 ++++++++++++---
>> >>  1 file changed, 12 insertions(+), 3 deletions(-)
>> >>
>> >> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
>> >> index 0369eb22c1bb..db220eaa34b8 100644
>> >> --- a/fs/xfs/xfs_inode.c
>> >> +++ b/fs/xfs/xfs_inode.c
>> >> @@ -40,9 +40,18 @@ kmem_zone_t *xfs_inode_zone;
>> >>
>> >>  /*
>> >>   * Used in xfs_itruncate_extents().  This is the maximum number of extents
>> >> - * freed from a file in a single transaction.
>> >> + * we will unmap and defer for freeing in a single call to xfs_bunmapi().
>> >> + * Realtime inodes directly free extents in xfs_bunmapi(), so are bound
>> >> + * by transaction reservation size to 2 extents.
>> >>   */
>> >> -#define	XFS_ITRUNC_MAX_EXTENTS	2
>> >> +static inline int
>> >> +xfs_itrunc_max_extents(
>> >> +	struct xfs_inode	*ip)
>> >> +{
>> >> +	if (XFS_IS_REALTIME_INODE(ip))
>> >> +		return 2;
>> >> +	return 64;
>> >> +}
>> >>
>> >>  STATIC int xfs_iunlink(struct xfs_trans *, struct xfs_inode *);
>> >>  STATIC int xfs_iunlink_remove(struct xfs_trans *, struct xfs_inode *);
>> >> @@ -1402,7 +1411,7 @@ xfs_itruncate_extents_flags(
>> >>  	while (unmap_len > 0) {
>> >>  		ASSERT(tp->t_firstblock == NULLFSBLOCK);
>> >>  		error = __xfs_bunmapi(tp, ip, first_unmap_block, &unmap_len,
>> >> -				flags, XFS_ITRUNC_MAX_EXTENTS);
>> >> +				flags, xfs_itrunc_max_extents(ip));
>> >>  		if (error)
>> >>  			goto out;
>> >
>> > The list of free extent items at xfs_defer_pending->dfp_work could
>> > now contain XFS_EFI_MAX_FAST_EXTENTS (i.e. 16) entries in the worst case.
>
> Yes, but we do exactly this when freeing a large fragmented directly
> block. That is, we ask xfs_bunmapi to unmap a 64kB range regardless
> of how many extents map that range.
>
> IOWs, the limitation in extent count placed in
> xfs_itruncate_extents() doesn't actually address the underlying
> problem - it's just a band-aid that has been placed over the easy to
> trigger transaction overrun symptom that has always been present in
> the underlying extent freeing code.
>
>> > For a single transaction, xfs_calc_itruncate_reservation() reserves space for
>> > logging only 4 extents (i.e. 4 exts * 2 trees * (2 * max depth - 1) * block
>> > size).
>>
>> ... Sorry, I meant to say "xfs_calc_itruncate_reservation() reserves log space
>> required for freeing 4 extents ..."
>
> My point exactly - the code has always had a mismatch between
> reservations and what we can stuff into an EFI.  EFIs are unbound in
> size, so having a fixed "4 extents per EFI" reservation limit has
> never made any real sense given that unmaping a 64kB directory block
> on a 1kb block size filesystem has a worst case of freeing 64
> extents in a single transaction. As I said above, the limitations
> placed on xfs_itruncate_extents is largely a hack because it doesn't
> address other avenues to the same overruns...
>

Ok. Thanks for the explanation.

> This "4 extents per transaction" reservation makes even less sense
> now that extents are freed by defer ops, not by the transaction that
> unmaps them. We've completely decoupled extent freeing from the
> higher level code that unmaps them, and so now the freeing
> transactions are independent of the high level code that runs the
> freeing operations.
>
> IOWs, having a reservation big enough to free a single extent is all
> we should need now as we can break the extent freeing up into
> individual transactions and still have them complete atomically even
> after a crash. That's where I'm trying to get to here, but it's
> clear that I need to refine the code a bit further such that we only
> allow individual EFIs to queue up and free multiple extents within
> an AG to be freed in the same transaction....

Ok. That should limit the number of CNTBT/BNOBT blocks being logged by a
transaction which is freeing extents.

>
> FWIW, I suspect that the right thing to do here is make use of the
> xfs_defer_finish_one() mechanism for relogging the remaining intents
> in work list while it is processing that list. All we need is for
> xfs_extent_free_finish_item() to return -EAGAIN instead of running
> the free transaction, and we will log the completed EFDs and the
> remaining EFIs to be run and roll the transaction.
>
> Hence we can control exactly when we roll the extent freeing
> transaction (e.g. when the next extent to free is in a different AG
> to the one we've just been freeing extents in) and as a result we
> can define a fixed transaction reservation for extent freeing that
> works in every situation without having to care about arbitrary
> "how many extents can we unmap without overrun" concerns.
>

That sounds perfect.

> That's the goal here - fixed, small reservation for extent freeing
> that is decoupled from and independent of the number of extents that
> need to be freed atomically by the high level operation...
>
> Hence I think that a minor amount of rework will allow the EFI code
> to log large numbers of extents to be freed whilst still processing
> them within single AG free space tree modification reservation
> bounds...


--
chandan

next prev parent reply	other threads:[~2021-06-01  6:42 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-27  4:51 [PATCH 0/6] xfs: bunmapi needs updating for deferred freeing Dave Chinner
2021-05-27  4:51 ` [PATCH 1/6] xfs: btree format inode forks can have zero extents Dave Chinner
2021-05-27  6:15   ` Darrick J. Wong
2021-05-27  4:51 ` [PATCH 2/6] xfs: bunmapi has unnecessary AG lock ordering issues Dave Chinner
2021-05-27  6:16   ` Darrick J. Wong
2021-05-27  4:51 ` [PATCH 3/6] xfs: xfs_itruncate_extents has no extent count limitation Dave Chinner
2021-05-31 12:55   ` Chandan Babu R
2021-05-31 13:05     ` Chandan Babu R
2021-05-31 23:28       ` Dave Chinner
2021-06-01  6:42         ` Chandan Babu R [this message]
2021-05-27  4:52 ` [PATCH 4/6] xfs: add a free space extent change reservation Dave Chinner
2021-05-27  6:38   ` kernel test robot
2021-05-27  6:38     ` kernel test robot
2021-05-27  6:38   ` kernel test robot
2021-05-27  6:38     ` kernel test robot
2021-05-27  7:03   ` kernel test robot
2021-05-27  7:03     ` kernel test robot
2021-05-27  7:03   ` [RFC PATCH] xfs: xfs_allocfree_extent_res can be static kernel test robot
2021-05-27  7:03     ` kernel test robot
2021-06-02 21:37   ` [PATCH 4/6] xfs: add a free space extent change reservation Darrick J. Wong
2021-05-27  4:52 ` [PATCH 5/6] xfs: factor free space tree transaciton reservations Dave Chinner
2021-06-02 21:36   ` Darrick J. Wong
2021-05-27  4:52 ` [PATCH 6/6] xfs: reduce transaction reservation for freeing extents Dave Chinner
2021-05-27  6:19   ` Darrick J. Wong
2021-05-27  8:52     ` Dave Chinner
2021-05-28  0:01       ` Darrick J. Wong
2021-05-28  2:30         ` Dave Chinner
2021-05-28  5:30           ` Darrick J. Wong
2021-05-31 10:02 ` [PATCH 0/6] xfs: bunmapi needs updating for deferred freeing Chandan Babu R
2021-05-31 22:41   ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=871r9mytk3.fsf@garuda \
    --to=chandanrlinux@gmail.com \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.