From: Chandan Babu R <chandanrlinux@gmail.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 3/6] xfs: xfs_itruncate_extents has no extent count limitation
Date: Tue, 01 Jun 2021 12:12:36 +0530 [thread overview]
Message-ID: <871r9mytk3.fsf@garuda> (raw)
In-Reply-To: <20210531232820.GT664593@dread.disaster.area>
On 01 Jun 2021 at 04:58, Dave Chinner wrote:
> On Mon, May 31, 2021 at 06:35:40PM +0530, Chandan Babu R wrote:
>> On 31 May 2021 at 18:25, Chandan Babu R wrote:
>> > On 27 May 2021 at 10:21, Dave Chinner wrote:
>> >> From: Dave Chinner <dchinner@redhat.com>
>> >>
>> >> Ever since we moved to freeing of extents by deferred operations,
>> >> we've already freed extents via individual transactions. Hence the
>> >> only limitation of how many extents we can mark for freeing in a
>> >> single xfs_bunmapi() call bound only by how many deferrals we want
>> >> to queue.
>> >>
>> >> That is xfs_bunmapi() doesn't actually do any AG based extent
>> >> freeing, so there's no actually transaction reservation used up by
>> >> calling bunmapi with a large count of extents to be freed. RT
>> >> extents have always been freed directly by bunmapi, but that doesn't
>> >> require modification of large number of blocks as there are no
>> >> btrees to split.
>> >>
>> >> Some callers of xfs_bunmapi assume that the extent count being freed
>> >> is bound by geometry (e.g. directories) and these can ask bunmapi to
>> >> free up to 64 extents in a single call. These functions just work as
>> >> tehy stand, so there's no reason for truncate to have a limit of
>> >> just two extents per bunmapi call anymore.
>> >>
>> >> Increase XFS_ITRUNC_MAX_EXTENTS to 64 to match the number of extents
>> >> that can be deferred in a single loop to match what the directory
>> >> code already uses.
>> >>
>> >> For realtime inodes, where xfs_bunmapi() directly frees extents,
>> >> leave the limit at 2 extents per loop as this is all the space that
>> >> the transaction reservation will cover.
>> >>
>> >> Signed-off-by: Dave Chinner <dchinner@redhat.com>
>> >> ---
>> >> fs/xfs/xfs_inode.c | 15 ++++++++++++---
>> >> 1 file changed, 12 insertions(+), 3 deletions(-)
>> >>
>> >> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
>> >> index 0369eb22c1bb..db220eaa34b8 100644
>> >> --- a/fs/xfs/xfs_inode.c
>> >> +++ b/fs/xfs/xfs_inode.c
>> >> @@ -40,9 +40,18 @@ kmem_zone_t *xfs_inode_zone;
>> >>
>> >> /*
>> >> * Used in xfs_itruncate_extents(). This is the maximum number of extents
>> >> - * freed from a file in a single transaction.
>> >> + * we will unmap and defer for freeing in a single call to xfs_bunmapi().
>> >> + * Realtime inodes directly free extents in xfs_bunmapi(), so are bound
>> >> + * by transaction reservation size to 2 extents.
>> >> */
>> >> -#define XFS_ITRUNC_MAX_EXTENTS 2
>> >> +static inline int
>> >> +xfs_itrunc_max_extents(
>> >> + struct xfs_inode *ip)
>> >> +{
>> >> + if (XFS_IS_REALTIME_INODE(ip))
>> >> + return 2;
>> >> + return 64;
>> >> +}
>> >>
>> >> STATIC int xfs_iunlink(struct xfs_trans *, struct xfs_inode *);
>> >> STATIC int xfs_iunlink_remove(struct xfs_trans *, struct xfs_inode *);
>> >> @@ -1402,7 +1411,7 @@ xfs_itruncate_extents_flags(
>> >> while (unmap_len > 0) {
>> >> ASSERT(tp->t_firstblock == NULLFSBLOCK);
>> >> error = __xfs_bunmapi(tp, ip, first_unmap_block, &unmap_len,
>> >> - flags, XFS_ITRUNC_MAX_EXTENTS);
>> >> + flags, xfs_itrunc_max_extents(ip));
>> >> if (error)
>> >> goto out;
>> >
>> > The list of free extent items at xfs_defer_pending->dfp_work could
>> > now contain XFS_EFI_MAX_FAST_EXTENTS (i.e. 16) entries in the worst case.
>
> Yes, but we do exactly this when freeing a large fragmented directly
> block. That is, we ask xfs_bunmapi to unmap a 64kB range regardless
> of how many extents map that range.
>
> IOWs, the limitation in extent count placed in
> xfs_itruncate_extents() doesn't actually address the underlying
> problem - it's just a band-aid that has been placed over the easy to
> trigger transaction overrun symptom that has always been present in
> the underlying extent freeing code.
>
>> > For a single transaction, xfs_calc_itruncate_reservation() reserves space for
>> > logging only 4 extents (i.e. 4 exts * 2 trees * (2 * max depth - 1) * block
>> > size).
>>
>> ... Sorry, I meant to say "xfs_calc_itruncate_reservation() reserves log space
>> required for freeing 4 extents ..."
>
> My point exactly - the code has always had a mismatch between
> reservations and what we can stuff into an EFI. EFIs are unbound in
> size, so having a fixed "4 extents per EFI" reservation limit has
> never made any real sense given that unmaping a 64kB directory block
> on a 1kb block size filesystem has a worst case of freeing 64
> extents in a single transaction. As I said above, the limitations
> placed on xfs_itruncate_extents is largely a hack because it doesn't
> address other avenues to the same overruns...
>
Ok. Thanks for the explanation.
> This "4 extents per transaction" reservation makes even less sense
> now that extents are freed by defer ops, not by the transaction that
> unmaps them. We've completely decoupled extent freeing from the
> higher level code that unmaps them, and so now the freeing
> transactions are independent of the high level code that runs the
> freeing operations.
>
> IOWs, having a reservation big enough to free a single extent is all
> we should need now as we can break the extent freeing up into
> individual transactions and still have them complete atomically even
> after a crash. That's where I'm trying to get to here, but it's
> clear that I need to refine the code a bit further such that we only
> allow individual EFIs to queue up and free multiple extents within
> an AG to be freed in the same transaction....
Ok. That should limit the number of CNTBT/BNOBT blocks being logged by a
transaction which is freeing extents.
>
> FWIW, I suspect that the right thing to do here is make use of the
> xfs_defer_finish_one() mechanism for relogging the remaining intents
> in work list while it is processing that list. All we need is for
> xfs_extent_free_finish_item() to return -EAGAIN instead of running
> the free transaction, and we will log the completed EFDs and the
> remaining EFIs to be run and roll the transaction.
>
> Hence we can control exactly when we roll the extent freeing
> transaction (e.g. when the next extent to free is in a different AG
> to the one we've just been freeing extents in) and as a result we
> can define a fixed transaction reservation for extent freeing that
> works in every situation without having to care about arbitrary
> "how many extents can we unmap without overrun" concerns.
>
That sounds perfect.
> That's the goal here - fixed, small reservation for extent freeing
> that is decoupled from and independent of the number of extents that
> need to be freed atomically by the high level operation...
>
> Hence I think that a minor amount of rework will allow the EFI code
> to log large numbers of extents to be freed whilst still processing
> them within single AG free space tree modification reservation
> bounds...
--
chandan
next prev parent reply other threads:[~2021-06-01 6:42 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-05-27 4:51 [PATCH 0/6] xfs: bunmapi needs updating for deferred freeing Dave Chinner
2021-05-27 4:51 ` [PATCH 1/6] xfs: btree format inode forks can have zero extents Dave Chinner
2021-05-27 6:15 ` Darrick J. Wong
2021-05-27 4:51 ` [PATCH 2/6] xfs: bunmapi has unnecessary AG lock ordering issues Dave Chinner
2021-05-27 6:16 ` Darrick J. Wong
2021-05-27 4:51 ` [PATCH 3/6] xfs: xfs_itruncate_extents has no extent count limitation Dave Chinner
2021-05-31 12:55 ` Chandan Babu R
2021-05-31 13:05 ` Chandan Babu R
2021-05-31 23:28 ` Dave Chinner
2021-06-01 6:42 ` Chandan Babu R [this message]
2021-05-27 4:52 ` [PATCH 4/6] xfs: add a free space extent change reservation Dave Chinner
2021-05-27 6:38 ` kernel test robot
2021-05-27 6:38 ` kernel test robot
2021-05-27 7:03 ` kernel test robot
2021-05-27 7:03 ` [RFC PATCH] xfs: xfs_allocfree_extent_res can be static kernel test robot
2021-06-02 21:37 ` [PATCH 4/6] xfs: add a free space extent change reservation Darrick J. Wong
2021-05-27 4:52 ` [PATCH 5/6] xfs: factor free space tree transaciton reservations Dave Chinner
2021-06-02 21:36 ` Darrick J. Wong
2021-05-27 4:52 ` [PATCH 6/6] xfs: reduce transaction reservation for freeing extents Dave Chinner
2021-05-27 6:19 ` Darrick J. Wong
2021-05-27 8:52 ` Dave Chinner
2021-05-28 0:01 ` Darrick J. Wong
2021-05-28 2:30 ` Dave Chinner
2021-05-28 5:30 ` Darrick J. Wong
2021-05-31 10:02 ` [PATCH 0/6] xfs: bunmapi needs updating for deferred freeing Chandan Babu R
2021-05-31 22:41 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=871r9mytk3.fsf@garuda \
--to=chandanrlinux@gmail.com \
--cc=david@fromorbit.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox