From: Chandan Babu R <chandanrlinux@gmail.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 0/6] xfs: bunmapi needs updating for deferred freeing
Date: Mon, 31 May 2021 15:32:05 +0530 [thread overview]
Message-ID: <87fsy3uspu.fsf@garuda> (raw)
In-Reply-To: <20210527045202.1155628-1-david@fromorbit.com>
On 27 May 2021 at 10:21, Dave Chinner wrote:
> Hi folks,
>
> I pulled on a loose thread when I started looking into the 64kB
> directory block size assert failure I was seeing while trying to
> test the bulk page allocation changes.
>
> I posted the first patch in the series separately - it fixed the
> immediate assert failure (5.13-rc1 regression) I was seeing, but in
> fixing that it only then dropped back to the previous assert failure
> that g/538 was triggering with 64kb directory block sizes. This can
> only be reproduced on 5.12, because that's when the error injection
> that g/538 uses was added. So I went looking deeper.
>
> It turns out that xfs_bunmapi() has some code in it to avoid locking
> AGFs in the wrong order and this is what was triggering. Many of the
> xfs_bunmapi() callers can not/do not handle partial unmaps that
> return success, and that's what the directory code is tripping over
> trying to free badly fragmented directory blocks.
>
> This AGF locking order constraint was added to xfs_bunmapu in 2017
> to avoid a deadlock in g/299. Sad thing is that shortly after this,
> we converted xfs-bunmapi to use deferred freeing, so it never
> actually locks AGFs anymore. But the deadlock avoiding landmine
> remained. And xfs_bmap_finish() went away, too, and we now only ever
> put one extent in any EFI we log for deferred freeing.
I did come across a scenario (when executing xfs/538 with 1k fs block size and
64k directory block size) where an EFI item contained three extents:
- Two of those extents belonged to the file whose extents were being freed.
- One more extent was added by xfs_bmap_btree_to_extents().
The corresponding call trace was,
CPU: 3 PID: 1367 Comm: fsstress Not tainted 5.12.0-rc8-next-20210419-chandan #125
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
Call Trace:
dump_stack+0x64/0x7c
xfs_defer_add.cold+0x1d/0x22
xfs_bmap_btree_to_extents+0x1f6/0x470
__xfs_bunmapi+0x50a/0xe60
? xfs_trans_alloc_inode+0xbb/0x180
xfs_bunmapi+0x15/0x30
xfs_free_file_space+0x241/0x2c0
xfs_file_fallocate+0x1ca/0x430
? __cond_resched+0x16/0x40
? inode_security+0x22/0x60
? selinux_file_permission+0xe2/0x120
vfs_fallocate+0x146/0x2e0
ioctl_preallocate+0x8f/0xc0
__x64_sys_ioctl+0x62/0xb0
do_syscall_64+0x40/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae
>
> That means we now only free one extent per transaction via deferred
> freeing,
With three instances of xfs_extent_free_items associated with one instance of
xfs_defer_pending, xfs_defer_finish_noroll() would,
1. Create an EFI item containing information about the three extents to be
freed.
- The extents in xfs_defer_pending->dfp_work list are sorted based on AG
number.
2. Roll the transaction.
3. The new transaction would,
- Create an EFD item to hold information about the three extents to be
freed.
- Free the three extents in a single transaction.
> and there are no limitations on what order xfs_bunmapi()
> can unmap extents.
I think the sorting of extent items mentioned above is the reason that AG
locks are obtained in increasing AGNO order while freeing extents.
> 64kB directories on a 1kB block size filesystem
> already unmap 64 extents in a single loop, so there's no real
> limitation here.
I think, in the worst case, we can free atmost XFS_EFI_MAX_FAST_EXTENTS
(i.e. 16) extents in a single transaction assuming that they were all added
in a sequence without any non-XFS_DEFER_OPS_TYPE_FREE deferred objects
added in between.
>
> This means that the limitations of how many extents we can unmap per
> loop in xfs_itruncate_extents_flags() goes away for data device
> extents (and will eventually go away for RT devices, too, when
> Darrick's RT EFI stuff gets merged).
>
> This "one data deveice extent free per transaction" change now means
> that all of the transaction reservations that include
> "xfs_bmap_finish" based freeing reservations are wrong. These extent
> frees are now done by deferred freeing, and so they only need a
> single extent free reservation instead of up to 4 (as truncate was
> reserving).
>
> This series fixes the btree fork regression, the bunmapi partial
> unmap regression from 2017, extends xfs_itruncate_extents to unmap
> 64 extents at a time for data device (AG) resident extents, and
> reworks the transaction reservations to use a consistent and correct
> reservation for allocation and freeing extents. The size of some
> transaction reservations drops dramatically as a result.
>
> The first two patches are -rcX candidates, the rest are for the next
> merge cycle....
>
--
chandan
next prev parent reply other threads:[~2021-05-31 10:02 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-05-27 4:51 [PATCH 0/6] xfs: bunmapi needs updating for deferred freeing Dave Chinner
2021-05-27 4:51 ` [PATCH 1/6] xfs: btree format inode forks can have zero extents Dave Chinner
2021-05-27 6:15 ` Darrick J. Wong
2021-05-27 4:51 ` [PATCH 2/6] xfs: bunmapi has unnecessary AG lock ordering issues Dave Chinner
2021-05-27 6:16 ` Darrick J. Wong
2021-05-27 4:51 ` [PATCH 3/6] xfs: xfs_itruncate_extents has no extent count limitation Dave Chinner
2021-05-31 12:55 ` Chandan Babu R
2021-05-31 13:05 ` Chandan Babu R
2021-05-31 23:28 ` Dave Chinner
2021-06-01 6:42 ` Chandan Babu R
2021-05-27 4:52 ` [PATCH 4/6] xfs: add a free space extent change reservation Dave Chinner
2021-05-27 6:38 ` kernel test robot
2021-05-27 6:38 ` kernel test robot
2021-05-27 7:03 ` kernel test robot
2021-05-27 7:03 ` [RFC PATCH] xfs: xfs_allocfree_extent_res can be static kernel test robot
2021-06-02 21:37 ` [PATCH 4/6] xfs: add a free space extent change reservation Darrick J. Wong
2021-05-27 4:52 ` [PATCH 5/6] xfs: factor free space tree transaciton reservations Dave Chinner
2021-06-02 21:36 ` Darrick J. Wong
2021-05-27 4:52 ` [PATCH 6/6] xfs: reduce transaction reservation for freeing extents Dave Chinner
2021-05-27 6:19 ` Darrick J. Wong
2021-05-27 8:52 ` Dave Chinner
2021-05-28 0:01 ` Darrick J. Wong
2021-05-28 2:30 ` Dave Chinner
2021-05-28 5:30 ` Darrick J. Wong
2021-05-31 10:02 ` Chandan Babu R [this message]
2021-05-31 22:41 ` [PATCH 0/6] xfs: bunmapi needs updating for deferred freeing Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87fsy3uspu.fsf@garuda \
--to=chandanrlinux@gmail.com \
--cc=david@fromorbit.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox