All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chandan Babu R <chandanrlinux@gmail.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 0/6] xfs: bunmapi needs updating for deferred freeing
Date: Mon, 31 May 2021 15:32:05 +0530	[thread overview]
Message-ID: <87fsy3uspu.fsf@garuda> (raw)
In-Reply-To: <20210527045202.1155628-1-david@fromorbit.com>

On 27 May 2021 at 10:21, Dave Chinner wrote:
> Hi folks,
>
> I pulled on a loose thread when I started looking into the 64kB
> directory block size assert failure I was seeing while trying to
> test the bulk page allocation changes.
>
> I posted the first patch in the series separately - it fixed the
> immediate assert failure (5.13-rc1 regression) I was seeing, but in
> fixing that it only then dropped back to the previous assert failure
> that g/538 was triggering with 64kb directory block sizes. This can
> only be reproduced on 5.12, because that's when the error injection
> that g/538 uses was added. So I went looking deeper.
>
> It turns out that xfs_bunmapi() has some code in it to avoid locking
> AGFs in the wrong order and this is what was triggering. Many of the
> xfs_bunmapi() callers can not/do not handle partial unmaps that
> return success, and that's what the directory code is tripping over
> trying to free badly fragmented directory blocks.
>
> This AGF locking order constraint was added to xfs_bunmapu in 2017
> to avoid a deadlock in g/299. Sad thing is that shortly after this,
> we converted xfs-bunmapi to use deferred freeing, so it never
> actually locks AGFs anymore. But the deadlock avoiding landmine
> remained. And xfs_bmap_finish() went away, too, and we now only ever
> put one extent in any EFI we log for deferred freeing.

I did come across a scenario (when executing xfs/538 with 1k fs block size and
64k directory block size) where an EFI item contained three extents:

- Two of those extents belonged to the file whose extents were being freed.
- One more extent was added by xfs_bmap_btree_to_extents().
  The corresponding call trace was,
    CPU: 3 PID: 1367 Comm: fsstress Not tainted 5.12.0-rc8-next-20210419-chandan #125
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
    Call Trace:
     dump_stack+0x64/0x7c
     xfs_defer_add.cold+0x1d/0x22
     xfs_bmap_btree_to_extents+0x1f6/0x470
     __xfs_bunmapi+0x50a/0xe60
     ? xfs_trans_alloc_inode+0xbb/0x180
     xfs_bunmapi+0x15/0x30
     xfs_free_file_space+0x241/0x2c0
     xfs_file_fallocate+0x1ca/0x430
     ? __cond_resched+0x16/0x40
     ? inode_security+0x22/0x60
     ? selinux_file_permission+0xe2/0x120
     vfs_fallocate+0x146/0x2e0
     ioctl_preallocate+0x8f/0xc0
     __x64_sys_ioctl+0x62/0xb0
     do_syscall_64+0x40/0x80
     entry_SYSCALL_64_after_hwframe+0x44/0xae

>
> That means we now only free one extent per transaction via deferred
> freeing,

With three instances of xfs_extent_free_items associated with one instance of
xfs_defer_pending, xfs_defer_finish_noroll() would,
1. Create an EFI item containing information about the three extents to be
   freed.
   - The extents in xfs_defer_pending->dfp_work list are sorted based on AG
     number.
2. Roll the transaction.
3. The new transaction would,
   - Create an EFD item to hold information about the three extents to be
     freed.
   - Free the three extents in a single transaction.

> and there are no limitations on what order xfs_bunmapi()
> can unmap extents.

I think the sorting of extent items mentioned above is the reason that AG
locks are obtained in increasing AGNO order while freeing extents.

> 64kB directories on a 1kB block size filesystem
> already unmap 64 extents in a single loop, so there's no real
> limitation here.

I think, in the worst case, we can free atmost XFS_EFI_MAX_FAST_EXTENTS
(i.e. 16) extents in a single transaction assuming that they were all added
in a sequence without any non-XFS_DEFER_OPS_TYPE_FREE deferred objects
added in between.

>
> This means that the limitations of how many extents we can unmap per
> loop in xfs_itruncate_extents_flags() goes away for data device
> extents (and will eventually go away for RT devices, too, when
> Darrick's RT EFI stuff gets merged).
>
> This "one data deveice extent free per transaction" change now means
> that all of the transaction reservations that include
> "xfs_bmap_finish" based freeing reservations are wrong. These extent
> frees are now done by deferred freeing, and so they only need a
> single extent free reservation instead of up to 4 (as truncate was
> reserving).
>
> This series fixes the btree fork regression, the bunmapi partial
> unmap regression from 2017, extends xfs_itruncate_extents to unmap
> 64 extents at a time for data device (AG) resident extents, and
> reworks the transaction reservations to use a consistent and correct
> reservation for allocation and freeing extents. The size of some
> transaction reservations drops dramatically as a result.
>
> The first two patches are -rcX candidates, the rest are for the next
> merge cycle....
>

--
chandan

  parent reply	other threads:[~2021-05-31 10:02 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-27  4:51 [PATCH 0/6] xfs: bunmapi needs updating for deferred freeing Dave Chinner
2021-05-27  4:51 ` [PATCH 1/6] xfs: btree format inode forks can have zero extents Dave Chinner
2021-05-27  6:15   ` Darrick J. Wong
2021-05-27  4:51 ` [PATCH 2/6] xfs: bunmapi has unnecessary AG lock ordering issues Dave Chinner
2021-05-27  6:16   ` Darrick J. Wong
2021-05-27  4:51 ` [PATCH 3/6] xfs: xfs_itruncate_extents has no extent count limitation Dave Chinner
2021-05-31 12:55   ` Chandan Babu R
2021-05-31 13:05     ` Chandan Babu R
2021-05-31 23:28       ` Dave Chinner
2021-06-01  6:42         ` Chandan Babu R
2021-05-27  4:52 ` [PATCH 4/6] xfs: add a free space extent change reservation Dave Chinner
2021-05-27  6:38   ` kernel test robot
2021-05-27  6:38     ` kernel test robot
2021-05-27  6:38   ` kernel test robot
2021-05-27  6:38     ` kernel test robot
2021-05-27  7:03   ` kernel test robot
2021-05-27  7:03     ` kernel test robot
2021-05-27  7:03   ` [RFC PATCH] xfs: xfs_allocfree_extent_res can be static kernel test robot
2021-05-27  7:03     ` kernel test robot
2021-06-02 21:37   ` [PATCH 4/6] xfs: add a free space extent change reservation Darrick J. Wong
2021-05-27  4:52 ` [PATCH 5/6] xfs: factor free space tree transaciton reservations Dave Chinner
2021-06-02 21:36   ` Darrick J. Wong
2021-05-27  4:52 ` [PATCH 6/6] xfs: reduce transaction reservation for freeing extents Dave Chinner
2021-05-27  6:19   ` Darrick J. Wong
2021-05-27  8:52     ` Dave Chinner
2021-05-28  0:01       ` Darrick J. Wong
2021-05-28  2:30         ` Dave Chinner
2021-05-28  5:30           ` Darrick J. Wong
2021-05-31 10:02 ` Chandan Babu R [this message]
2021-05-31 22:41   ` [PATCH 0/6] xfs: bunmapi needs updating for deferred freeing Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87fsy3uspu.fsf@garuda \
    --to=chandanrlinux@gmail.com \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.