From: Dave Chinner <david@fromorbit.com>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: Brian Foster <bfoster@redhat.com>, linux-xfs@vger.kernel.org
Subject: Re: [PATCH 2/6] xfs: actually set aside enough space to handle a bmbt split
Date: Thu, 24 Mar 2022 07:48:56 +1100 [thread overview]
Message-ID: <20220323204856.GV1544202@dread.disaster.area> (raw)
In-Reply-To: <164779461835.550479.15316047141170352189.stgit@magnolia>
On Sun, Mar 20, 2022 at 09:43:38AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
>
> The comment for xfs_alloc_set_aside indicates that we want to set aside
> enough space to handle a bmap btree split. The code, unfortunately,
> hardcodes this to 4.
>
> This is incorrect, since file bmap btrees can be taller than that:
>
> xfs_db> btheight bmapbt -n 4294967295 -b 512
> bmapbt: worst case per 512-byte block: 13 records (leaf) / 13 keyptrs (node)
> level 0: 4294967295 records, 330382100 blocks
> level 1: 330382100 records, 25414008 blocks
> level 2: 25414008 records, 1954924 blocks
> level 3: 1954924 records, 150379 blocks
> level 4: 150379 records, 11568 blocks
> level 5: 11568 records, 890 blocks
> level 6: 890 records, 69 blocks
> level 7: 69 records, 6 blocks
> level 8: 6 records, 1 block
> 9 levels, 357913945 blocks total
>
> Or, for V5 filesystems:
>
> xfs_db> btheight bmapbt -n 4294967295 -b 1024
> bmapbt: worst case per 1024-byte block: 29 records (leaf) / 29 keyptrs (node)
> level 0: 4294967295 records, 148102321 blocks
> level 1: 148102321 records, 5106977 blocks
> level 2: 5106977 records, 176103 blocks
> level 3: 176103 records, 6073 blocks
> level 4: 6073 records, 210 blocks
> level 5: 210 records, 8 blocks
> level 6: 8 records, 1 block
> 7 levels, 153391693 blocks total
>
> Fix this by using the actual bmap btree maxlevel value for the
> set-aside. We subtract one because the root is always in the inode and
> hence never splits.
>
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> Reviewed-by: Brian Foster <bfoster@redhat.com>
> ---
> fs/xfs/libxfs/xfs_alloc.c | 7 +++++--
> fs/xfs/libxfs/xfs_sb.c | 2 --
> fs/xfs/xfs_mount.c | 7 +++++++
> 3 files changed, 12 insertions(+), 4 deletions(-)
>
>
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index b0678e96ce61..747b3e45303f 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -107,13 +107,16 @@ xfs_prealloc_blocks(
> * aside a few blocks which will not be reserved in delayed allocation.
> *
> * For each AG, we need to reserve enough blocks to replenish a totally empty
> - * AGFL and 4 more to handle a potential split of the file's bmap btree.
> + * AGFL and enough to handle a potential split of a file's bmap btree.
> */
> unsigned int
> xfs_alloc_set_aside(
> struct xfs_mount *mp)
> {
> - return mp->m_sb.sb_agcount * (XFS_ALLOCBT_AGFL_RESERVE + 4);
> + unsigned int bmbt_splits;
> +
> + bmbt_splits = max(mp->m_bm_maxlevels[0], mp->m_bm_maxlevels[1]) - 1;
> + return mp->m_sb.sb_agcount * (XFS_ALLOCBT_AGFL_RESERVE + bmbt_splits);
> }
So right now I'm trying to understand why this global space set
aside ever needed to take into account the space used by a single
BMBT split. ISTR it was done back in 2006 because the ag selection
code, alloc args and/or xfs_alloc_space_available() didn't take into
account the BMBT space via args->minleft correctly to ensure that
the AGF we select had enough space in it for both the data extent
and the followup BMBT split. Hence the original SET ASIDE (which
wasn't per AG) was just 8 blocks - 4 for the AGFL, 4 for the BMBT.
The transaction reservation takes into account the space needed by
BMBT splits so we don't over-commit global space on user allocation
anymore, args->minleft takes it into account so we don't overcommit
AG space on extent allocation, and we have the reserved block pool
to handle situations were delalloc extents are fragmented more than
the initial indirect block reservation that is taken with the
delalloc extent reservation.
So where/why is this needed anymore?
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2022-03-23 20:49 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-20 16:43 [PATCHSET v3 0/6] xfs: fix incorrect reserve pool calculations and reporting Darrick J. Wong
2022-03-20 16:43 ` [PATCH 1/6] xfs: document the XFS_ALLOC_AGFL_RESERVE constant Darrick J. Wong
2022-03-23 20:39 ` Dave Chinner
2022-03-24 5:15 ` Darrick J. Wong
2022-03-24 5:58 ` Dave Chinner
2022-03-20 16:43 ` [PATCH 2/6] xfs: actually set aside enough space to handle a bmbt split Darrick J. Wong
2022-03-23 20:48 ` Dave Chinner [this message]
2022-03-24 5:26 ` Darrick J. Wong
2022-03-24 6:00 ` Dave Chinner
2022-03-20 16:43 ` [PATCH 3/6] xfs: don't include bnobt blocks when reserving free block pool Darrick J. Wong
2022-03-21 15:22 ` Brian Foster
2022-03-21 20:42 ` Darrick J. Wong
2022-03-23 20:51 ` Dave Chinner
2022-03-20 16:43 ` [PATCH 4/6] xfs: fix infinite loop " Darrick J. Wong
2022-03-23 21:11 ` Dave Chinner
2022-03-24 5:24 ` Darrick J. Wong
2022-03-24 6:21 ` Dave Chinner
2022-03-20 16:43 ` [PATCH 5/6] xfs: don't report reserved bnobt space as available Darrick J. Wong
2022-03-21 15:22 ` Brian Foster
2022-03-21 20:48 ` Darrick J. Wong
2022-03-23 21:12 ` Dave Chinner
2022-03-20 16:44 ` [PATCH 6/6] xfs: rename "alloc_set_aside" to be more descriptive Darrick J. Wong
2022-03-23 21:21 ` Dave Chinner
-- strict thread matches above, loose matches on Subject: below --
2022-03-17 21:20 [PATCHSET v2 0/6] xfs: fix incorrect reserve pool calculations and reporting Darrick J. Wong
2022-03-17 21:21 ` [PATCH 2/6] xfs: actually set aside enough space to handle a bmbt split Darrick J. Wong
2022-03-18 12:17 ` Brian Foster
2022-03-18 20:52 ` Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220323204856.GV1544202@dread.disaster.area \
--to=david@fromorbit.com \
--cc=bfoster@redhat.com \
--cc=djwong@kernel.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox