From: Dave Chinner <david@fromorbit.com>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: Brian Foster <bfoster@redhat.com>, linux-xfs@vger.kernel.org
Subject: Re: [PATCH 2/6] xfs: actually set aside enough space to handle a bmbt split
Date: Thu, 24 Mar 2022 17:00:26 +1100 [thread overview]
Message-ID: <20220324060026.GC1544202@dread.disaster.area> (raw)
In-Reply-To: <20220324052614.GT8241@magnolia>
On Wed, Mar 23, 2022 at 10:26:14PM -0700, Darrick J. Wong wrote:
> On Thu, Mar 24, 2022 at 07:48:56AM +1100, Dave Chinner wrote:
> > On Sun, Mar 20, 2022 at 09:43:38AM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <djwong@kernel.org>
> > >
> > > The comment for xfs_alloc_set_aside indicates that we want to set aside
> > > enough space to handle a bmap btree split. The code, unfortunately,
> > > hardcodes this to 4.
> > >
> > > This is incorrect, since file bmap btrees can be taller than that:
> > >
> > > xfs_db> btheight bmapbt -n 4294967295 -b 512
> > > bmapbt: worst case per 512-byte block: 13 records (leaf) / 13 keyptrs (node)
> > > level 0: 4294967295 records, 330382100 blocks
> > > level 1: 330382100 records, 25414008 blocks
> > > level 2: 25414008 records, 1954924 blocks
> > > level 3: 1954924 records, 150379 blocks
> > > level 4: 150379 records, 11568 blocks
> > > level 5: 11568 records, 890 blocks
> > > level 6: 890 records, 69 blocks
> > > level 7: 69 records, 6 blocks
> > > level 8: 6 records, 1 block
> > > 9 levels, 357913945 blocks total
> > >
> > > Or, for V5 filesystems:
> > >
> > > xfs_db> btheight bmapbt -n 4294967295 -b 1024
> > > bmapbt: worst case per 1024-byte block: 29 records (leaf) / 29 keyptrs (node)
> > > level 0: 4294967295 records, 148102321 blocks
> > > level 1: 148102321 records, 5106977 blocks
> > > level 2: 5106977 records, 176103 blocks
> > > level 3: 176103 records, 6073 blocks
> > > level 4: 6073 records, 210 blocks
> > > level 5: 210 records, 8 blocks
> > > level 6: 8 records, 1 block
> > > 7 levels, 153391693 blocks total
> > >
> > > Fix this by using the actual bmap btree maxlevel value for the
> > > set-aside. We subtract one because the root is always in the inode and
> > > hence never splits.
> > >
> > > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > > Reviewed-by: Brian Foster <bfoster@redhat.com>
> > > ---
> > > fs/xfs/libxfs/xfs_alloc.c | 7 +++++--
> > > fs/xfs/libxfs/xfs_sb.c | 2 --
> > > fs/xfs/xfs_mount.c | 7 +++++++
> > > 3 files changed, 12 insertions(+), 4 deletions(-)
> > >
> > >
> > > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> > > index b0678e96ce61..747b3e45303f 100644
> > > --- a/fs/xfs/libxfs/xfs_alloc.c
> > > +++ b/fs/xfs/libxfs/xfs_alloc.c
> > > @@ -107,13 +107,16 @@ xfs_prealloc_blocks(
> > > * aside a few blocks which will not be reserved in delayed allocation.
> > > *
> > > * For each AG, we need to reserve enough blocks to replenish a totally empty
> > > - * AGFL and 4 more to handle a potential split of the file's bmap btree.
> > > + * AGFL and enough to handle a potential split of a file's bmap btree.
> > > */
> > > unsigned int
> > > xfs_alloc_set_aside(
> > > struct xfs_mount *mp)
> > > {
> > > - return mp->m_sb.sb_agcount * (XFS_ALLOCBT_AGFL_RESERVE + 4);
> > > + unsigned int bmbt_splits;
> > > +
> > > + bmbt_splits = max(mp->m_bm_maxlevels[0], mp->m_bm_maxlevels[1]) - 1;
> > > + return mp->m_sb.sb_agcount * (XFS_ALLOCBT_AGFL_RESERVE + bmbt_splits);
> > > }
> >
> > So right now I'm trying to understand why this global space set
> > aside ever needed to take into account the space used by a single
> > BMBT split. ISTR it was done back in 2006 because the ag selection
> > code, alloc args and/or xfs_alloc_space_available() didn't take into
> > account the BMBT space via args->minleft correctly to ensure that
> > the AGF we select had enough space in it for both the data extent
> > and the followup BMBT split. Hence the original SET ASIDE (which
> > wasn't per AG) was just 8 blocks - 4 for the AGFL, 4 for the BMBT.
> >
> > The transaction reservation takes into account the space needed by
> > BMBT splits so we don't over-commit global space on user allocation
> > anymore, args->minleft takes it into account so we don't overcommit
> > AG space on extent allocation, and we have the reserved block pool
> > to handle situations were delalloc extents are fragmented more than
> > the initial indirect block reservation that is taken with the
> > delalloc extent reservation.
> >
> > So where/why is this needed anymore?
>
> Beats me. I started by reading the comment and thought "Gee, that's not
> true of bmbts!" :(
Yeah. Time and knowledge are showing through here. I've been running
ENOSPC tests with the BMBT reservation part removed now for several
hours and I've haven't seen any issues as a result removing it. I'll
keep it running though to see if anything falls out, but my initial
impression is that we don't obviously need it anymore.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2022-03-24 6:00 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-20 16:43 [PATCHSET v3 0/6] xfs: fix incorrect reserve pool calculations and reporting Darrick J. Wong
2022-03-20 16:43 ` [PATCH 1/6] xfs: document the XFS_ALLOC_AGFL_RESERVE constant Darrick J. Wong
2022-03-23 20:39 ` Dave Chinner
2022-03-24 5:15 ` Darrick J. Wong
2022-03-24 5:58 ` Dave Chinner
2022-03-20 16:43 ` [PATCH 2/6] xfs: actually set aside enough space to handle a bmbt split Darrick J. Wong
2022-03-23 20:48 ` Dave Chinner
2022-03-24 5:26 ` Darrick J. Wong
2022-03-24 6:00 ` Dave Chinner [this message]
2022-03-20 16:43 ` [PATCH 3/6] xfs: don't include bnobt blocks when reserving free block pool Darrick J. Wong
2022-03-21 15:22 ` Brian Foster
2022-03-21 20:42 ` Darrick J. Wong
2022-03-23 20:51 ` Dave Chinner
2022-03-20 16:43 ` [PATCH 4/6] xfs: fix infinite loop " Darrick J. Wong
2022-03-23 21:11 ` Dave Chinner
2022-03-24 5:24 ` Darrick J. Wong
2022-03-24 6:21 ` Dave Chinner
2022-03-20 16:43 ` [PATCH 5/6] xfs: don't report reserved bnobt space as available Darrick J. Wong
2022-03-21 15:22 ` Brian Foster
2022-03-21 20:48 ` Darrick J. Wong
2022-03-23 21:12 ` Dave Chinner
2022-03-20 16:44 ` [PATCH 6/6] xfs: rename "alloc_set_aside" to be more descriptive Darrick J. Wong
2022-03-23 21:21 ` Dave Chinner
-- strict thread matches above, loose matches on Subject: below --
2022-03-17 21:20 [PATCHSET v2 0/6] xfs: fix incorrect reserve pool calculations and reporting Darrick J. Wong
2022-03-17 21:21 ` [PATCH 2/6] xfs: actually set aside enough space to handle a bmbt split Darrick J. Wong
2022-03-18 12:17 ` Brian Foster
2022-03-18 20:52 ` Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220324060026.GC1544202@dread.disaster.area \
--to=david@fromorbit.com \
--cc=bfoster@redhat.com \
--cc=djwong@kernel.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox