From: Brian Foster <bfoster@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: "Darrick J. Wong" <djwong@kernel.org>, xfs <linux-xfs@vger.kernel.org>
Subject: Re: [PATCH] xfs: don't include bnobt blocks when reserving free block pool
Date: Thu, 17 Mar 2022 08:56:55 -0400 [thread overview]
Message-ID: <YjMwF8Sh231vjjde@bfoster> (raw)
In-Reply-To: <20220317020526.GV3927073@dread.disaster.area>
On Thu, Mar 17, 2022 at 01:05:26PM +1100, Dave Chinner wrote:
> On Wed, Mar 16, 2022 at 11:17:26AM -0700, Darrick J. Wong wrote:
> > On Wed, Mar 16, 2022 at 01:29:01PM -0400, Brian Foster wrote:
> > > On Wed, Mar 16, 2022 at 09:32:16AM -0700, Darrick J. Wong wrote:
> > > > On Wed, Mar 16, 2022 at 07:28:18AM -0400, Brian Foster wrote:
> > > > > On Mon, Mar 14, 2022 at 11:08:47AM -0700, Darrick J. Wong wrote:
> > > Similar deal as above.. I'm more interested in a potential cleanup of
> > > the code that helps prevent this sort of buglet for the next user of
> > > ->m_alloc_set_aside that will (expectedly) have no idea about this
> > > subtle quirk than I am about what's presented in the free space
> > > counters. ISTM that we ought to just ditch ->m_alloc_set_aside, replace
> > > the existing xfs_alloc_set_aside() with an XFS_ALLOC_FS_RESERVED() macro
> > > or something that just does the (agcount << 3) thing, and then define a
> >
> > I'm not sure that the current xfs_alloc_set_aside code is correct.
> > Right now it comes with this comment:
> >
> > "We need to reserve 4 fsbs _per AG_ for the freelist and 4 more to
> > handle a potential split of the file's bmap btree."
> >
> > I think the first part ("4 fsbs _per AG_ for the freelist") is wrong.
> > AFAICT, that part refers to the number of blocks we need to keep free in
> > case we have to replenish a completely empty AGFL. The hardcoded value
> > of 4 seems wrong, since xfs_alloc_min_freelist() is what _fix_freelist
> > uses to decide how big the AGFL needs to be, and it returns 6 on a
> > filesystem that has rmapbt enabled. So I think XFS_ALLOC_AGFL_RESERVE
> > is wrong here and should be replaced with the function call.
>
> Back when I wrote that code (circa 2007, IIRC), that was actually
> correct according to the reservations that were made when freeing
> an extent at ENOSPC.
>
> We needed 4 blocks for the AGFL fixup to always succeed - 2 blocks
> for each BNO and CNT btrees, and, IIRC, the extent free reservation
> was just 4 blocks at that time. Hence the 4+4 value.
>
> However, you are right that rmap also adds another per-ag btree that
> is allocated from the agfl and that set_aside() should be taking
> that into accout. That said, I think that xfs_alloc_min_freelist()
> might even be wrong by just adding 2 blocks to the AGFL for the
> rmapbt.
>
> That is, at ENOSPC the rmapbt can be a *big* btree. It's not like
> the BNO and CNT btrees which are completely empty at that point in
> time; the RMAP tree could be one level below max height, and freeing
> a single block could split a rmap rec and trigger a full height RMAP
> split.
>
> So the minimum free list length in that case is 2 + 2 + MAX_RMAP_HEIGHT.
>
> > I also think the second part ("and 4 more to handle a split of the
> > file's bmap btree") is wrong. If we're really supposed to save enough
> > blocks to handle a bmbt split, then I think this ought to be
> > (mp->m_bm_maxlevels[0] - 1), not 4, right? According to xfs_db, bmap
> > btrees can be 9 levels tall:
>
> Yes, we've changed the BMBT reservations in the years since that
> code was written to handle max height reservations correctly, too.
> So, like the RMAP btree reservation, we probably should be reserving
> MAX_BMAP_HEIGHT in the set-aside calculation.
>
> refcount btree space is handled by the ag_resv code and blocks
> aren't allocated from the AGFL, so I don't think we need to take
> taht into account for xfs_alloc_set_aside().
>
> > So in the end, I think that calculation should become:
> >
> > unsigned int
> > xfs_alloc_set_aside(
> > struct xfs_mount *mp)
> > {
> > unsigned int min-agfl = xfs_alloc_min_freelist(mp, NULL);
> >
> > return mp->m_sb.sb_agcount * (min_agfl + mp->m_bm_maxlevels[0] - 1);
> > }
>
> *nod*, but with the proviso that xfs_alloc_min_freelist() doesn't
> appear to be correct, either....
>
> Also, that's a fixed value for the physical geometry of the
> filesystem, so it should be calculated once at mount time and stored
> in the xfs_mount (and only updated if needed at growfs time)...
>
To my earlier point... please just don't call this fixed mount value
"set_aside" if that's not what it actually is. Rename the field and
helper to something self-descriptive based on whatever fixed components
it's made up of (you could even qualify it as a subcomponent of
set_aside with something like ".._agfl_bmap_set_aside" or whatever) then
reserve the _set_aside() name for the helper that calculates and
documents what the actual/final/dynamic "set aside" value is.
Brian
> > > new xfs_alloc_set_aside() that combines the macro calculation with
> > > ->m_allocbt_blks. Then the whole "set aside" concept is calculated and
> > > documented in one place. Hm?
> >
> > I think I'd rather call the new function xfs_fdblocks_avail() over
> > reusing an existing name, because I fear that zapping an old function
> > and replacing it with a new function with the same name will cause
> > confusion for anyone backporting patches or reading code after an
> > absence.
> >
> > Also the only reason we have a mount variable and a function (instead of
> > a macro) is that Dave asked me to change the codebase away from the
> > XFS_ALLOC_AG_MAX_USABLE/XFS_ALLOC_SET_ASIDE macros as part of merging
> > reflink.
>
> Yeah, macros wrapping a variable or repeated constant calculation
> are bad, and it's something we've been cleaning up for a long
> time...
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
>
next prev parent reply other threads:[~2022-03-17 12:58 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-14 18:08 [PATCH] xfs: don't include bnobt blocks when reserving free block pool Darrick J. Wong
2022-03-16 11:28 ` Brian Foster
2022-03-16 16:32 ` Darrick J. Wong
2022-03-16 17:29 ` Brian Foster
2022-03-16 18:17 ` Darrick J. Wong
2022-03-16 18:48 ` Brian Foster
2022-03-16 19:17 ` Brian Foster
2022-03-16 21:15 ` Darrick J. Wong
2022-03-17 2:19 ` Dave Chinner
2022-03-17 12:53 ` Brian Foster
2022-03-17 2:05 ` Dave Chinner
2022-03-17 12:56 ` Brian Foster [this message]
2022-03-17 15:46 ` Darrick J. Wong
2022-03-17 17:05 ` Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YjMwF8Sh231vjjde@bfoster \
--to=bfoster@redhat.com \
--cc=david@fromorbit.com \
--cc=djwong@kernel.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox