From: "Darrick J. Wong" <djwong@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: Brian Foster <bfoster@redhat.com>, linux-xfs@vger.kernel.org
Subject: Re: [PATCH 1/6] xfs: document the XFS_ALLOC_AGFL_RESERVE constant
Date: Wed, 23 Mar 2022 22:15:42 -0700 [thread overview]
Message-ID: <20220324051542.GR8241@magnolia> (raw)
In-Reply-To: <20220323203916.GU1544202@dread.disaster.area>
On Thu, Mar 24, 2022 at 07:39:16AM +1100, Dave Chinner wrote:
> On Sun, Mar 20, 2022 at 09:43:32AM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> >
> > Currently, we use this undocumented macro to encode the minimum number
> > of blocks needed to replenish a completely empty AGFL when an AG is
> > nearly full. This has lead to confusion on the part of the maintainers,
> > so let's document what the value actually means, and move it to
> > xfs_alloc.c since it's not used outside of that module.
>
> Code change looks fine, but....
>
> >
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > Reviewed-by: Brian Foster <bfoster@redhat.com>
> > ---
> > fs/xfs/libxfs/xfs_alloc.c | 23 ++++++++++++++++++-----
> > fs/xfs/libxfs/xfs_alloc.h | 1 -
> > 2 files changed, 18 insertions(+), 6 deletions(-)
> >
> >
> > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> > index 353e53b892e6..b0678e96ce61 100644
> > --- a/fs/xfs/libxfs/xfs_alloc.c
> > +++ b/fs/xfs/libxfs/xfs_alloc.c
> > @@ -82,6 +82,19 @@ xfs_prealloc_blocks(
> > }
> >
> > /*
> > + * The number of blocks per AG that we withhold from xfs_mod_fdblocks to
> > + * guarantee that we can refill the AGFL prior to allocating space in a nearly
> > + * full AG. We require two blocks per free space btree because free space
> > + * btrees shrink to a single block as the AG fills up, and any allocation can
> > + * cause a btree split. The rmap btree uses a per-AG reservation to withhold
> > + * space from xfs_mod_fdblocks, so we do not account for that here.
> > + */
> > +#define XFS_ALLOCBT_AGFL_RESERVE 4
>
> .... that comment is not correct. this number had nothing to do
> with btree split reservations and is all about preventing
> oversubscription of the AG when the free space trees are completely
> empty. By the time there is enough free space records in the AG for
> the bno/cnt trees to be at risk of a single level root split
> (several hundred free extent records), there is enough free space to
> fully allocate the 4 blocks that the AGFL needs for that split.
>
> That is, the set aside was designed to prevent AG selection in
> xfs_alloc_vextent() having xfs_alloc_fixup_freelist() modifying an
> AG to fix up the AGFL and then fail to fully fill the AGFL because,
> say, there were only 2 blocks free in the AG and the AGFL needed 3.
> Then we try all other AGs and get ENOSPC from all of them, and we
> end up cancelling a dirty transaction and shutting down instead of
> propagating ENOSPC up to the user.
>
> This "not enough blocks to populate the AGFL" problem arose because
> we can allocate extents directly from the AGFL if the free space
> btree is empty, resulting in depletion of the AGFL and no free space
> to re-populate it. Freeing a block will then go back into the btree,
> and so the next allocation attempt might need 2 blocks for the AGFL,
> have one block in the free space tree, and then we fail to fill
> the AGFL completely because we still need one block for the AGFL and
> there's no space available anymore. If all other AGs are like this
> or ENOSPC, then kaboom.
>
> IOWs, I originally added this per-ag set aside so that when the AG
> was almost completely empty and we were down to allocating the last
> blocks from the AG, users couldn't oversubscribe global free space by
> consuming the blocks the AGs required to fill the AGFLs to allow the
> last blocks that users could allocate to be allocated safely.
>
> Hence the set aside of 4 blocks per AG was not to ensure the
> freespace trees could be split, but to ensure every last possible
> block could be allocated from the AG without causing the AG
> selection algorithms to select and modify AGs that could not have
> their AGFL fully fixed up to allocate the blocks that the caller
> needed when near ENOSPC...
Hmmm, thanks for the context! What do you think about this revised
comment?
/*
* The number of blocks per AG that we withhold from xfs_mod_fdblocks to
* guarantee that we can refill the AGFL prior to allocating space in a
* nearly full AG. Although the the space described in the free space
* btrees, the freesp btrees, and the blocks owned by the agfl are all
* added to the ondisk fdblocks, it's a mistake to let the ondisk free
* space in the AG drop so low that the free space btrees cannot refill
* an empty AGFL up to the minimum level. Rather than grind through
* empty AGs until the fs goes down, we subtract this many AG blocks
* from the incore fdblocks to ENOSPC early.
*/
#define XFS_ALLOCBT_AGFL_RESERVE 4
--D
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
next prev parent reply other threads:[~2022-03-24 5:15 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-20 16:43 [PATCHSET v3 0/6] xfs: fix incorrect reserve pool calculations and reporting Darrick J. Wong
2022-03-20 16:43 ` [PATCH 1/6] xfs: document the XFS_ALLOC_AGFL_RESERVE constant Darrick J. Wong
2022-03-23 20:39 ` Dave Chinner
2022-03-24 5:15 ` Darrick J. Wong [this message]
2022-03-24 5:58 ` Dave Chinner
2022-03-20 16:43 ` [PATCH 2/6] xfs: actually set aside enough space to handle a bmbt split Darrick J. Wong
2022-03-23 20:48 ` Dave Chinner
2022-03-24 5:26 ` Darrick J. Wong
2022-03-24 6:00 ` Dave Chinner
2022-03-20 16:43 ` [PATCH 3/6] xfs: don't include bnobt blocks when reserving free block pool Darrick J. Wong
2022-03-21 15:22 ` Brian Foster
2022-03-21 20:42 ` Darrick J. Wong
2022-03-23 20:51 ` Dave Chinner
2022-03-20 16:43 ` [PATCH 4/6] xfs: fix infinite loop " Darrick J. Wong
2022-03-23 21:11 ` Dave Chinner
2022-03-24 5:24 ` Darrick J. Wong
2022-03-24 6:21 ` Dave Chinner
2022-03-20 16:43 ` [PATCH 5/6] xfs: don't report reserved bnobt space as available Darrick J. Wong
2022-03-21 15:22 ` Brian Foster
2022-03-21 20:48 ` Darrick J. Wong
2022-03-23 21:12 ` Dave Chinner
2022-03-20 16:44 ` [PATCH 6/6] xfs: rename "alloc_set_aside" to be more descriptive Darrick J. Wong
2022-03-23 21:21 ` Dave Chinner
-- strict thread matches above, loose matches on Subject: below --
2022-03-27 16:58 [PATCHSET v4 0/6] xfs: fix incorrect reserve pool calculations and reporting Darrick J. Wong
2022-03-27 16:58 ` [PATCH 1/6] xfs: document the XFS_ALLOC_AGFL_RESERVE constant Darrick J. Wong
2022-03-28 1:18 ` Dave Chinner
2022-04-01 5:51 ` Christoph Hellwig
2022-03-17 21:20 [PATCHSET v2 0/6] xfs: fix incorrect reserve pool calculations and reporting Darrick J. Wong
2022-03-17 21:21 ` [PATCH 1/6] xfs: document the XFS_ALLOC_AGFL_RESERVE constant Darrick J. Wong
2022-03-18 12:17 ` Brian Foster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220324051542.GR8241@magnolia \
--to=djwong@kernel.org \
--cc=bfoster@redhat.com \
--cc=david@fromorbit.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox