From: Dave Chinner <david@fromorbit.com>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: Brian Foster <bfoster@redhat.com>, linux-xfs@vger.kernel.org
Subject: Re: [PATCH 1/6] xfs: document the XFS_ALLOC_AGFL_RESERVE constant
Date: Thu, 24 Mar 2022 07:39:16 +1100 [thread overview]
Message-ID: <20220323203916.GU1544202@dread.disaster.area> (raw)
In-Reply-To: <164779461278.550479.17511700626088235894.stgit@magnolia>
On Sun, Mar 20, 2022 at 09:43:32AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
>
> Currently, we use this undocumented macro to encode the minimum number
> of blocks needed to replenish a completely empty AGFL when an AG is
> nearly full. This has lead to confusion on the part of the maintainers,
> so let's document what the value actually means, and move it to
> xfs_alloc.c since it's not used outside of that module.
Code change looks fine, but....
>
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> Reviewed-by: Brian Foster <bfoster@redhat.com>
> ---
> fs/xfs/libxfs/xfs_alloc.c | 23 ++++++++++++++++++-----
> fs/xfs/libxfs/xfs_alloc.h | 1 -
> 2 files changed, 18 insertions(+), 6 deletions(-)
>
>
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index 353e53b892e6..b0678e96ce61 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -82,6 +82,19 @@ xfs_prealloc_blocks(
> }
>
> /*
> + * The number of blocks per AG that we withhold from xfs_mod_fdblocks to
> + * guarantee that we can refill the AGFL prior to allocating space in a nearly
> + * full AG. We require two blocks per free space btree because free space
> + * btrees shrink to a single block as the AG fills up, and any allocation can
> + * cause a btree split. The rmap btree uses a per-AG reservation to withhold
> + * space from xfs_mod_fdblocks, so we do not account for that here.
> + */
> +#define XFS_ALLOCBT_AGFL_RESERVE 4
.... that comment is not correct. this number had nothing to do
with btree split reservations and is all about preventing
oversubscription of the AG when the free space trees are completely
empty. By the time there is enough free space records in the AG for
the bno/cnt trees to be at risk of a single level root split
(several hundred free extent records), there is enough free space to
fully allocate the 4 blocks that the AGFL needs for that split.
That is, the set aside was designed to prevent AG selection in
xfs_alloc_vextent() having xfs_alloc_fixup_freelist() modifying an
AG to fix up the AGFL and then fail to fully fill the AGFL because,
say, there were only 2 blocks free in the AG and the AGFL needed 3.
Then we try all other AGs and get ENOSPC from all of them, and we
end up cancelling a dirty transaction and shutting down instead of
propagating ENOSPC up to the user.
This "not enough blocks to populate the AGFL" problem arose because
we can allocate extents directly from the AGFL if the free space
btree is empty, resulting in depletion of the AGFL and no free space
to re-populate it. Freeing a block will then go back into the btree,
and so the next allocation attempt might need 2 blocks for the AGFL,
have one block in the free space tree, and then we fail to fill
the AGFL completely because we still need one block for the AGFL and
there's no space available anymore. If all other AGs are like this
or ENOSPC, then kaboom.
IOWs, I originally added this per-ag set aside so that when the AG
was almost completely empty and we were down to allocating the last
blocks from the AG, users couldn't oversubscribe global free space by
consuming the blocks the AGs required to fill the AGFLs to allow the
last blocks that users could allocate to be allocated safely.
Hence the set aside of 4 blocks per AG was not to ensure the
freespace trees could be split, but to ensure every last possible
block could be allocated from the AG without causing the AG
selection algorithms to select and modify AGs that could not have
their AGFL fully fixed up to allocate the blocks that the caller
needed when near ENOSPC...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2022-03-23 20:39 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-20 16:43 [PATCHSET v3 0/6] xfs: fix incorrect reserve pool calculations and reporting Darrick J. Wong
2022-03-20 16:43 ` [PATCH 1/6] xfs: document the XFS_ALLOC_AGFL_RESERVE constant Darrick J. Wong
2022-03-23 20:39 ` Dave Chinner [this message]
2022-03-24 5:15 ` Darrick J. Wong
2022-03-24 5:58 ` Dave Chinner
2022-03-20 16:43 ` [PATCH 2/6] xfs: actually set aside enough space to handle a bmbt split Darrick J. Wong
2022-03-23 20:48 ` Dave Chinner
2022-03-24 5:26 ` Darrick J. Wong
2022-03-24 6:00 ` Dave Chinner
2022-03-20 16:43 ` [PATCH 3/6] xfs: don't include bnobt blocks when reserving free block pool Darrick J. Wong
2022-03-21 15:22 ` Brian Foster
2022-03-21 20:42 ` Darrick J. Wong
2022-03-23 20:51 ` Dave Chinner
2022-03-20 16:43 ` [PATCH 4/6] xfs: fix infinite loop " Darrick J. Wong
2022-03-23 21:11 ` Dave Chinner
2022-03-24 5:24 ` Darrick J. Wong
2022-03-24 6:21 ` Dave Chinner
2022-03-20 16:43 ` [PATCH 5/6] xfs: don't report reserved bnobt space as available Darrick J. Wong
2022-03-21 15:22 ` Brian Foster
2022-03-21 20:48 ` Darrick J. Wong
2022-03-23 21:12 ` Dave Chinner
2022-03-20 16:44 ` [PATCH 6/6] xfs: rename "alloc_set_aside" to be more descriptive Darrick J. Wong
2022-03-23 21:21 ` Dave Chinner
-- strict thread matches above, loose matches on Subject: below --
2022-03-27 16:58 [PATCHSET v4 0/6] xfs: fix incorrect reserve pool calculations and reporting Darrick J. Wong
2022-03-27 16:58 ` [PATCH 1/6] xfs: document the XFS_ALLOC_AGFL_RESERVE constant Darrick J. Wong
2022-03-28 1:18 ` Dave Chinner
2022-04-01 5:51 ` Christoph Hellwig
2022-03-17 21:20 [PATCHSET v2 0/6] xfs: fix incorrect reserve pool calculations and reporting Darrick J. Wong
2022-03-17 21:21 ` [PATCH 1/6] xfs: document the XFS_ALLOC_AGFL_RESERVE constant Darrick J. Wong
2022-03-18 12:17 ` Brian Foster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220323203916.GU1544202@dread.disaster.area \
--to=david@fromorbit.com \
--cc=bfoster@redhat.com \
--cc=djwong@kernel.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox