public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Brian Foster <bfoster@redhat.com>
Cc: linux-xfs@vger.kernel.org, david@fromorbit.com
Subject: Re: [PATCH 3/6] xfs: don't include bnobt blocks when reserving free block pool
Date: Fri, 18 Mar 2022 14:01:00 -0700	[thread overview]
Message-ID: <20220318210100.GE8224@magnolia> (raw)
In-Reply-To: <YjR4nWL9RXOq1mDi@bfoster>

On Fri, Mar 18, 2022 at 08:18:37AM -0400, Brian Foster wrote:
> On Thu, Mar 17, 2022 at 02:21:12PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > xfs_reserve_blocks controls the size of the user-visible free space
> > reserve pool.  Given the difference between the current and requested
> > pool sizes, it will try to reserve free space from fdblocks.  However,
> > the amount requested from fdblocks is also constrained by the amount of
> > space that we think xfs_mod_fdblocks will give us.  We'll keep trying to
> > reserve space so long as xfs_mod_fdblocks returns ENOSPC.
> > 
> > In commit fd43cf600cf6, we decided that xfs_mod_fdblocks should not hand
> > out the "free space" used by the free space btrees, because some portion
> > of the free space btrees hold in reserve space for future btree
> > expansion.  Unfortunately, xfs_reserve_blocks' estimation of the number
> > of blocks that it could request from xfs_mod_fdblocks was not updated to
> > include m_allocbt_blks, so if space is extremely low, the caller hangs.
> > 
> > Fix this by creating a function to estimate the number of blocks that
> > can be reserved from fdblocks, which needs to exclude the set-aside and
> > m_allocbt_blks.
> > 
> > Found by running xfs/306 (which formats a single-AG 20MB filesystem)
> > with an fstests configuration that specifies a 1k blocksize and a
> > specially crafted log size that will consume 7/8 of the space (17920
> > blocks, specifically) in that AG.
> > 
> > Cc: Brian Foster <bfoster@redhat.com>
> > Fixes: fd43cf600cf6 ("xfs: set aside allocation btree blocks from block reservation")
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > ---
> >  fs/xfs/xfs_fsops.c |    7 +++++--
> >  fs/xfs/xfs_mount.h |   29 +++++++++++++++++++++++++++++
> >  2 files changed, 34 insertions(+), 2 deletions(-)
> > 
> > 
> > diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
> > index 33e26690a8c4..b71799a3acd3 100644
> > --- a/fs/xfs/xfs_fsops.c
> > +++ b/fs/xfs/xfs_fsops.c
> > @@ -433,8 +433,11 @@ xfs_reserve_blocks(
> >  	 */
> >  	error = -ENOSPC;
> >  	do {
> > -		free = percpu_counter_sum(&mp->m_fdblocks) -
> > -						mp->m_alloc_set_aside;
> > +		/*
> > +		 * The reservation pool cannot take space that xfs_mod_fdblocks
> > +		 * will not give us.
> > +		 */
> 
> This comment seems unnecessary. I'm not sure what this is telling that
> the code doesn't already..?

Yeah, I'll get rid of it.

> > +		free = xfs_fdblocks_available(mp);
> >  		if (free <= 0)
> >  			break;
> >  
> > diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
> > index 00720a02e761..998b54c3c454 100644
> > --- a/fs/xfs/xfs_mount.h
> > +++ b/fs/xfs/xfs_mount.h
> > @@ -479,6 +479,35 @@ extern void	xfs_unmountfs(xfs_mount_t *);
> >   */
> >  #define XFS_FDBLOCKS_BATCH	1024
> >  
> > +/*
> > + * Estimate the amount of space that xfs_mod_fdblocks might give us without
> > + * drawing from the reservation pool.  In other words, estimate the free space
> > + * that is available to userspace.
> > + *
> > + * This quantity is the amount of free space tracked in the on-disk metadata
> > + * minus:
> > + *
> > + * - Delayed allocation reservations
> > + * - Per-AG space reservations to guarantee metadata expansion
> > + * - Userspace-controlled free space reserve pool
> > + *
> > + * - Space reserved to ensure that we can always split a bmap btree
> > + * - Free space btree blocks that are not available for allocation due to
> > + *   per-AG metadata reservations
> > + *
> > + * The first three are captured in the incore fdblocks counter.
> > + */
> 
> Hm. Sometimes I wonder if we overdocument things to our own detriment
> (reading back my own comments at times suggests I'm terrible at this).
> So do we really need to document what other internal reservations are or
> are not taken out of ->m_fdblocks here..? I suspect we already have
> plenty of sufficient documentation for things like perag res colocated
> with the actual code, such that this kind of thing just creates an
> external reference that will probably just bitrot as years go by. Can we
> reduce this down to just explain how/why this helper has to calculate a
> block availability value for blocks that otherwise haven't been
> explicitly allocated out of the in-core free block counters?

Hmm.  I suppose I could reduce the comment at the same time that I split
out the code that computes the amount of free space that isn't
available.

> > +static inline int64_t
> > +xfs_fdblocks_available(
> > +	struct xfs_mount	*mp)
> > +{
> > +	int64_t			free = percpu_counter_sum(&mp->m_fdblocks);
> > +
> > +	free -= mp->m_alloc_set_aside;
> > +	free -= atomic64_read(&mp->m_allocbt_blks);
> > +	return free;
> > +}
> > +
> 
> FWIW the helper seems fine in context, but will this help us avoid the
> duplicate calculation in xfs_mod_fdblocks(), for instance?

It will once I turn that into:


/*
 * Estimate the amount of free space that is not available to userspace
 * and is not explicitly reserved from the incore fdblocks:
 *
 * - Space reserved to ensure that we can always split a bmap btree
 * - Free space btree blocks that are not available for allocation due
 *   to per-AG metadata reservations
 */
static inline uint64_t
xfs_fdblocks_unavailable(
	struct xfs_mount	*mp)
{
	return mp->m_alloc_set_aside + atomic64_read(&mp->m_allocbt_blks);
}

/*
 * Estimate the amount of space that xfs_mod_fdblocks might give us
 * without drawing from any reservation pool.  In other words, estimate
 * the free space that is available to userspace.
 */
static inline int64_t
xfs_fdblocks_available(
	struct xfs_mount	*mp)
{
	return percpu_counter_sum(&mp->m_fdblocks) -
			xfs_fdblocks_unavailable(mp);
}

--D

> 
> Brian
> 
> >  extern int	xfs_mod_fdblocks(struct xfs_mount *mp, int64_t delta,
> >  				 bool reserved);
> >  extern int	xfs_mod_frextents(struct xfs_mount *mp, int64_t delta);
> > 
> 

  reply	other threads:[~2022-03-18 21:01 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-17 21:20 [PATCHSET v2 0/6] xfs: fix incorrect reserve pool calculations and reporting Darrick J. Wong
2022-03-17 21:21 ` [PATCH 1/6] xfs: document the XFS_ALLOC_AGFL_RESERVE constant Darrick J. Wong
2022-03-18 12:17   ` Brian Foster
2022-03-17 21:21 ` [PATCH 2/6] xfs: actually set aside enough space to handle a bmbt split Darrick J. Wong
2022-03-18 12:17   ` Brian Foster
2022-03-18 20:52     ` Darrick J. Wong
2022-03-17 21:21 ` [PATCH 3/6] xfs: don't include bnobt blocks when reserving free block pool Darrick J. Wong
2022-03-18 12:18   ` Brian Foster
2022-03-18 21:01     ` Darrick J. Wong [this message]
2022-03-17 21:21 ` [PATCH 4/6] xfs: fix infinite loop " Darrick J. Wong
2022-03-18 12:18   ` Brian Foster
2022-03-17 21:21 ` [PATCH 5/6] xfs: don't report reserved bnobt space as available Darrick J. Wong
2022-03-18 12:19   ` Brian Foster
2022-03-18 21:19     ` Darrick J. Wong
2022-03-17 21:21 ` [PATCH 6/6] xfs: rename "alloc_set_aside" to be more descriptive Darrick J. Wong
2022-03-18 12:21   ` Brian Foster
  -- strict thread matches above, loose matches on Subject: below --
2022-03-20 16:43 [PATCHSET v3 0/6] xfs: fix incorrect reserve pool calculations and reporting Darrick J. Wong
2022-03-20 16:43 ` [PATCH 3/6] xfs: don't include bnobt blocks when reserving free block pool Darrick J. Wong
2022-03-21 15:22   ` Brian Foster
2022-03-21 20:42     ` Darrick J. Wong
2022-03-23 20:51   ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220318210100.GE8224@magnolia \
    --to=djwong@kernel.org \
    --cc=bfoster@redhat.com \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox