public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Gao Xiang <hsiangkao@redhat.com>
Cc: linux-xfs@vger.kernel.org,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	Brian Foster <bfoster@redhat.com>,
	Eric Sandeen <sandeen@sandeen.net>,
	Dave Chinner <david@fromorbit.com>,
	Christoph Hellwig <hch@infradead.org>
Subject: Re: [PATCH v5 4/5] xfs: support shrinking unused space in the last AG
Date: Wed, 20 Jan 2021 11:25:06 -0800	[thread overview]
Message-ID: <20210120192506.GL3134581@magnolia> (raw)
In-Reply-To: <20210118083700.2384277-5-hsiangkao@redhat.com>

On Mon, Jan 18, 2021 at 04:36:59PM +0800, Gao Xiang wrote:
> As the first step of shrinking, this attempts to enable shrinking
> unused space in the last allocation group by fixing up freespace
> btree, agi, agf and adjusting super block and introduce a helper
> xfs_ag_shrink_space() to fixup the last AG.
> 
> This can be all done in one transaction for now, so I think no
> additional protection is needed.
> 
> Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_ag.c | 88 ++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/libxfs/xfs_ag.h |  2 +
>  fs/xfs/xfs_fsops.c     | 77 ++++++++++++++++++++++++++----------
>  fs/xfs/xfs_trans.c     |  1 -
>  4 files changed, 146 insertions(+), 22 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c
> index 9331f3516afa..04a7c9b20470 100644
> --- a/fs/xfs/libxfs/xfs_ag.c
> +++ b/fs/xfs/libxfs/xfs_ag.c
> @@ -22,6 +22,8 @@
>  #include "xfs_ag.h"
>  #include "xfs_ag_resv.h"
>  #include "xfs_health.h"
> +#include "xfs_error.h"
> +#include "xfs_bmap.h"
>  
>  static int
>  xfs_get_aghdr_buf(
> @@ -485,6 +487,92 @@ xfs_ag_init_headers(
>  	return error;
>  }
>  
> +int
> +xfs_ag_shrink_space(
> +	struct xfs_mount	*mp,
> +	struct xfs_trans	*tp,
> +	struct aghdr_init_data	*id,
> +	xfs_extlen_t		len)
> +{
> +	struct xfs_alloc_arg	args = {
> +		.tp	= tp,
> +		.mp	= mp,
> +		.type	= XFS_ALLOCTYPE_THIS_BNO,
> +		.minlen = len,
> +		.maxlen = len,
> +		.oinfo	= XFS_RMAP_OINFO_SKIP_UPDATE,
> +		.resv	= XFS_AG_RESV_NONE,
> +		.prod	= 1
> +	};
> +	struct xfs_buf		*agibp, *agfbp;
> +	struct xfs_agi		*agi;
> +	struct xfs_agf		*agf;
> +	int			error, err2;
> +
> +	ASSERT(id->agno == mp->m_sb.sb_agcount - 1);
> +	error = xfs_ialloc_read_agi(mp, tp, id->agno, &agibp);
> +	if (error)
> +		return error;
> +
> +	agi = agibp->b_addr;
> +
> +	error = xfs_alloc_read_agf(mp, tp, id->agno, 0, &agfbp);
> +	if (error)
> +		return error;
> +
> +	agf = agfbp->b_addr;
> +	if (XFS_IS_CORRUPT(mp, agf->agf_length != agi->agi_length))
> +		return -EFSCORRUPTED;
> +
> +	args.fsbno = XFS_AGB_TO_FSB(mp, id->agno,
> +				    be32_to_cpu(agi->agi_length) - len);
> +
> +	/* remove the preallocations before allocation and re-establish then */
> +	error = xfs_ag_resv_free(agibp->b_pag);
> +	if (error)
> +		return error;
> +
> +	/* internal log shouldn't also show up in the free space btrees */
> +	error = xfs_alloc_vextent(&args);
> +	if (!error && args.agbno == NULLAGBLOCK)
> +		error = -ENOSPC;
> +
> +	if (error) {

Aha, now I see why this bit:

	if (!extend && ((tp->t_flags & XFS_TRANS_DIRTY) ||
			!list_empty(&tp->t_dfops)))
		xfs_trans_commit(tp);

is needed below -- we could have refilled the AGFL here but failed the
allocation.  At this point we have a dirty transaction /and/ an error
code.  We need to commit the AGFL refill changes and return to userspace
with that error code, but calling xfs_trans_cancel on the dirty
transaction causes an (unnecessary) shutdown.

What if you rolled the transaction here and passed the new tp and the
error code back to the caller?  The new transaction is clean so it will
cancel without any side effects, and then you can send the ENOSPC up to
userspace.

Granted, you could just as easily commit the transaction here and make
the caller smart enough to know that it no longer has a transaction.  I
wonder if the transaction allocation and disposal ought to be part of
the _ag_grow_space and _ag_shrink_space functions.

Also fwiw I would make sure the transaction is clean before I tried to
re-initialize the per-ag reservation.

> +		err2 = xfs_ag_resv_init(agibp->b_pag, tp);
> +		if (err2)
> +			goto resv_err;
> +		return error;
> +	}
> +
> +	/*
> +	 * if successfully deleted from freespace btrees, need to confirm
> +	 * per-AG reservation works as expected.
> +	 */
> +	be32_add_cpu(&agi->agi_length, -len);
> +	be32_add_cpu(&agf->agf_length, -len);
> +
> +	err2 = xfs_ag_resv_init(agibp->b_pag, tp);
> +	if (err2) {
> +		be32_add_cpu(&agi->agi_length, len);
> +		be32_add_cpu(&agf->agf_length, len);
> +		if (err2 != -ENOSPC)
> +			goto resv_err;

If we've just undone reducing ag[if]_length, don't we need to call
xfs_ag_resv_init here to (try to) recreate the former per-ag
reservations?

Also, the comment above about cleaning the transaction before trying to
reinit the per-ag reservation and returning ENOSPC applies here.

> +
> +		__xfs_bmap_add_free(tp, args.fsbno, len,
> +				    &XFS_RMAP_OINFO_SKIP_UPDATE, true);
> +		return err2;
> +	}
> +	xfs_ialloc_log_agi(tp, agibp, XFS_AGI_LENGTH);
> +	xfs_alloc_log_agf(tp, agfbp, XFS_AGF_LENGTH);
> +	return 0;
> +
> +resv_err:
> +	xfs_warn(mp,
> +"Error %d reserving per-AG metadata reserve pool.", err2);
> +		xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
> +	return err2;
> +}
> +
>  /*
>   * Extent the AG indicated by the @id by the length passed in
>   */
> diff --git a/fs/xfs/libxfs/xfs_ag.h b/fs/xfs/libxfs/xfs_ag.h
> index 5166322807e7..f3b5bbfeadce 100644
> --- a/fs/xfs/libxfs/xfs_ag.h
> +++ b/fs/xfs/libxfs/xfs_ag.h
> @@ -24,6 +24,8 @@ struct aghdr_init_data {
>  };
>  
>  int xfs_ag_init_headers(struct xfs_mount *mp, struct aghdr_init_data *id);
> +int xfs_ag_shrink_space(struct xfs_mount *mp, struct xfs_trans *tp,
> +			struct aghdr_init_data *id, xfs_extlen_t len);
>  int xfs_ag_extend_space(struct xfs_mount *mp, struct xfs_trans *tp,
>  			struct aghdr_init_data *id, xfs_extlen_t len);
>  int xfs_ag_get_geometry(struct xfs_mount *mp, xfs_agnumber_t agno,
> diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
> index db6ed354c465..2ae4f33b42c9 100644
> --- a/fs/xfs/xfs_fsops.c
> +++ b/fs/xfs/xfs_fsops.c
> @@ -38,7 +38,7 @@ xfs_resizefs_init_new_ags(
>  	struct aghdr_init_data	*id,
>  	xfs_agnumber_t		oagcount,
>  	xfs_agnumber_t		nagcount,
> -	xfs_rfsblock_t		*delta)
> +	int64_t			*delta)
>  {
>  	xfs_rfsblock_t		nb = mp->m_sb.sb_dblocks + *delta;
>  	int			error;
> @@ -76,33 +76,41 @@ xfs_growfs_data_private(
>  	xfs_agnumber_t		nagcount;
>  	xfs_agnumber_t		nagimax = 0;
>  	xfs_rfsblock_t		nb, nb_div, nb_mod;
> -	xfs_rfsblock_t		delta;
> +	int64_t			delta;
>  	xfs_agnumber_t		oagcount;
>  	struct xfs_trans	*tp;
> +	bool			extend;
>  	struct aghdr_init_data	id = {};
>  
>  	nb = in->newblocks;
> -	if (nb < mp->m_sb.sb_dblocks)
> -		return -EINVAL;
> -	if ((error = xfs_sb_validate_fsb_count(&mp->m_sb, nb)))
> +	if (nb == mp->m_sb.sb_dblocks)
> +		return 0;
> +
> +	error = xfs_sb_validate_fsb_count(&mp->m_sb, nb);
> +	if (error)
>  		return error;
> -	error = xfs_buf_read_uncached(mp->m_ddev_targp,
> +
> +	if (nb > mp->m_sb.sb_dblocks) {
> +		error = xfs_buf_read_uncached(mp->m_ddev_targp,
>  				XFS_FSB_TO_BB(mp, nb) - XFS_FSS_TO_BB(mp, 1),
>  				XFS_FSS_TO_BB(mp, 1), 0, &bp, NULL);
> -	if (error)
> -		return error;
> -	xfs_buf_relse(bp);
> +		if (error)
> +			return error;
> +		xfs_buf_relse(bp);
> +	}
>  
>  	nb_div = nb;
>  	nb_mod = do_div(nb_div, mp->m_sb.sb_agblocks);
>  	nagcount = nb_div + (nb_mod != 0);
>  	if (nb_mod && nb_mod < XFS_MIN_AG_BLOCKS) {
>  		nagcount--;
> -		nb = (xfs_rfsblock_t)nagcount * mp->m_sb.sb_agblocks;
> -		if (nb < mp->m_sb.sb_dblocks)
> +		if (nagcount < 2)
>  			return -EINVAL;
> +		nb = (xfs_rfsblock_t)nagcount * mp->m_sb.sb_agblocks;
>  	}
> +
>  	delta = nb - mp->m_sb.sb_dblocks;
> +	extend = (delta > 0);
>  	oagcount = mp->m_sb.sb_agcount;
>  
>  	/* allocate the new per-ag structures */
> @@ -110,22 +118,34 @@ xfs_growfs_data_private(
>  		error = xfs_initialize_perag(mp, nagcount, &nagimax);
>  		if (error)
>  			return error;
> +	} else if (nagcount != oagcount) {

Nit: nagcount < oagcount ?

> +		/* TODO: shrinking the entire AGs hasn't yet completed */
> +		return -EINVAL;
>  	}
>  
>  	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_growdata,
> -			XFS_GROWFS_SPACE_RES(mp), 0, XFS_TRANS_RESERVE, &tp);
> +			(extend ? XFS_GROWFS_SPACE_RES(mp) : -delta), 0,
> +			XFS_TRANS_RESERVE, &tp);
>  	if (error)
>  		return error;
>  
> -	error = xfs_resizefs_init_new_ags(mp, &id, oagcount, nagcount, &delta);
> -	if (error)
> -		goto out_trans_cancel;
> -
> +	if (extend) {
> +		error = xfs_resizefs_init_new_ags(mp, &id, oagcount,
> +						  nagcount, &delta);
> +		if (error)
> +			goto out_trans_cancel;
> +	}
>  	xfs_trans_agblocks_delta(tp, id.nfree);
>  
> -	/* If there are new blocks in the old last AG, extend it. */
> +	/* If there are some blocks in the last AG, resize it. */
>  	if (delta) {
> -		error = xfs_ag_extend_space(mp, tp, &id, delta);
> +		if (extend) {
> +			error = xfs_ag_extend_space(mp, tp, &id, delta);
> +		} else {
> +			id.agno = nagcount - 1;
> +			error = xfs_ag_shrink_space(mp, tp, &id, -delta);

This is a little nitpicky, but I wonder if the reorganization of
xfs_growfs_data_private ought to be in a separate preparation patch,
wherein you'd define xfs_ag_shrink_space as a stub that returns
EOPNOSUPP, and make all the necessary adjustments to the caller.

That way, this second patch would concentrate on replacing the
shrink_space stub an actual implementation.

> +		}
> +
>  		if (error)
>  			goto out_trans_cancel;
>  	}
> @@ -137,11 +157,19 @@ xfs_growfs_data_private(
>  	 */
>  	if (nagcount > oagcount)
>  		xfs_trans_mod_sb(tp, XFS_TRANS_SB_AGCOUNT, nagcount - oagcount);
> -	if (nb > mp->m_sb.sb_dblocks)
> +	if (nb != mp->m_sb.sb_dblocks)
>  		xfs_trans_mod_sb(tp, XFS_TRANS_SB_DBLOCKS,
>  				 nb - mp->m_sb.sb_dblocks);
>  	if (id.nfree)
>  		xfs_trans_mod_sb(tp, XFS_TRANS_SB_FDBLOCKS, id.nfree);
> +
> +	/*
> +	 * update in-core counters (especially sb_fdblocks) now
> +	 * so xfs_validate_sb_write() can pass.
> +	 */
> +	if (xfs_sb_version_haslazysbcount(&mp->m_sb))
> +		xfs_log_sb(tp);

How do we get a failure in xfs_validate_sb_write?  We're changing
fdblocks and dblocks in the same transaction, which means that both
counters should have changed by the number of blocks we took out of
the filesystem, right?

Is the problem that the TRANS_SB_DBLOCKS change above makes the primary
super's sb_dblocks decrease immediately, but since we're in lazycounters
mode we defer updating sb_fdblocks until unmount, so in the meantime
we fail the sb write verifier because fdblocks > dblocks?

Or: is it the general case that we ought to be forcing fdblocks to get
logged here even for fs grow operations?  In which case this (minor)
behavior change probably should go in a separate patch.

--D

> +
>  	xfs_trans_set_sync(tp);
>  	error = xfs_trans_commit(tp);
>  	if (error)
> @@ -157,7 +185,7 @@ xfs_growfs_data_private(
>  	 * If we expanded the last AG, free the per-AG reservation
>  	 * so we can reinitialize it with the new size.
>  	 */
> -	if (delta) {
> +	if (delta > 0) {
>  		struct xfs_perag	*pag;
>  
>  		pag = xfs_perag_get(mp, id.agno);
> @@ -178,7 +206,14 @@ xfs_growfs_data_private(
>  	return error;
>  
>  out_trans_cancel:
> -	xfs_trans_cancel(tp);
> +	/*
> +	 * AGFL fixup can dirty the transaction, so it needs committing anyway.
> +	 */
> +	if (!extend && ((tp->t_flags & XFS_TRANS_DIRTY) ||
> +			!list_empty(&tp->t_dfops)))
> +		xfs_trans_commit(tp);
> +	else
> +		xfs_trans_cancel(tp);
>  	return error;
>  }
>  
> diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
> index e72730f85af1..fd2cbf414b80 100644
> --- a/fs/xfs/xfs_trans.c
> +++ b/fs/xfs/xfs_trans.c
> @@ -419,7 +419,6 @@ xfs_trans_mod_sb(
>  		tp->t_res_frextents_delta += delta;
>  		break;
>  	case XFS_TRANS_SB_DBLOCKS:
> -		ASSERT(delta > 0);
>  		tp->t_dblocks_delta += delta;
>  		break;
>  	case XFS_TRANS_SB_AGCOUNT:
> -- 
> 2.27.0
> 

  reply	other threads:[~2021-01-20 19:26 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-18  8:36 [PATCH v5 0/5] xfs: support shrinking free space in the last AG Gao Xiang
2021-01-18  8:36 ` [PATCH v5 1/5] xfs: rename `new' to `delta' in xfs_growfs_data_private() Gao Xiang
2021-01-18  8:36 ` [PATCH v5 2/5] xfs: get rid of xfs_growfs_{data,log}_t Gao Xiang
2021-01-18  8:36 ` [PATCH v5 3/5] xfs: hoist out xfs_resizefs_init_new_ags() Gao Xiang
2021-01-18  8:36 ` [PATCH v5 4/5] xfs: support shrinking unused space in the last AG Gao Xiang
2021-01-20 19:25   ` Darrick J. Wong [this message]
2021-01-20 20:22     ` Gao Xiang
2021-01-20 20:31       ` Gao Xiang
2021-01-21  1:51     ` Gao Xiang
2021-01-18  8:37 ` [PATCH v5 5/5] xfs: add error injection for per-AG resv failure when shrinkfs Gao Xiang
2021-01-20 19:25   ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210120192506.GL3134581@magnolia \
    --to=djwong@kernel.org \
    --cc=bfoster@redhat.com \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=hsiangkao@redhat.com \
    --cc=linux-xfs@vger.kernel.org \
    --cc=sandeen@sandeen.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox