Re: [PATCH 06/21] xfs: repair free space btrees

linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 06/21] xfs: repair free space btrees
Date: Tue, 3 Jul 2018 19:15:04 -0700	[thread overview]
Message-ID: <20180704021504.GA32415@magnolia> (raw)
In-Reply-To: <20180627032123.GF19934@dastard>

On Wed, Jun 27, 2018 at 01:21:23PM +1000, Dave Chinner wrote:
> On Sun, Jun 24, 2018 at 12:24:07PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Rebuild the free space btrees from the gaps in the rmap btree.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> ......
> > +
> > +/* Collect an AGFL block for the not-to-release list. */
> > +static int
> > +xfs_repair_collect_agfl_block(
> > +	struct xfs_mount		*mp,
> > +	xfs_agblock_t			bno,
> > +	void				*priv)

Whoah, I never replied to this.  Oops. :(

> /me now gets confused by agfl code (xfs_repair_agfl_...) collecting btree
> blocks, and now the btree code (xfs_repair_collect_agfl... )
> collecting agfl blocks.
> 
> The naming/namespace collisions is not that nice. I think this needs
> to be xr_allocbt_collect_agfl_blocks().

> /me idly wonders about consistently renaming everything abt, bnbt, cnbt,
> fibt, ibt, rmbt and rcbt...

Hmm, I'll think about a mass rename. :)

xfs_repair_refcountbt_fiddle_faddle(...);

xrep_rcbt_fiddle_faddle(...);

xrpr_rcbt_fiddle_faddle(...);

xrprrcbt_fiddle_faddle(...);

Yeah, maybe that third one.

> 
> > +/*
> > + * Iterate all reverse mappings to find (1) the free extents, (2) the OWN_AG
> > + * extents, (3) the rmapbt blocks, and (4) the AGFL blocks.  The free space is
> > + * (1) + (2) - (3) - (4).  Figure out if we have enough free space to
> > + * reconstruct the free space btrees.  Caller must clean up the input lists
> > + * if something goes wrong.
> > + */
> > +STATIC int
> > +xfs_repair_allocbt_find_freespace(
> > +	struct xfs_scrub_context	*sc,
> > +	struct list_head		*free_extents,
> > +	struct xfs_repair_extent_list	*old_allocbt_blocks)
> > +{
> > +	struct xfs_repair_alloc		ra;
> > +	struct xfs_repair_alloc_extent	*rae;
> > +	struct xfs_btree_cur		*cur;
> > +	struct xfs_mount		*mp = sc->mp;
> > +	xfs_agblock_t			agend;
> > +	xfs_agblock_t			nr_blocks;
> > +	int				error;
> > +
> > +	ra.extlist = free_extents;
> > +	ra.btlist = old_allocbt_blocks;
> > +	xfs_repair_init_extent_list(&ra.nobtlist);
> > +	ra.next_bno = 0;
> > +	ra.nr_records = 0;
> > +	ra.nr_blocks = 0;
> > +	ra.sc = sc;
> > +
> > +	/*
> > +	 * Iterate all the reverse mappings to find gaps in the physical
> > +	 * mappings, all the OWN_AG blocks, and all the rmapbt extents.
> > +	 */
> > +	cur = xfs_rmapbt_init_cursor(mp, sc->tp, sc->sa.agf_bp, sc->sa.agno);
> > +	error = xfs_rmap_query_all(cur, xfs_repair_alloc_extent_fn, &ra);
> > +	if (error)
> > +		goto err;
> > +	xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
> > +	cur = NULL;
> > +
> > +	/* Insert a record for space between the last rmap and EOAG. */
> > +	agend = be32_to_cpu(XFS_BUF_TO_AGF(sc->sa.agf_bp)->agf_length);
> > +	if (ra.next_bno < agend) {
> > +		rae = kmem_alloc(sizeof(struct xfs_repair_alloc_extent),
> > +				KM_MAYFAIL);
> > +		if (!rae) {
> > +			error = -ENOMEM;
> > +			goto err;
> > +		}
> > +		INIT_LIST_HEAD(&rae->list);
> > +		rae->bno = ra.next_bno;
> > +		rae->len = agend - ra.next_bno;
> > +		list_add_tail(&rae->list, free_extents);
> > +		ra.nr_records++;
> > +	}
> > +
> > +	/* Collect all the AGFL blocks. */
> > +	error = xfs_agfl_walk(mp, XFS_BUF_TO_AGF(sc->sa.agf_bp),
> > +			sc->sa.agfl_bp, xfs_repair_collect_agfl_block, &ra);
> > +	if (error)
> > +		goto err;
> > +
> > +	/* Do we actually have enough space to do this? */
> > +	nr_blocks = 2 * xfs_allocbt_calc_size(mp, ra.nr_records);
> 
> 	/* Do we have enough space to rebuild both freespace trees? */
> 
> (explains the multiplication by 2)

Yep, will fix.

> > +	if (!xfs_repair_ag_has_space(sc->sa.pag, nr_blocks, XFS_AG_RESV_NONE) ||
> > +	    ra.nr_blocks < nr_blocks) {
> > +		error = -ENOSPC;
> > +		goto err;
> > +	}
> > +
> > +	/* Compute the old bnobt/cntbt blocks. */
> > +	error = xfs_repair_subtract_extents(sc, old_allocbt_blocks,
> > +			&ra.nobtlist);
> > +	if (error)
> > +		goto err;
> > +	xfs_repair_cancel_btree_extents(sc, &ra.nobtlist);
> > +	return 0;
> > +
> > +err:
> > +	xfs_repair_cancel_btree_extents(sc, &ra.nobtlist);
> > +	if (cur)
> > +		xfs_btree_del_cursor(cur, XFS_BTREE_ERROR);
> > +	return error;
> 
> Error stacking here can be cleaned up - we don't need an extra stack
> as the cursor is NULL when finished with. Hence it could just be:
> 
> 	/* Compute the old bnobt/cntbt blocks. */
> 	error = xfs_repair_subtract_extents(sc, old_allocbt_blocks,
> 			&ra.nobtlist);
> err:
> 	xfs_repair_cancel_btree_extents(sc, &ra.nobtlist);
> 	if (cur)
> 		xfs_btree_del_cursor(cur, XFS_BTREE_ERROR);

TBH I've been tempted for years to refactor this thing to take error
directly rather than require this XFS_BTREE_{NO,}ERROR business.
There's only two choices, and we nearly always decide using error == 0.

> 	return error;
> }
> 
> 
> > +}
> > +
> > +/*
> > + * Reset the global free block counter and the per-AG counters to make it look
> > + * like this AG has no free space.
> > + */
> > +STATIC int
> > +xfs_repair_allocbt_reset_counters(
> > +	struct xfs_scrub_context	*sc,
> > +	int				*log_flags)
> > +{
> > +	struct xfs_perag		*pag = sc->sa.pag;
> > +	struct xfs_agf			*agf;
> > +	xfs_extlen_t			oldf;
> > +	xfs_agblock_t			rmap_blocks;
> > +	int				error;
> > +
> > +	/*
> > +	 * Since we're abandoning the old bnobt/cntbt, we have to
> > +	 * decrease fdblocks by the # of blocks in those trees.
> > +	 * btreeblks counts the non-root blocks of the free space
> > +	 * and rmap btrees.  Do this before resetting the AGF counters.
> 
> Comment can use 80 columns.
> 
> > +	agf = XFS_BUF_TO_AGF(sc->sa.agf_bp);
> > +	rmap_blocks = be32_to_cpu(agf->agf_rmap_blocks) - 1;
> > +	oldf = pag->pagf_btreeblks + 2;
> > +	oldf -= rmap_blocks;
> 
> Convoluted. The comment really didn't help me understand what oldf
> is accounting.
> 
> Ah, rmap_blocks is actually the new btreeblks count. OK.
> 
> 	/*
> 	 * Since we're abandoning the old bnobt/cntbt, we have to decrease
> 	 * fdblocks by the # of blocks in those trees.  btreeblks counts the
> 	 * non-root blocks of the free space and rmap btrees.  Do this before
> 	 * resetting the AGF counters.
> 	 */
> 
> 	agf = XFS_BUF_TO_AGF(sc->sa.agf_bp);
> 
> 	/* rmap_blocks accounts root block, btreeblks doesn't */
> 	new_btblks = be32_to_cpu(agf->agf_rmap_blocks) - 1;
> 
> 	/* btreeblks doesn't account bno/cnt root blocks */
> 	to_free = pag->pagf_btreeblks + 2;
> 
> 	/* and don't account for the blocks we aren't freeing */
> 	to_free -= new_btblks;

Ok, I'll do that.

> 
> > +	error = xfs_mod_fdblocks(sc->mp, -(int64_t)oldf, false);
> > +	if (error)
> > +		return error;
> > +
> > +	/* Reset the per-AG info, both incore and ondisk. */
> > +	pag->pagf_btreeblks = rmap_blocks;
> > +	pag->pagf_freeblks = 0;
> > +	pag->pagf_longest = 0;
> > +
> > +	agf->agf_btreeblks = cpu_to_be32(pag->pagf_btreeblks);
> 
> I'd prefer that you use new_btblks here, too. Easier to see at a
> glance that the on-disk agf is being set to the new value....

Ok.

> 
> 
> > +	agf->agf_freeblks = 0;
> > +	agf->agf_longest = 0;
> > +	*log_flags |= XFS_AGF_BTREEBLKS | XFS_AGF_LONGEST | XFS_AGF_FREEBLKS;
> > +
> > +	return 0;
> > +}
> > +
> > +/* Initialize new bnobt/cntbt roots and implant them into the AGF. */
> > +STATIC int
> > +xfs_repair_allocbt_reset_btrees(
> > +	struct xfs_scrub_context	*sc,
> > +	struct list_head		*free_extents,
> > +	int				*log_flags)
> > +{
> > +	struct xfs_owner_info		oinfo;
> > +	struct xfs_repair_alloc_extent	*cached = NULL;
> > +	struct xfs_buf			*bp;
> > +	struct xfs_perag		*pag = sc->sa.pag;
> > +	struct xfs_mount		*mp = sc->mp;
> > +	struct xfs_agf			*agf;
> > +	xfs_fsblock_t			bnofsb;
> > +	xfs_fsblock_t			cntfsb;
> > +	int				error;
> > +
> > +	/* Allocate new bnobt root. */
> > +	bnofsb = xfs_repair_allocbt_alloc_block(sc, free_extents, &cached);
> > +	if (bnofsb == NULLFSBLOCK)
> > +		return -ENOSPC;
> 
> Does this happen after the free extent list has been sorted by bno
> order? It really should, that way the new root is as close to the
> the AGF as possible, and the new btree blocks will also tend to
> cluster towards the lower AG offsets.

Will do.

> > +	/* Allocate new cntbt root. */
> > +	cntfsb = xfs_repair_allocbt_alloc_block(sc, free_extents, &cached);
> > +	if (cntfsb == NULLFSBLOCK)
> > +		return -ENOSPC;
> > +
> > +	agf = XFS_BUF_TO_AGF(sc->sa.agf_bp);
> > +	/* Initialize new bnobt root. */
> > +	error = xfs_repair_init_btblock(sc, bnofsb, &bp, XFS_BTNUM_BNO,
> > +			&xfs_allocbt_buf_ops);
> > +	if (error)
> > +		return error;
> > +	agf->agf_roots[XFS_BTNUM_BNOi] =
> > +			cpu_to_be32(XFS_FSB_TO_AGBNO(mp, bnofsb));
> > +	agf->agf_levels[XFS_BTNUM_BNOi] = cpu_to_be32(1);
> > +
> > +	/* Initialize new cntbt root. */
> > +	error = xfs_repair_init_btblock(sc, cntfsb, &bp, XFS_BTNUM_CNT,
> > +			&xfs_allocbt_buf_ops);
> > +	if (error)
> > +		return error;
> > +	agf->agf_roots[XFS_BTNUM_CNTi] =
> > +			cpu_to_be32(XFS_FSB_TO_AGBNO(mp, cntfsb));
> > +	agf->agf_levels[XFS_BTNUM_CNTi] = cpu_to_be32(1);
> > +
> > +	/* Add rmap records for the btree roots */
> > +	xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_AG);
> > +	error = xfs_rmap_alloc(sc->tp, sc->sa.agf_bp, sc->sa.agno,
> > +			XFS_FSB_TO_AGBNO(mp, bnofsb), 1, &oinfo);
> > +	if (error)
> > +		return error;
> > +	error = xfs_rmap_alloc(sc->tp, sc->sa.agf_bp, sc->sa.agno,
> > +			XFS_FSB_TO_AGBNO(mp, cntfsb), 1, &oinfo);
> > +	if (error)
> > +		return error;
> > +
> > +	/* Reset the incore state. */
> > +	pag->pagf_levels[XFS_BTNUM_BNOi] = 1;
> > +	pag->pagf_levels[XFS_BTNUM_CNTi] = 1;
> > +
> > +	*log_flags |=  XFS_AGF_ROOTS | XFS_AGF_LEVELS;
> > +	return 0;
> 
> Rather than duplicating all this init code twice, would factoring it
> make sense? The only difference between the alloc/init of the two
> btrees is the array index that info is stored in....

Yeah, it would.

> > +}
> > +
> > +/* Build new free space btrees and dispose of the old one. */
> > +STATIC int
> > +xfs_repair_allocbt_rebuild_trees(
> > +	struct xfs_scrub_context	*sc,
> > +	struct list_head		*free_extents,
> > +	struct xfs_repair_extent_list	*old_allocbt_blocks)
> > +{
> > +	struct xfs_owner_info		oinfo;
> > +	struct xfs_repair_alloc_extent	*rae;
> > +	struct xfs_repair_alloc_extent	*n;
> > +	struct xfs_repair_alloc_extent	*longest;
> > +	int				error;
> > +
> > +	xfs_rmap_skip_owner_update(&oinfo);
> > +
> > +	/*
> > +	 * Insert the longest free extent in case it's necessary to
> > +	 * refresh the AGFL with multiple blocks.  If there is no longest
> > +	 * extent, we had exactly the free space we needed; we're done.
> > +	 */
> > +	longest = xfs_repair_allocbt_get_longest(free_extents);
> > +	if (!longest)
> > +		goto done;
> > +	error = xfs_repair_allocbt_free_extent(sc,
> > +			XFS_AGB_TO_FSB(sc->mp, sc->sa.agno, longest->bno),
> > +			longest->len, &oinfo);
> > +	list_del(&longest->list);
> > +	kmem_free(longest);
> > +	if (error)
> > +		return error;
> > +
> > +	/* Insert records into the new btrees. */
> > +	list_sort(NULL, free_extents, xfs_repair_allocbt_extent_cmp);
> 
> Hmmm. I guess list sorting doesn't occur before allocating new root
> blocks. Can this get moved?

Certainly.

> ....
> 
> > +bool
> > +xfs_extent_busy_list_empty(
> > +	struct xfs_perag	*pag);
> 
> One line form for header prototypes, please.

Ok.

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

next prev parent reply	other threads:[~2018-07-04  2:15 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-24 19:23 [PATCH v16 00/21] xfs-4.19: online repair support Darrick J. Wong
2018-06-24 19:23 ` [PATCH 01/21] xfs: don't assume a left rmap when allocating a new rmap Darrick J. Wong
2018-06-27  0:54   ` Dave Chinner
2018-06-28 21:11   ` Allison Henderson
2018-06-29 14:39     ` Darrick J. Wong
2018-06-24 19:23 ` [PATCH 02/21] xfs: add helper to decide if an inode has allocated cow blocks Darrick J. Wong
2018-06-27  1:02   ` Dave Chinner
2018-06-28 21:12   ` Allison Henderson
2018-06-24 19:23 ` [PATCH 03/21] xfs: refactor part of xfs_free_eofblocks Darrick J. Wong
2018-06-28 21:13   ` Allison Henderson
2018-06-24 19:23 ` [PATCH 04/21] xfs: repair the AGF and AGFL Darrick J. Wong
2018-06-27  2:19   ` Dave Chinner
2018-06-27 16:44     ` Allison Henderson
2018-06-27 23:37       ` Dave Chinner
2018-06-29 15:14         ` Darrick J. Wong
2018-06-28 17:25     ` Allison Henderson
2018-06-29 15:08       ` Darrick J. Wong
2018-06-28 21:14   ` Allison Henderson
2018-06-28 23:21     ` Dave Chinner
2018-06-29  1:35       ` Allison Henderson
2018-06-29 14:55         ` Darrick J. Wong
2018-06-24 19:24 ` [PATCH 05/21] xfs: repair the AGI Darrick J. Wong
2018-06-27  2:22   ` Dave Chinner
2018-06-28 21:15   ` Allison Henderson
2018-06-24 19:24 ` [PATCH 06/21] xfs: repair free space btrees Darrick J. Wong
2018-06-27  3:21   ` Dave Chinner
2018-07-04  2:15     ` Darrick J. Wong [this message]
2018-07-04  2:25       ` Dave Chinner
2018-06-30 17:36   ` Allison Henderson
2018-06-24 19:24 ` [PATCH 07/21] xfs: repair inode btrees Darrick J. Wong
2018-06-28  0:55   ` Dave Chinner
2018-07-04  2:22     ` Darrick J. Wong
2018-06-30 17:36   ` Allison Henderson
2018-06-30 18:30     ` Darrick J. Wong
2018-07-01  0:45       ` Allison Henderson
2018-06-24 19:24 ` [PATCH 08/21] xfs: defer iput on certain inodes while scrub / repair are running Darrick J. Wong
2018-06-28 23:37   ` Dave Chinner
2018-06-29 14:49     ` Darrick J. Wong
2018-06-24 19:24 ` [PATCH 09/21] xfs: finish our set of inode get/put tracepoints for scrub Darrick J. Wong
2018-06-24 19:24 ` [PATCH 10/21] xfs: introduce online scrub freeze Darrick J. Wong
2018-06-24 19:24 ` [PATCH 11/21] xfs: repair the rmapbt Darrick J. Wong
2018-07-03  5:32   ` Dave Chinner
2018-07-03 23:59     ` Darrick J. Wong
2018-07-04  8:44       ` Carlos Maiolino
2018-07-04 18:40         ` Darrick J. Wong
2018-07-04 23:21       ` Dave Chinner
2018-07-05  3:48         ` Darrick J. Wong
2018-07-05  7:03           ` Dave Chinner
2018-07-06  0:47             ` Darrick J. Wong
2018-07-06  1:08               ` Dave Chinner
2018-06-24 19:24 ` [PATCH 12/21] xfs: repair refcount btrees Darrick J. Wong
2018-07-03  5:50   ` Dave Chinner
2018-07-04  2:23     ` Darrick J. Wong
2018-06-24 19:24 ` [PATCH 13/21] xfs: repair inode records Darrick J. Wong
2018-07-03  6:17   ` Dave Chinner
2018-07-04  0:16     ` Darrick J. Wong
2018-07-04  1:03       ` Dave Chinner
2018-07-04  1:30         ` Darrick J. Wong
2018-06-24 19:24 ` [PATCH 14/21] xfs: zap broken inode forks Darrick J. Wong
2018-07-04  2:07   ` Dave Chinner
2018-07-04  3:26     ` Darrick J. Wong
2018-06-24 19:25 ` [PATCH 15/21] xfs: repair inode block maps Darrick J. Wong
2018-07-04  3:00   ` Dave Chinner
2018-07-04  3:41     ` Darrick J. Wong
2018-06-24 19:25 ` [PATCH 16/21] xfs: repair damaged symlinks Darrick J. Wong
2018-07-04  5:45   ` Dave Chinner
2018-07-04 18:45     ` Darrick J. Wong
2018-06-24 19:25 ` [PATCH 17/21] xfs: repair extended attributes Darrick J. Wong
2018-07-06  1:03   ` Dave Chinner
2018-07-06  3:10     ` Darrick J. Wong
2018-06-24 19:25 ` [PATCH 18/21] xfs: scrub should set preen if attr leaf has holes Darrick J. Wong
2018-06-29  2:52   ` Dave Chinner
2018-06-24 19:25 ` [PATCH 19/21] xfs: repair quotas Darrick J. Wong
2018-07-06  1:50   ` Dave Chinner
2018-07-06  3:16     ` Darrick J. Wong
2018-06-24 19:25 ` [PATCH 20/21] xfs: implement live quotacheck as part of quota repair Darrick J. Wong
2018-06-24 19:25 ` [PATCH 21/21] xfs: add online scrub/repair for superblock counters Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180704021504.GA32415@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).