linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Brian Foster <bfoster@redhat.com>
Cc: linux-xfs@vger.kernel.org, david@fromorbit.com,
	allison.henderson@oracle.com
Subject: Re: [PATCH 04/16] xfs: repair the AGF
Date: Fri, 27 Jul 2018 11:19:45 -0700	[thread overview]
Message-ID: <20180727181945.GM30972@magnolia> (raw)
In-Reply-To: <20180727162555.GF22227@bfoster>

On Fri, Jul 27, 2018 at 12:25:56PM -0400, Brian Foster wrote:
> On Fri, Jul 27, 2018 at 09:02:38AM -0700, Darrick J. Wong wrote:
> > On Fri, Jul 27, 2018 at 10:23:48AM -0400, Brian Foster wrote:
> > > On Wed, Jul 25, 2018 at 05:19:55PM -0700, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > > 
> > > > Regenerate the AGF from the rmap data.
> > > > 
> > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > > ---
> > > 
> > > Mostly seems sane to me. I still need to come up to speed on the broader
> > > xfs_scrub context. A few comments in the meantime..
> > 
> > <nod> Thanks for taking a look at this series. :)
> > 
> > > >  fs/xfs/scrub/agheader_repair.c |  366 ++++++++++++++++++++++++++++++++++++++++
> > > >  fs/xfs/scrub/repair.c          |   27 ++-
> > > >  fs/xfs/scrub/repair.h          |    4 
> > > >  fs/xfs/scrub/scrub.c           |    2 
> > > >  4 files changed, 389 insertions(+), 10 deletions(-)
> > > > 
> > > > 
> > > > diff --git a/fs/xfs/scrub/agheader_repair.c b/fs/xfs/scrub/agheader_repair.c
> > > > index 1e96621ece3a..938af216cb1c 100644
> > > > --- a/fs/xfs/scrub/agheader_repair.c
> > > > +++ b/fs/xfs/scrub/agheader_repair.c
> > > ...
> > > > @@ -54,3 +61,362 @@ xrep_superblock(
> > > >  	xfs_trans_log_buf(sc->tp, bp, 0, BBTOB(bp->b_length) - 1);
> > > >  	return error;
> > > >  }
> > > ...
> > > > +/* Update all AGF fields which derive from btree contents. */
> > > > +STATIC int
> > > > +xrep_agf_calc_from_btrees(
> > > > +	struct xfs_scrub	*sc,
> > > > +	struct xfs_buf		*agf_bp)
> > > > +{
> > > > +	struct xrep_agf_allocbt	raa = { .sc = sc };
> > > > +	struct xfs_btree_cur	*cur = NULL;
> > > > +	struct xfs_agf		*agf = XFS_BUF_TO_AGF(agf_bp);
> > > > +	struct xfs_mount	*mp = sc->mp;
> > > > +	xfs_agblock_t		btreeblks;
> > > > +	xfs_agblock_t		blocks;
> > > > +	int			error;
> > > > +
> > > > +	/* Update the AGF counters from the bnobt. */
> > > > +	cur = xfs_allocbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.agno,
> > > > +			XFS_BTNUM_BNO);
> > > > +	error = xfs_alloc_query_all(cur, xrep_agf_walk_allocbt, &raa);
> > > > +	if (error)
> > > > +		goto err;
> > > > +	error = xfs_btree_count_blocks(cur, &blocks);
> > > > +	if (error)
> > > > +		goto err;
> > > > +	xfs_btree_del_cursor(cur, error);
> > > > +	btreeblks = blocks - 1;
> > > 
> > > Why the -1? We don't count the root or something?
> > 
> > The AGF btreeblks field only counts the number of blocks added to the
> > bno/cnt/rmapbt since they were initialized (each with a single root
> > block).  I find it a little strange not to count the root, but oh well.
> > 
> 
> Got it.
> 
> > > > +	agf->agf_freeblks = cpu_to_be32(raa.freeblks);
> > > > +	agf->agf_longest = cpu_to_be32(raa.longest);
> > > > +
> > > > +	/* Update the AGF counters from the cntbt. */
> > > > +	cur = xfs_allocbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.agno,
> > > > +			XFS_BTNUM_CNT);
> > > > +	error = xfs_btree_count_blocks(cur, &blocks);
> > > > +	if (error)
> > > > +		goto err;
> > > > +	xfs_btree_del_cursor(cur, error);
> > > > +	btreeblks += blocks - 1;
> > > > +
> > > > +	/* Update the AGF counters from the rmapbt. */
> > > > +	cur = xfs_rmapbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.agno);
> > > > +	error = xfs_btree_count_blocks(cur, &blocks);
> > > > +	if (error)
> > > > +		goto err;
> > > > +	xfs_btree_del_cursor(cur, error);
> > > > +	agf->agf_rmap_blocks = cpu_to_be32(blocks);
> > > > +	btreeblks += blocks - 1;
> > > > +
> > > > +	agf->agf_btreeblks = cpu_to_be32(btreeblks);
> > > > +
> > > > +	/* Update the AGF counters from the refcountbt. */
> > > > +	if (xfs_sb_version_hasreflink(&mp->m_sb)) {
> > > > +		cur = xfs_refcountbt_init_cursor(mp, sc->tp, agf_bp,
> > > > +				sc->sa.agno, NULL);
> > > 
> > > FYI this fails to compile on for-next (dfops param has been removed).
> > 
> > Yeah, I'm working on a rebase to for-next (once I settle on the locking
> > question in hch's "reduce cow lookups" series).
> > 
> > > > +		error = xfs_btree_count_blocks(cur, &blocks);
> > > > +		if (error)
> > > > +			goto err;
> > > > +		xfs_btree_del_cursor(cur, error);
> > > > +		agf->agf_refcount_blocks = cpu_to_be32(blocks);
> > > > +	}
> > > > +
> > > > +	return 0;
> > > > +err:
> > > > +	xfs_btree_del_cursor(cur, error);
> > > > +	return error;
> > > > +}
> > > > +
> ...
> > > > +/* Repair the AGF. v5 filesystems only. */
> > > > +int
> > > > +xrep_agf(
> > > > +	struct xfs_scrub		*sc)
> > > > +{
> > > > +	struct xrep_find_ag_btree	fab[XREP_AGF_MAX] = {
> > > > +		[XREP_AGF_BNOBT] = {
> > > > +			.rmap_owner = XFS_RMAP_OWN_AG,
> > > > +			.buf_ops = &xfs_allocbt_buf_ops,
> > > > +			.magic = XFS_ABTB_CRC_MAGIC,
> > > > +		},
> > > > +		[XREP_AGF_CNTBT] = {
> > > > +			.rmap_owner = XFS_RMAP_OWN_AG,
> > > > +			.buf_ops = &xfs_allocbt_buf_ops,
> > > > +			.magic = XFS_ABTC_CRC_MAGIC,
> > > > +		},
> > > > +		[XREP_AGF_RMAPBT] = {
> > > > +			.rmap_owner = XFS_RMAP_OWN_AG,
> > > > +			.buf_ops = &xfs_rmapbt_buf_ops,
> > > > +			.magic = XFS_RMAP_CRC_MAGIC,
> > > > +		},
> > > > +		[XREP_AGF_REFCOUNTBT] = {
> > > > +			.rmap_owner = XFS_RMAP_OWN_REFC,
> > > > +			.buf_ops = &xfs_refcountbt_buf_ops,
> > > > +			.magic = XFS_REFC_CRC_MAGIC,
> > > > +		},
> > > > +		[XREP_AGF_END] = {
> > > > +			.buf_ops = NULL,
> > > > +		},
> > > > +	};
> > > > +	struct xfs_agf			old_agf;
> > > > +	struct xfs_mount		*mp = sc->mp;
> > > > +	struct xfs_buf			*agf_bp;
> > > > +	struct xfs_buf			*agfl_bp;
> > > > +	struct xfs_agf			*agf;
> > > > +	int				error;
> > > > +
> > > > +	/* We require the rmapbt to rebuild anything. */
> > > > +	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
> > > > +		return -EOPNOTSUPP;
> > > > +
> > > > +	xchk_perag_get(sc->mp, &sc->sa);
> > > > +	error = xfs_trans_read_buf(mp, sc->tp, mp->m_ddev_targp,
> > > > +			XFS_AG_DADDR(mp, sc->sa.agno, XFS_AGF_DADDR(mp)),
> > > > +			XFS_FSS_TO_BB(mp, 1), 0, &agf_bp, NULL);
> > > > +	if (error)
> > > > +		return error;
> > > > +	agf_bp->b_ops = &xfs_agf_buf_ops;
> > > 
> > > Any reason we don't call xfs_read_agf() here? It looks like we use the
> > > similar helper for the agfl below.
> > 
> > We're grabbing the agf buffer without read verifiers so that we can
> > reinitialize it.  Note that scrub tries xfs_read_agf, and if it fails
> > with -EFSCORRUPTED/-EFSBADCRC it marks the agf as corrupt, so it's
> > possible that sc->sa.sa_agf is still null.
> > 
> 
> Ah, makes sense. I missed the NULL b_ops..
> 
> > This probably could have been trans_get_buf though...
> > 
> > > > +	agf = XFS_BUF_TO_AGF(agf_bp);
> > > > +
> > > > +	/*
> > > > +	 * Load the AGFL so that we can screen out OWN_AG blocks that are on
> > > > +	 * the AGFL now; these blocks might have once been part of the
> > > > +	 * bno/cnt/rmap btrees but are not now.  This is a chicken and egg
> > > > +	 * problem: the AGF is corrupt, so we have to trust the AGFL contents
> > > > +	 * because we can't do any serious cross-referencing with any of the
> > > > +	 * btrees rooted in the AGF.  If the AGFL contents are obviously bad
> > > > +	 * then we'll bail out.
> > > > +	 */
> > > > +	error = xfs_alloc_read_agfl(mp, sc->tp, sc->sa.agno, &agfl_bp);
> > > > +	if (error)
> > > > +		return error;
> > > > +
> > > > +	/*
> > > > +	 * Spot-check the AGFL blocks; if they're obviously corrupt then
> > > > +	 * there's nothing we can do but bail out.
> > > > +	 */
> > > 
> > > Why? Can't we reset the agfl, or is that handled elsewhere?
> > 
> > It's handled in xrep_agfl, but userspace will have to call us again to
> > fix the agfl and then call us a third time about the agf repair.
> > 
> 
> Ok, this was one of the things I feel like I don't have enough context
> on wrt to online repair: in general, what dependent structures we expect
> to be consistent in order to repair some other interrelated structure.
> Userspace repair is straightforward in this regard since we slurp the
> whole fs into memory, adjust the global state as we go, then essentially
> regenerate new metadata based on the finalized state.

<nod>

> For online repair, it sounds like we're potentially limited because
> repair of one structure may always depend on some other subset of
> metadata being consistent, right?

Correct.  There's an implicit assumption here (which is coded into a
warning message in xfs_scrub) that if the primary and secondary metadata
(rmapbt usually) are corrupt then the fs may be unrecoverable online...

> If so, is the goal of online repair to essentially "do the best we can
> but otherwise the most severe corruptions may have to always fall back
> to xfs_repair?"

...and so the only recourse is xfs_repair, with the usual caveat that a
severely damaged filesystem might be a total loss.  If the online repair
fails then the fs will be shut down (either due to cancelling a dirty
repair transaction or because xfs_scrub can call FS_IOC_SHUTDOWN after a
failure).

> So in this particular case, we expect a sane agfl and otherwise buzz off
> because 1.) this is a targeted agf repair request and 2.) we have a
> separate request to deal with the agfl. It sounds like the smarts to
> understand how we might have to jump back and forth between them is in
> userspace, so the end-user doesn't necessarily have to understand the
> dependency.

Correct.  Userspace should be running xfs_scrub, which knows in which
order metadata has to be checked, and what dependencies must be
satisfied for repairs to succeed.  While it's possible to invoke it
manually via xfs_io, in practice nobody but xfstests should be using
that method of invocation.

--D

> Brian
> 
> > (xfs_scrub does this, naturally...)
> > 
> > --D
> > 
> > > Brian
> > > 
> > > > +	error = xfs_agfl_walk(sc->mp, XFS_BUF_TO_AGF(agf_bp), agfl_bp,
> > > > +			xrep_agf_check_agfl_block, sc);
> > > > +	if (error)
> > > > +		return error;
> > > > +
> > > > +	/*
> > > > +	 * Find the AGF btree roots.  This is also a chicken-and-egg situation;
> > > > +	 * see the function for more details.
> > > > +	 */
> > > > +	error = xrep_agf_find_btrees(sc, agf_bp, fab, agfl_bp);
> > > > +	if (error)
> > > > +		return error;
> > > > +
> > > > +	/* Start rewriting the header and implant the btrees we found. */
> > > > +	xrep_agf_init_header(sc, agf_bp, &old_agf);
> > > > +	xrep_agf_set_roots(sc, agf, fab);
> > > > +	error = xrep_agf_calc_from_btrees(sc, agf_bp);
> > > > +	if (error)
> > > > +		goto out_revert;
> > > > +
> > > > +	/* Commit the changes and reinitialize incore state. */
> > > > +	return xrep_agf_commit_new(sc, agf_bp);
> > > > +
> > > > +out_revert:
> > > > +	/* Mark the incore AGF state stale and revert the AGF. */
> > > > +	sc->sa.pag->pagf_init = 0;
> > > > +	memcpy(agf, &old_agf, sizeof(old_agf));
> > > > +	return error;
> > > > +}
> > > > diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c
> > > > index 85b048b341a0..17cf48564390 100644
> > > > --- a/fs/xfs/scrub/repair.c
> > > > +++ b/fs/xfs/scrub/repair.c
> > > > @@ -128,9 +128,12 @@ xrep_roll_ag_trans(
> > > >  	int			error;
> > > >  
> > > >  	/* Keep the AG header buffers locked so we can keep going. */
> > > > -	xfs_trans_bhold(sc->tp, sc->sa.agi_bp);
> > > > -	xfs_trans_bhold(sc->tp, sc->sa.agf_bp);
> > > > -	xfs_trans_bhold(sc->tp, sc->sa.agfl_bp);
> > > > +	if (sc->sa.agi_bp)
> > > > +		xfs_trans_bhold(sc->tp, sc->sa.agi_bp);
> > > > +	if (sc->sa.agf_bp)
> > > > +		xfs_trans_bhold(sc->tp, sc->sa.agf_bp);
> > > > +	if (sc->sa.agfl_bp)
> > > > +		xfs_trans_bhold(sc->tp, sc->sa.agfl_bp);
> > > >  
> > > >  	/* Roll the transaction. */
> > > >  	error = xfs_trans_roll(&sc->tp);
> > > > @@ -138,9 +141,12 @@ xrep_roll_ag_trans(
> > > >  		goto out_release;
> > > >  
> > > >  	/* Join AG headers to the new transaction. */
> > > > -	xfs_trans_bjoin(sc->tp, sc->sa.agi_bp);
> > > > -	xfs_trans_bjoin(sc->tp, sc->sa.agf_bp);
> > > > -	xfs_trans_bjoin(sc->tp, sc->sa.agfl_bp);
> > > > +	if (sc->sa.agi_bp)
> > > > +		xfs_trans_bjoin(sc->tp, sc->sa.agi_bp);
> > > > +	if (sc->sa.agf_bp)
> > > > +		xfs_trans_bjoin(sc->tp, sc->sa.agf_bp);
> > > > +	if (sc->sa.agfl_bp)
> > > > +		xfs_trans_bjoin(sc->tp, sc->sa.agfl_bp);
> > > >  
> > > >  	return 0;
> > > >  
> > > > @@ -150,9 +156,12 @@ xrep_roll_ag_trans(
> > > >  	 * buffers will be released during teardown on our way out
> > > >  	 * of the kernel.
> > > >  	 */
> > > > -	xfs_trans_bhold_release(sc->tp, sc->sa.agi_bp);
> > > > -	xfs_trans_bhold_release(sc->tp, sc->sa.agf_bp);
> > > > -	xfs_trans_bhold_release(sc->tp, sc->sa.agfl_bp);
> > > > +	if (sc->sa.agi_bp)
> > > > +		xfs_trans_bhold_release(sc->tp, sc->sa.agi_bp);
> > > > +	if (sc->sa.agf_bp)
> > > > +		xfs_trans_bhold_release(sc->tp, sc->sa.agf_bp);
> > > > +	if (sc->sa.agfl_bp)
> > > > +		xfs_trans_bhold_release(sc->tp, sc->sa.agfl_bp);
> > > >  
> > > >  	return error;
> > > >  }
> > > > diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h
> > > > index 5a4e92221916..1d283360b5ab 100644
> > > > --- a/fs/xfs/scrub/repair.h
> > > > +++ b/fs/xfs/scrub/repair.h
> > > > @@ -58,6 +58,8 @@ int xrep_ino_dqattach(struct xfs_scrub *sc);
> > > >  
> > > >  int xrep_probe(struct xfs_scrub *sc);
> > > >  int xrep_superblock(struct xfs_scrub *sc);
> > > > +int xrep_agf(struct xfs_scrub *sc);
> > > > +int xrep_agfl(struct xfs_scrub *sc);
> > > >  
> > > >  #else
> > > >  
> > > > @@ -81,6 +83,8 @@ xrep_calc_ag_resblks(
> > > >  
> > > >  #define xrep_probe			xrep_notsupported
> > > >  #define xrep_superblock			xrep_notsupported
> > > > +#define xrep_agf			xrep_notsupported
> > > > +#define xrep_agfl			xrep_notsupported
> > > >  
> > > >  #endif /* CONFIG_XFS_ONLINE_REPAIR */
> > > >  
> > > > diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
> > > > index 6efb926f3cf8..1e8a17c8e2b9 100644
> > > > --- a/fs/xfs/scrub/scrub.c
> > > > +++ b/fs/xfs/scrub/scrub.c
> > > > @@ -214,7 +214,7 @@ static const struct xchk_meta_ops meta_scrub_ops[] = {
> > > >  		.type	= ST_PERAG,
> > > >  		.setup	= xchk_setup_fs,
> > > >  		.scrub	= xchk_agf,
> > > > -		.repair	= xrep_notsupported,
> > > > +		.repair	= xrep_agf,
> > > >  	},
> > > >  	[XFS_SCRUB_TYPE_AGFL]= {	/* agfl */
> > > >  		.type	= ST_PERAG,
> > > > 
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > the body of a message to majordomo@vger.kernel.org
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2018-07-27 19:42 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-26  0:19 [PATCH v17 00/16] xfs-4.19: online repair support Darrick J. Wong
2018-07-26  0:19 ` [PATCH 01/16] xfs: pass transaction lock while setting up agresv on cyclic metadata Darrick J. Wong
2018-07-27 14:21   ` Brian Foster
2018-07-26  0:19 ` [PATCH 02/16] xfs: move the repair extent list into its own file Darrick J. Wong
2018-07-27 14:21   ` Brian Foster
2018-07-26  0:19 ` [PATCH 03/16] xfs: refactor the xrep_extent_list into xfs_bitmap Darrick J. Wong
2018-07-27 14:21   ` Brian Foster
2018-07-27 15:52     ` Darrick J. Wong
2018-07-26  0:19 ` [PATCH 04/16] xfs: repair the AGF Darrick J. Wong
2018-07-27 14:23   ` Brian Foster
2018-07-27 16:02     ` Darrick J. Wong
2018-07-27 16:25       ` Brian Foster
2018-07-27 18:19         ` Darrick J. Wong [this message]
2018-07-26  0:20 ` [PATCH 05/16] xfs: repair the AGFL Darrick J. Wong
2018-07-26  0:20 ` [PATCH 06/16] xfs: repair the AGI Darrick J. Wong
2018-07-26  0:20 ` [PATCH 07/16] xfs: repair free space btrees Darrick J. Wong
2018-07-26  0:21 ` [PATCH 08/16] xfs: repair inode btrees Darrick J. Wong
2018-07-26  0:21 ` [PATCH 09/16] xfs: repair refcount btrees Darrick J. Wong
2018-07-26  0:21 ` [PATCH 10/16] xfs: repair inode records Darrick J. Wong
2018-07-26  0:21 ` [PATCH 11/16] xfs: zap broken inode forks Darrick J. Wong
2018-07-26  0:21 ` [PATCH 12/16] xfs: repair inode block maps Darrick J. Wong
2018-07-26  0:21 ` [PATCH 13/16] xfs: repair damaged symlinks Darrick J. Wong
2018-07-26  0:21 ` [PATCH 14/16] xfs: repair extended attributes Darrick J. Wong
2018-07-26  0:21 ` [PATCH 15/16] xfs: scrub should set preen if attr leaf has holes Darrick J. Wong
2018-07-26  0:21 ` [PATCH 16/16] xfs: repair quotas Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180727181945.GM30972@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=allison.henderson@oracle.com \
    --cc=bfoster@redhat.com \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).