Re: [PATCH 04/14] xfs: repair inode btrees

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 04/14] xfs: repair inode btrees
Date: Tue, 5 Jun 2018 20:55:28 -0700	[thread overview]
Message-ID: <20180606035528.GT9437@magnolia> (raw)
In-Reply-To: <20180604034130.GM10363@dastard>

On Mon, Jun 04, 2018 at 01:41:30PM +1000, Dave Chinner wrote:
> On Wed, May 30, 2018 at 12:31:04PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Use the rmapbt to find inode chunks, query the chunks to compute
> > hole and free masks, and with that information rebuild the inobt
> > and finobt.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> 
> [...]
> 
> > +xfs_repair_ialloc_check_free(
> > +	struct xfs_btree_cur	*cur,
> > +	struct xfs_buf		*bp,
> > +	xfs_ino_t		fsino,
> > +	xfs_agino_t		bpino,
> > +	bool			*inuse)
> > +{
> > +	struct xfs_mount	*mp = cur->bc_mp;
> > +	struct xfs_dinode	*dip;
> > +	int			error;
> > +
> > +	/* Will the in-core inode tell us if it's in use? */
> > +	error = xfs_icache_inode_is_allocated(mp, cur->bc_tp, fsino, inuse);
> > +	if (!error)
> > +		return 0;
> > +
> > +	/* Inode uncached or half assembled, read disk buffer */
> > +	dip = xfs_buf_offset(bp, bpino * mp->m_sb.sb_inodesize);
> > +	if (be16_to_cpu(dip->di_magic) != XFS_DINODE_MAGIC)
> > +		return -EFSCORRUPTED;
> 
> Do we hold the buffer locked here? i.e. can we race with someone
> else allocating/freeing/reading the inode?

I think repair should be ok from alloc/free because both of those paths
(xfs_dialloc/xfs_difree) will grab the AGI header, whereas repair locks
all three AG headers and keeps them locked until repairs are complete.
I don't think we have to worry about concurrent reads because the only
field we care about are di_mode/i_mode, which don't change outside of
inode allocation and freeing.

> > +
> > +	if (dip->di_version >= 3 && be64_to_cpu(dip->di_ino) != fsino)
> > +		return -EFSCORRUPTED;
> > +
> > +	*inuse = dip->di_mode != 0;
> > +	return 0;
> > +}
> > +
> > +/* Record extents that belong to inode btrees. */
> > +STATIC int
> > +xfs_repair_ialloc_extent_fn(
> > +	struct xfs_btree_cur		*cur,
> > +	struct xfs_rmap_irec		*rec,
> > +	void				*priv)
> > +{
> > +	struct xfs_imap			imap;
> > +	struct xfs_repair_ialloc	*ri = priv;
> > +	struct xfs_repair_ialloc_extent	*rie;
> > +	struct xfs_dinode		*dip;
> > +	struct xfs_buf			*bp;
> > +	struct xfs_mount		*mp = cur->bc_mp;
> > +	xfs_ino_t			fsino;
> > +	xfs_inofree_t			usedmask;
> > +	xfs_fsblock_t			fsbno;
> > +	xfs_agnumber_t			agno;
> > +	xfs_agblock_t			agbno;
> > +	xfs_agino_t			cdist;
> > +	xfs_agino_t			startino;
> > +	xfs_agino_t			clusterino;
> > +	xfs_agino_t			nr_inodes;
> > +	xfs_agino_t			inoalign;
> > +	xfs_agino_t			agino;
> > +	xfs_agino_t			rmino;
> > +	uint16_t			fillmask;
> > +	bool				inuse;
> > +	int				blks_per_cluster;
> > +	int				usedcount;
> > +	int				error = 0;
> > +
> > +	if (xfs_scrub_should_terminate(ri->sc, &error))
> > +		return error;
> > +
> > +	/* Fragment of the old btrees; dispose of them later. */
> > +	if (rec->rm_owner == XFS_RMAP_OWN_INOBT) {
> > +		fsbno = XFS_AGB_TO_FSB(cur->bc_mp, cur->bc_private.a.agno,
> > +				rec->rm_startblock);
> > +		return xfs_repair_collect_btree_extent(ri->sc, &ri->btlist,
> > +				fsbno, rec->rm_blockcount);
> > +	}
> > +
> > +	/* Skip extents which are not owned by this inode and fork. */
> > +	if (rec->rm_owner != XFS_RMAP_OWN_INODES)
> > +		return 0;
> > +
> > +	agno = cur->bc_private.a.agno;
> > +	blks_per_cluster = xfs_icluster_size_fsb(mp);
> > +	nr_inodes = XFS_OFFBNO_TO_AGINO(mp, blks_per_cluster, 0);
> > +
> > +	if (rec->rm_startblock % blks_per_cluster != 0)
> > +		return -EFSCORRUPTED;
> > +
> > +	trace_xfs_repair_ialloc_extent_fn(mp, cur->bc_private.a.agno,
> > +			rec->rm_startblock, rec->rm_blockcount, rec->rm_owner,
> > +			rec->rm_offset, rec->rm_flags);
> > +
> > +	/*
> > +	 * Determine the inode block alignment, and where the block
> > +	 * ought to start if it's aligned properly.  On a sparse inode
> > +	 * system the rmap doesn't have to start on an alignment boundary,
> > +	 * but the record does.  On pre-sparse filesystems, we /must/
> > +	 * start both rmap and inobt on an alignment boundary.
> > +	 */
> > +	inoalign = xfs_ialloc_cluster_alignment(mp);
> > +	agbno = rec->rm_startblock;
> > +	agino = XFS_OFFBNO_TO_AGINO(mp, agbno, 0);
> > +	rmino = XFS_OFFBNO_TO_AGINO(mp, rounddown(agbno, inoalign), 0);
> > +	if (!xfs_sb_version_hassparseinodes(&mp->m_sb) && agino != rmino)
> > +		return -EFSCORRUPTED;
> > +
> > +	/*
> > +	 * For each cluster in this blob of inode, we must calculate the
> > +	 * properly aligned startino of that cluster, then iterate each
> > +	 * cluster to fill in used and filled masks appropriately.  We
> > +	 * then use the (startino, used, filled) information to construct
> > +	 * the appropriate inode records.
> > +	 */
> > +	for (agbno = rec->rm_startblock;
> > +	     agbno < rec->rm_startblock + rec->rm_blockcount;
> > +	     agbno += blks_per_cluster) {
> 
> I see a few problems with indenting and "just over" long lines here.
> Can you factor the loop internals into a separate function to reduce
> that issue? Say xfs_repair_ialloc_process_cluster()?

Ok, done.

> > +		/* The per-AG inum of this inode cluster. */
> > +		agino = XFS_OFFBNO_TO_AGINO(mp, agbno, 0);
> > +
> > +		/* The per-AG inum of the inobt record. */
> > +		startino = rmino +
> > +				rounddown(agino - rmino, XFS_INODES_PER_CHUNK);
> > +		cdist = agino - startino;
> 
> What's "cdist" mean? I can guess at it's meaning, but I don't recall
> seeing the inode number offset into a cluster been refered to as a
> distanced before....

cluster offset?

I wasn't sure of the terminology for the offset of the cluster within a
chunk, in units of ag inodes.

> > +		/* Every inode in this holemask slot is filled. */
> > +		fillmask = xfs_inobt_maskn(
> > +				cdist / XFS_INODES_PER_HOLEMASK_BIT,
> > +				nr_inodes / XFS_INODES_PER_HOLEMASK_BIT);
> > +
> > +		/* Grab the inode cluster buffer. */
> > +		imap.im_blkno = XFS_AGB_TO_DADDR(mp, agno, agbno);
> > +		imap.im_len = XFS_FSB_TO_BB(mp, blks_per_cluster);
> > +		imap.im_boffset = 0;
> > +
> > +		error = xfs_imap_to_bp(mp, cur->bc_tp, &imap,
> > +				&dip, &bp, 0, XFS_IGET_UNTRUSTED);
> > +		if (error)
> > +			return error;
> > +
> > +		usedmask = 0;
> > +		usedcount = 0;
> > +		/* Which inodes within this cluster are free? */
> > +		for (clusterino = 0; clusterino < nr_inodes; clusterino++) {
> > +			fsino = XFS_AGINO_TO_INO(mp, cur->bc_private.a.agno,
> > +					agino + clusterino);
> > +			error = xfs_repair_ialloc_check_free(cur, bp, fsino,
> > +					clusterino, &inuse);
> > +			if (error) {
> > +				xfs_trans_brelse(cur->bc_tp, bp);
> > +				return error;
> > +			}
> > +			if (inuse) {
> > +				usedcount++;
> > +				usedmask |= XFS_INOBT_MASK(cdist + clusterino);
> > +			}
> > +		}
> > +		xfs_trans_brelse(cur->bc_tp, bp);
> > +
> > +		/*
> > +		 * If the last item in the list is our chunk record,
> > +		 * update that.
> > +		 */
> > +		if (!list_empty(&ri->extlist)) {
> > +			rie = list_last_entry(&ri->extlist,
> > +					struct xfs_repair_ialloc_extent, list);
> > +			if (rie->startino + XFS_INODES_PER_CHUNK > startino) {
> > +				rie->freemask &= ~usedmask;
> > +				rie->holemask &= ~fillmask;
> > +				rie->count += nr_inodes;
> > +				rie->usedcount += usedcount;
> > +				continue;
> > +			}
> > +		}
> > +
> > +		/* New inode chunk; add to the list. */
> > +		rie = kmem_alloc(sizeof(struct xfs_repair_ialloc_extent),
> > +				KM_MAYFAIL);
> > +		if (!rie)
> > +			return -ENOMEM;
> > +
> > +		INIT_LIST_HEAD(&rie->list);
> > +		rie->startino = startino;
> > +		rie->freemask = XFS_INOBT_ALL_FREE & ~usedmask;
> > +		rie->holemask = XFS_INOBT_ALL_FREE & ~fillmask;
> > +		rie->count = nr_inodes;
> > +		rie->usedcount = usedcount;
> > +		list_add_tail(&rie->list, &ri->extlist);
> > +		ri->nr_records++;
> > +	}
> > +
> > +	return 0;
> > +}
> 
> [....]
> 
> > +/* Repair both inode btrees. */
> > +int
> > +xfs_repair_iallocbt(
> > +	struct xfs_scrub_context	*sc)
> > +{
> > +	struct xfs_repair_ialloc	ri;
> > +	struct xfs_owner_info		oinfo;
> > +	struct xfs_mount		*mp = sc->mp;
> > +	struct xfs_buf			*bp;
> > +	struct xfs_repair_ialloc_extent	*rie;
> > +	struct xfs_repair_ialloc_extent	*n;
> > +	struct xfs_agi			*agi;
> > +	struct xfs_btree_cur		*cur = NULL;
> > +	struct xfs_perag		*pag;
> > +	xfs_fsblock_t			inofsb;
> > +	xfs_fsblock_t			finofsb;
> > +	xfs_extlen_t			nr_blocks;
> > +	xfs_agino_t			old_count;
> > +	xfs_agino_t			old_freecount;
> > +	xfs_agino_t			freecount;
> > +	unsigned int			count;
> > +	unsigned int			usedcount;
> > +	int				logflags;
> > +	int				error = 0;
> > +
> > +	/* We require the rmapbt to rebuild anything. */
> > +	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
> > +		return -EOPNOTSUPP;
> 
> This could be factored similarly to the allocbt repair function.

Will do.

> > +
> > +	xfs_scrub_perag_get(sc->mp, &sc->sa);
> > +	pag = sc->sa.pag;
> > +	/* Collect all reverse mappings for inode blocks. */
> > +	xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_INOBT);
> > +	INIT_LIST_HEAD(&ri.extlist);
> > +	xfs_repair_init_extent_list(&ri.btlist);
> > +	ri.nr_records = 0;
> > +	ri.sc = sc;
> > +
> > +	cur = xfs_rmapbt_init_cursor(mp, sc->tp, sc->sa.agf_bp, sc->sa.agno);
> > +	error = xfs_rmap_query_all(cur, xfs_repair_ialloc_extent_fn, &ri);
> > +	if (error)
> > +		goto out;
> > +	xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
> > +	cur = NULL;
> > +
> > +	/* Do we actually have enough space to do this? */
> > +	nr_blocks = xfs_iallocbt_calc_size(mp, ri.nr_records);
> > +	if (xfs_sb_version_hasfinobt(&mp->m_sb))
> > +		nr_blocks *= 2;
> > +	if (!xfs_repair_ag_has_space(pag, nr_blocks, XFS_AG_RESV_NONE)) {
> > +		error = -ENOSPC;
> > +		goto out;
> > +	}
> > +
> > +	/* Invalidate all the inobt/finobt blocks in btlist. */
> > +	error = xfs_repair_invalidate_blocks(sc, &ri.btlist);
> > +	if (error)
> > +		goto out;
> > +
> > +	agi = XFS_BUF_TO_AGI(sc->sa.agi_bp);
> > +	/* Initialize new btree roots. */
> > +	error = xfs_repair_alloc_ag_block(sc, &oinfo, &inofsb,
> > +			XFS_AG_RESV_NONE);
> > +	if (error)
> > +		goto out;
> > +	error = xfs_repair_init_btblock(sc, inofsb, &bp, XFS_BTNUM_INO,
> > +			&xfs_inobt_buf_ops);
> > +	if (error)
> > +		goto out;
> > +	agi->agi_root = cpu_to_be32(XFS_FSB_TO_AGBNO(mp, inofsb));
> > +	agi->agi_level = cpu_to_be32(1);
> > +	logflags = XFS_AGI_ROOT | XFS_AGI_LEVEL;
> > +
> > +	if (xfs_sb_version_hasfinobt(&mp->m_sb)) {
> > +		error = xfs_repair_alloc_ag_block(sc, &oinfo, &finofsb,
> > +				mp->m_inotbt_nores ? XFS_AG_RESV_NONE :
> > +						     XFS_AG_RESV_METADATA);
> > +		if (error)
> > +			goto out;
> > +		error = xfs_repair_init_btblock(sc, finofsb, &bp,
> > +				XFS_BTNUM_FINO, &xfs_inobt_buf_ops);
> > +		if (error)
> > +			goto out;
> > +		agi->agi_free_root = cpu_to_be32(XFS_FSB_TO_AGBNO(mp, finofsb));
> > +		agi->agi_free_level = cpu_to_be32(1);
> > +		logflags |= XFS_AGI_FREE_ROOT | XFS_AGI_FREE_LEVEL;
> > +	}
> > +
> > +	xfs_ialloc_log_agi(sc->tp, sc->sa.agi_bp, logflags);
> > +	error = xfs_repair_roll_ag_trans(sc);
> > +	if (error)
> > +		goto out;
> > +
> > +	/* Insert records into the new btrees. */
> > +	count = 0;
> > +	usedcount = 0;
> > +	list_sort(NULL, &ri.extlist, xfs_repair_ialloc_extent_cmp);
> > +	list_for_each_entry_safe(rie, n, &ri.extlist, list) {
> > +		count += rie->count;
> > +		usedcount += rie->usedcount;
> > +
> > +		error = xfs_repair_iallocbt_insert_rec(sc, rie);
> > +		if (error)
> > +			goto out;
> > +
> > +		list_del(&rie->list);
> > +		kmem_free(rie);
> > +	}
> > +
> > +
> > +	/* Update the AGI counters. */
> > +	agi = XFS_BUF_TO_AGI(sc->sa.agi_bp);
> > +	old_count = be32_to_cpu(agi->agi_count);
> > +	old_freecount = be32_to_cpu(agi->agi_freecount);
> > +	freecount = count - usedcount;
> > +
> > +	xfs_repair_mod_ino_counts(sc, old_count, count, old_freecount,
> > +			freecount);
> > +
> > +	if (count != old_count) {
> > +		if (sc->sa.pag->pagi_init)
> > +			sc->sa.pag->pagi_count = count;
> > +		agi->agi_count = cpu_to_be32(count);
> > +		xfs_ialloc_log_agi(sc->tp, sc->sa.agi_bp, XFS_AGI_COUNT);
> > +	}
> > +
> > +	if (freecount != old_freecount) {
> > +		if (sc->sa.pag->pagi_init)
> > +			sc->sa.pag->pagi_freecount = freecount;
> 
> We've read the AGI buffer in at this point, right? so it is
> guaranteed that pagi_init is true, right?

Yeah.

> > +		agi->agi_freecount = cpu_to_be32(freecount);
> > +		xfs_ialloc_log_agi(sc->tp, sc->sa.agi_bp, XFS_AGI_FREECOUNT);
> > +	}
> > +
> > +	/* Free the old inode btree blocks if they're not in use. */
> > +	return xfs_repair_reap_btree_extents(sc, &ri.btlist, &oinfo,
> > +			XFS_AG_RESV_NONE);
> > +out:
> 
> out_error, perhaps, to distinguish it from the normal function
> return path? (and perhaps apply that to all the previous main reapir
> functions on factoring?)

Ok.

--D

> > +	if (cur)
> > +		xfs_btree_del_cursor(cur, XFS_BTREE_ERROR);
> > +	xfs_repair_cancel_btree_extents(sc, &ri.btlist);
> > +	list_for_each_entry_safe(rie, n, &ri.extlist, list) {
> > +		list_del(&rie->list);
> > +		kmem_free(rie);
> > +	}
> > +	return error;
> > +}
> 
> -Dave.
> 
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

next prev parent reply	other threads:[~2018-06-06  3:55 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-30 19:30 [PATCH v15.2 00/14] xfs-4.18: online repair support Darrick J. Wong
2018-05-30 19:30 ` [PATCH 01/14] xfs: repair the AGF and AGFL Darrick J. Wong
2018-06-04  1:52   ` Dave Chinner
2018-06-05 23:18     ` Darrick J. Wong
2018-06-06  4:06       ` Dave Chinner
2018-06-06  4:56         ` Darrick J. Wong
2018-06-07  0:31           ` Dave Chinner
2018-06-07  4:42             ` Darrick J. Wong
2018-06-08  0:55               ` Dave Chinner
2018-06-08  1:23                 ` Darrick J. Wong
2018-05-30 19:30 ` [PATCH 02/14] xfs: repair the AGI Darrick J. Wong
2018-06-04  1:56   ` Dave Chinner
2018-06-05 23:54     ` Darrick J. Wong
2018-05-30 19:30 ` [PATCH 03/14] xfs: repair free space btrees Darrick J. Wong
2018-06-04  2:12   ` Dave Chinner
2018-06-06  1:50     ` Darrick J. Wong
2018-06-06  3:34       ` Dave Chinner
2018-06-06  4:01         ` Darrick J. Wong
2018-05-30 19:31 ` [PATCH 04/14] xfs: repair inode btrees Darrick J. Wong
2018-06-04  3:41   ` Dave Chinner
2018-06-06  3:55     ` Darrick J. Wong [this message]
2018-06-06  4:32       ` Dave Chinner
2018-06-06  4:58         ` Darrick J. Wong
2018-05-30 19:31 ` [PATCH 05/14] xfs: repair the rmapbt Darrick J. Wong
2018-05-31  5:42   ` Amir Goldstein
2018-06-06 21:13     ` Darrick J. Wong
2018-05-30 19:31 ` [PATCH 06/14] xfs: repair refcount btrees Darrick J. Wong
2018-05-30 19:31 ` [PATCH 07/14] xfs: repair inode records Darrick J. Wong
2018-05-30 19:31 ` [PATCH 08/14] xfs: zap broken inode forks Darrick J. Wong
2018-05-30 19:31 ` [PATCH 09/14] xfs: repair inode block maps Darrick J. Wong
2018-05-30 19:31 ` [PATCH 10/14] xfs: repair damaged symlinks Darrick J. Wong
2018-05-30 19:31 ` [PATCH 11/14] xfs: repair extended attributes Darrick J. Wong
2018-05-30 19:31 ` [PATCH 12/14] xfs: scrub should set preen if attr leaf has holes Darrick J. Wong
2018-05-30 19:32 ` [PATCH 13/14] xfs: repair quotas Darrick J. Wong
2018-05-30 19:32 ` [PATCH 14/14] xfs: implement live quotacheck as part of quota repair Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180606035528.GT9437@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.