From: Dave Chinner <david@fromorbit.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 04/22] xfs: add helpers to dispose of old btree blocks after a repair
Date: Wed, 16 May 2018 18:32:32 +1000 [thread overview]
Message-ID: <20180516083232.GS23861@dastard> (raw)
In-Reply-To: <152642364418.1556.17533271717583168316.stgit@magnolia>
On Tue, May 15, 2018 at 03:34:04PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
>
> Now that we've plumbed in the ability to construct a list of dead btree
> blocks following a repair, add more helpers to dispose of them. This is
> done by examining the rmapbt -- if the btree was the only owner we can
> free the block, otherwise it's crosslinked and we can only remove the
> rmapbt record.
>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
> fs/xfs/scrub/repair.c | 200 +++++++++++++++++++++++++++++++++++++++++++++++++
> fs/xfs/scrub/repair.h | 3 +
> 2 files changed, 202 insertions(+), 1 deletion(-)
>
>
> diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c
> index 8e8ecddd7537..d820e01d1146 100644
> --- a/fs/xfs/scrub/repair.c
> +++ b/fs/xfs/scrub/repair.c
> @@ -362,6 +362,173 @@ xfs_repair_init_btblock(
> return 0;
> }
>
> +/* Ensure the freelist is the correct size. */
> +int
> +xfs_repair_fix_freelist(
> + struct xfs_scrub_context *sc,
> + bool can_shrink)
> +{
> + struct xfs_alloc_arg args = {0};
> + int error;
> +
> + args.mp = sc->mp;
> + args.tp = sc->tp;
> + args.agno = sc->sa.agno;
> + args.alignment = 1;
> + args.pag = xfs_perag_get(args.mp, sc->sa.agno);
> +
> + error = xfs_alloc_fix_freelist(&args,
> + can_shrink ? 0 : XFS_ALLOC_FLAG_NOSHRINK);
> + xfs_perag_put(args.pag);
with all these pag lookups, I'm starting to wonder if you should
just add a lookup and store the pag in the scrub context? That'd
save a lot of lookups - the pag structures never go away and i never
really intended them to be used like this in single, very short use
contexts. Grab once per operation, hold the reference across the
entire operation context, then release it....
> + return error;
> +}
> +
> +/*
> + * Put a block back on the AGFL.
> + */
> +STATIC int
> +xfs_repair_put_freelist(
> + struct xfs_scrub_context *sc,
> + xfs_agblock_t agbno)
> +{
> + struct xfs_owner_info oinfo;
> + struct xfs_perag *pag;
> + int error;
> +
> + /* Make sure there's space on the freelist. */
> + error = xfs_repair_fix_freelist(sc, true);
> + if (error)
> + return error;
> + pag = xfs_perag_get(sc->mp, sc->sa.agno);
Because this is how it quickly gets it gets to silly numbers of
lookups. That's two now in this function.
> + if (pag->pagf_flcount == 0) {
> + xfs_perag_put(pag);
> + return -EFSCORRUPTED;
Why is having an empty freelist a problem here? It's an AG thatis
completely out of space, but it isn't corruption? And I don't see
why an empty freelist prevents us from adding a backs back onto the
AGFL?
> + }
> + xfs_perag_put(pag);
> +
> + /*
> + * Since we're "freeing" a lost block onto the AGFL, we have to
> + * create an rmap for the block prior to merging it or else other
> + * parts will break.
> + */
> + xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_AG);
> + error = xfs_rmap_alloc(sc->tp, sc->sa.agf_bp, sc->sa.agno, agbno, 1,
> + &oinfo);
> + if (error)
> + return error;
> +
> + /* Put the block on the AGFL. */
> + error = xfs_alloc_put_freelist(sc->tp, sc->sa.agf_bp, sc->sa.agfl_bp,
> + agbno, 0);
There's another pag lookup in here.
> + if (error)
> + return error;
> + xfs_extent_busy_insert(sc->tp, sc->sa.agno, agbno, 1,
> + XFS_EXTENT_BUSY_SKIP_DISCARD);
And another in here, so 4 perag lookups for the same structure in
one simple operation. The code here in the function is fine, but I
really think we need to rethink how we use the perag in our
allocation code...
> +
> + return 0;
> +}
> +
> +/* Dispose of a single metadata block. */
> +STATIC int
> +xfs_repair_dispose_btree_block(
> + struct xfs_scrub_context *sc,
> + xfs_fsblock_t fsbno,
> + struct xfs_owner_info *oinfo,
> + enum xfs_ag_resv_type resv)
> +{
> + struct xfs_btree_cur *cur;
> + struct xfs_buf *agf_bp = NULL;
> + xfs_agnumber_t agno;
> + xfs_agblock_t agbno;
> + bool has_other_rmap;
> + int error;
> +
> + agno = XFS_FSB_TO_AGNO(sc->mp, fsbno);
> + agbno = XFS_FSB_TO_AGBNO(sc->mp, fsbno);
> +
> + if (sc->ip) {
> + /* Repairing per-inode metadata, read in the AGF. */
This would be better with a comment above saying:
/*
* If we are repairing per-inode metadata, we need to read
* in the AGF buffer. Otherwise we can re-use the existing
* AGF buffer we set up for repairing the per-AG btrees.
*/
> + error = xfs_alloc_read_agf(sc->mp, sc->tp, agno, 0, &agf_bp);
> + if (error)
> + return error;
> + if (!agf_bp)
> + return -ENOMEM;
> + } else {
> + /* Repairing per-AG btree, reuse existing AGF buffer. */
> + agf_bp = sc->sa.agf_bp;
> + }
> + cur = xfs_rmapbt_init_cursor(sc->mp, sc->tp, agf_bp, agno);
> +
> + /* Can we find any other rmappings? */
> + error = xfs_rmap_has_other_keys(cur, agbno, 1, oinfo, &has_other_rmap);
> + if (error)
> + goto out_cur;
> + xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
> +
> + /*
> + * If there are other rmappings, this block is cross linked and must
> + * not be freed. Remove the reverse mapping and move on. Otherwise,
Why do we just remove the reverse mapping if the block cannot be
freed? I have my suspicions that this is removing cross-links one by
one until there's only one reference left to the extent, but then I
ask "how do we know which one is the correct mapping"?
i.e. this comment raised more questions about the algorithm for
dealing with cross-linked blocks - which doesn't appear to be
explained anywhere - than it answers....
> + * we were the only owner of the block, so free the extent, which will
> + * also remove the rmap.
> + */
> + if (has_other_rmap)
> + error = xfs_rmap_free(sc->tp, agf_bp, agno, agbno, 1, oinfo);
> + else if (resv == XFS_AG_RESV_AGFL)
> + error = xfs_repair_put_freelist(sc, agbno);
> + else
> + error = xfs_free_extent(sc->tp, fsbno, 1, oinfo, resv);
> + if (agf_bp != sc->sa.agf_bp)
> + xfs_trans_brelse(sc->tp, agf_bp);
> + if (error)
> + return error;
> +
> + if (sc->ip)
> + return xfs_trans_roll_inode(&sc->tp, sc->ip);
> + return xfs_repair_roll_ag_trans(sc);
> +
> +out_cur:
> + xfs_btree_del_cursor(cur, XFS_BTREE_ERROR);
> + if (agf_bp != sc->sa.agf_bp)
> + xfs_trans_brelse(sc->tp, agf_bp);
> + return error;
> +}
> +
> +/*
> + * Dispose of a given metadata extent.
> + *
> + * If the rmapbt says the extent has multiple owners, we simply remove the
> + * rmap associated with this owner hoping that we'll eventually disentangle
> + * the crosslinked metadata. Otherwise, there's one owner, so call the
> + * regular free code to remove the rmap and free the extent. Any existing
> + * buffers for the blocks in the extent must have been _binval'd previously.
Ok, so there's a little more detail about cross-linked files. Seems
my suspicions of "remove and hope" were close to the mark :P
Perhaps we need a document that describes the various algorithms we
use for resolving these problems, so they can be discussed and
improved without having to troll through the code to understand?
> + */
> +STATIC int
> +xfs_repair_dispose_btree_extent(
> + struct xfs_scrub_context *sc,
> + xfs_fsblock_t fsbno,
> + xfs_extlen_t len,
> + struct xfs_owner_info *oinfo,
> + enum xfs_ag_resv_type resv)
> +{
> + struct xfs_mount *mp = sc->mp;
> + int error = 0;
> +
> + ASSERT(xfs_sb_version_hasrmapbt(&mp->m_sb));
> + ASSERT(sc->ip != NULL || XFS_FSB_TO_AGNO(mp, fsbno) == sc->sa.agno);
> +
> + trace_xfs_repair_dispose_btree_extent(mp, XFS_FSB_TO_AGNO(mp, fsbno),
> + XFS_FSB_TO_AGBNO(mp, fsbno), len);
> +
> + for (; len > 0; len--, fsbno++) {
> + error = xfs_repair_dispose_btree_block(sc, fsbno, oinfo, resv);
> + if (error)
> + return error;
So why do we do this one block at a time, rather than freeing it
as an entire extent in one go? And if there is only a single caller,
why not just open code the loop in the caller?
> +/*
> + * Invalidate buffers for per-AG btree blocks we're dumping. We assume that
> + * exlist points only to metadata blocks.
> + */
> +int
> +xfs_repair_invalidate_blocks(
> + struct xfs_scrub_context *sc,
> + struct xfs_repair_extent_list *exlist)
> +{
> + struct xfs_repair_extent *rex;
> + struct xfs_repair_extent *n;
> + struct xfs_buf *bp;
> + xfs_agnumber_t agno;
> + xfs_agblock_t agbno;
> + xfs_agblock_t i;
> +
> + for_each_xfs_repair_extent_safe(rex, n, exlist) {
> + agno = XFS_FSB_TO_AGNO(sc->mp, rex->fsbno);
> + agbno = XFS_FSB_TO_AGBNO(sc->mp, rex->fsbno);
> + for (i = 0; i < rex->len; i++) {
> + bp = xfs_btree_get_bufs(sc->mp, sc->tp, agno,
> + agbno + i, 0);
> + xfs_trans_binval(sc->tp, bp);
> + }
Again, this is doing things by single blocks. We do have multi-block
metadata (inodes, directory blocks, remote attrs) that, if it
is already in memory, needs to be treated as multi-block extents. If
we don't do that, we'll cause aliasing problems in the buffer cache
(see _xfs_buf_obj_cmp()) and it's all downhill from there.
That's why I get worried when I see assumptions that we can process
contiguous metadata ranges in single block buffers....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2018-05-16 8:32 UTC|newest]
Thread overview: 76+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-05-15 22:33 [PATCH v15.1 00/22] xfs-4.18: online repair support Darrick J. Wong
2018-05-15 22:33 ` [PATCH 01/22] xfs: add helpers to deal with transaction allocation and rolling Darrick J. Wong
2018-05-16 6:51 ` Dave Chinner
2018-05-16 16:46 ` Darrick J. Wong
2018-05-16 21:19 ` Dave Chinner
2018-05-16 16:48 ` Allison Henderson
2018-05-18 3:49 ` [PATCH v2 " Darrick J. Wong
2018-05-15 22:33 ` [PATCH 02/22] xfs: add helpers to allocate and initialize fresh btree roots Darrick J. Wong
2018-05-16 7:07 ` Dave Chinner
2018-05-16 17:15 ` Darrick J. Wong
2018-05-16 17:00 ` Allison Henderson
2018-05-15 22:33 ` [PATCH 03/22] xfs: add helpers to collect and sift btree block pointers during repair Darrick J. Wong
2018-05-16 7:56 ` Dave Chinner
2018-05-16 17:34 ` Allison Henderson
2018-05-16 18:06 ` Darrick J. Wong
2018-05-16 21:23 ` Dave Chinner
2018-05-16 21:33 ` Allison Henderson
2018-05-16 18:01 ` Darrick J. Wong
2018-05-16 21:32 ` Dave Chinner
2018-05-16 22:05 ` Darrick J. Wong
2018-05-17 0:41 ` Dave Chinner
2018-05-17 5:05 ` Darrick J. Wong
2018-05-18 3:51 ` [PATCH v2 " Darrick J. Wong
2018-05-29 3:10 ` Dave Chinner
2018-05-29 15:28 ` Darrick J. Wong
2018-05-15 22:34 ` [PATCH 04/22] xfs: add helpers to dispose of old btree blocks after a repair Darrick J. Wong
2018-05-16 8:32 ` Dave Chinner [this message]
2018-05-16 18:02 ` Allison Henderson
2018-05-16 19:34 ` Darrick J. Wong
2018-05-16 22:32 ` Dave Chinner
2018-05-16 23:18 ` Darrick J. Wong
2018-05-17 5:58 ` Darrick J. Wong
2018-05-18 3:53 ` [PATCH v2 " Darrick J. Wong
2018-05-29 3:14 ` Dave Chinner
2018-05-29 18:01 ` Darrick J. Wong
2018-05-15 22:34 ` [PATCH 05/22] xfs: recover AG btree roots from rmap data Darrick J. Wong
2018-05-16 8:51 ` Dave Chinner
2018-05-16 18:37 ` Darrick J. Wong
2018-05-16 19:18 ` Allison Henderson
2018-05-16 22:36 ` Dave Chinner
2018-05-17 5:53 ` Darrick J. Wong
2018-05-18 3:54 ` [PATCH v2 " Darrick J. Wong
2018-05-29 3:16 ` Dave Chinner
2018-05-15 22:34 ` [PATCH 06/22] xfs: add a repair helper to reset superblock counters Darrick J. Wong
2018-05-16 21:29 ` Allison Henderson
2018-05-18 3:56 ` Darrick J. Wong
2018-05-18 3:56 ` [PATCH v2 " Darrick J. Wong
2018-05-29 3:28 ` Dave Chinner
2018-05-29 22:07 ` Darrick J. Wong
2018-05-29 22:24 ` Dave Chinner
2018-05-29 22:43 ` Darrick J. Wong
2018-05-30 1:23 ` Dave Chinner
2018-05-30 3:22 ` Darrick J. Wong
2018-05-15 22:34 ` [PATCH 07/22] xfs: add helpers to attach quotas to inodes Darrick J. Wong
2018-05-16 22:21 ` Allison Henderson
2018-05-18 3:58 ` [PATCH v2 " Darrick J. Wong
2018-05-29 3:29 ` Dave Chinner
2018-05-15 22:34 ` [PATCH 08/22] xfs: repair superblocks Darrick J. Wong
2018-05-16 22:55 ` Allison Henderson
2018-05-29 3:42 ` Dave Chinner
2018-05-15 22:34 ` [PATCH 09/22] xfs: repair the AGF and AGFL Darrick J. Wong
2018-05-15 22:34 ` [PATCH 10/22] xfs: repair the AGI Darrick J. Wong
2018-05-15 22:34 ` [PATCH 11/22] xfs: repair free space btrees Darrick J. Wong
2018-05-15 22:34 ` [PATCH 12/22] xfs: repair inode btrees Darrick J. Wong
2018-05-15 22:35 ` [PATCH 13/22] xfs: repair the rmapbt Darrick J. Wong
2018-05-15 22:35 ` [PATCH 14/22] xfs: repair refcount btrees Darrick J. Wong
2018-05-15 22:35 ` [PATCH 15/22] xfs: repair inode records Darrick J. Wong
2018-05-15 22:35 ` [PATCH 16/22] xfs: zap broken inode forks Darrick J. Wong
2018-05-15 22:35 ` [PATCH 17/22] xfs: repair inode block maps Darrick J. Wong
2018-05-15 22:35 ` [PATCH 18/22] xfs: repair damaged symlinks Darrick J. Wong
2018-05-15 22:35 ` [PATCH 19/22] xfs: repair extended attributes Darrick J. Wong
2018-05-15 22:35 ` [PATCH 20/22] xfs: scrub should set preen if attr leaf has holes Darrick J. Wong
2018-05-15 22:35 ` [PATCH 21/22] xfs: repair quotas Darrick J. Wong
2018-05-15 22:36 ` [PATCH 22/22] xfs: implement live quotacheck as part of quota repair Darrick J. Wong
2018-05-18 3:47 ` [PATCH 0.5/22] xfs: grab the per-ag structure whenever relevant Darrick J. Wong
2018-05-30 6:44 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180516083232.GS23861@dastard \
--to=david@fromorbit.com \
--cc=darrick.wong@oracle.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).