From: Dave Chinner <david@fromorbit.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 14/21] xfs: zap broken inode forks
Date: Wed, 4 Jul 2018 12:07:06 +1000 [thread overview]
Message-ID: <20180704020706.GL2234@dastard> (raw)
In-Reply-To: <152986829769.3155.762174417583296955.stgit@magnolia>
On Sun, Jun 24, 2018 at 12:24:57PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
>
> Determine if inode fork damage is responsible for the inode being unable
> to pass the ifork verifiers in xfs_iget and zap the fork contents if
> this is true. Once this is done the fork will be empty but we'll be
> able to construct an in-core inode, and a subsequent call to the inode
> fork repair ioctl will search the rmapbt to rebuild the records that
> were in the fork.
>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
> fs/xfs/libxfs/xfs_attr_leaf.c | 32 ++-
> fs/xfs/libxfs/xfs_attr_leaf.h | 2
> fs/xfs/libxfs/xfs_bmap.c | 21 ++
> fs/xfs/libxfs/xfs_bmap.h | 2
> fs/xfs/scrub/inode_repair.c | 399 +++++++++++++++++++++++++++++++++++++++++
> 5 files changed, 437 insertions(+), 19 deletions(-)
>
>
> diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
> index b3c19339e1b5..f6c458104934 100644
> --- a/fs/xfs/libxfs/xfs_attr_leaf.c
> +++ b/fs/xfs/libxfs/xfs_attr_leaf.c
> @@ -894,23 +894,16 @@ xfs_attr_shortform_allfit(
> return xfs_attr_shortform_bytesfit(dp, bytes);
> }
>
> -/* Verify the consistency of an inline attribute fork. */
> +/* Verify the consistency of a raw inline attribute fork. */
> xfs_failaddr_t
> -xfs_attr_shortform_verify(
> - struct xfs_inode *ip)
> +xfs_attr_shortform_verify_struct(
> + struct xfs_attr_shortform *sfp,
> + size_t size)
The internal structure checking functions in the directory code
use the naming convention xfs_dir3_<type>_check(). I think we should use
the same here. i.e. xfs_attr_sf_check().
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index b7f094e19bab..b1254e6c17b5 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -6223,18 +6223,16 @@ xfs_bmap_finish_one(
> return error;
> }
>
> -/* Check that an inode's extent does not have invalid flags or bad ranges. */
> +/* Check that an extent does not have invalid flags or bad ranges. */
> xfs_failaddr_t
> -xfs_bmap_validate_extent(
> - struct xfs_inode *ip,
> +xfs_bmbt_validate_extent(
xfs_bmbt_ prefixes should only appear in xfs_bmap_btree.c, not
xfs_bmap.c....
So either it needs to get moved or renamed to something like
xfs_bmap_validate_irec()?
> diff --git a/fs/xfs/scrub/inode_repair.c b/fs/xfs/scrub/inode_repair.c
> index 4ac43c1b1eb0..b941f21d7667 100644
> --- a/fs/xfs/scrub/inode_repair.c
> +++ b/fs/xfs/scrub/inode_repair.c
> @@ -22,11 +22,15 @@
> #include "xfs_ialloc.h"
> #include "xfs_da_format.h"
> #include "xfs_reflink.h"
> +#include "xfs_alloc.h"
> #include "xfs_rmap.h"
> +#include "xfs_rmap_btree.h"
> #include "xfs_bmap.h"
> +#include "xfs_bmap_btree.h"
> #include "xfs_bmap_util.h"
> #include "xfs_dir2.h"
> #include "xfs_quota_defs.h"
> +#include "xfs_attr_leaf.h"
> #include "scrub/xfs_scrub.h"
> #include "scrub/scrub.h"
> #include "scrub/common.h"
> @@ -113,7 +117,8 @@ xfs_repair_inode_mode(
> STATIC void
> xfs_repair_inode_flags(
> struct xfs_scrub_context *sc,
> - struct xfs_dinode *dip)
> + struct xfs_dinode *dip,
> + bool is_rt_file)
> {
> struct xfs_mount *mp = sc->mp;
> uint64_t flags2;
> @@ -132,6 +137,10 @@ xfs_repair_inode_flags(
> flags2 &= ~XFS_DIFLAG2_REFLINK;
> if (flags2 & XFS_DIFLAG2_REFLINK)
> flags2 &= ~XFS_DIFLAG2_DAX;
> + if (is_rt_file)
> + flags |= XFS_DIFLAG_REALTIME;
> + else
> + flags &= ~XFS_DIFLAG_REALTIME;
This needs to be done first. i.e. before we check things like rt vs
reflink flags.
> @@ -210,17 +219,402 @@ xfs_repair_inode_extsize_hints(
> }
> }
>
> +struct xfs_repair_inode_fork_counters {
> + struct xfs_scrub_context *sc;
> + xfs_rfsblock_t data_blocks;
> + xfs_rfsblock_t rt_blocks;
inode is either data or rt, not both. Why do you need two separate
counters? Oh, bmbt blocks are always data, right? Coment, perhaps?
> + xfs_rfsblock_t attr_blocks;
> + xfs_extnum_t data_extents;
> + xfs_extnum_t rt_extents;
but bmbt blocks are not data extents, so only one counter here?
> +/* Count extents and blocks for a given inode from all rmap data. */
> +STATIC int
> +xfs_repair_inode_count_rmaps(
> + struct xfs_repair_inode_fork_counters *rifc)
> +{
> + xfs_agnumber_t agno;
> + int error;
> +
> + if (!xfs_sb_version_hasrmapbt(&rifc->sc->mp->m_sb) ||
> + xfs_sb_version_hasrealtime(&rifc->sc->mp->m_sb))
> + return -EOPNOTSUPP;
> +
> + /* XXX: find rt blocks too */
Ok, needs a comment up front that realtime repair isn't supported,
rather than hiding it down here.
> + for (agno = 0; agno < rifc->sc->mp->m_sb.sb_agcount; agno++) {
> + error = xfs_repair_inode_count_ag_rmaps(rifc, agno);
> + if (error)
> + return error;
> + }
/* rt not supported yet */
ASSERT(rifc->rt_extents == 0);
> + /* Can't have extents on both the rt and the data device. */
> + if (rifc->data_extents && rifc->rt_extents)
> + return -EFSCORRUPTED;
> +
> + return 0;
> +}
> +
> +/* Figure out if we need to zap this extents format fork. */
> +STATIC bool
> +xfs_repair_inode_core_check_extents_fork(
Urk. Not sure what this function is supposed to be doing from the
name. xrep_ifork_extent_check()? And then the next function
becomes xrep_ifork_btree_check()?
Also document the return values.
> + struct xfs_scrub_context *sc,
> + struct xfs_dinode *dip,
> + int dfork_size,
> + int whichfork)
> +{
> + struct xfs_bmbt_irec new;
> + struct xfs_bmbt_rec *dp;
> + bool isrt;
> + int i;
> + int nex;
> + int fork_size;
> +
> + nex = XFS_DFORK_NEXTENTS(dip, whichfork);
> + fork_size = nex * sizeof(struct xfs_bmbt_rec);
> + if (fork_size < 0 || fork_size > dfork_size)
> + return true;
Check nex against dip->di_nextents?
> + dp = (struct xfs_bmbt_rec *)XFS_DFORK_PTR(dip, whichfork);
> +
> + isrt = dip->di_flags & cpu_to_be16(XFS_DIFLAG_REALTIME);
> + for (i = 0; i < nex; i++, dp++) {
> + xfs_failaddr_t fa;
> +
> + xfs_bmbt_disk_get_all(dp, &new);
> + fa = xfs_bmbt_validate_extent(sc->mp, isrt, whichfork, &new);
> + if (fa)
> + return true;
> + }
> +
> + return false;
> +}
> +
> +/* Figure out if we need to zap this btree format fork. */
> +STATIC bool
> +xfs_repair_inode_core_check_btree_fork(
> + struct xfs_scrub_context *sc,
> + struct xfs_dinode *dip,
> + int dfork_size,
> + int whichfork)
> +{
> + struct xfs_bmdr_block *dfp;
> + int nrecs;
> + int level;
> +
> + if (XFS_DFORK_NEXTENTS(dip, whichfork) <=
> + dfork_size / sizeof(struct xfs_bmbt_irec))
> + return true;
check against dip->di_nextents?
> + dfp = (struct xfs_bmdr_block *)XFS_DFORK_PTR(dip, whichfork);
> + nrecs = be16_to_cpu(dfp->bb_numrecs);
> + level = be16_to_cpu(dfp->bb_level);
> +
> + if (nrecs == 0 || XFS_BMDR_SPACE_CALC(nrecs) > dfork_size)
> + return true;
> + if (level == 0 || level > XFS_BTREE_MAXLEVELS)
> + return true;
Should this visit the bmbt blocks to check the level is actually
correct?
> + return false;
> +}
> +
> +/*
> + * Check the data fork for things that will fail the ifork verifiers or the
> + * ifork formatters.
> + */
> +STATIC bool
> +xfs_repair_inode_core_check_data_fork(
xrep_ifork_check_data()
> + struct xfs_scrub_context *sc,
> + struct xfs_dinode *dip,
> + uint16_t mode)
> +{
> + uint64_t size;
> + int dfork_size;
> +
> + size = be64_to_cpu(dip->di_size);
> + switch (mode & S_IFMT) {
> + case S_IFIFO:
> + case S_IFCHR:
> + case S_IFBLK:
> + case S_IFSOCK:
> + if (XFS_DFORK_FORMAT(dip, XFS_DATA_FORK) != XFS_DINODE_FMT_DEV)
> + return true;
> + break;
> + case S_IFREG:
> + case S_IFLNK:
> + case S_IFDIR:
> + switch (XFS_DFORK_FORMAT(dip, XFS_DATA_FORK)) {
> + case XFS_DINODE_FMT_LOCAL:
local format is not valid for S_IFREG.
> + case XFS_DINODE_FMT_EXTENTS:
> + case XFS_DINODE_FMT_BTREE:
> + break;
> + default:
> + return true;
> + }
> + break;
> + default:
> + return true;
> + }
> + dfork_size = XFS_DFORK_SIZE(dip, sc->mp, XFS_DATA_FORK);
> + switch (XFS_DFORK_FORMAT(dip, XFS_DATA_FORK)) {
> + case XFS_DINODE_FMT_DEV:
> + break;
> + case XFS_DINODE_FMT_LOCAL:
> + if (size > dfork_size)
> + return true;
> + break;
> + case XFS_DINODE_FMT_EXTENTS:
> + if (xfs_repair_inode_core_check_extents_fork(sc, dip,
> + dfork_size, XFS_DATA_FORK))
> + return true;
> + break;
> + case XFS_DINODE_FMT_BTREE:
> + if (xfs_repair_inode_core_check_btree_fork(sc, dip,
> + dfork_size, XFS_DATA_FORK))
> + return true;
> + break;
> + default:
> + return true;
> + }
> +
> + return false;
> +}
> +
> +/* Reset the data fork to something sane. */
> +STATIC void
> +xfs_repair_inode_core_zap_data_fork(
xrep_ifork_zap_data()
(you get the idea :P)
> + struct xfs_scrub_context *sc,
> + struct xfs_dinode *dip,
> + uint16_t mode,
> + struct xfs_repair_inode_fork_counters *rifc)
(structure names can change, too :)
> +{
> + char *p;
> + const struct xfs_dir_ops *ops;
> + struct xfs_dir2_sf_hdr *sfp;
> + int i8count;
> +
> + /* Special files always get reset to DEV */
> + switch (mode & S_IFMT) {
> + case S_IFIFO:
> + case S_IFCHR:
> + case S_IFBLK:
> + case S_IFSOCK:
> + dip->di_format = XFS_DINODE_FMT_DEV;
> + dip->di_size = 0;
> + return;
> + }
> +
> + /*
> + * If we have data extents, reset to an empty map and hope the user
> + * will run the bmapbtd checker next.
> + */
> + if (rifc->data_extents || rifc->rt_extents || S_ISREG(mode)) {
> + dip->di_format = XFS_DINODE_FMT_EXTENTS;
> + dip->di_nextents = 0;
> + return;
> + }
Does the userspace tool run the bmapbtd checker next?
> + /* Otherwise, reset the local format to the minimum. */
> + switch (mode & S_IFMT) {
> + case S_IFLNK:
> + /* Blow out symlink; now it points to root dir */
> + dip->di_format = XFS_DINODE_FMT_LOCAL;
> + dip->di_size = cpu_to_be64(1);
> + p = XFS_DFORK_PTR(dip, XFS_DATA_FORK);
> + *p = '/';
Maybe factor this so zero length symlinks can be set directly to the
same thing? FWIW, would making it point at '.' be better than '/' so
it always points within the same filesystem?
> + break;
> + case S_IFDIR:
> + /*
> + * Blow out dir, make it point to the root. In the
> + * future the direction repair will reconstruct this
> + * dir for us.
> + */
s/the direction//
> + dip->di_format = XFS_DINODE_FMT_LOCAL;
> + i8count = sc->mp->m_sb.sb_rootino > XFS_DIR2_MAX_SHORT_INUM;
> + ops = xfs_dir_get_ops(sc->mp, NULL);
> + sfp = (struct xfs_dir2_sf_hdr *)XFS_DFORK_PTR(dip,
> + XFS_DATA_FORK);
> + sfp->count = 0;
> + sfp->i8count = i8count;
> + ops->sf_put_parent_ino(sfp, sc->mp->m_sb.sb_rootino);
> + dip->di_size = cpu_to_be64(xfs_dir2_sf_hdr_size(i8count));
What happens now if this dir has an ancestor still pointing at it?
Haven't we just screwed the directory structure? How does this
interact with the dentry cache, esp. w.r.t. disconnected dentries
(filehandle lookups)?
> +/*
> + * Check the attr fork for things that will fail the ifork verifiers or the
> + * ifork formatters.
> + */
> +STATIC bool
> +xfs_repair_inode_core_check_attr_fork(
> + struct xfs_scrub_context *sc,
> + struct xfs_dinode *dip)
> +{
> + struct xfs_attr_shortform *sfp;
> + int size;
> +
> + if (XFS_DFORK_BOFF(dip) == 0)
> + return dip->di_aformat != XFS_DINODE_FMT_EXTENTS ||
> + dip->di_anextents != 0;
> +
> + size = XFS_DFORK_SIZE(dip, sc->mp, XFS_ATTR_FORK);
> + switch (XFS_DFORK_FORMAT(dip, XFS_ATTR_FORK)) {
> + case XFS_DINODE_FMT_LOCAL:
> + sfp = (struct xfs_attr_shortform *)XFS_DFORK_PTR(dip,
> + XFS_ATTR_FORK);
As a side note, we should make XFS_DFORK_PTR() return a void * so we
don't need casts like this.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2018-07-04 2:07 UTC|newest]
Thread overview: 77+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-06-24 19:23 [PATCH v16 00/21] xfs-4.19: online repair support Darrick J. Wong
2018-06-24 19:23 ` [PATCH 01/21] xfs: don't assume a left rmap when allocating a new rmap Darrick J. Wong
2018-06-27 0:54 ` Dave Chinner
2018-06-28 21:11 ` Allison Henderson
2018-06-29 14:39 ` Darrick J. Wong
2018-06-24 19:23 ` [PATCH 02/21] xfs: add helper to decide if an inode has allocated cow blocks Darrick J. Wong
2018-06-27 1:02 ` Dave Chinner
2018-06-28 21:12 ` Allison Henderson
2018-06-24 19:23 ` [PATCH 03/21] xfs: refactor part of xfs_free_eofblocks Darrick J. Wong
2018-06-28 21:13 ` Allison Henderson
2018-06-24 19:23 ` [PATCH 04/21] xfs: repair the AGF and AGFL Darrick J. Wong
2018-06-27 2:19 ` Dave Chinner
2018-06-27 16:44 ` Allison Henderson
2018-06-27 23:37 ` Dave Chinner
2018-06-29 15:14 ` Darrick J. Wong
2018-06-28 17:25 ` Allison Henderson
2018-06-29 15:08 ` Darrick J. Wong
2018-06-28 21:14 ` Allison Henderson
2018-06-28 23:21 ` Dave Chinner
2018-06-29 1:35 ` Allison Henderson
2018-06-29 14:55 ` Darrick J. Wong
2018-06-24 19:24 ` [PATCH 05/21] xfs: repair the AGI Darrick J. Wong
2018-06-27 2:22 ` Dave Chinner
2018-06-28 21:15 ` Allison Henderson
2018-06-24 19:24 ` [PATCH 06/21] xfs: repair free space btrees Darrick J. Wong
2018-06-27 3:21 ` Dave Chinner
2018-07-04 2:15 ` Darrick J. Wong
2018-07-04 2:25 ` Dave Chinner
2018-06-30 17:36 ` Allison Henderson
2018-06-24 19:24 ` [PATCH 07/21] xfs: repair inode btrees Darrick J. Wong
2018-06-28 0:55 ` Dave Chinner
2018-07-04 2:22 ` Darrick J. Wong
2018-06-30 17:36 ` Allison Henderson
2018-06-30 18:30 ` Darrick J. Wong
2018-07-01 0:45 ` Allison Henderson
2018-06-24 19:24 ` [PATCH 08/21] xfs: defer iput on certain inodes while scrub / repair are running Darrick J. Wong
2018-06-28 23:37 ` Dave Chinner
2018-06-29 14:49 ` Darrick J. Wong
2018-06-24 19:24 ` [PATCH 09/21] xfs: finish our set of inode get/put tracepoints for scrub Darrick J. Wong
2018-06-24 19:24 ` [PATCH 10/21] xfs: introduce online scrub freeze Darrick J. Wong
2018-06-24 19:24 ` [PATCH 11/21] xfs: repair the rmapbt Darrick J. Wong
2018-07-03 5:32 ` Dave Chinner
2018-07-03 23:59 ` Darrick J. Wong
2018-07-04 8:44 ` Carlos Maiolino
2018-07-04 18:40 ` Darrick J. Wong
2018-07-04 23:21 ` Dave Chinner
2018-07-05 3:48 ` Darrick J. Wong
2018-07-05 7:03 ` Dave Chinner
2018-07-06 0:47 ` Darrick J. Wong
2018-07-06 1:08 ` Dave Chinner
2018-06-24 19:24 ` [PATCH 12/21] xfs: repair refcount btrees Darrick J. Wong
2018-07-03 5:50 ` Dave Chinner
2018-07-04 2:23 ` Darrick J. Wong
2018-06-24 19:24 ` [PATCH 13/21] xfs: repair inode records Darrick J. Wong
2018-07-03 6:17 ` Dave Chinner
2018-07-04 0:16 ` Darrick J. Wong
2018-07-04 1:03 ` Dave Chinner
2018-07-04 1:30 ` Darrick J. Wong
2018-06-24 19:24 ` [PATCH 14/21] xfs: zap broken inode forks Darrick J. Wong
2018-07-04 2:07 ` Dave Chinner [this message]
2018-07-04 3:26 ` Darrick J. Wong
2018-06-24 19:25 ` [PATCH 15/21] xfs: repair inode block maps Darrick J. Wong
2018-07-04 3:00 ` Dave Chinner
2018-07-04 3:41 ` Darrick J. Wong
2018-06-24 19:25 ` [PATCH 16/21] xfs: repair damaged symlinks Darrick J. Wong
2018-07-04 5:45 ` Dave Chinner
2018-07-04 18:45 ` Darrick J. Wong
2018-06-24 19:25 ` [PATCH 17/21] xfs: repair extended attributes Darrick J. Wong
2018-07-06 1:03 ` Dave Chinner
2018-07-06 3:10 ` Darrick J. Wong
2018-06-24 19:25 ` [PATCH 18/21] xfs: scrub should set preen if attr leaf has holes Darrick J. Wong
2018-06-29 2:52 ` Dave Chinner
2018-06-24 19:25 ` [PATCH 19/21] xfs: repair quotas Darrick J. Wong
2018-07-06 1:50 ` Dave Chinner
2018-07-06 3:16 ` Darrick J. Wong
2018-06-24 19:25 ` [PATCH 20/21] xfs: implement live quotacheck as part of quota repair Darrick J. Wong
2018-06-24 19:25 ` [PATCH 21/21] xfs: add online scrub/repair for superblock counters Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180704020706.GL2234@dastard \
--to=david@fromorbit.com \
--cc=darrick.wong@oracle.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).