From: "Darrick J. Wong" <djwong@kernel.org>
To: Brian Foster <bfoster@redhat.com>
Cc: sandeen@sandeen.net, linux-xfs@vger.kernel.org
Subject: Re: [PATCH 09/10] xfs_repair: add a testing hook for NEEDSREPAIR
Date: Tue, 9 Feb 2021 11:59:20 -0800 [thread overview]
Message-ID: <20210209195920.GZ7193@magnolia> (raw)
In-Reply-To: <20210209185939.GK14273@bfoster>
On Tue, Feb 09, 2021 at 01:59:39PM -0500, Brian Foster wrote:
> On Tue, Feb 09, 2021 at 10:17:38AM -0800, Darrick J. Wong wrote:
> > On Tue, Feb 09, 2021 at 12:21:31PM -0500, Brian Foster wrote:
> > > On Mon, Feb 08, 2021 at 08:10:55PM -0800, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <djwong@kernel.org>
> > > >
> > > > Simulate a crash when anyone calls force_needsrepair. This is a debug
> > > > knob so that we can test that the kernel won't mount after setting
> > > > needsrepair and that a re-run of xfs_repair will clear the flag.
> > > >
> > > > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > > > ---
> > >
> > > Can't we just use db to manually set the bit on the superblock?
> >
> > No, because the fstest uses this debug knob to simulate the following:
> >
> > 1) sysadmin issues 'xfs_admin -O inobtcount /dev/sda1'
> > 2) xfs_repair flips on INOBTCOUNT and NEEDSREPAIR
> > 3) system goes down and repair never completes
> > 4) verify that we can't mount
> > 5) verify that repair clears NEEDSREPAIR and gives us a clean fs
> > 6) verify that mount works again
> >
>
> Ok, but that seems like circular reasoning.
I'm sorry, but I don't see how this is circular logic?
The test needs to show that NEEDSREPAIR is turned on during phase 1 (or
2) when we apply an upgrade, and it needs to induce some kind of early
exit so that the needsrepair clearing code after phase 7 does not run.
If we set NEEDSREPAIR with xfs_db before running repair then we have no
way to detect if the inobtcount upgrade doesn't set needsrepair.
If we don't have a debugging knob to stop repair before it reaches phase
7, we're not really testing a genuine early-repair-exit scenario. Yes,
we can use xfs_db to manually set the flag after repair returns, but
that doesn't fill the testing gap above.
> It wouldn't be quite the
> same as a simulated repair failure, but ISTM that if we set the bit
> manually, we can still verify steps 4, 5 and 6 as is (with the caveat
> that the repair invocation performs a feature upgrade). I'm not sure how
> important it really is to verify that a feature upgrade sequence sets
> the bit if it happens to fail provided we have independent tests that 1.
> verify the needsrepair bit works as expected and 2. verify the feature
> upgrades work appropriately, since that is the primary functionality.
>
> I wanted to think about that a little more before replying, but I also
> just realized something odd when digging into the debug code:
>
> # ./repair/xfs_repair -c needsrepair=1 /dev/test/scratch
> Phase 1 - find and verify superblock...
> Marking filesystem in need of repair.
> writing modified primary superblock
> Phase 2 - using internal log
> - zero log...
> ERROR: The filesystem has valuable metadata changes in a log which needs to
> ...
> # mount /dev/test/scratch /mnt/
> mount: /mnt: wrong fs type, bad option, bad superblock on /dev/mapper/test-scratch, missing codepage or helper program, or other error.
> #
>
> It looks like we can set a feature upgrade bit on the superblock before
> we've examined the log and potentially discovered that it's dirty (phase
> 2). If the log is recoverable, that puts the user in a bit of a bind..
Heh, funny that I was thinking that the upgrades shouldn't really be
happening in phase 1 anyway--
I've (separately) started working on a patch to make it so that you can
add reflink and finobt to a filesystem. Those upgrades require somewhat
more intensive checks of the filesystem (such as checking free space in
each AG), so I ended up dumping them into phase 2, since the xfs_mount
and buffer cache aren't fully initialized until after phase 1.
So, yeah, the upgrade code should move to phase2() after log zeroing and
before the AG scan.
--D
> Brian
>
> > and the other scenario is:
> >
> > 1) fuzz a directory entry in such a way that repair will decide to
> > blow out the dirent and rebuild the directory later
> > 2) sysadmin issues 'xfs_repair /dev/sda1'
> > 2) xfs_repair flips on NEEDSREPAIR at the same time it corrupts the
> > dirent to trigger the rebuild later
> > 3) system goes down and repair never completes
> > 4) verify that we can't mount
> > 5) verify that repair clears NEEDSREPAIR and gives us a clean fs
> > 6) verify that mount works again
> >
> > Both cases reflect what I think are the most likely failure scenarios,
> > hence the knob needs to be in xfs_repair to prevent it from running to
> > completion.
> >
> > (And yes, I've been recently very bad at sending fstests out for review
> > the past few months; I will get that done by this afternoon.)
> >
> > --D
> >
> > > Brian
> > >
> > > > repair/globals.c | 1 +
> > > > repair/globals.h | 2 ++
> > > > repair/phase1.c | 5 +++++
> > > > repair/xfs_repair.c | 7 +++++++
> > > > 4 files changed, 15 insertions(+)
> > > >
> > > >
> > > > diff --git a/repair/globals.c b/repair/globals.c
> > > > index 699a96ee..b0e23864 100644
> > > > --- a/repair/globals.c
> > > > +++ b/repair/globals.c
> > > > @@ -40,6 +40,7 @@ int dangerously; /* live dangerously ... fix ro mount */
> > > > int isa_file;
> > > > int zap_log;
> > > > int dumpcore; /* abort, not exit on fatal errs */
> > > > +bool abort_after_force_needsrepair;
> > > > int force_geo; /* can set geo on low confidence info */
> > > > int assume_xfs; /* assume we have an xfs fs */
> > > > char *log_name; /* Name of log device */
> > > > diff --git a/repair/globals.h b/repair/globals.h
> > > > index 043b3e8e..9fa73b2c 100644
> > > > --- a/repair/globals.h
> > > > +++ b/repair/globals.h
> > > > @@ -82,6 +82,8 @@ extern int isa_file;
> > > > extern int zap_log;
> > > > extern int dumpcore; /* abort, not exit on fatal errs */
> > > > extern int force_geo; /* can set geo on low confidence info */
> > > > +/* Abort after forcing NEEDSREPAIR to test its functionality */
> > > > +extern bool abort_after_force_needsrepair;
> > > > extern int assume_xfs; /* assume we have an xfs fs */
> > > > extern char *log_name; /* Name of log device */
> > > > extern int log_spec; /* Log dev specified as option */
> > > > diff --git a/repair/phase1.c b/repair/phase1.c
> > > > index b26d25f8..57f72cd0 100644
> > > > --- a/repair/phase1.c
> > > > +++ b/repair/phase1.c
> > > > @@ -170,5 +170,10 @@ _("Cannot disable lazy-counters on V5 fs\n"));
> > > > */
> > > > sb_ifree = sb_icount = sb_fdblocks = sb_frextents = 0;
> > > >
> > > > + /* Simulate a crash after setting needsrepair. */
> > > > + if (primary_sb_modified && add_needsrepair &&
> > > > + abort_after_force_needsrepair)
> > > > + exit(55);
> > > > +
> > > > free(sb);
> > > > }
> > > > diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c
> > > > index ee377e8a..ae7106a6 100644
> > > > --- a/repair/xfs_repair.c
> > > > +++ b/repair/xfs_repair.c
> > > > @@ -44,6 +44,7 @@ enum o_opt_nums {
> > > > BLOAD_LEAF_SLACK,
> > > > BLOAD_NODE_SLACK,
> > > > NOQUOTA,
> > > > + FORCE_NEEDSREPAIR_ABORT,
> > > > O_MAX_OPTS,
> > > > };
> > > >
> > > > @@ -57,6 +58,7 @@ static char *o_opts[] = {
> > > > [BLOAD_LEAF_SLACK] = "debug_bload_leaf_slack",
> > > > [BLOAD_NODE_SLACK] = "debug_bload_node_slack",
> > > > [NOQUOTA] = "noquota",
> > > > + [FORCE_NEEDSREPAIR_ABORT] = "debug_force_needsrepair_abort",
> > > > [O_MAX_OPTS] = NULL,
> > > > };
> > > >
> > > > @@ -282,6 +284,9 @@ process_args(int argc, char **argv)
> > > > _("-o debug_bload_node_slack requires a parameter\n"));
> > > > bload_node_slack = (int)strtol(val, NULL, 0);
> > > > break;
> > > > + case FORCE_NEEDSREPAIR_ABORT:
> > > > + abort_after_force_needsrepair = true;
> > > > + break;
> > > > case NOQUOTA:
> > > > quotacheck_skip();
> > > > break;
> > > > @@ -795,6 +800,8 @@ force_needsrepair(
> > > > error = -libxfs_bwrite(bp);
> > > > if (error)
> > > > do_log(_("couldn't force needsrepair, err=%d\n"), error);
> > > > + if (abort_after_force_needsrepair)
> > > > + exit(55);
> > > > }
> > > > if (bp)
> > > > libxfs_buf_relse(bp);
> > > >
> > >
> >
>
next prev parent reply other threads:[~2021-02-09 21:37 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-02-09 4:10 [PATCHSET v4 00/10] xfs: add the ability to flag a fs for repair Darrick J. Wong
2021-02-09 4:10 ` [PATCH 01/10] xfs_admin: clean up string quoting Darrick J. Wong
2021-02-09 9:07 ` Christoph Hellwig
2021-02-09 4:10 ` [PATCH 02/10] xfs_admin: support filesystems with realtime devices Darrick J. Wong
2021-02-09 9:08 ` Christoph Hellwig
2021-02-09 17:19 ` Brian Foster
2021-02-09 4:10 ` [PATCH 03/10] xfs_db: support the needsrepair feature flag in the version command Darrick J. Wong
2021-02-09 9:09 ` Christoph Hellwig
2021-02-09 17:15 ` Darrick J. Wong
2021-02-09 17:19 ` Brian Foster
2021-02-09 4:10 ` [PATCH 04/10] xfs_repair: fix unmount error message to have a newline Darrick J. Wong
2021-02-09 9:09 ` Christoph Hellwig
2021-02-09 4:10 ` [PATCH 05/10] xfs_repair: clear quota CHKD flags on the incore superblock too Darrick J. Wong
2021-02-09 9:10 ` Christoph Hellwig
2021-02-09 17:20 ` Brian Foster
2021-02-09 17:46 ` Darrick J. Wong
2021-02-09 4:10 ` [PATCH 06/10] xfs_repair: clear the needsrepair flag Darrick J. Wong
2021-02-09 9:12 ` Christoph Hellwig
2021-02-09 17:20 ` Brian Foster
2021-02-09 18:01 ` Darrick J. Wong
2021-02-09 4:10 ` [PATCH 07/10] xfs_repair: set NEEDSREPAIR when we deliberately corrupt directories Darrick J. Wong
2021-02-09 9:13 ` Christoph Hellwig
2021-02-09 18:45 ` Darrick J. Wong
2021-02-09 17:20 ` Brian Foster
2021-02-09 18:35 ` Darrick J. Wong
2021-02-09 19:14 ` Brian Foster
2021-02-09 19:43 ` Darrick J. Wong
2021-02-10 20:19 ` Eric Sandeen
2021-02-09 4:10 ` [PATCH 08/10] xfs_repair: allow setting the needsrepair flag Darrick J. Wong
2021-02-09 9:15 ` Christoph Hellwig
2021-02-09 14:41 ` Eric Sandeen
2021-02-09 16:47 ` Darrick J. Wong
2021-02-10 20:44 ` Eric Sandeen
2021-02-09 17:21 ` Brian Foster
2021-02-09 18:10 ` Darrick J. Wong
2021-02-10 20:26 ` Eric Sandeen
2021-02-09 4:10 ` [PATCH 09/10] xfs_repair: add a testing hook for NEEDSREPAIR Darrick J. Wong
2021-02-09 9:16 ` Christoph Hellwig
2021-02-09 17:21 ` Brian Foster
2021-02-09 18:17 ` Darrick J. Wong
2021-02-09 18:59 ` Brian Foster
2021-02-09 19:59 ` Darrick J. Wong [this message]
2021-02-09 20:32 ` Brian Foster
2021-02-10 21:41 ` Eric Sandeen
2021-02-11 1:30 ` Darrick J. Wong
2021-02-09 4:11 ` [PATCH 10/10] xfs_admin: support adding features to V5 filesystems Darrick J. Wong
2021-02-09 9:18 ` Christoph Hellwig
2021-02-09 17:22 ` Brian Foster
2021-02-09 18:22 ` Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210209195920.GZ7193@magnolia \
--to=djwong@kernel.org \
--cc=bfoster@redhat.com \
--cc=linux-xfs@vger.kernel.org \
--cc=sandeen@sandeen.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox