From: "Darrick J. Wong" <djwong@kernel.org>
To: Allison Henderson <allison.henderson@oracle.com>
Cc: Catherine Hoang <catherine.hoang@oracle.com>,
"david@fromorbit.com" <david@fromorbit.com>,
"willy@infradead.org" <willy@infradead.org>,
"linux-xfs@vger.kernel.org" <linux-xfs@vger.kernel.org>,
Chandan Babu <chandan.babu@oracle.com>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
"hch@infradead.org" <hch@infradead.org>
Subject: Re: [PATCH 03/14] xfs: document the testing plan for online fsck
Date: Tue, 17 Jan 2023 18:38:47 -0800 [thread overview]
Message-ID: <Y8dbt1g7SS6P3kKA@magnolia> (raw)
In-Reply-To: <77b0b494dc2a78c14805c2d9300f839ec25f0330.camel@oracle.com>
On Wed, Jan 18, 2023 at 12:03:17AM +0000, Allison Henderson wrote:
> On Fri, 2022-12-30 at 14:10 -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> >
> > Start the third chapter of the online fsck design documentation.
> > This
> > covers the testing plan to make sure that both online and offline
> > fsck
> > can detect arbitrary problems and correct them without making things
> > worse.
> >
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > ---
> > .../filesystems/xfs-online-fsck-design.rst | 187
> > ++++++++++++++++++++
> > 1 file changed, 187 insertions(+)
> >
> >
> > diff --git a/Documentation/filesystems/xfs-online-fsck-design.rst
> > b/Documentation/filesystems/xfs-online-fsck-design.rst
> > index a03a7b9f0250..d630b6bdbe4a 100644
> > --- a/Documentation/filesystems/xfs-online-fsck-design.rst
> > +++ b/Documentation/filesystems/xfs-online-fsck-design.rst
> > @@ -563,3 +563,190 @@ functionality.
> > Many of these risks are inherent to software programming.
> > Despite this, it is hoped that this new functionality will prove
> > useful in
> > reducing unexpected downtime.
> > +
> > +3. Testing Plan
> > +===============
> > +
> > +As stated before, fsck tools have three main goals:
> > +
> > +1. Detect inconsistencies in the metadata;
> > +
> > +2. Eliminate those inconsistencies; and
> > +
> > +3. Minimize further loss of data.
> > +
> > +Demonstrations of correct operation are necessary to build users'
> > confidence
> > +that the software behaves within expectations.
> > +Unfortunately, it was not really feasible to perform regular
> > exhaustive testing
> > +of every aspect of a fsck tool until the introduction of low-cost
> > virtual
> > +machines with high-IOPS storage.
> > +With ample hardware availability in mind, the testing strategy for
> > the online
> > +fsck project involves differential analysis against the existing
> > fsck tools and
> > +systematic testing of every attribute of every type of metadata
> > object.
> > +Testing can be split into four major categories, as discussed below.
> > +
> > +Integrated Testing with fstests
> > +-------------------------------
> > +
> > +The primary goal of any free software QA effort is to make testing
> > as
> > +inexpensive and widespread as possible to maximize the scaling
> > advantages of
> > +community.
> > +In other words, testing should maximize the breadth of filesystem
> > configuration
> > +scenarios and hardware setups.
> > +This improves code quality by enabling the authors of online fsck to
> > find and
> > +fix bugs early, and helps developers of new features to find
> > integration
> > +issues earlier in their development effort.
> > +
> > +The Linux filesystem community shares a common QA testing suite,
> > +`fstests
> > <https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/>`_, for
> > +functional and regression testing.
> > +Even before development work began on online fsck, fstests (when run
> > on XFS)
> > +would run both the ``xfs_check`` and ``xfs_repair -n`` commands on
> > the test and
> > +scratch filesystems between each test.
> > +This provides a level of assurance that the kernel and the fsck
> > tools stay in
> > +alignment about what constitutes consistent metadata.
> > +During development of the online checking code, fstests was modified
> > to run
> > +``xfs_scrub -n`` between each test to ensure that the new checking
> > code
> > +produces the same results as the two existing fsck tools.
> > +
> > +To start development of online repair, fstests was modified to run
> > +``xfs_repair`` to rebuild the filesystem's metadata indices between
> > tests.
> > +This ensures that offline repair does not crash, leave a corrupt
> > filesystem
> > +after it exists, or trigger complaints from the online check.
> > +This also established a baseline for what can and cannot be repaired
> > offline.
> > +To complete the first phase of development of online repair, fstests
> > was
> > +modified to be able to run ``xfs_scrub`` in a "force rebuild" mode.
> > +This enables a comparison of the effectiveness of online repair as
> > compared to
> > +the existing offline repair tools.
> > +
> > +General Fuzz Testing of Metadata Blocks
> > +---------------------------------------
> > +
> > +XFS benefits greatly from having a very robust debugging tool,
> > ``xfs_db``.
> > +
> > +Before development of online fsck even began, a set of fstests were
> > created
> > +to test the rather common fault that entire metadata blocks get
> > corrupted.
> > +This required the creation of fstests library code that can create a
> > filesystem
> > +containing every possible type of metadata object.
> > +Next, individual test cases were created to create a test
> > filesystem, identify
> > +a single block of a specific type of metadata object, trash it with
> > the
> > +existing ``blocktrash`` command in ``xfs_db``, and test the reaction
> > of a
> > +particular metadata validation strategy.
> > +
> > +This earlier test suite enabled XFS developers to test the ability
> > of the
> > +in-kernel validation functions and the ability of the offline fsck
> > tool to
> > +detect and eliminate the inconsistent metadata.
> > +This part of the test suite was extended to cover online fsck in
> > exactly the
> > +same manner.
> > +
> > +In other words, for a given fstests filesystem configuration:
> > +
> > +* For each metadata object existing on the filesystem:
> > +
> > + * Write garbage to it
> > +
> > + * Test the reactions of:
> > +
> > + 1. The kernel verifiers to stop obviously bad metadata
> > + 2. Offline repair (``xfs_repair``) to detect and fix
> > + 3. Online repair (``xfs_scrub``) to detect and fix
> > +
> > +Targeted Fuzz Testing of Metadata Records
> > +-----------------------------------------
> > +
> > +A quick conversation with the other XFS developers revealed that the
> > existing
> > +test infrastructure could be extended to provide
>
> "The testing plan for ofsck includes extending the existing test
> infrastructure to provide..."
>
> Took me a moment to notice we're not talking about history any more....
Ah. Sorry about that. The sentence now reads:
"The testing plan for online fsck includes extending the existing fs
testing infrastructure to provide a much more powerful facility:
targeted fuzz testing of every metadata field of every metadata object
in the filesystem."
> > a much more powerful
> > +facility: targeted fuzz testing of every metadata field of every
> > metadata
> > +object in the filesystem.
> > +``xfs_db`` can modify every field of every metadata structure in
> > every
> > +block in the filesystem to simulate the effects of memory corruption
> > and
> > +software bugs.
> > +Given that fstests already contains the ability to create a
> > filesystem
> > +containing every metadata format known to the filesystem, ``xfs_db``
> > can be
> > +used to perform exhaustive fuzz testing!
> > +
> > +For a given fstests filesystem configuration:
> > +
> > +* For each metadata object existing on the filesystem...
> > +
> > + * For each record inside that metadata object...
> > +
> > + * For each field inside that record...
> > +
> > + * For each conceivable type of transformation that can be
> > applied to a bit field...
> > +
> > + 1. Clear all bits
> > + 2. Set all bits
> > + 3. Toggle the most significant bit
> > + 4. Toggle the middle bit
> > + 5. Toggle the least significant bit
> > + 6. Add a small quantity
> > + 7. Subtract a small quantity
> > + 8. Randomize the contents
> > +
> > + * ...test the reactions of:
> > +
> > + 1. The kernel verifiers to stop obviously bad metadata
> > + 2. Offline checking (``xfs_repair -n``)
> > + 3. Offline repair (``xfs_repair``)
> > + 4. Online checking (``xfs_scrub -n``)
> > + 5. Online repair (``xfs_scrub``)
> > + 6. Both repair tools (``xfs_scrub`` and then
> > ``xfs_repair`` if online repair doesn't succeed)
> I like the indented bullet list format tho
Thanks! I'm pleased that ... whatever renders this stuff ... actually
supports nested lists.
> > +
> > +This is quite the combinatoric explosion!
> > +
> > +Fortunately, having this much test coverage makes it easy for XFS
> > developers to
> > +check the responses of XFS' fsck tools.
> > +Since the introduction of the fuzz testing framework, these tests
> > have been
> > +used to discover incorrect repair code and missing functionality for
> > entire
> > +classes of metadata objects in ``xfs_repair``.
> > +The enhanced testing was used to finalize the deprecation of
> > ``xfs_check`` by
> > +confirming that ``xfs_repair`` could detect at least as many
> > corruptions as
> > +the older tool.
> > +
> > +These tests have been very valuable for ``xfs_scrub`` in the same
> > ways -- they
> > +allow the online fsck developers to compare online fsck against
> > offline fsck,
> > +and they enable XFS developers to find deficiencies in the code
> > base.
> > +
> > +Proposed patchsets include
> > +`general fuzzer improvements
> > +<
> > https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.g
> > it/log/?h=fuzzer-improvements>`_,
> > +`fuzzing baselines
> > +<
> > https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.g
> > it/log/?h=fuzz-baseline>`_,
> > +and `improvements in fuzz testing comprehensiveness
> > +<
> > https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.g
> > it/log/?h=more-fuzz-testing>`_.
> > +
> > +Stress Testing
> > +--------------
> > +
> > +A unique requirement to online fsck is the ability to operate on a
> > filesystem
> > +concurrently with regular workloads.
> > +Although it is of course impossible to run ``xfs_scrub`` with *zero*
> > observable
> > +impact on the running system, the online repair code should never
> > introduce
> > +inconsistencies into the filesystem metadata, and regular workloads
> > should
> > +never notice resource starvation.
> > +To verify that these conditions are being met, fstests has been
> > enhanced in
> > +the following ways:
> > +
> > +* For each scrub item type, create a test to exercise checking that
> > item type
> > + while running ``fsstress``.
> > +* For each scrub item type, create a test to exercise repairing that
> > item type
> > + while running ``fsstress``.
> > +* Race ``fsstress`` and ``xfs_scrub -n`` to ensure that checking the
> > whole
> > + filesystem doesn't cause problems.
> > +* Race ``fsstress`` and ``xfs_scrub`` in force-rebuild mode to
> > ensure that
> > + force-repairing the whole filesystem doesn't cause problems.
> > +* Race ``xfs_scrub`` in check and force-repair mode against
> > ``fsstress`` while
> > + freezing and thawing the filesystem.
> > +* Race ``xfs_scrub`` in check and force-repair mode against
> > ``fsstress`` while
> > + remounting the filesystem read-only and read-write.
> > +* The same, but running ``fsx`` instead of ``fsstress``. (Not done
> > yet?)
> > +
> > +Success is defined by the ability to run all of these tests without
> > observing
> > +any unexpected filesystem shutdowns due to corrupted metadata,
> > kernel hang
> > +check warnings, or any other sort of mischief.
>
> Seems reasonable. Other than the one nit, I think this section reads
> pretty well.
> Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Woo!
--D
> Allison
> > +
> > +Proposed patchsets include `general stress testing
> > +<
> > https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.g
> > it/log/?h=race-scrub-and-mount-state-changes>`_
> > +and the `evolution of existing per-function stress testing
> > +<
> > https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.g
> > it/log/?h=refactor-scrub-stress>`_.
> >
>
next prev parent reply other threads:[~2023-01-18 2:38 UTC|newest]
Thread overview: 88+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <Y69UceeA2MEpjMJ8@magnolia>
2022-12-30 22:10 ` [PATCHSET v24.0 00/14] xfs: design documentation for online fsck Darrick J. Wong
2022-12-30 22:10 ` [PATCH 01/14] xfs: document the motivation for online fsck design Darrick J. Wong
2023-01-07 5:01 ` Allison Henderson
2023-01-11 19:10 ` Darrick J. Wong
2023-01-18 0:03 ` Allison Henderson
2023-01-18 1:29 ` Darrick J. Wong
2023-01-12 0:10 ` Darrick J. Wong
2022-12-30 22:10 ` [PATCH 02/14] xfs: document the general theory underlying " Darrick J. Wong
2023-01-11 1:25 ` Allison Henderson
2023-01-11 23:39 ` Darrick J. Wong
2023-01-12 0:29 ` Dave Chinner
2023-01-18 0:03 ` Allison Henderson
2023-01-18 2:35 ` Darrick J. Wong
2022-12-30 22:10 ` [PATCH 03/14] xfs: document the testing plan for online fsck Darrick J. Wong
2023-01-18 0:03 ` Allison Henderson
2023-01-18 2:38 ` Darrick J. Wong [this message]
2022-12-30 22:10 ` [PATCH 04/14] xfs: document the user interface " Darrick J. Wong
2023-01-18 0:03 ` Allison Henderson
2023-01-18 2:42 ` Darrick J. Wong
2022-12-30 22:10 ` [PATCH 09/14] xfs: document online file metadata repair code Darrick J. Wong
2022-12-30 22:10 ` [PATCH 06/14] xfs: document how online fsck deals with eventual consistency Darrick J. Wong
2023-01-05 9:08 ` Amir Goldstein
2023-01-05 19:40 ` Darrick J. Wong
2023-01-06 3:33 ` Amir Goldstein
2023-01-11 17:54 ` Darrick J. Wong
2023-01-31 6:11 ` Allison Henderson
2023-02-02 19:55 ` Darrick J. Wong
2023-02-09 5:41 ` Allison Henderson
2022-12-30 22:10 ` [PATCH 05/14] xfs: document the filesystem metadata checking strategy Darrick J. Wong
2023-01-21 1:38 ` Allison Henderson
2023-02-02 19:04 ` Darrick J. Wong
2023-02-09 5:41 ` Allison Henderson
2022-12-30 22:10 ` [PATCH 08/14] xfs: document btree bulk loading Darrick J. Wong
2023-02-09 5:47 ` Allison Henderson
2023-02-10 0:24 ` Darrick J. Wong
2023-02-16 15:46 ` Allison Henderson
2023-02-16 21:08 ` Darrick J. Wong
2022-12-30 22:10 ` [PATCH 07/14] xfs: document pageable kernel memory Darrick J. Wong
2023-02-02 7:14 ` Allison Henderson
2023-02-02 23:14 ` Darrick J. Wong
2023-02-09 5:41 ` Allison Henderson
2023-02-09 23:14 ` Darrick J. Wong
2023-02-25 7:32 ` Allison Henderson
2022-12-30 22:10 ` [PATCH 10/14] xfs: document full filesystem scans for online fsck Darrick J. Wong
2023-02-16 15:47 ` Allison Henderson
2023-02-16 22:48 ` Darrick J. Wong
2023-02-25 7:33 ` Allison Henderson
2023-03-01 22:09 ` Darrick J. Wong
2022-12-30 22:10 ` [PATCH 12/14] xfs: document directory tree repairs Darrick J. Wong
2023-01-14 2:32 ` [PATCH v24.2 " Darrick J. Wong
2023-02-03 2:12 ` [PATCH v24.3 " Darrick J. Wong
2023-02-25 7:33 ` Allison Henderson
2023-03-02 0:14 ` Darrick J. Wong
2023-03-03 23:50 ` Allison Henderson
2023-03-04 2:19 ` Darrick J. Wong
2022-12-30 22:10 ` [PATCH 11/14] xfs: document metadata file repair Darrick J. Wong
2023-02-25 7:33 ` Allison Henderson
2023-03-01 2:42 ` Darrick J. Wong
2022-12-30 22:10 ` [PATCH 14/14] xfs: document future directions of online fsck Darrick J. Wong
2023-03-01 5:37 ` Allison Henderson
2023-03-02 0:39 ` Darrick J. Wong
2023-03-03 23:51 ` Allison Henderson
2023-03-04 2:28 ` Darrick J. Wong
2022-12-30 22:10 ` [PATCH 13/14] xfs: document the userspace fsck driver program Darrick J. Wong
2023-03-01 5:36 ` Allison Henderson
2023-03-02 0:27 ` Darrick J. Wong
2023-03-03 23:51 ` Allison Henderson
2023-03-04 2:25 ` Darrick J. Wong
2023-03-07 1:30 ` [PATCHSET v24.3 00/14] xfs: design documentation for online fsck Darrick J. Wong
2023-03-07 1:30 ` Darrick J. Wong
2023-03-07 1:30 ` [PATCH 01/14] xfs: document the motivation for online fsck design Darrick J. Wong
2023-03-07 1:31 ` [PATCH 02/14] xfs: document the general theory underlying " Darrick J. Wong
2023-03-07 1:31 ` [PATCH 03/14] xfs: document the testing plan for online fsck Darrick J. Wong
2023-03-07 1:31 ` [PATCH 04/14] xfs: document the user interface " Darrick J. Wong
2023-03-07 1:31 ` [PATCH 05/14] xfs: document the filesystem metadata checking strategy Darrick J. Wong
2023-03-07 1:31 ` [PATCH 06/14] xfs: document how online fsck deals with eventual consistency Darrick J. Wong
2023-03-07 1:31 ` [PATCH 07/14] xfs: document pageable kernel memory Darrick J. Wong
2023-03-07 1:31 ` [PATCH 08/14] xfs: document btree bulk loading Darrick J. Wong
2023-03-07 1:31 ` [PATCH 09/14] xfs: document online file metadata repair code Darrick J. Wong
2023-03-07 1:31 ` [PATCH 10/14] xfs: document full filesystem scans for online fsck Darrick J. Wong
2023-03-07 1:31 ` [PATCH 11/14] xfs: document metadata file repair Darrick J. Wong
2023-03-07 1:31 ` [PATCH 12/14] xfs: document directory tree repairs Darrick J. Wong
2023-03-07 1:32 ` [PATCH 13/14] xfs: document the userspace fsck driver program Darrick J. Wong
2023-03-07 1:32 ` [PATCH 14/14] xfs: document future directions of online fsck Darrick J. Wong
2022-10-02 18:19 [PATCHSET v23.3 00/14] xfs: design documentation for " Darrick J. Wong
2022-10-02 18:19 ` [PATCH 03/14] xfs: document the testing plan " Darrick J. Wong
-- strict thread matches above, loose matches on Subject: below --
2022-08-07 18:30 [PATCHSET v2 00/14] xfs: design documentation " Darrick J. Wong
2022-08-07 18:30 ` [PATCH 03/14] xfs: document the testing plan " Darrick J. Wong
2022-08-11 0:09 ` Dave Chinner
2022-08-16 2:18 ` Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y8dbt1g7SS6P3kKA@magnolia \
--to=djwong@kernel.org \
--cc=allison.henderson@oracle.com \
--cc=catherine.hoang@oracle.com \
--cc=chandan.babu@oracle.com \
--cc=david@fromorbit.com \
--cc=hch@infradead.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).