public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: zlang@redhat.com, hch@lst.de, fstests@vger.kernel.org,
	linux-xfs@vger.kernel.org
Subject: Re: [PATCH 11/23] common/xfs: find loop devices for non-blockdevs passed to _prepare_for_eio_shutdown
Date: Wed, 22 Jan 2025 16:21:46 +1100	[thread overview]
Message-ID: <Z5CAaq-SN96RSvfZ@dread.disaster.area> (raw)
In-Reply-To: <20250122040542.GV1611770@frogsfrogsfrogs>

On Tue, Jan 21, 2025 at 08:05:42PM -0800, Darrick J. Wong wrote:
> On Tue, Jan 21, 2025 at 03:37:17PM +1100, Dave Chinner wrote:
> > On Thu, Jan 16, 2025 at 03:28:02PM -0800, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <djwong@kernel.org>
> > > 
> > > xfs/336 does this somewhat sketchy thing where it mdrestores into a
> > > regular file, and then does this to validate the restored metadata:
> > > 
> > > SCRATCH_DEV=$TEST_DIR/image _scratch_mount
> > 
> > That's a canonical example of what is called "stepping on a
> > landmine".
> 
> 60% of fstests is written in bash, all of it is a friggin land mine
> because bash totally lets us do variable substitution at any time, and
> any time you make a change you have to exhaustively test the whole mess
> to make sure nothing broke...

Yes, I know, which is why the moment I saw xfs/336 I called it out -
it has never run on my machines, ever...

> (Yeah, I hate bash)

Not a great fan of it myself. But it's no worse than other scripting
languages that use JIT based syntax checking from the "if it wasn't
run it ain't tested" perspective.

> > We validate that the SCRATCH_DEV is a block device at the start of
> > check and each section it reads and runs (via common/config), and
> > then make the assumption in all the infrastructure that SCRATCH_DEV
> > always points to a valid block device.
> 
> We do?

fstests configurations for block based filesystems have always been
based on block devices and mount points, not image files. 

Yes, you can pass an image file to XFS utilities and they will do
the right thing, but not all filesystems or all the infrastructure
in fstests can handle an image file masquerading as a device.

I certainly didn't expect it.....

> Can you point me to the sentence in doc/ that says this
> explicitly? 

fstests is woefully under-documented - especially when it comes to
configuration constraints and behaviours - so I doubt it is actually
specified anywhere. AFAIA it has never been raised in discussion
for a long time (not since we added network filesystem support a
long time ago, IIRC)

However, the code is pretty explicit - common/config is responsible
for setting up and validating the runtime config before any test can
run. All test and scratch devices are passed through this
validation:

_check_device()
{
        local name=$1
        local dev_needed=$2
        local dev=$3

        if [ -z "$dev" ]; then
                if [ "$dev_needed" == "required" ]; then
                        _fatal "common/config: $name is required but not defined!"
                fi
                return 0
        fi

        if [ -b "$dev" ] || ( echo $dev | grep -qE ":|//" ); then
                # block device or a network url
                return 0
	fi

	case "$FSTYP" in
        9p|fuse|tmpfs|virtiofs|afs)
	.....
	*)
                _fatal "common/config: $name ($dev) is not a block device or a network filesystem"
        esac
}
....

Basically, it says that all the test and scratch devices (including
the external ones) must be either a block device, a network URL, or
a string that the specific filesystem under test must recognise and
accept (e.g. a directory for overlay filesystems). Otherwise fstests
will fail to run with an explicit error message that says:

	<device> is not a block device or network filesystem

Nowhere in this config validation process does fstests consider
image files as a valid device configuration for a block based
filesystem.

If we need to do stuff with a image files, we have the
infrastructure to create loop devices and then operate directly on
that dynamic loop device(s) (e.g. _mkfs_dev, _mount, _unmount,
_check_xfs_filesystem, etc) that are created.

> There's nothing I can find in the any docs and
> _try_scratch_mount does not check SCRATCH_DEV is a bdev for XFS.

That's because it's validated before we start running tests and the
assumption is that nobody is screwing with SCRATCH_DEV in a way
that makes it behave vastly differently.

Consider what it means to have to do runtime checking of the
device validity in common code before we do anything with the
device. We'd have to sprinkle _check_device calls -everywhere-.

We'd also have to check logdev and rtdev variables if USE_EXTERNAL
is set, too.

That's not a viable development strategy, nor is it a maintainable
solution to the issue at hand. It's far simpler to fix one test not
to use this trick than it is to declare "nobody can trust TEST_DEV
or SCRATCH_DEV to be a valid block device" and have to handle that
everywhere those variables are used...

> That needs to be documented.

Sure.

> > > Fix this by detecting non-bdevs and finding (we hope) the loop device
> > > that was created to handle the mount. 
> > 
> > What loop device? xfs/336 doesn't use loop devices at all.
> > 
> > Oh, this is assuming that mount will silently do a loopback mount
> > when passed a file rather than a block device. IOWs, it's relying on
> > some third party to do the loop device creation and hence allow it
> > to be mounted.
> > 
> > IOWs, this change is addressing a landmine by adding another
> > landmine.
> 
> Some would say that mount adding the ability to set up a loop dev was
> itself *avoiding* a landmine from 90s era util-linux.

True.

But in the case of fstests we explicitly create loop
devices so that we don't have to play whacky games to find the
random loop device that mount magically creates when you pass it a
file.

Making all the image file and loop device usage consistent across
all of fstests was part of the infrastructure changes in my initial
check-parallel patchset. This was necessary because killing tests
with ctrl-c would randomly leave dangling mounts and loop devices
because many tests did not have _cleanup routines to tear down
mounts that auto-created loop devices or clean up loop
devices they created themselves properly.

Part of those changes was fixing up the mess in some XFS tests
where that mixed loop device and image file based operations
interchangably. I didn't notice x/336 because it wasn't
running on my test system and so didn't attempt to fix it at the
same time...

> > I really think that xfs/336 needs to be fixed - one off test hacks
> > like this, while they may work, only make modifying and maintaining
> > the fstests infrastructure that much harder....
> 
> Yeah, it'll get cleaned up for the rtrmap fstests merge.

Thanks!

-Dave.

-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2025-01-22  5:21 UTC|newest]

Thread overview: 118+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-16 23:21 [PATCHBOMB] fstests: fix check-parallel and get all the xfs 6.13 changes merged Darrick J. Wong
2025-01-16 23:23 ` [PATCHSET 1/7] fstests: random fixes for v2025.01.12 Darrick J. Wong
2025-01-16 23:25   ` [PATCH 01/23] generic/476: fix fsstress process management Darrick J. Wong
2025-01-21  3:03     ` Dave Chinner
2025-01-16 23:25   ` [PATCH 02/23] metadump: make non-local function variables more obvious Darrick J. Wong
2025-01-21  3:06     ` Dave Chinner
2025-01-16 23:25   ` [PATCH 03/23] metadump: fix cleanup for v1 metadump testing Darrick J. Wong
2025-01-21  3:07     ` Dave Chinner
2025-01-16 23:26   ` [PATCH 04/23] generic/482: _run_fsstress needs the test filesystem Darrick J. Wong
2025-01-21  3:12     ` Dave Chinner
2025-01-22  3:27       ` Darrick J. Wong
2025-01-16 23:26   ` [PATCH 05/23] generic/019: don't fail if fio crashes while shutting down Darrick J. Wong
2025-01-21  3:12     ` Dave Chinner
2025-01-16 23:26   ` [PATCH 06/23] fuzzy: do not set _FSSTRESS_PID when exercising fsx Darrick J. Wong
2025-01-21  3:13     ` Dave Chinner
2025-01-16 23:27   ` [PATCH 07/23] common/rc: create a wrapper for the su command Darrick J. Wong
2025-01-16 23:27   ` [PATCH 08/23] common: fix pkill by running test program in a separate session Darrick J. Wong
2025-01-21  3:28     ` Dave Chinner
2025-01-22  4:24       ` Darrick J. Wong
2025-01-22  6:08         ` Dave Chinner
2025-01-22  7:05           ` Darrick J. Wong
2025-01-22  9:42             ` Dave Chinner
2025-01-22 21:46               ` Darrick J. Wong
2025-01-23  1:16                 ` Dave Chinner
2025-01-28  4:34                   ` Dave Chinner
2025-01-28  7:23                     ` Darrick J. Wong
2025-01-28 20:39                       ` Dave Chinner
2025-01-29  3:13                         ` Darrick J. Wong
2025-01-29  6:06                           ` Dave Chinner
2025-01-29  7:33                             ` Darrick J. Wong
2025-01-16 23:27   ` [PATCH 09/23] unmount: resume logging of stdout and stderr for filtering Darrick J. Wong
2025-01-21  3:52     ` Dave Chinner
2025-01-22  3:29       ` Darrick J. Wong
2025-01-16 23:27   ` [PATCH 10/23] mkfs: don't hardcode log size Darrick J. Wong
2025-01-21  3:58     ` Dave Chinner
2025-01-21 12:44       ` Theodore Ts'o
2025-01-21 22:05         ` Dave Chinner
2025-01-22  3:40           ` Darrick J. Wong
2025-01-22  3:36         ` Darrick J. Wong
2025-01-22  3:30       ` Darrick J. Wong
2025-01-16 23:28   ` [PATCH 11/23] common/xfs: find loop devices for non-blockdevs passed to _prepare_for_eio_shutdown Darrick J. Wong
2025-01-21  4:37     ` Dave Chinner
2025-01-22  4:05       ` Darrick J. Wong
2025-01-22  5:21         ` Dave Chinner [this message]
2025-01-16 23:28   ` [PATCH 12/23] preamble: fix missing _kill_fsstress Darrick J. Wong
2025-01-21  4:37     ` Dave Chinner
2025-01-16 23:28   ` [PATCH 13/23] generic/650: revert SOAK DURATION changes Darrick J. Wong
2025-01-21  4:57     ` Dave Chinner
2025-01-21 13:00       ` Theodore Ts'o
2025-01-21 22:15         ` Dave Chinner
2025-01-22  3:51           ` Darrick J. Wong
2025-01-22  4:08           ` Theodore Ts'o
2025-01-22  6:01             ` Dave Chinner
2025-01-22  7:02               ` Darrick J. Wong
2025-01-22  3:49       ` Darrick J. Wong
2025-01-22  4:12         ` Dave Chinner
2025-01-22  4:37           ` Darrick J. Wong
2025-01-16 23:28   ` [PATCH 14/23] generic/032: fix pinned mount failure Darrick J. Wong
2025-01-21  5:03     ` Dave Chinner
2025-01-22  4:08       ` Darrick J. Wong
2025-01-22  4:19         ` Dave Chinner
2025-01-16 23:29   ` [PATCH 15/23] fuzzy: stop __stress_scrub_fsx_loop if fsx fails Darrick J. Wong
2025-01-16 23:29   ` [PATCH 16/23] fuzzy: don't use readarray for xfsfind output Darrick J. Wong
2025-01-16 23:29   ` [PATCH 17/23] fuzzy: always stop the scrub fsstress loop on error Darrick J. Wong
2025-01-16 23:29   ` [PATCH 18/23] fuzzy: port fsx and fsstress loop to use --duration Darrick J. Wong
2025-01-16 23:30   ` [PATCH 19/23] common/rc: don't copy fsstress to $TEST_DIR Darrick J. Wong
2025-01-21  5:05     ` Dave Chinner
2025-01-22  3:52       ` Darrick J. Wong
2025-01-16 23:30   ` [PATCH 20/23] fix _require_scratch_duperemove ordering Darrick J. Wong
2025-01-16 23:30   ` [PATCH 21/23] fsstress: fix a memory leak Darrick J. Wong
2025-01-16 23:30   ` [PATCH 22/23] fsx: fix leaked log file pointer Darrick J. Wong
2025-01-16 23:31   ` [PATCH 23/23] build: initialize stack variables to zero by default Darrick J. Wong
2025-01-16 23:23 ` [PATCHSET 2/7] fstests: fix logwrites zeroing Darrick J. Wong
2025-01-16 23:31   ` [PATCH 1/3] logwrites: warn if we don't think read after discard returns zeroes Darrick J. Wong
2025-01-16 23:31   ` [PATCH 2/3] logwrites: use BLKZEROOUT if it's available Darrick J. Wong
2025-01-16 23:31   ` [PATCH 3/3] logwrites: only use BLKDISCARD if we know discard zeroes data Darrick J. Wong
2025-01-16 23:24 ` [PATCHSET v6.2 3/7] fstests: enable metadir Darrick J. Wong
2025-01-16 23:32   ` [PATCH 01/11] various: fix finding metadata inode numbers when metadir is enabled Darrick J. Wong
2025-01-16 23:32   ` [PATCH 02/11] xfs/{030,033,178}: forcibly disable metadata directory trees Darrick J. Wong
2025-01-16 23:32   ` [PATCH 03/11] common/repair: patch up repair sb inode value complaints Darrick J. Wong
2025-01-16 23:32   ` [PATCH 04/11] xfs/206: update for metadata directory support Darrick J. Wong
2025-01-16 23:33   ` [PATCH 05/11] xfs/{050,144,153,299,330}: update quota reports to handle metadir trees Darrick J. Wong
2025-01-16 23:33   ` [PATCH 06/11] xfs/509: adjust inumbers accounting for metadata directories Darrick J. Wong
2025-01-16 23:33   ` [PATCH 07/11] xfs: create fuzz tests " Darrick J. Wong
2025-01-16 23:34   ` [PATCH 08/11] xfs/163: bigger fs for metadir Darrick J. Wong
2025-01-16 23:34   ` [PATCH 09/11] xfs/122: disable this test for any codebase that knows about metadir Darrick J. Wong
2025-01-16 23:34   ` [PATCH 10/11] scrub: race metapath online fsck with fsstress Darrick J. Wong
2025-01-16 23:34   ` [PATCH 11/11] xfs: test metapath repairs Darrick J. Wong
2025-01-16 23:24 ` [PATCHSET v6.2 4/7] fstests: make protofiles less janky Darrick J. Wong
2025-01-16 23:35   ` [PATCH 1/1] fstests: test mkfs.xfs protofiles with xattr support Darrick J. Wong
2025-03-02 13:15     ` Zorro Lang
2025-03-04 17:42       ` Darrick J. Wong
2025-03-04 18:00         ` Zorro Lang
2025-03-04 18:21           ` Darrick J. Wong
2025-01-16 23:24 ` [PATCHSET v6.2 5/7] fstests: shard the realtime section Darrick J. Wong
2025-01-16 23:35   ` [PATCH 01/14] common/populate: refactor caching of metadumps to a helper Darrick J. Wong
2025-01-16 23:35   ` [PATCH 02/14] common/{fuzzy,populate}: use _scratch_xfs_mdrestore Darrick J. Wong
2025-01-16 23:35   ` [PATCH 03/14] fuzzy: stress data and rt sections of xfs filesystems equally Darrick J. Wong
2025-01-16 23:36   ` [PATCH 04/14] common/ext4: reformat external logs during mdrestore operations Darrick J. Wong
2025-01-16 23:36   ` [PATCH 05/14] common/populate: use metadump v2 format by default for fs metadata snapshots Darrick J. Wong
2025-01-16 23:36   ` [PATCH 06/14] punch-alternating: detect xfs realtime files with large allocation units Darrick J. Wong
2025-01-16 23:36   ` [PATCH 07/14] xfs/206: update mkfs filtering for rt groups feature Darrick J. Wong
2025-01-16 23:37   ` [PATCH 08/14] common: pass the realtime device to xfs_db when possible Darrick J. Wong
2025-01-16 23:37   ` [PATCH 09/14] xfs/185: update for rtgroups Darrick J. Wong
2025-01-16 23:37   ` [PATCH 10/14] xfs/449: update test to know about xfs_db -R Darrick J. Wong
2025-01-16 23:37   ` [PATCH 11/14] xfs/271,xfs/556: fix tests to deal with rtgroups output in bmap/fsmap commands Darrick J. Wong
2025-01-16 23:38   ` [PATCH 12/14] common/xfs: capture realtime devices during metadump/mdrestore Darrick J. Wong
2025-01-16 23:38   ` [PATCH 13/14] common/fuzzy: adapt the scrub stress tests to support rtgroups Darrick J. Wong
2025-01-16 23:38   ` [PATCH 14/14] xfs: fix fuzz tests of rtgroups bitmap and summary files Darrick J. Wong
2025-01-16 23:24 ` [PATCHSET v6.2 6/7] fstests: store quota files in the metadir Darrick J. Wong
2025-01-16 23:38   ` [PATCH 1/4] xfs: update tests for " Darrick J. Wong
2025-01-16 23:39   ` [PATCH 2/4] xfs: test persistent quota flags Darrick J. Wong
2025-01-16 23:39   ` [PATCH 3/4] xfs: fix quota detection in fuzz tests Darrick J. Wong
2025-01-16 23:39   ` [PATCH 4/4] xfs: fix tests for persistent qflags Darrick J. Wong
2025-01-16 23:25 ` [PATCHSET v6.2 7/7] fstests: enable quota for realtime volumes Darrick J. Wong
2025-01-16 23:40   ` [PATCH 1/3] common: enable testing of realtime quota when supported Darrick J. Wong
2025-01-16 23:40   ` [PATCH 2/3] xfs: fix quota tests to adapt to realtime quota Darrick J. Wong
2025-01-16 23:40   ` [PATCH 3/3] xfs: regression testing of quota on the realtime device Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z5CAaq-SN96RSvfZ@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=fstests@vger.kernel.org \
    --cc=hch@lst.de \
    --cc=linux-xfs@vger.kernel.org \
    --cc=zlang@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox