public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: zlang@redhat.com, hch@lst.de, fstests@vger.kernel.org,
	linux-xfs@vger.kernel.org
Subject: Re: [PATCH 13/23] generic/650: revert SOAK DURATION changes
Date: Tue, 21 Jan 2025 20:37:32 -0800	[thread overview]
Message-ID: <20250122043732.GY1611770@frogsfrogsfrogs> (raw)
In-Reply-To: <Z5BwGzOfPaSzXyQ3@dread.disaster.area>

On Wed, Jan 22, 2025 at 03:12:11PM +1100, Dave Chinner wrote:
> On Tue, Jan 21, 2025 at 07:49:44PM -0800, Darrick J. Wong wrote:
> > On Tue, Jan 21, 2025 at 03:57:23PM +1100, Dave Chinner wrote:
> > > On Thu, Jan 16, 2025 at 03:28:33PM -0800, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <djwong@kernel.org>
> > > > 
> > > > Prior to commit 8973af00ec21, in the absence of an explicit
> > > > SOAK_DURATION, this test would run 2500 fsstress operations each of ten
> > > > times through the loop body.  On the author's machines, this kept the
> > > > runtime to about 30s total.  Oddly, this was changed to 30s per loop
> > > > body with no specific justification in the middle of an fsstress process
> > > > management change.
> > > 
> > > I'm pretty sure that was because when you run g/650 on a machine
> > > with 64p, the number of ops performed on the filesystem is
> > > nr_cpus * 2500 * nr_loops.
> > 
> > Where does that happen?
> > 
> > Oh, heh.  -n is the number of ops *per process*.
> 
> Yeah, I just noticed another case of this:
> 
> Ten slowest tests - runtime in seconds:
> generic/750 559
> generic/311 486
> .....
> 
> generic/750 does:
> 
> nr_cpus=$((LOAD_FACTOR * 4))
> nr_ops=$((25000 * nr_cpus * TIME_FACTOR))
> fsstress_args=(-w -d $SCRATCH_MNT -n $nr_ops -p $nr_cpus)
> 
> So the actual load factor increase is exponential:
> 
> Load factor	nr_cpus		nr_ops		total ops
> 1		4		100k		400k
> 2		8		200k		1.6M
> 3		12		300k		3.6M
> 4		16		400k		6.4M
> 
> and so on.
> 
> I suspect that there are other similar cpu scaling issues
> lurking across the many fsstress tests...
> 
> > > > On the author's machine, this explodes the runtime from ~30s to 420s.
> > > > Put things back the way they were.
> > > 
> > > Yeah, OK, that's exactly waht keep_running() does - duration
> > > overrides nr_ops.
> > > 
> > > Ok, so keeping or reverting the change will simply make different
> > > people unhappy because of the excessive runtime the test has at
> > > either ends of the CPU count spectrum - what's the best way to go
> > > about providing the desired min(nr_ops, max loop time) behaviour?
> > > Do we simply cap the maximum process count to keep the number of ops
> > > down to something reasonable (e.g. 16), or something else?
> > 
> > How about running fsstress with --duration=3 if SOAK_DURATION isn't set?
> > That should keep the runtime to 30 seconds or so even on larger
> > machines:
> > 
> > if [ -n "$SOAK_DURATION" ]; then
> > 	test "$SOAK_DURATION" -lt 10 && SOAK_DURATION=10
> > 	fsstress_args+=(--duration="$((SOAK_DURATION / 10))")
> > else
> > 	# run for 3s per iteration max for a default runtime of ~30s.
> > 	fsstress_args+=(--duration=3)
> > fi
> 
> Yeah, that works for me.
> 
> As a rainy day project, perhaps we should look to convert all the
> fsstress invocations to be time bound rather than running a specific
> number of ops. i.e. hard code nr_ops=<some huge number> in
> _run_fstress_bg() and the tests only need to define parallelism and
> runtime.

I /think/ the only ones that do that are generic/1220 generic/476
generic/642 generic/750.  I could drop the nr_cpus term from the nr_ops
calculation.

> This would make the test runtimes more deterministic across machines
> with vastly different capabilities and and largely make "test xyz is
> slow on my test machine" reports largely go away.
> 
> Thoughts?

I'm fine with _run_fsstress injecting --duration=30 if no other duration
argument is passed in.

--D

> -Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 

  reply	other threads:[~2025-01-22  4:37 UTC|newest]

Thread overview: 118+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-16 23:21 [PATCHBOMB] fstests: fix check-parallel and get all the xfs 6.13 changes merged Darrick J. Wong
2025-01-16 23:23 ` [PATCHSET 1/7] fstests: random fixes for v2025.01.12 Darrick J. Wong
2025-01-16 23:25   ` [PATCH 01/23] generic/476: fix fsstress process management Darrick J. Wong
2025-01-21  3:03     ` Dave Chinner
2025-01-16 23:25   ` [PATCH 02/23] metadump: make non-local function variables more obvious Darrick J. Wong
2025-01-21  3:06     ` Dave Chinner
2025-01-16 23:25   ` [PATCH 03/23] metadump: fix cleanup for v1 metadump testing Darrick J. Wong
2025-01-21  3:07     ` Dave Chinner
2025-01-16 23:26   ` [PATCH 04/23] generic/482: _run_fsstress needs the test filesystem Darrick J. Wong
2025-01-21  3:12     ` Dave Chinner
2025-01-22  3:27       ` Darrick J. Wong
2025-01-16 23:26   ` [PATCH 05/23] generic/019: don't fail if fio crashes while shutting down Darrick J. Wong
2025-01-21  3:12     ` Dave Chinner
2025-01-16 23:26   ` [PATCH 06/23] fuzzy: do not set _FSSTRESS_PID when exercising fsx Darrick J. Wong
2025-01-21  3:13     ` Dave Chinner
2025-01-16 23:27   ` [PATCH 07/23] common/rc: create a wrapper for the su command Darrick J. Wong
2025-01-16 23:27   ` [PATCH 08/23] common: fix pkill by running test program in a separate session Darrick J. Wong
2025-01-21  3:28     ` Dave Chinner
2025-01-22  4:24       ` Darrick J. Wong
2025-01-22  6:08         ` Dave Chinner
2025-01-22  7:05           ` Darrick J. Wong
2025-01-22  9:42             ` Dave Chinner
2025-01-22 21:46               ` Darrick J. Wong
2025-01-23  1:16                 ` Dave Chinner
2025-01-28  4:34                   ` Dave Chinner
2025-01-28  7:23                     ` Darrick J. Wong
2025-01-28 20:39                       ` Dave Chinner
2025-01-29  3:13                         ` Darrick J. Wong
2025-01-29  6:06                           ` Dave Chinner
2025-01-29  7:33                             ` Darrick J. Wong
2025-01-16 23:27   ` [PATCH 09/23] unmount: resume logging of stdout and stderr for filtering Darrick J. Wong
2025-01-21  3:52     ` Dave Chinner
2025-01-22  3:29       ` Darrick J. Wong
2025-01-16 23:27   ` [PATCH 10/23] mkfs: don't hardcode log size Darrick J. Wong
2025-01-21  3:58     ` Dave Chinner
2025-01-21 12:44       ` Theodore Ts'o
2025-01-21 22:05         ` Dave Chinner
2025-01-22  3:40           ` Darrick J. Wong
2025-01-22  3:36         ` Darrick J. Wong
2025-01-22  3:30       ` Darrick J. Wong
2025-01-16 23:28   ` [PATCH 11/23] common/xfs: find loop devices for non-blockdevs passed to _prepare_for_eio_shutdown Darrick J. Wong
2025-01-21  4:37     ` Dave Chinner
2025-01-22  4:05       ` Darrick J. Wong
2025-01-22  5:21         ` Dave Chinner
2025-01-16 23:28   ` [PATCH 12/23] preamble: fix missing _kill_fsstress Darrick J. Wong
2025-01-21  4:37     ` Dave Chinner
2025-01-16 23:28   ` [PATCH 13/23] generic/650: revert SOAK DURATION changes Darrick J. Wong
2025-01-21  4:57     ` Dave Chinner
2025-01-21 13:00       ` Theodore Ts'o
2025-01-21 22:15         ` Dave Chinner
2025-01-22  3:51           ` Darrick J. Wong
2025-01-22  4:08           ` Theodore Ts'o
2025-01-22  6:01             ` Dave Chinner
2025-01-22  7:02               ` Darrick J. Wong
2025-01-22  3:49       ` Darrick J. Wong
2025-01-22  4:12         ` Dave Chinner
2025-01-22  4:37           ` Darrick J. Wong [this message]
2025-01-16 23:28   ` [PATCH 14/23] generic/032: fix pinned mount failure Darrick J. Wong
2025-01-21  5:03     ` Dave Chinner
2025-01-22  4:08       ` Darrick J. Wong
2025-01-22  4:19         ` Dave Chinner
2025-01-16 23:29   ` [PATCH 15/23] fuzzy: stop __stress_scrub_fsx_loop if fsx fails Darrick J. Wong
2025-01-16 23:29   ` [PATCH 16/23] fuzzy: don't use readarray for xfsfind output Darrick J. Wong
2025-01-16 23:29   ` [PATCH 17/23] fuzzy: always stop the scrub fsstress loop on error Darrick J. Wong
2025-01-16 23:29   ` [PATCH 18/23] fuzzy: port fsx and fsstress loop to use --duration Darrick J. Wong
2025-01-16 23:30   ` [PATCH 19/23] common/rc: don't copy fsstress to $TEST_DIR Darrick J. Wong
2025-01-21  5:05     ` Dave Chinner
2025-01-22  3:52       ` Darrick J. Wong
2025-01-16 23:30   ` [PATCH 20/23] fix _require_scratch_duperemove ordering Darrick J. Wong
2025-01-16 23:30   ` [PATCH 21/23] fsstress: fix a memory leak Darrick J. Wong
2025-01-16 23:30   ` [PATCH 22/23] fsx: fix leaked log file pointer Darrick J. Wong
2025-01-16 23:31   ` [PATCH 23/23] build: initialize stack variables to zero by default Darrick J. Wong
2025-01-16 23:23 ` [PATCHSET 2/7] fstests: fix logwrites zeroing Darrick J. Wong
2025-01-16 23:31   ` [PATCH 1/3] logwrites: warn if we don't think read after discard returns zeroes Darrick J. Wong
2025-01-16 23:31   ` [PATCH 2/3] logwrites: use BLKZEROOUT if it's available Darrick J. Wong
2025-01-16 23:31   ` [PATCH 3/3] logwrites: only use BLKDISCARD if we know discard zeroes data Darrick J. Wong
2025-01-16 23:24 ` [PATCHSET v6.2 3/7] fstests: enable metadir Darrick J. Wong
2025-01-16 23:32   ` [PATCH 01/11] various: fix finding metadata inode numbers when metadir is enabled Darrick J. Wong
2025-01-16 23:32   ` [PATCH 02/11] xfs/{030,033,178}: forcibly disable metadata directory trees Darrick J. Wong
2025-01-16 23:32   ` [PATCH 03/11] common/repair: patch up repair sb inode value complaints Darrick J. Wong
2025-01-16 23:32   ` [PATCH 04/11] xfs/206: update for metadata directory support Darrick J. Wong
2025-01-16 23:33   ` [PATCH 05/11] xfs/{050,144,153,299,330}: update quota reports to handle metadir trees Darrick J. Wong
2025-01-16 23:33   ` [PATCH 06/11] xfs/509: adjust inumbers accounting for metadata directories Darrick J. Wong
2025-01-16 23:33   ` [PATCH 07/11] xfs: create fuzz tests " Darrick J. Wong
2025-01-16 23:34   ` [PATCH 08/11] xfs/163: bigger fs for metadir Darrick J. Wong
2025-01-16 23:34   ` [PATCH 09/11] xfs/122: disable this test for any codebase that knows about metadir Darrick J. Wong
2025-01-16 23:34   ` [PATCH 10/11] scrub: race metapath online fsck with fsstress Darrick J. Wong
2025-01-16 23:34   ` [PATCH 11/11] xfs: test metapath repairs Darrick J. Wong
2025-01-16 23:24 ` [PATCHSET v6.2 4/7] fstests: make protofiles less janky Darrick J. Wong
2025-01-16 23:35   ` [PATCH 1/1] fstests: test mkfs.xfs protofiles with xattr support Darrick J. Wong
2025-03-02 13:15     ` Zorro Lang
2025-03-04 17:42       ` Darrick J. Wong
2025-03-04 18:00         ` Zorro Lang
2025-03-04 18:21           ` Darrick J. Wong
2025-01-16 23:24 ` [PATCHSET v6.2 5/7] fstests: shard the realtime section Darrick J. Wong
2025-01-16 23:35   ` [PATCH 01/14] common/populate: refactor caching of metadumps to a helper Darrick J. Wong
2025-01-16 23:35   ` [PATCH 02/14] common/{fuzzy,populate}: use _scratch_xfs_mdrestore Darrick J. Wong
2025-01-16 23:35   ` [PATCH 03/14] fuzzy: stress data and rt sections of xfs filesystems equally Darrick J. Wong
2025-01-16 23:36   ` [PATCH 04/14] common/ext4: reformat external logs during mdrestore operations Darrick J. Wong
2025-01-16 23:36   ` [PATCH 05/14] common/populate: use metadump v2 format by default for fs metadata snapshots Darrick J. Wong
2025-01-16 23:36   ` [PATCH 06/14] punch-alternating: detect xfs realtime files with large allocation units Darrick J. Wong
2025-01-16 23:36   ` [PATCH 07/14] xfs/206: update mkfs filtering for rt groups feature Darrick J. Wong
2025-01-16 23:37   ` [PATCH 08/14] common: pass the realtime device to xfs_db when possible Darrick J. Wong
2025-01-16 23:37   ` [PATCH 09/14] xfs/185: update for rtgroups Darrick J. Wong
2025-01-16 23:37   ` [PATCH 10/14] xfs/449: update test to know about xfs_db -R Darrick J. Wong
2025-01-16 23:37   ` [PATCH 11/14] xfs/271,xfs/556: fix tests to deal with rtgroups output in bmap/fsmap commands Darrick J. Wong
2025-01-16 23:38   ` [PATCH 12/14] common/xfs: capture realtime devices during metadump/mdrestore Darrick J. Wong
2025-01-16 23:38   ` [PATCH 13/14] common/fuzzy: adapt the scrub stress tests to support rtgroups Darrick J. Wong
2025-01-16 23:38   ` [PATCH 14/14] xfs: fix fuzz tests of rtgroups bitmap and summary files Darrick J. Wong
2025-01-16 23:24 ` [PATCHSET v6.2 6/7] fstests: store quota files in the metadir Darrick J. Wong
2025-01-16 23:38   ` [PATCH 1/4] xfs: update tests for " Darrick J. Wong
2025-01-16 23:39   ` [PATCH 2/4] xfs: test persistent quota flags Darrick J. Wong
2025-01-16 23:39   ` [PATCH 3/4] xfs: fix quota detection in fuzz tests Darrick J. Wong
2025-01-16 23:39   ` [PATCH 4/4] xfs: fix tests for persistent qflags Darrick J. Wong
2025-01-16 23:25 ` [PATCHSET v6.2 7/7] fstests: enable quota for realtime volumes Darrick J. Wong
2025-01-16 23:40   ` [PATCH 1/3] common: enable testing of realtime quota when supported Darrick J. Wong
2025-01-16 23:40   ` [PATCH 2/3] xfs: fix quota tests to adapt to realtime quota Darrick J. Wong
2025-01-16 23:40   ` [PATCH 3/3] xfs: regression testing of quota on the realtime device Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250122043732.GY1611770@frogsfrogsfrogs \
    --to=djwong@kernel.org \
    --cc=david@fromorbit.com \
    --cc=fstests@vger.kernel.org \
    --cc=hch@lst.de \
    --cc=linux-xfs@vger.kernel.org \
    --cc=zlang@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox