public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET] fstests: more random fixes for v2025.11.04
@ 2025-11-10 18:26 Darrick J. Wong
  2025-11-10 18:26 ` [PATCH 1/7] common: leave any breadcrumbs when _link_out_file_named can't find the output file Darrick J. Wong
                   ` (6 more replies)
  0 siblings, 7 replies; 30+ messages in thread
From: Darrick J. Wong @ 2025-11-10 18:26 UTC (permalink / raw)
  To: djwong, zlang; +Cc: fstests, fstests, linux-ext4

Hi all,

Here's the usual odd fixes for fstests.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

With a bit of luck, this should all go splendidly.
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=random-fixes

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=random-fixes

fstests git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=random-fixes
---
Commits in this patchset:
 * common: leave any breadcrumbs when _link_out_file_named can't find the output file
 * generic/778: fix severe performance problems
 * generic/778: fix background loop control with sentinel files
 * generic/019: skip test when there is no journal
 * xfs/837: fix test to work with pre-metadir quota mount options
 * generic/774: reduce file size
 * generic/774: turn off lfsr
---
 common/rc                  |    1 
 tests/generic/019          |    1 
 tests/generic/774          |    6 ---
 tests/generic/778          |   95 ++++++++++++++++++++++++++++++++------------
 tests/xfs/837              |   28 +++++++++----
 tests/xfs/837.cfg          |    1 
 tests/xfs/837.out.default  |    0 
 tests/xfs/837.out.oldquota |    6 +++
 8 files changed, 100 insertions(+), 38 deletions(-)
 create mode 100644 tests/xfs/837.cfg
 rename tests/xfs/{837.out => 837.out.default} (100%)
 create mode 100644 tests/xfs/837.out.oldquota


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/7] common: leave any breadcrumbs when _link_out_file_named can't find the output file
  2025-11-10 18:26 [PATCHSET] fstests: more random fixes for v2025.11.04 Darrick J. Wong
@ 2025-11-10 18:26 ` Darrick J. Wong
  2025-11-11  9:13   ` Christoph Hellwig
  2025-11-10 18:26 ` [PATCH 2/7] generic/778: fix severe performance problems Darrick J. Wong
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 30+ messages in thread
From: Darrick J. Wong @ 2025-11-10 18:26 UTC (permalink / raw)
  To: djwong, zlang; +Cc: fstests, linux-ext4

From: Darrick J. Wong <djwong@kernel.org>

_link_out_file_named is an obnoxiously complicated helper involving a
perl script embedded inside a bash subshell that does ... a lookup of
some sort involving comparing the comma-separated list in its second
argument against a comma-separated list in a config file that then maps
to an output file suffix.  I don't know what it really does.  The .cfg
file format is undocumented except for the perl script.

This is really irritating every time I have to touch any of these tests
with flexible golden outputs, and I frequently screw up the mapping.
The helper is not very helpful when you do this, because it doesn't even
try to tell you *which* suffix it found, let alone how it got there.

Fix this up so that the .full file gets some diagnostics, even if the
stdout text is "no qualified output".

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 common/rc |    1 +
 1 file changed, 1 insertion(+)


diff --git a/common/rc b/common/rc
index a10ac17746a3ca..cb91c00d24fb19 100644
--- a/common/rc
+++ b/common/rc
@@ -3825,6 +3825,7 @@ _link_out_file_named()
 		' <$seqfull.cfg)
 	rm -f $1 || _fail "_link_out_file_named: failed to remove existing output file"
 	ln -fs $(basename $1).$suffix $1 || _fail "$(basename $1).$suffix: could not setup output file"
+	test -r $1 || _fail "$(basename $1).$suffix: output file for feature set \"$2\" not found"
 }
 
 _link_out_file()


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 2/7] generic/778: fix severe performance problems
  2025-11-10 18:26 [PATCHSET] fstests: more random fixes for v2025.11.04 Darrick J. Wong
  2025-11-10 18:26 ` [PATCH 1/7] common: leave any breadcrumbs when _link_out_file_named can't find the output file Darrick J. Wong
@ 2025-11-10 18:26 ` Darrick J. Wong
  2025-11-11  9:27   ` Christoph Hellwig
  2025-11-13 11:53   ` Ojaswin Mujoo
  2025-11-10 18:26 ` [PATCH 3/7] generic/778: fix background loop control with sentinel files Darrick J. Wong
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 30+ messages in thread
From: Darrick J. Wong @ 2025-11-10 18:26 UTC (permalink / raw)
  To: djwong, zlang; +Cc: fstests, fstests, linux-ext4

From: Darrick J. Wong <djwong@kernel.org>

This test takes 4800s to run, which is horrible.  AFAICT it starts out
by timing how much can be written atomically to a new file in 0.2
seconds, then scales up the file size by 3x.  On not very fast storage,
this can result in file_size being set to ~250MB on a 4k fsblock
filesystem.  That's about 64,000 blocks.

The next thing this test does is try to create a file of that size
(250MB) of alternating written and unwritten blocks.  For some reason,
it sets up this file by invoking xfs_io 64,000 times to write small
amounts of data, which takes 3+ minutes on the author's system because
exec overhead is pretty high when you do that.

As a result, one loop through the test takes almost 4 minutes.  The test
loops 20 times, so it runs for 80 minutes(!!) which is a really long
time.

So the first thing we do is observe that the giant slow loop is being
run as a single thread on an empty filesystem.  Most of the time the
allocator generates a mostly physically contiguous file.  We could
fallocate the whole file instead of fallocating one block every other
time through the loop.  This halves the setup time.

Next, we can also stuff the remaining pwrite commands into a bash array
and only invoke xfs_io once every 128x through the loop.  This amortizes
the xfs_io startup time, which reduces the test loop runtime to about 20
seconds.

Finally, replace the 20x loop with a _soak_loop_running 5x loop because
5 seems like enough.  Anyone who wants more can set TIME_FACTOR or
SOAK_DURATION to get more intensive testing.  On my system this cuts the
runtime to 75 seconds.

Cc: <fstests@vger.kernel.org> # v2025.10.20
Fixes: ca954527ff9d97 ("generic: Add sudden shutdown tests for multi block atomic writes")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 tests/generic/778 |   59 ++++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 40 insertions(+), 19 deletions(-)


diff --git a/tests/generic/778 b/tests/generic/778
index 8cb1d8d4cad45d..7cfabc3a47a521 100755
--- a/tests/generic/778
+++ b/tests/generic/778
@@ -42,22 +42,28 @@ atomic_write_loop() {
 		# Due to sudden shutdown this can produce errors so just
 		# redirect them to seqres.full
 		$XFS_IO_PROG -c "open -fsd $testfile" -c "pwrite -S 0x61 -DA -V1 -b $size $off $size" >> /dev/null 2>>$seqres.full
-		echo "Written to offset: $off" >> $tmp.aw
-		off=$((off + $size))
+		echo "Written to offset: $((off + size))" >> $tmp.aw
+		off=$((off + size))
 	done
 }
 
 start_atomic_write_and_shutdown() {
 	atomic_write_loop &
 	awloop_pid=$!
+	local max_loops=100
 
 	local i=0
-	# Wait for at least first write to be recorded or 10s
-	while [ ! -f "$tmp.aw" -a $i -le 50 ]; do i=$((i + 1)); sleep 0.2; done
+	# Wait for at least first write to be recorded or too much time passes
+	while [ ! -f "$tmp.aw" -a $i -le $max_loops ]; do
+		i=$((i + 1))
+		sleep 0.2
+	done
 
-	if [[ $i -gt 50 ]]
+	cat $tmp.aw >> $seqres.full
+
+	if [[ $i -gt $max_loops ]]
 	then
-		_fail "atomic write process took too long to start"
+		_notrun "atomic write process took too long to start"
 	fi
 
 	echo >> $seqres.full
@@ -113,21 +119,34 @@ create_mixed_mappings() {
 	local off=0
 	local operations=("W" "U")
 
+	test $size_bytes -eq 0 && return
+
+	# fallocate the whole file once because preallocating single blocks
+	# with individual xfs_io invocations is really slow and the allocator
+	# usually gives out consecutive blocks anyway
+	$XFS_IO_PROG -f -c "falloc 0 $size_bytes" $file
+
+	local cmds=()
 	for ((i=0; i<$((size_bytes / blksz )); i++)); do
-		index=$(($i % ${#operations[@]}))
-		map="${operations[$index]}"
+		if (( i % 2 == 0 )); then
+			cmds+=(-c "pwrite -b $blksz $off $blksz")
+		fi
+
+		# batch the write commands into larger xfs_io invocations to
+		# amortize the fork overhead
+		if [ "${#cmds[@]}" -ge 128 ]; then
+			$XFS_IO_PROG "${cmds[@]}" "$file" >> /dev/null
+			cmds=()
+		fi
 
-		case "$map" in
-		    "W")
-			$XFS_IO_PROG -fc "pwrite -b $blksz $off $blksz" $file  >> /dev/null
-			;;
-		    "U")
-			$XFS_IO_PROG -fc "falloc $off $blksz" $file >> /dev/null
-			;;
-		esac
 		off=$((off + blksz))
 	done
 
+	if [ "${#cmds[@]}" -gt 0 ]; then
+		$XFS_IO_PROG "${cmds[@]}" "$file" >> /dev/null
+		cmds=()
+	fi
+
 	sync $file
 }
 
@@ -336,9 +355,9 @@ echo >> $seqres.full
 echo "# Populating expected data buffers" >> $seqres.full
 populate_expected_data
 
-# Loop 20 times to shake out any races due to shutdown
-for ((iter=0; iter<20; iter++))
-do
+# Loop to shake out any races due to shutdown
+iter=0
+while _soak_loop_running $TIME_FACTOR; do
 	echo >> $seqres.full
 	echo "------ Iteration $iter ------" >> $seqres.full
 
@@ -361,6 +380,8 @@ do
 	echo >> $seqres.full
 	echo "# Starting shutdown torn write test for append atomic writes" >> $seqres.full
 	test_append_torn_write
+
+	iter=$((iter + 1))
 done
 
 echo "Silence is golden"


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 3/7] generic/778: fix background loop control with sentinel files
  2025-11-10 18:26 [PATCHSET] fstests: more random fixes for v2025.11.04 Darrick J. Wong
  2025-11-10 18:26 ` [PATCH 1/7] common: leave any breadcrumbs when _link_out_file_named can't find the output file Darrick J. Wong
  2025-11-10 18:26 ` [PATCH 2/7] generic/778: fix severe performance problems Darrick J. Wong
@ 2025-11-10 18:26 ` Darrick J. Wong
  2025-11-11  9:28   ` Christoph Hellwig
  2025-11-13 11:06   ` Ojaswin Mujoo
  2025-11-10 18:27 ` [PATCH 4/7] generic/019: skip test when there is no journal Darrick J. Wong
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 30+ messages in thread
From: Darrick J. Wong @ 2025-11-10 18:26 UTC (permalink / raw)
  To: djwong, zlang; +Cc: fstests, fstests, linux-ext4

From: Darrick J. Wong <djwong@kernel.org>

This test fails on my slowish QA VM with 32k-fsblock xfs:

 --- /run/fstests/bin/tests/generic/778.out      2025-10-20 10:03:43.432910446 -0700
 +++ /var/tmp/fstests/generic/778.out.bad        2025-11-04 12:01:31.137813652 -0800
 @@ -1,2 +1,137 @@
  QA output created by 778
 -Silence is golden
 +umount: /opt: target is busy.
 +mount: /opt: /dev/sda4 already mounted on /opt.
 +       dmesg(1) may have more information after failed mount system call.
 +cycle mount failed
 +(see /var/tmp/fstests/generic/778.full for details)

Injecting a 'ps auxfww' into the _scratch_cycle_mount helper reveals
that this process is still sitting on /opt:

root     1804418  9.0  0.8 144960 134368 pts/0   Dl+  12:01   0:00 /run/fstests/xfsprogs/io/xfs_io -i -c open -fsd /opt/testfile -c pwrite -S 0x61 -DA -V1 -b 134217728 134217728 134217728

Yes, that's the xfs_io process started by atomic_write_loop.
Inexplicably, the awloop killing code terminates the subshell running
the for loop in atomic_write_loop but only waits for the subshell itself
to exit.  It doesn't wait for any of that subshell's children, and
that's why the unmount fails.

A bare "wait" (without the $awloop_pid parameter) also doesn't wait for
the xfs_io because the parent shell sees the subshell exit and treats
that as job completion.  We can't use killall here because the system
could be running check-parallel, nor can we use pkill here because the
pid namespace containment code was removed.

The simplest stupid answer is to use sentinel files to control the loop.

Cc: <fstests@vger.kernel.org> # v2025.10.20
Fixes: ca954527ff9d97 ("generic: Add sudden shutdown tests for multi block atomic writes")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 tests/generic/778 |   36 ++++++++++++++++++++++++++++++------
 1 file changed, 30 insertions(+), 6 deletions(-)


diff --git a/tests/generic/778 b/tests/generic/778
index 7cfabc3a47a521..715de458268ebc 100755
--- a/tests/generic/778
+++ b/tests/generic/778
@@ -21,6 +21,9 @@ _scratch_mount >> $seqres.full
 testfile=$SCRATCH_MNT/testfile
 touch $testfile
 
+awloop_runfile=$tmp.awloop_running
+awloop_killfile=$tmp.awloop_kill
+
 awu_max=$(_get_atomic_write_unit_max $testfile)
 blksz=$(_get_block_size $SCRATCH_MNT)
 echo "Awu max: $awu_max" >> $seqres.full
@@ -31,25 +34,48 @@ num_blocks=$((awu_max / blksz))
 filesize=$(( 10 * 1024 * 1024 * 1024 ))
 
 _cleanup() {
-	[ -n "$awloop_pid" ] && kill $awloop_pid &> /dev/null
-	wait
+	kill_awloop
 }
 
 atomic_write_loop() {
 	local off=0
 	local size=$awu_max
+
+	rm -f $awloop_killfile
+	touch $awloop_runfile
+
 	for ((i=0; i<$((filesize / $size )); i++)); do
 		# Due to sudden shutdown this can produce errors so just
 		# redirect them to seqres.full
 		$XFS_IO_PROG -c "open -fsd $testfile" -c "pwrite -S 0x61 -DA -V1 -b $size $off $size" >> /dev/null 2>>$seqres.full
+		if [ ! -w "$testfile" ] || [ -e "$awloop_killfile" ]; then
+			break
+		fi
 		echo "Written to offset: $((off + size))" >> $tmp.aw
 		off=$((off + size))
 	done
+
+	rm -f $awloop_runfile
+}
+
+# Use sentinel files to control the loop execution because we don't know the
+# pid of the xfs_io process and so we can't wait for it directly.  A bare
+# wait command won't wait for a D-state xfs_io process so we can't do that
+# either.  We can't use killall because check-parallel, and we can't pkill
+# because the pid namespacing code was removed withotu fixing check-parallel.
+kill_awloop() {
+	test -e $awloop_runfile || return
+
+	touch $awloop_killfile
+
+	for ((i=0;i<300;i++)); do
+		test -e $awloop_runfile || break
+		sleep 0.1
+	done
 }
 
 start_atomic_write_and_shutdown() {
 	atomic_write_loop &
-	awloop_pid=$!
 	local max_loops=100
 
 	local i=0
@@ -70,9 +96,7 @@ start_atomic_write_and_shutdown() {
 	echo "# Shutting down filesystem while write is running" >> $seqres.full
 	_scratch_shutdown
 
-	kill $awloop_pid 2>/dev/null  # the process might have finished already
-	wait $awloop_pid
-	unset $awloop_pid
+	kill_awloop
 }
 
 # This test has the following flow:


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 4/7] generic/019: skip test when there is no journal
  2025-11-10 18:26 [PATCHSET] fstests: more random fixes for v2025.11.04 Darrick J. Wong
                   ` (2 preceding siblings ...)
  2025-11-10 18:26 ` [PATCH 3/7] generic/778: fix background loop control with sentinel files Darrick J. Wong
@ 2025-11-10 18:27 ` Darrick J. Wong
  2025-11-11  9:28   ` Christoph Hellwig
  2025-11-10 18:27 ` [PATCH 5/7] xfs/837: fix test to work with pre-metadir quota mount options Darrick J. Wong
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 30+ messages in thread
From: Darrick J. Wong @ 2025-11-10 18:27 UTC (permalink / raw)
  To: djwong, zlang; +Cc: fstests, linux-ext4

From: Darrick J. Wong <djwong@kernel.org>

This test checks a filesystem's ability to recover from a noncritical
disk failure (e.g. journal replay) without becoming inconsistent.  This
isn't true for any filesystem that doesn't have a journal, so we should
skip the test on those platforms.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 tests/generic/019 |    1 +
 1 file changed, 1 insertion(+)


diff --git a/tests/generic/019 b/tests/generic/019
index 00badf6dc320b2..3ea88e2e94e220 100755
--- a/tests/generic/019
+++ b/tests/generic/019
@@ -126,6 +126,7 @@ _workout()
 
 
 _scratch_mkfs >> $seqres.full 2>&1 || _fail "mkfs failed"
+_require_metadata_journaling $SCRATCH_DEV
 _scratch_mount
 _allow_fail_make_request
 _workout


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 5/7] xfs/837: fix test to work with pre-metadir quota mount options
  2025-11-10 18:26 [PATCHSET] fstests: more random fixes for v2025.11.04 Darrick J. Wong
                   ` (3 preceding siblings ...)
  2025-11-10 18:27 ` [PATCH 4/7] generic/019: skip test when there is no journal Darrick J. Wong
@ 2025-11-10 18:27 ` Darrick J. Wong
  2025-11-11  9:29   ` Christoph Hellwig
  2025-11-10 18:27 ` [PATCH 6/7] generic/774: reduce file size Darrick J. Wong
  2025-11-10 18:27 ` [PATCH 7/7] generic/774: turn off lfsr Darrick J. Wong
  6 siblings, 1 reply; 30+ messages in thread
From: Darrick J. Wong @ 2025-11-10 18:27 UTC (permalink / raw)
  To: djwong, zlang; +Cc: fstests, fstests, linux-ext4

From: Darrick J. Wong <djwong@kernel.org>

Prior to metadir, xfs users always had to supply quota mount options to
get quota functionality, even if the mount options match the ondisk
superblock's qflag state.  The kernel, in turn, required a writable
filesystem if any mount options were specified.  As a result, this test
fails on those old filesystems because the _scratch_mount fails.

Metadir filesystems reuse whatever's in qflags if no mount options are
supplied, so we don't need them in MOUNT_OPTS anymore.

Change the _scratch_mount to _try_scratch_mount and add configurable
golden output to handle this case.

Cc: <fstests@vger.kernel.org> # v2025.06.22
Fixes: e225772353e212 ("xfs: add mount test for read only rt devices")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 tests/xfs/837              |   28 ++++++++++++++++++++--------
 tests/xfs/837.cfg          |    1 +
 tests/xfs/837.out.default  |    0 
 tests/xfs/837.out.oldquota |    6 ++++++
 4 files changed, 27 insertions(+), 8 deletions(-)
 create mode 100644 tests/xfs/837.cfg
 rename tests/xfs/{837.out => 837.out.default} (100%)
 create mode 100644 tests/xfs/837.out.oldquota


diff --git a/tests/xfs/837 b/tests/xfs/837
index 61e51d3a7d0e81..2fe195a009f10f 100755
--- a/tests/xfs/837
+++ b/tests/xfs/837
@@ -8,6 +8,7 @@
 # Check out various mount/remount/unmount scenarious on a read-only rtdev
 # Based on generic/050
 #
+seqfull=$0
 . ./common/preamble
 _begin_fstest mount auto quick
 
@@ -36,6 +37,17 @@ _register_cleanup "_cleanup_setrw"
 
 _scratch_mkfs "-d rtinherit" > /dev/null 2>&1
 
+# Select appropriate output file
+features=""
+if ! _xfs_has_feature "$SCRATCH_DEV" metadir && echo "$MOUNT_OPTIONS" | grep -q quota ; then
+	# Mounting with quota mount options on a non-metadir fs requires a
+	# writable fs because the kernel requires write access even if the
+	# mount options match the superblock qflags.  This means we expect to
+	# fail the ro blockdev test with with EPERM.
+	features="oldquota"
+fi
+_link_out_file "$features"
+
 #
 # Mark the rt device read-only.
 #
@@ -46,20 +58,20 @@ blockdev --setro $SCRATCH_RTDEV
 # Mount it and make sure it can't be written to.
 #
 echo "mounting read-only rt block device:"
-_scratch_mount 2>&1 | _filter_ro_mount | _filter_scratch
+_try_scratch_mount 2>&1 | _filter_ro_mount | _filter_scratch
 if [ "${PIPESTATUS[0]}" -eq 0 ]; then
 	echo "writing to file on read-only filesystem:"
 	dd if=/dev/zero of=$SCRATCH_MNT/foo bs=1M count=1 oflag=direct 2>&1 | _filter_scratch
+
+	echo "remounting read-write:"
+	_scratch_remount rw 2>&1 | _filter_scratch | _filter_ro_mount
+
+	echo "unmounting read-only filesystem"
+	_scratch_unmount 2>&1 | _filter_scratch | _filter_ending_dot
 else
-	_fail "failed to mount"
+	echo "failed to mount"
 fi
 
-echo "remounting read-write:"
-_scratch_remount rw 2>&1 | _filter_scratch | _filter_ro_mount
-
-echo "unmounting read-only filesystem"
-_scratch_unmount 2>&1 | _filter_scratch | _filter_ending_dot
-
 # success, all done
 echo "*** done"
 status=0
diff --git a/tests/xfs/837.cfg b/tests/xfs/837.cfg
new file mode 100644
index 00000000000000..01456b2fa80f04
--- /dev/null
+++ b/tests/xfs/837.cfg
@@ -0,0 +1 @@
+oldquota: oldquota
diff --git a/tests/xfs/837.out b/tests/xfs/837.out.default
similarity index 100%
rename from tests/xfs/837.out
rename to tests/xfs/837.out.default
diff --git a/tests/xfs/837.out.oldquota b/tests/xfs/837.out.oldquota
new file mode 100644
index 00000000000000..1383b4440dd8ee
--- /dev/null
+++ b/tests/xfs/837.out.oldquota
@@ -0,0 +1,6 @@
+QA output created by 837
+setting device read-only
+mounting read-only rt block device:
+mount: SCRATCH_MNT: permission denied
+failed to mount
+*** done


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 6/7] generic/774: reduce file size
  2025-11-10 18:26 [PATCHSET] fstests: more random fixes for v2025.11.04 Darrick J. Wong
                   ` (4 preceding siblings ...)
  2025-11-10 18:27 ` [PATCH 5/7] xfs/837: fix test to work with pre-metadir quota mount options Darrick J. Wong
@ 2025-11-10 18:27 ` Darrick J. Wong
  2025-11-11  9:13   ` John Garry
                     ` (2 more replies)
  2025-11-10 18:27 ` [PATCH 7/7] generic/774: turn off lfsr Darrick J. Wong
  6 siblings, 3 replies; 30+ messages in thread
From: Darrick J. Wong @ 2025-11-10 18:27 UTC (permalink / raw)
  To: djwong, zlang; +Cc: fstests, fstests, linux-ext4

From: Darrick J. Wong <djwong@kernel.org>

We've gotten complaints about this test taking hours to run and
producing stall warning on test VMs with a large number of cpu cores.  I
think this is due to the maximum atomic write unit being very large on
XFS where we can fall back to a software-based out of place write
implementation.

On the victim machine, the atomic write max is 4MB and there are 24
CPUs.  As a result, aw_bsize to be 1MB, so the file size is
1MB * 24 * 2 * 100 == 4.8GB.  I set up a test machine with fast storage
and 24 CPUs, and the atomic writes poked along at 25MB/s and the total
runtime was 300s.  On spinning rust those stats will be much worse.

Let's try backing the file size off by 10x and see if that eases the
complaints.

Cc: <fstests@vger.kernel.org> # v2025.10.20
Fixes: 9117fb93b41c38 ("generic: Add atomic write test using fio verify on file mixed mappings")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 tests/generic/774 |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


diff --git a/tests/generic/774 b/tests/generic/774
index 7a4d70167f9959..28886ed5b09ff7 100755
--- a/tests/generic/774
+++ b/tests/generic/774
@@ -29,7 +29,7 @@ aw_bsize=$(_max "$awu_min_write" "$((awu_max_write/4))")
 fsbsize=$(_get_block_size $SCRATCH_MNT)
 
 threads=$(_min "$(($(nproc) * 2 * LOAD_FACTOR))" "100")
-filesize=$((aw_bsize * threads * 100))
+filesize=$((aw_bsize * threads * 10))
 depth=$threads
 aw_io_size=$((filesize / threads))
 aw_io_inc=$aw_io_size


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 7/7] generic/774: turn off lfsr
  2025-11-10 18:26 [PATCHSET] fstests: more random fixes for v2025.11.04 Darrick J. Wong
                   ` (5 preceding siblings ...)
  2025-11-10 18:27 ` [PATCH 6/7] generic/774: reduce file size Darrick J. Wong
@ 2025-11-10 18:27 ` Darrick J. Wong
  2025-11-11  9:01   ` John Garry
                     ` (2 more replies)
  6 siblings, 3 replies; 30+ messages in thread
From: Darrick J. Wong @ 2025-11-10 18:27 UTC (permalink / raw)
  To: djwong, zlang; +Cc: fstests, fstests, linux-ext4

From: Darrick J. Wong <djwong@kernel.org>

This test fails mostly-predictably across my testing fleet with:

 --- /run/fstests/bin/tests/generic/774.out	2025-10-20 10:03:43.432910446 -0700
 +++ /var/tmp/fstests/generic/774.out.bad	2025-11-10 01:14:58.941775866 -0800
 @@ -1,2 +1,11 @@
 QA output created by 774
 +fio: failed initializing LFSR
 +verify: bad magic header 0, wanted acca at file /opt/test-file offset 0, length 33554432 (requested block: offset=0, length=33554432)
 +verify: bad magic header 0, wanted acca at file /opt/test-file offset 33554432, length 33554432 (requested block: offset=33554432, length=33554432)
 +verify: bad magic header 0, wanted acca at file /opt/test-file offset 67108864, length 33554432 (requested block: offset=67108864, length=33554432)
 +verify: bad magic header 0, wanted acca at file /opt/test-file offset 100663296, length 33554432 (requested block: offset=100663296, length=33554432)
 +verify: bad magic header 0, wanted acca at file /opt/test-file offset 134217728, length 33554432 (requested block: offset=134217728, length=33554432)
 +verify: bad magic header 0, wanted acca at file /opt/test-file offset 167772160, length 33554432 (requested block: offset=167772160, length=33554432)
 +verify: bad magic header 0, wanted acca at file /opt/test-file offset 201326592, length 33554432 (requested block: offset=201326592, length=33554432)
 +verify: bad magic header 0, wanted acca at file /opt/test-file offset 234881024, length 33554432 (requested block: offset=234881024, length=33554432)
 Silence is golden

I'm not sure why the linear feedback shift register algorithm is
specifically needed for this test.

Cc: <fstests@vger.kernel.org> # v2025.10.20
Fixes: 9117fb93b41c38 ("generic: Add atomic write test using fio verify on file mixed mappings")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 tests/generic/774 |    4 ----
 1 file changed, 4 deletions(-)


diff --git a/tests/generic/774 b/tests/generic/774
index 28886ed5b09ff7..86ab01fbd35874 100755
--- a/tests/generic/774
+++ b/tests/generic/774
@@ -56,14 +56,12 @@ group_reporting=1
 ioengine=libaio
 rw=randwrite
 io_size=$((filesize/3))
-random_generator=lfsr
 
 # Create unwritten extents
 [prep_unwritten_blocks]
 ioengine=falloc
 rw=randwrite
 io_size=$((filesize/3))
-random_generator=lfsr
 EOF
 
 cat >$fio_aw_config <<EOF
@@ -73,7 +71,6 @@ ioengine=libaio
 rw=randwrite
 direct=1
 atomic=1
-random_generator=lfsr
 group_reporting=1
 
 filename=$testfile
@@ -93,7 +90,6 @@ cat >$fio_verify_config <<EOF
 [verify_job]
 ioengine=libaio
 rw=read
-random_generator=lfsr
 group_reporting=1
 
 filename=$testfile


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 7/7] generic/774: turn off lfsr
  2025-11-10 18:27 ` [PATCH 7/7] generic/774: turn off lfsr Darrick J. Wong
@ 2025-11-11  9:01   ` John Garry
  2025-11-12 18:15     ` Darrick J. Wong
  2025-11-11  9:32   ` Christoph Hellwig
  2025-11-13 10:34   ` Ojaswin Mujoo
  2 siblings, 1 reply; 30+ messages in thread
From: John Garry @ 2025-11-11  9:01 UTC (permalink / raw)
  To: Darrick J. Wong, zlang; +Cc: fstests, linux-ext4

On 10/11/2025 18:27, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> This test fails mostly-predictably across my testing fleet with:
> 
>   --- /run/fstests/bin/tests/generic/774.out	2025-10-20 10:03:43.432910446 -0700
>   +++ /var/tmp/fstests/generic/774.out.bad	2025-11-10 01:14:58.941775866 -0800
>   @@ -1,2 +1,11 @@
>   QA output created by 774
>   +fio: failed initializing LFSR
>   +verify: bad magic header 0, wanted acca at file /opt/test-file offset 0, length 33554432 (requested block: offset=0, length=33554432)
>   +verify: bad magic header 0, wanted acca at file /opt/test-file offset 33554432, length 33554432 (requested block: offset=33554432, length=33554432)
>   +verify: bad magic header 0, wanted acca at file /opt/test-file offset 67108864, length 33554432 (requested block: offset=67108864, length=33554432)
>   +verify: bad magic header 0, wanted acca at file /opt/test-file offset 100663296, length 33554432 (requested block: offset=100663296, length=33554432)
>   +verify: bad magic header 0, wanted acca at file /opt/test-file offset 134217728, length 33554432 (requested block: offset=134217728, length=33554432)
>   +verify: bad magic header 0, wanted acca at file /opt/test-file offset 167772160, length 33554432 (requested block: offset=167772160, length=33554432)
>   +verify: bad magic header 0, wanted acca at file /opt/test-file offset 201326592, length 33554432 (requested block: offset=201326592, length=33554432)
>   +verify: bad magic header 0, wanted acca at file /opt/test-file offset 234881024, length 33554432 (requested block: offset=234881024, length=33554432)
>   Silence is golden
> 
> I'm not sure why the linear feedback shift register algorithm is
> specifically needed for this test.
> 
> Cc: <fstests@vger.kernel.org> # v2025.10.20
> Fixes: 9117fb93b41c38 ("generic: Add atomic write test using fio verify on file mixed mappings")
> Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>

Reviewed-by: John Garry <john.g.garry@oracle.com>

BTW, if you would like to make further tidy ups, then I don't think that 
verify_write_sequence=0 is required (for verify config). That is only 
relevant when we have multiple threads writing to the same region, which 
I don't think is the case here.

cheers

> ---
>   tests/generic/774 |    4 ----
>   1 file changed, 4 deletions(-)
> 
> 
> diff --git a/tests/generic/774 b/tests/generic/774
> index 28886ed5b09ff7..86ab01fbd35874 100755
> --- a/tests/generic/774
> +++ b/tests/generic/774
> @@ -56,14 +56,12 @@ group_reporting=1
>   ioengine=libaio
>   rw=randwrite
>   io_size=$((filesize/3))
> -random_generator=lfsr
>   
>   # Create unwritten extents
>   [prep_unwritten_blocks]
>   ioengine=falloc
>   rw=randwrite
>   io_size=$((filesize/3))
> -random_generator=lfsr
>   EOF
>   
>   cat >$fio_aw_config <<EOF
> @@ -73,7 +71,6 @@ ioengine=libaio
>   rw=randwrite
>   direct=1
>   atomic=1
> -random_generator=lfsr
>   group_reporting=1
>   
>   filename=$testfile
> @@ -93,7 +90,6 @@ cat >$fio_verify_config <<EOF
>   [verify_job]
>   ioengine=libaio
>   rw=read
> -random_generator=lfsr
>   group_reporting=1
>   
>   filename=$testfile
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/7] common: leave any breadcrumbs when _link_out_file_named can't find the output file
  2025-11-10 18:26 ` [PATCH 1/7] common: leave any breadcrumbs when _link_out_file_named can't find the output file Darrick J. Wong
@ 2025-11-11  9:13   ` Christoph Hellwig
  0 siblings, 0 replies; 30+ messages in thread
From: Christoph Hellwig @ 2025-11-11  9:13 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: zlang, fstests, linux-ext4

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/7] generic/774: reduce file size
  2025-11-10 18:27 ` [PATCH 6/7] generic/774: reduce file size Darrick J. Wong
@ 2025-11-11  9:13   ` John Garry
  2025-11-11  9:29     ` Christoph Hellwig
  2025-11-11  9:31   ` Christoph Hellwig
  2025-11-13 10:44   ` Ojaswin Mujoo
  2 siblings, 1 reply; 30+ messages in thread
From: John Garry @ 2025-11-11  9:13 UTC (permalink / raw)
  To: Darrick J. Wong, zlang; +Cc: fstests, linux-ext4

On 10/11/2025 18:27, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> We've gotten complaints about this test taking hours to run and
> producing stall warning on test VMs with a large number of cpu cores.  I
> think this is due to the maximum atomic write unit being very large on
> XFS where we can fall back to a software-based out of place write
> implementation.
> 
> On the victim machine, the atomic write max is 4MB and there are 24
> CPUs.  As a result, aw_bsize to be 1MB, so the file size is
> 1MB * 24 * 2 * 100 == 4.8GB.  I set up a test machine with fast storage
> and 24 CPUs, and the atomic writes poked along at 25MB/s and the total
> runtime was 300s.  On spinning rust those stats will be much worse.
> 
> Let's try backing the file size off by 10x and see if that eases the
> complaints.
> 

The awu max for xfs is still unbounded (so the file size could still be 
huge). For ext4, it is limited by HW constraints - the largest HW awu 
max I heard about is 256KB. How about also limiting awu max to something 
sane, like 1MB?

> Cc: <fstests@vger.kernel.org> # v2025.10.20
> Fixes: 9117fb93b41c38 ("generic: Add atomic write test using fio verify on file mixed mappings")
> Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
> ---
>   tests/generic/774 |    2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> 
> diff --git a/tests/generic/774 b/tests/generic/774
> index 7a4d70167f9959..28886ed5b09ff7 100755
> --- a/tests/generic/774
> +++ b/tests/generic/774
> @@ -29,7 +29,7 @@ aw_bsize=$(_max "$awu_min_write" "$((awu_max_write/4))")
>   fsbsize=$(_get_block_size $SCRATCH_MNT)
>   
>   threads=$(_min "$(($(nproc) * 2 * LOAD_FACTOR))" "100")
> -filesize=$((aw_bsize * threads * 100))
> +filesize=$((aw_bsize * threads * 10))
>   depth=$threads
>   aw_io_size=$((filesize / threads))
>   aw_io_inc=$aw_io_size
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/7] generic/778: fix severe performance problems
  2025-11-10 18:26 ` [PATCH 2/7] generic/778: fix severe performance problems Darrick J. Wong
@ 2025-11-11  9:27   ` Christoph Hellwig
  2025-11-13 11:53   ` Ojaswin Mujoo
  1 sibling, 0 replies; 30+ messages in thread
From: Christoph Hellwig @ 2025-11-11  9:27 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: zlang, fstests, linux-ext4

On Mon, Nov 10, 2025 at 10:26:32AM -0800, Darrick J. Wong wrote:
> Finally, replace the 20x loop with a _soak_loop_running 5x loop because
> 5 seems like enough.  Anyone who wants more can set TIME_FACTOR or
> SOAK_DURATION to get more intensive testing.  On my system this cuts the
> runtime to 75 seconds.

Phew!

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/7] generic/778: fix background loop control with sentinel files
  2025-11-10 18:26 ` [PATCH 3/7] generic/778: fix background loop control with sentinel files Darrick J. Wong
@ 2025-11-11  9:28   ` Christoph Hellwig
  2025-11-13 11:06   ` Ojaswin Mujoo
  1 sibling, 0 replies; 30+ messages in thread
From: Christoph Hellwig @ 2025-11-11  9:28 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: zlang, fstests, linux-ext4

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/7] generic/019: skip test when there is no journal
  2025-11-10 18:27 ` [PATCH 4/7] generic/019: skip test when there is no journal Darrick J. Wong
@ 2025-11-11  9:28   ` Christoph Hellwig
  2025-11-12 18:20     ` Darrick J. Wong
  0 siblings, 1 reply; 30+ messages in thread
From: Christoph Hellwig @ 2025-11-11  9:28 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: zlang, fstests, linux-ext4

On Mon, Nov 10, 2025 at 10:27:04AM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> This test checks a filesystem's ability to recover from a noncritical
> disk failure (e.g. journal replay) without becoming inconsistent.  This
> isn't true for any filesystem that doesn't have a journal, so we should
> skip the test on those platforms.

This is triggered by your fuse ext4 I guess?

Either way, looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

> 
---end quoted text---

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/7] xfs/837: fix test to work with pre-metadir quota mount options
  2025-11-10 18:27 ` [PATCH 5/7] xfs/837: fix test to work with pre-metadir quota mount options Darrick J. Wong
@ 2025-11-11  9:29   ` Christoph Hellwig
  0 siblings, 0 replies; 30+ messages in thread
From: Christoph Hellwig @ 2025-11-11  9:29 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: zlang, fstests, linux-ext4

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/7] generic/774: reduce file size
  2025-11-11  9:13   ` John Garry
@ 2025-11-11  9:29     ` Christoph Hellwig
  0 siblings, 0 replies; 30+ messages in thread
From: Christoph Hellwig @ 2025-11-11  9:29 UTC (permalink / raw)
  To: John Garry; +Cc: Darrick J. Wong, zlang, fstests, linux-ext4

On Tue, Nov 11, 2025 at 09:13:27AM +0000, John Garry wrote:
> On 10/11/2025 18:27, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > We've gotten complaints about this test taking hours to run and
> > producing stall warning on test VMs with a large number of cpu cores.  I
> > think this is due to the maximum atomic write unit being very large on
> > XFS where we can fall back to a software-based out of place write
> > implementation.
> > 
> > On the victim machine, the atomic write max is 4MB and there are 24
> > CPUs.  As a result, aw_bsize to be 1MB, so the file size is
> > 1MB * 24 * 2 * 100 == 4.8GB.  I set up a test machine with fast storage
> > and 24 CPUs, and the atomic writes poked along at 25MB/s and the total
> > runtime was 300s.  On spinning rust those stats will be much worse.
> > 
> > Let's try backing the file size off by 10x and see if that eases the
> > complaints.
> > 
> 
> The awu max for xfs is still unbounded (so the file size could still be
> huge). For ext4, it is limited by HW constraints - the largest HW awu max I
> heard about is 256KB. How about also limiting awu max to something sane,
> like 1MB?

Sounds fine to ne, as long as we document it as an arbitrary limit.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/7] generic/774: reduce file size
  2025-11-10 18:27 ` [PATCH 6/7] generic/774: reduce file size Darrick J. Wong
  2025-11-11  9:13   ` John Garry
@ 2025-11-11  9:31   ` Christoph Hellwig
  2025-11-12 18:05     ` Darrick J. Wong
  2025-11-13 10:44   ` Ojaswin Mujoo
  2 siblings, 1 reply; 30+ messages in thread
From: Christoph Hellwig @ 2025-11-11  9:31 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: zlang, fstests, linux-ext4

On Mon, Nov 10, 2025 at 10:27:35AM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> We've gotten complaints about this test taking hours to run and
> producing stall warning on test VMs with a large number of cpu cores.  I

Btw, we should still find a way to avoid stalls for anything a user
can easily trigger, independent of fixing the test case.  What were
these stall warnings?

The patch looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 7/7] generic/774: turn off lfsr
  2025-11-10 18:27 ` [PATCH 7/7] generic/774: turn off lfsr Darrick J. Wong
  2025-11-11  9:01   ` John Garry
@ 2025-11-11  9:32   ` Christoph Hellwig
  2025-11-13 10:34   ` Ojaswin Mujoo
  2 siblings, 0 replies; 30+ messages in thread
From: Christoph Hellwig @ 2025-11-11  9:32 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: zlang, fstests, linux-ext4

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/7] generic/774: reduce file size
  2025-11-11  9:31   ` Christoph Hellwig
@ 2025-11-12 18:05     ` Darrick J. Wong
  0 siblings, 0 replies; 30+ messages in thread
From: Darrick J. Wong @ 2025-11-12 18:05 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: zlang, fstests, linux-ext4

On Tue, Nov 11, 2025 at 01:31:06AM -0800, Christoph Hellwig wrote:
> On Mon, Nov 10, 2025 at 10:27:35AM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > We've gotten complaints about this test taking hours to run and
> > producing stall warning on test VMs with a large number of cpu cores.  I
> 
> Btw, we should still find a way to avoid stalls for anything a user
> can easily trigger, independent of fixing the test case.  What were
> these stall warnings?

The ones I see look like this:

FSTYP         -- xfs (debug)
PLATFORM      -- Linux/x86_64 oci-mtr49 6.18.0-rc5-xfsx #rc5 SMP PREEMPT_DYNAMIC Sun Nov  9 20:11:20 PST 2025
MKFS_OPTIONS  -- -f -m metadir=1,autofsck=1,uquota,gquota,pquota, -b size=32768, /dev/sda4
MOUNT_OPTIONS -- /dev/sda4 /opt

and dmesg does this:

run fstests generic/774 at 2025-11-10 13:37:52
XFS (sda3): EXPERIMENTAL large block size feature enabled.  Use at your own risk!
XFS (sda3): EXPERIMENTAL metadata directory tree feature enabled.  Use at your own risk!
XFS (sda3): Mounting V5 Filesystem 0280a424-6e91-401b-8c47-864532b77ceb
XFS (sda3): Ending clean mount
XFS (sda4): EXPERIMENTAL large block size feature enabled.  Use at your own risk!
XFS (sda4): EXPERIMENTAL metadata directory tree feature enabled.  Use at your own risk!
XFS (sda4): Mounting V5 Filesystem f09b8bdf-3518-44e5-97cb-e0ab36808488
XFS (sda4): Ending clean mount
XFS (sda4): Quotacheck needed: Please wait.
XFS (sda4): Quotacheck: Done.
XFS (sda4): Unmounting Filesystem f09b8bdf-3518-44e5-97cb-e0ab36808488
XFS (sda4): EXPERIMENTAL large block size feature enabled.  Use at your own risk!
XFS (sda4): EXPERIMENTAL metadata directory tree feature enabled.  Use at your own risk!
XFS (sda4): Mounting V5 Filesystem 11615837-dfa0-47b8-a078-21f9f1f7525b
XFS (sda4): Ending clean mount
XFS (sda4): Quotacheck needed: Please wait.
XFS (sda4): Quotacheck: Done.
workqueue: iomap_dio_complete_work hogged CPU for >10666us 4 times, consider switching to WQ_UNBOUND
workqueue: iomap_dio_complete_work hogged CPU for >10666us 5 times, consider switching to WQ_UNBOUND
workqueue: iomap_dio_complete_work hogged CPU for >10666us 7 times, consider switching to WQ_UNBOUND
workqueue: iomap_dio_complete_work hogged CPU for >10666us 11 times, consider switching to WQ_UNBOUND
workqueue: iomap_dio_complete_work hogged CPU for >10666us 19 times, consider switching to WQ_UNBOUND
INFO: task 3:1:2248454 blocked for more than 61 seconds.
      Tainted: G        W           6.18.0-rc5-xfsx #rc5
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:3:1             state:D stack:12744 pid:2248454 tgid:2248454 ppid:2      task_flags:0x4248060 flags:0x00080000
Workqueue: dio/sda4 iomap_dio_complete_work
Call Trace:
 <TASK>
 __schedule+0x4cb/0x1a70
 ? check_preempt_wakeup_fair+0x162/0x2a0
 ? wakeup_preempt+0x66/0x70
 schedule+0x2a/0xe0
 schedule_preempt_disabled+0x18/0x30
 rwsem_down_write_slowpath+0x2c5/0x780
 down_write+0x6e/0x70
 xfs_reflink_end_atomic_cow+0x133/0x200 [xfs 28bfed63550f2a1085614a29851734fab813b29a]
 ? finish_task_switch.isra.0+0x9b/0x2b0
 xfs_dio_write_end_io+0x117/0x320 [xfs 28bfed63550f2a1085614a29851734fab813b29a]
 iomap_dio_complete+0x41/0x240
 ? aio_fsync_work+0x120/0x120
 iomap_dio_complete_work+0x17/0x30
 process_one_work+0x196/0x3e0
 worker_thread+0x264/0x380
 ? _raw_spin_unlock_irqrestore+0x1e/0x40
 ? rescuer_thread+0x4f0/0x4f0
 kthread+0x117/0x270
 ? kthread_complete_and_exit+0x20/0x20
 ? kthread_complete_and_exit+0x20/0x20
 ret_from_fork+0xa4/0xe0
 ? kthread_complete_and_exit+0x20/0x20
 ret_from_fork_asm+0x11/0x20
 </TASK>
INFO: task 3:1:2248454 <writer> blocked on an rw-semaphore likely owned by task 3:13:2308404 <writer>
INFO: task 3:13:2308404 blocked for more than 61 seconds.
      Tainted: G        W           6.18.0-rc5-xfsx #rc5
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:3:13            state:D stack:12184 pid:2308404 tgid:2308404 ppid:2      task_flags:0x4248060 flags:0x00080000
Workqueue: dio/sda4 iomap_dio_complete_work
Call Trace:
 <TASK>
 __schedule+0x4cb/0x1a70
 ? do_raw_spin_unlock+0x49/0xb0
 ? _raw_spin_unlock_irqrestore+0x1e/0x40
 schedule+0x2a/0xe0
 xlog_grant_head_wait+0x5c/0x2a0 [xfs 28bfed63550f2a1085614a29851734fab813b29a]
 ? kmem_cache_free+0x540/0x5b0
 xlog_grant_head_check+0x10e/0x180 [xfs 28bfed63550f2a1085614a29851734fab813b29a]
 xfs_log_regrant+0xc2/0x1e0 [xfs 28bfed63550f2a1085614a29851734fab813b29a]
 xfs_trans_roll+0x90/0xc0 [xfs 28bfed63550f2a1085614a29851734fab813b29a]
 xfs_defer_trans_roll+0x73/0x120 [xfs 28bfed63550f2a1085614a29851734fab813b29a]
 xfs_defer_finish_noroll+0x2a3/0x520 [xfs 28bfed63550f2a1085614a29851734fab813b29a]
 xfs_trans_commit+0x3d/0x70 [xfs 28bfed63550f2a1085614a29851734fab813b29a]
 xfs_reflink_end_atomic_cow+0x19c/0x200 [xfs 28bfed63550f2a1085614a29851734fab813b29a]
 ? finish_task_switch.isra.0+0x9b/0x2b0
 xfs_dio_write_end_io+0x117/0x320 [xfs 28bfed63550f2a1085614a29851734fab813b29a]
 iomap_dio_complete+0x41/0x240
 ? aio_fsync_work+0x120/0x120
 iomap_dio_complete_work+0x17/0x30
 process_one_work+0x196/0x3e0
 worker_thread+0x264/0x380
 ? _raw_spin_unlock_irqrestore+0x1e/0x40
 ? rescuer_thread+0x4f0/0x4f0
 kthread+0x117/0x270
 ? kthread_complete_and_exit+0x20/0x20
 ? kthread_complete_and_exit+0x20/0x20
 ret_from_fork+0xa4/0xe0
 ? kthread_complete_and_exit+0x20/0x20
 ret_from_fork_asm+0x11/0x20
 </TASK>

I think this is an ABBA livelock where nearly all the threads hold a log
grant reservation and are trying to obtain ILOCK; but one thread is in
the middle of an atomic cow write ioend transaction chain but has run
out of permanent reservation during xfs_trans_roll, which means it holds
ILOCK and is trying to reserve log grant space.

I also suspect this could happen with regular ioend remapping of shared
space after a COW.  Perhaps for these ioend types we should chain ioends
together like we do for direct writes to zoned storage.  Remapping isn't
going to be low-latency like pure overwrites anyway.

> The patch looks good:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Thanks!

--D

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 7/7] generic/774: turn off lfsr
  2025-11-11  9:01   ` John Garry
@ 2025-11-12 18:15     ` Darrick J. Wong
  0 siblings, 0 replies; 30+ messages in thread
From: Darrick J. Wong @ 2025-11-12 18:15 UTC (permalink / raw)
  To: John Garry; +Cc: zlang, fstests, linux-ext4

On Tue, Nov 11, 2025 at 09:01:59AM +0000, John Garry wrote:
> On 10/11/2025 18:27, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > This test fails mostly-predictably across my testing fleet with:
> > 
> >   --- /run/fstests/bin/tests/generic/774.out	2025-10-20 10:03:43.432910446 -0700
> >   +++ /var/tmp/fstests/generic/774.out.bad	2025-11-10 01:14:58.941775866 -0800
> >   @@ -1,2 +1,11 @@
> >   QA output created by 774
> >   +fio: failed initializing LFSR
> >   +verify: bad magic header 0, wanted acca at file /opt/test-file offset 0, length 33554432 (requested block: offset=0, length=33554432)
> >   +verify: bad magic header 0, wanted acca at file /opt/test-file offset 33554432, length 33554432 (requested block: offset=33554432, length=33554432)
> >   +verify: bad magic header 0, wanted acca at file /opt/test-file offset 67108864, length 33554432 (requested block: offset=67108864, length=33554432)
> >   +verify: bad magic header 0, wanted acca at file /opt/test-file offset 100663296, length 33554432 (requested block: offset=100663296, length=33554432)
> >   +verify: bad magic header 0, wanted acca at file /opt/test-file offset 134217728, length 33554432 (requested block: offset=134217728, length=33554432)
> >   +verify: bad magic header 0, wanted acca at file /opt/test-file offset 167772160, length 33554432 (requested block: offset=167772160, length=33554432)
> >   +verify: bad magic header 0, wanted acca at file /opt/test-file offset 201326592, length 33554432 (requested block: offset=201326592, length=33554432)
> >   +verify: bad magic header 0, wanted acca at file /opt/test-file offset 234881024, length 33554432 (requested block: offset=234881024, length=33554432)
> >   Silence is golden
> > 
> > I'm not sure why the linear feedback shift register algorithm is
> > specifically needed for this test.
> > 
> > Cc: <fstests@vger.kernel.org> # v2025.10.20
> > Fixes: 9117fb93b41c38 ("generic: Add atomic write test using fio verify on file mixed mappings")
> > Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
> 
> Reviewed-by: John Garry <john.g.garry@oracle.com>
> 
> BTW, if you would like to make further tidy ups, then I don't think that
> verify_write_sequence=0 is required (for verify config). That is only
> relevant when we have multiple threads writing to the same region, which I
> don't think is the case here.

No I would not.  Send a separate patch with a separate justification
please.

--D

> cheers
> 
> > ---
> >   tests/generic/774 |    4 ----
> >   1 file changed, 4 deletions(-)
> > 
> > 
> > diff --git a/tests/generic/774 b/tests/generic/774
> > index 28886ed5b09ff7..86ab01fbd35874 100755
> > --- a/tests/generic/774
> > +++ b/tests/generic/774
> > @@ -56,14 +56,12 @@ group_reporting=1
> >   ioengine=libaio
> >   rw=randwrite
> >   io_size=$((filesize/3))
> > -random_generator=lfsr
> >   # Create unwritten extents
> >   [prep_unwritten_blocks]
> >   ioengine=falloc
> >   rw=randwrite
> >   io_size=$((filesize/3))
> > -random_generator=lfsr
> >   EOF
> >   cat >$fio_aw_config <<EOF
> > @@ -73,7 +71,6 @@ ioengine=libaio
> >   rw=randwrite
> >   direct=1
> >   atomic=1
> > -random_generator=lfsr
> >   group_reporting=1
> >   filename=$testfile
> > @@ -93,7 +90,6 @@ cat >$fio_verify_config <<EOF
> >   [verify_job]
> >   ioengine=libaio
> >   rw=read
> > -random_generator=lfsr
> >   group_reporting=1
> >   filename=$testfile
> > 
> 
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/7] generic/019: skip test when there is no journal
  2025-11-11  9:28   ` Christoph Hellwig
@ 2025-11-12 18:20     ` Darrick J. Wong
  0 siblings, 0 replies; 30+ messages in thread
From: Darrick J. Wong @ 2025-11-12 18:20 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: zlang, fstests, linux-ext4

On Tue, Nov 11, 2025 at 01:28:48AM -0800, Christoph Hellwig wrote:
> On Mon, Nov 10, 2025 at 10:27:04AM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > This test checks a filesystem's ability to recover from a noncritical
> > disk failure (e.g. journal replay) without becoming inconsistent.  This
> > isn't true for any filesystem that doesn't have a journal, so we should
> > skip the test on those platforms.
> 
> This is triggered by your fuse ext4 I guess?

Yep.

--D

> Either way, looks good:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> 
> > 
> ---end quoted text---
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 7/7] generic/774: turn off lfsr
  2025-11-10 18:27 ` [PATCH 7/7] generic/774: turn off lfsr Darrick J. Wong
  2025-11-11  9:01   ` John Garry
  2025-11-11  9:32   ` Christoph Hellwig
@ 2025-11-13 10:34   ` Ojaswin Mujoo
  2 siblings, 0 replies; 30+ messages in thread
From: Ojaswin Mujoo @ 2025-11-13 10:34 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: zlang, fstests, linux-ext4

On Mon, Nov 10, 2025 at 10:27:51AM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> This test fails mostly-predictably across my testing fleet with:
> 
>  --- /run/fstests/bin/tests/generic/774.out	2025-10-20 10:03:43.432910446 -0700
>  +++ /var/tmp/fstests/generic/774.out.bad	2025-11-10 01:14:58.941775866 -0800
>  @@ -1,2 +1,11 @@
>  QA output created by 774
>  +fio: failed initializing LFSR
>  +verify: bad magic header 0, wanted acca at file /opt/test-file offset 0, length 33554432 (requested block: offset=0, length=33554432)
>  +verify: bad magic header 0, wanted acca at file /opt/test-file offset 33554432, length 33554432 (requested block: offset=33554432, length=33554432)
>  +verify: bad magic header 0, wanted acca at file /opt/test-file offset 67108864, length 33554432 (requested block: offset=67108864, length=33554432)
>  +verify: bad magic header 0, wanted acca at file /opt/test-file offset 100663296, length 33554432 (requested block: offset=100663296, length=33554432)
>  +verify: bad magic header 0, wanted acca at file /opt/test-file offset 134217728, length 33554432 (requested block: offset=134217728, length=33554432)
>  +verify: bad magic header 0, wanted acca at file /opt/test-file offset 167772160, length 33554432 (requested block: offset=167772160, length=33554432)
>  +verify: bad magic header 0, wanted acca at file /opt/test-file offset 201326592, length 33554432 (requested block: offset=201326592, length=33554432)
>  +verify: bad magic header 0, wanted acca at file /opt/test-file offset 234881024, length 33554432 (requested block: offset=234881024, length=33554432)
>  Silence is golden
> 
> I'm not sure why the linear feedback shift register algorithm is
> specifically needed for this test.

Hi Darrick thanks for the fix. Strange that I never observed this but
yes it is not needed.

Feel free to add:
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>

Regards,
ojaswin

> 
> Cc: <fstests@vger.kernel.org> # v2025.10.20
> Fixes: 9117fb93b41c38 ("generic: Add atomic write test using fio verify on file mixed mappings")
> Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
> ---
>  tests/generic/774 |    4 ----
>  1 file changed, 4 deletions(-)
> 
> 
> diff --git a/tests/generic/774 b/tests/generic/774
> index 28886ed5b09ff7..86ab01fbd35874 100755
> --- a/tests/generic/774
> +++ b/tests/generic/774
> @@ -56,14 +56,12 @@ group_reporting=1
>  ioengine=libaio
>  rw=randwrite
>  io_size=$((filesize/3))
> -random_generator=lfsr
>  
>  # Create unwritten extents
>  [prep_unwritten_blocks]
>  ioengine=falloc
>  rw=randwrite
>  io_size=$((filesize/3))
> -random_generator=lfsr
>  EOF
>  
>  cat >$fio_aw_config <<EOF
> @@ -73,7 +71,6 @@ ioengine=libaio
>  rw=randwrite
>  direct=1
>  atomic=1
> -random_generator=lfsr
>  group_reporting=1
>  
>  filename=$testfile
> @@ -93,7 +90,6 @@ cat >$fio_verify_config <<EOF
>  [verify_job]
>  ioengine=libaio
>  rw=read
> -random_generator=lfsr
>  group_reporting=1
>  
>  filename=$testfile
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/7] generic/774: reduce file size
  2025-11-10 18:27 ` [PATCH 6/7] generic/774: reduce file size Darrick J. Wong
  2025-11-11  9:13   ` John Garry
  2025-11-11  9:31   ` Christoph Hellwig
@ 2025-11-13 10:44   ` Ojaswin Mujoo
  2 siblings, 0 replies; 30+ messages in thread
From: Ojaswin Mujoo @ 2025-11-13 10:44 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: zlang, fstests, linux-ext4

On Mon, Nov 10, 2025 at 10:27:35AM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> We've gotten complaints about this test taking hours to run and
> producing stall warning on test VMs with a large number of cpu cores.  I
> think this is due to the maximum atomic write unit being very large on
> XFS where we can fall back to a software-based out of place write
> implementation.
> 
> On the victim machine, the atomic write max is 4MB and there are 24
> CPUs.  As a result, aw_bsize to be 1MB, so the file size is
> 1MB * 24 * 2 * 100 == 4.8GB.  I set up a test machine with fast storage
> and 24 CPUs, and the atomic writes poked along at 25MB/s and the total
> runtime was 300s.  On spinning rust those stats will be much worse.
> 
> Let's try backing the file size off by 10x and see if that eases the
> complaints.

Hi Darrick,

I agree with John's comments on limiting the awu_max to 1MB as well. But
regardless, this change looks good to me. 

Feel free to add:
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>

Thanks for fixing this up, I think I didn't test enough with slow
storage which made me miss these issues for g/774 and g/778

Regards,
ojaswin

> 
> Cc: <fstests@vger.kernel.org> # v2025.10.20
> Fixes: 9117fb93b41c38 ("generic: Add atomic write test using fio verify on file mixed mappings")
> Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
> ---
>  tests/generic/774 |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> 
> diff --git a/tests/generic/774 b/tests/generic/774
> index 7a4d70167f9959..28886ed5b09ff7 100755
> --- a/tests/generic/774
> +++ b/tests/generic/774
> @@ -29,7 +29,7 @@ aw_bsize=$(_max "$awu_min_write" "$((awu_max_write/4))")
>  fsbsize=$(_get_block_size $SCRATCH_MNT)
>  
>  threads=$(_min "$(($(nproc) * 2 * LOAD_FACTOR))" "100")
> -filesize=$((aw_bsize * threads * 100))
> +filesize=$((aw_bsize * threads * 10))
>  depth=$threads
>  aw_io_size=$((filesize / threads))
>  aw_io_inc=$aw_io_size
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/7] generic/778: fix background loop control with sentinel files
  2025-11-10 18:26 ` [PATCH 3/7] generic/778: fix background loop control with sentinel files Darrick J. Wong
  2025-11-11  9:28   ` Christoph Hellwig
@ 2025-11-13 11:06   ` Ojaswin Mujoo
  1 sibling, 0 replies; 30+ messages in thread
From: Ojaswin Mujoo @ 2025-11-13 11:06 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: zlang, fstests, linux-ext4

On Mon, Nov 10, 2025 at 10:26:48AM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> This test fails on my slowish QA VM with 32k-fsblock xfs:
> 
>  --- /run/fstests/bin/tests/generic/778.out      2025-10-20 10:03:43.432910446 -0700
>  +++ /var/tmp/fstests/generic/778.out.bad        2025-11-04 12:01:31.137813652 -0800
>  @@ -1,2 +1,137 @@
>   QA output created by 778
>  -Silence is golden
>  +umount: /opt: target is busy.
>  +mount: /opt: /dev/sda4 already mounted on /opt.
>  +       dmesg(1) may have more information after failed mount system call.
>  +cycle mount failed
>  +(see /var/tmp/fstests/generic/778.full for details)
> 
> Injecting a 'ps auxfww' into the _scratch_cycle_mount helper reveals
> that this process is still sitting on /opt:
> 
> root     1804418  9.0  0.8 144960 134368 pts/0   Dl+  12:01   0:00 /run/fstests/xfsprogs/io/xfs_io -i -c open -fsd /opt/testfile -c pwrite -S 0x61 -DA -V1 -b 134217728 134217728 134217728
> 
> Yes, that's the xfs_io process started by atomic_write_loop.
> Inexplicably, the awloop killing code terminates the subshell running
> the for loop in atomic_write_loop but only waits for the subshell itself
> to exit.  It doesn't wait for any of that subshell's children, and
> that's why the unmount fails.

Ouch, thanks for catching this. This approach looks good to me.

Feel free to add:
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>

Regards,
ojaswin

> 
> A bare "wait" (without the $awloop_pid parameter) also doesn't wait for
> the xfs_io because the parent shell sees the subshell exit and treats
> that as job completion.  We can't use killall here because the system
> could be running check-parallel, nor can we use pkill here because the
> pid namespace containment code was removed.
> 
> The simplest stupid answer is to use sentinel files to control the loop.
> 
> Cc: <fstests@vger.kernel.org> # v2025.10.20
> Fixes: ca954527ff9d97 ("generic: Add sudden shutdown tests for multi block atomic writes")
> Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
> ---
>  tests/generic/778 |   36 ++++++++++++++++++++++++++++++------
>  1 file changed, 30 insertions(+), 6 deletions(-)
> 
> 
> diff --git a/tests/generic/778 b/tests/generic/778
> index 7cfabc3a47a521..715de458268ebc 100755
> --- a/tests/generic/778
> +++ b/tests/generic/778
> @@ -21,6 +21,9 @@ _scratch_mount >> $seqres.full
>  testfile=$SCRATCH_MNT/testfile
>  touch $testfile
>  
> +awloop_runfile=$tmp.awloop_running
> +awloop_killfile=$tmp.awloop_kill
> +
>  awu_max=$(_get_atomic_write_unit_max $testfile)
>  blksz=$(_get_block_size $SCRATCH_MNT)
>  echo "Awu max: $awu_max" >> $seqres.full
> @@ -31,25 +34,48 @@ num_blocks=$((awu_max / blksz))
>  filesize=$(( 10 * 1024 * 1024 * 1024 ))
>  
>  _cleanup() {
> -	[ -n "$awloop_pid" ] && kill $awloop_pid &> /dev/null
> -	wait
> +	kill_awloop
>  }
>  
>  atomic_write_loop() {
>  	local off=0
>  	local size=$awu_max
> +
> +	rm -f $awloop_killfile
> +	touch $awloop_runfile
> +
>  	for ((i=0; i<$((filesize / $size )); i++)); do
>  		# Due to sudden shutdown this can produce errors so just
>  		# redirect them to seqres.full
>  		$XFS_IO_PROG -c "open -fsd $testfile" -c "pwrite -S 0x61 -DA -V1 -b $size $off $size" >> /dev/null 2>>$seqres.full
> +		if [ ! -w "$testfile" ] || [ -e "$awloop_killfile" ]; then
> +			break
> +		fi
>  		echo "Written to offset: $((off + size))" >> $tmp.aw
>  		off=$((off + size))
>  	done
> +
> +	rm -f $awloop_runfile
> +}
> +
> +# Use sentinel files to control the loop execution because we don't know the
> +# pid of the xfs_io process and so we can't wait for it directly.  A bare
> +# wait command won't wait for a D-state xfs_io process so we can't do that
> +# either.  We can't use killall because check-parallel, and we can't pkill
> +# because the pid namespacing code was removed withotu fixing check-parallel.
> +kill_awloop() {
> +	test -e $awloop_runfile || return
> +
> +	touch $awloop_killfile
> +
> +	for ((i=0;i<300;i++)); do
> +		test -e $awloop_runfile || break
> +		sleep 0.1
> +	done
>  }
>  
>  start_atomic_write_and_shutdown() {
>  	atomic_write_loop &
> -	awloop_pid=$!
>  	local max_loops=100
>  
>  	local i=0
> @@ -70,9 +96,7 @@ start_atomic_write_and_shutdown() {
>  	echo "# Shutting down filesystem while write is running" >> $seqres.full
>  	_scratch_shutdown
>  
> -	kill $awloop_pid 2>/dev/null  # the process might have finished already
> -	wait $awloop_pid
> -	unset $awloop_pid
> +	kill_awloop
>  }
>  
>  # This test has the following flow:
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/7] generic/778: fix severe performance problems
  2025-11-10 18:26 ` [PATCH 2/7] generic/778: fix severe performance problems Darrick J. Wong
  2025-11-11  9:27   ` Christoph Hellwig
@ 2025-11-13 11:53   ` Ojaswin Mujoo
  2025-11-15  2:32     ` Darrick J. Wong
  1 sibling, 1 reply; 30+ messages in thread
From: Ojaswin Mujoo @ 2025-11-13 11:53 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: zlang, fstests, linux-ext4

On Mon, Nov 10, 2025 at 10:26:32AM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> This test takes 4800s to run, which is horrible.  AFAICT it starts out
> by timing how much can be written atomically to a new file in 0.2
> seconds, then scales up the file size by 3x.  On not very fast storage,

Hi Darrick,

So 250MB in 0.2s is like 1.2GBps which seems pretty fast. Did you mean
"On fast storage ..." ?

> this can result in file_size being set to ~250MB on a 4k fsblock
> filesystem.  That's about 64,000 blocks.
> 
> The next thing this test does is try to create a file of that size
> (250MB) of alternating written and unwritten blocks.  For some reason,
> it sets up this file by invoking xfs_io 64,000 times to write small
> amounts of data, which takes 3+ minutes on the author's system because
> exec overhead is pretty high when you do that.

> 
> As a result, one loop through the test takes almost 4 minutes.  The test
> loops 20 times, so it runs for 80 minutes(!!) which is a really long
> time.
> 
> So the first thing we do is observe that the giant slow loop is being
> run as a single thread on an empty filesystem.  Most of the time the
> allocator generates a mostly physically contiguous file.  We could
> fallocate the whole file instead of fallocating one block every other
> time through the loop.  This halves the setup time.
> 
> Next, we can also stuff the remaining pwrite commands into a bash array
> and only invoke xfs_io once every 128x through the loop.  This amortizes
> the xfs_io startup time, which reduces the test loop runtime to about 20
> seconds.

Oh right, this is very bad. Weirdly I never noticed the test taking such
a huge time while testing on scsi_debug and also on an enterprise SSD.

Thanks for fixing this up though, I will start using maybe dm-delay
while stressing the tests in the future to avoid such issues.

> 
> Finally, replace the 20x loop with a _soak_loop_running 5x loop because
> 5 seems like enough.  Anyone who wants more can set TIME_FACTOR or
> SOAK_DURATION to get more intensive testing.  On my system this cuts the
> runtime to 75 seconds.

So about the loops, we were running a modified version of this test,
which used non atomic writes, to confirm if we are able to catch torn
writes this way. We noticed that it sometimes took 10+ loops to observe
the torn write. Hence we kept iters=20. Since catching a torn write is
critical for working of atomic writes, I think it might make sense to
leave it at 20. If we feel this is a very high value, we can perhaps
remove -g auto and keep -g stress -g atomicwrites so only people who
explicitly want to stress atomic writes will run it.

> 
> Cc: <fstests@vger.kernel.org> # v2025.10.20
> Fixes: ca954527ff9d97 ("generic: Add sudden shutdown tests for multi block atomic writes")
> Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
> ---
>  tests/generic/778 |   59 ++++++++++++++++++++++++++++++++++++-----------------
>  1 file changed, 40 insertions(+), 19 deletions(-)
> 
> 
> diff --git a/tests/generic/778 b/tests/generic/778
> index 8cb1d8d4cad45d..7cfabc3a47a521 100755
> --- a/tests/generic/778
> +++ b/tests/generic/778
> @@ -42,22 +42,28 @@ atomic_write_loop() {
>  		# Due to sudden shutdown this can produce errors so just
>  		# redirect them to seqres.full
>  		$XFS_IO_PROG -c "open -fsd $testfile" -c "pwrite -S 0x61 -DA -V1 -b $size $off $size" >> /dev/null 2>>$seqres.full
> -		echo "Written to offset: $off" >> $tmp.aw
> -		off=$((off + $size))
> +		echo "Written to offset: $((off + size))" >> $tmp.aw
> +		off=$((off + size))
>  	done
>  }
>  
>  start_atomic_write_and_shutdown() {
>  	atomic_write_loop &
>  	awloop_pid=$!
> +	local max_loops=100
>  
>  	local i=0
> -	# Wait for at least first write to be recorded or 10s
> -	while [ ! -f "$tmp.aw" -a $i -le 50 ]; do i=$((i + 1)); sleep 0.2; done
> +	# Wait for at least first write to be recorded or too much time passes
> +	while [ ! -f "$tmp.aw" -a $i -le $max_loops ]; do
> +		i=$((i + 1))
> +		sleep 0.2
> +	done
>  
> -	if [[ $i -gt 50 ]]
> +	cat $tmp.aw >> $seqres.full
> +
> +	if [[ $i -gt $max_loops ]]
>  	then
> -		_fail "atomic write process took too long to start"
> +		_notrun "atomic write process took too long to start"
>  	fi
>  
>  	echo >> $seqres.full
> @@ -113,21 +119,34 @@ create_mixed_mappings() {
>  	local off=0
>  	local operations=("W" "U")
>  
> +	test $size_bytes -eq 0 && return
> +
> +	# fallocate the whole file once because preallocating single blocks
> +	# with individual xfs_io invocations is really slow and the allocator
> +	# usually gives out consecutive blocks anyway
> +	$XFS_IO_PROG -f -c "falloc 0 $size_bytes" $file
> +
> +	local cmds=()
>  	for ((i=0; i<$((size_bytes / blksz )); i++)); do
> -		index=$(($i % ${#operations[@]}))
> -		map="${operations[$index]}"
> +		if (( i % 2 == 0 )); then
> +			cmds+=(-c "pwrite -b $blksz $off $blksz")
> +		fi
> +
> +		# batch the write commands into larger xfs_io invocations to
> +		# amortize the fork overhead
> +		if [ "${#cmds[@]}" -ge 128 ]; then
> +			$XFS_IO_PROG "${cmds[@]}" "$file" >> /dev/null
> +			cmds=()
> +		fi
>  
> -		case "$map" in
> -		    "W")
> -			$XFS_IO_PROG -fc "pwrite -b $blksz $off $blksz" $file  >> /dev/null
> -			;;
> -		    "U")
> -			$XFS_IO_PROG -fc "falloc $off $blksz" $file >> /dev/null
> -			;;
> -		esac
>  		off=$((off + blksz))
>  	done
>  
> +	if [ "${#cmds[@]}" -gt 0 ]; then
> +		$XFS_IO_PROG "${cmds[@]}" "$file" >> /dev/null
> +		cmds=()
> +	fi
> +
>  	sync $file
>  }
>  
> @@ -336,9 +355,9 @@ echo >> $seqres.full
>  echo "# Populating expected data buffers" >> $seqres.full
>  populate_expected_data
>  
> -# Loop 20 times to shake out any races due to shutdown
> -for ((iter=0; iter<20; iter++))
> -do
> +# Loop to shake out any races due to shutdown
> +iter=0
> +while _soak_loop_running $TIME_FACTOR; do
>  	echo >> $seqres.full
>  	echo "------ Iteration $iter ------" >> $seqres.full
>  
> @@ -361,6 +380,8 @@ do
>  	echo >> $seqres.full
>  	echo "# Starting shutdown torn write test for append atomic writes" >> $seqres.full
>  	test_append_torn_write
> +
> +	iter=$((iter + 1))
>  done
>  
>  echo "Silence is golden"
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/7] generic/778: fix severe performance problems
  2025-11-13 11:53   ` Ojaswin Mujoo
@ 2025-11-15  2:32     ` Darrick J. Wong
  2025-11-29  8:52       ` Ojaswin Mujoo
  0 siblings, 1 reply; 30+ messages in thread
From: Darrick J. Wong @ 2025-11-15  2:32 UTC (permalink / raw)
  To: Ojaswin Mujoo; +Cc: zlang, fstests, linux-ext4

On Thu, Nov 13, 2025 at 05:23:45PM +0530, Ojaswin Mujoo wrote:
> On Mon, Nov 10, 2025 at 10:26:32AM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > This test takes 4800s to run, which is horrible.  AFAICT it starts out
> > by timing how much can be written atomically to a new file in 0.2
> > seconds, then scales up the file size by 3x.  On not very fast storage,
> 
> Hi Darrick,
> 
> So 250MB in 0.2s is like 1.2GBps which seems pretty fast. Did you mean
> "On fast storage ..." ?

No, I have even faster storage. ;)

> > this can result in file_size being set to ~250MB on a 4k fsblock
> > filesystem.  That's about 64,000 blocks.
> > 
> > The next thing this test does is try to create a file of that size
> > (250MB) of alternating written and unwritten blocks.  For some reason,
> > it sets up this file by invoking xfs_io 64,000 times to write small
> > amounts of data, which takes 3+ minutes on the author's system because
> > exec overhead is pretty high when you do that.
> 
> > 
> > As a result, one loop through the test takes almost 4 minutes.  The test
> > loops 20 times, so it runs for 80 minutes(!!) which is a really long
> > time.
> > 
> > So the first thing we do is observe that the giant slow loop is being
> > run as a single thread on an empty filesystem.  Most of the time the
> > allocator generates a mostly physically contiguous file.  We could
> > fallocate the whole file instead of fallocating one block every other
> > time through the loop.  This halves the setup time.
> > 
> > Next, we can also stuff the remaining pwrite commands into a bash array
> > and only invoke xfs_io once every 128x through the loop.  This amortizes
> > the xfs_io startup time, which reduces the test loop runtime to about 20
> > seconds.
> 
> Oh right, this is very bad. Weirdly I never noticed the test taking such
> a huge time while testing on scsi_debug and also on an enterprise SSD.

It doesn't help that xfs supports much larger awu_max than (say) ext4.

> Thanks for fixing this up though, I will start using maybe dm-delay
> while stressing the tests in the future to avoid such issues.

fork() is a bit expensive.

> > 
> > Finally, replace the 20x loop with a _soak_loop_running 5x loop because
> > 5 seems like enough.  Anyone who wants more can set TIME_FACTOR or
> > SOAK_DURATION to get more intensive testing.  On my system this cuts the
> > runtime to 75 seconds.
> 
> So about the loops, we were running a modified version of this test,
> which used non atomic writes, to confirm if we are able to catch torn
> writes this way. We noticed that it sometimes took 10+ loops to observe
> the torn write. Hence we kept iters=20. Since catching a torn write is
> critical for working of atomic writes, I think it might make sense to
> leave it at 20. If we feel this is a very high value, we can perhaps
> remove -g auto and keep -g stress -g atomicwrites so only people who
> explicitly want to stress atomic writes will run it.

In that case we ought to limit the awu_max that we feed to the test
because otherwise it starts running a lot of IO.

--D

> > 
> > Cc: <fstests@vger.kernel.org> # v2025.10.20
> > Fixes: ca954527ff9d97 ("generic: Add sudden shutdown tests for multi block atomic writes")
> > Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
> > ---
> >  tests/generic/778 |   59 ++++++++++++++++++++++++++++++++++++-----------------
> >  1 file changed, 40 insertions(+), 19 deletions(-)
> > 
> > 
> > diff --git a/tests/generic/778 b/tests/generic/778
> > index 8cb1d8d4cad45d..7cfabc3a47a521 100755
> > --- a/tests/generic/778
> > +++ b/tests/generic/778
> > @@ -42,22 +42,28 @@ atomic_write_loop() {
> >  		# Due to sudden shutdown this can produce errors so just
> >  		# redirect them to seqres.full
> >  		$XFS_IO_PROG -c "open -fsd $testfile" -c "pwrite -S 0x61 -DA -V1 -b $size $off $size" >> /dev/null 2>>$seqres.full
> > -		echo "Written to offset: $off" >> $tmp.aw
> > -		off=$((off + $size))
> > +		echo "Written to offset: $((off + size))" >> $tmp.aw
> > +		off=$((off + size))
> >  	done
> >  }
> >  
> >  start_atomic_write_and_shutdown() {
> >  	atomic_write_loop &
> >  	awloop_pid=$!
> > +	local max_loops=100
> >  
> >  	local i=0
> > -	# Wait for at least first write to be recorded or 10s
> > -	while [ ! -f "$tmp.aw" -a $i -le 50 ]; do i=$((i + 1)); sleep 0.2; done
> > +	# Wait for at least first write to be recorded or too much time passes
> > +	while [ ! -f "$tmp.aw" -a $i -le $max_loops ]; do
> > +		i=$((i + 1))
> > +		sleep 0.2
> > +	done
> >  
> > -	if [[ $i -gt 50 ]]
> > +	cat $tmp.aw >> $seqres.full
> > +
> > +	if [[ $i -gt $max_loops ]]
> >  	then
> > -		_fail "atomic write process took too long to start"
> > +		_notrun "atomic write process took too long to start"
> >  	fi
> >  
> >  	echo >> $seqres.full
> > @@ -113,21 +119,34 @@ create_mixed_mappings() {
> >  	local off=0
> >  	local operations=("W" "U")
> >  
> > +	test $size_bytes -eq 0 && return
> > +
> > +	# fallocate the whole file once because preallocating single blocks
> > +	# with individual xfs_io invocations is really slow and the allocator
> > +	# usually gives out consecutive blocks anyway
> > +	$XFS_IO_PROG -f -c "falloc 0 $size_bytes" $file
> > +
> > +	local cmds=()
> >  	for ((i=0; i<$((size_bytes / blksz )); i++)); do
> > -		index=$(($i % ${#operations[@]}))
> > -		map="${operations[$index]}"
> > +		if (( i % 2 == 0 )); then
> > +			cmds+=(-c "pwrite -b $blksz $off $blksz")
> > +		fi
> > +
> > +		# batch the write commands into larger xfs_io invocations to
> > +		# amortize the fork overhead
> > +		if [ "${#cmds[@]}" -ge 128 ]; then
> > +			$XFS_IO_PROG "${cmds[@]}" "$file" >> /dev/null
> > +			cmds=()
> > +		fi
> >  
> > -		case "$map" in
> > -		    "W")
> > -			$XFS_IO_PROG -fc "pwrite -b $blksz $off $blksz" $file  >> /dev/null
> > -			;;
> > -		    "U")
> > -			$XFS_IO_PROG -fc "falloc $off $blksz" $file >> /dev/null
> > -			;;
> > -		esac
> >  		off=$((off + blksz))
> >  	done
> >  
> > +	if [ "${#cmds[@]}" -gt 0 ]; then
> > +		$XFS_IO_PROG "${cmds[@]}" "$file" >> /dev/null
> > +		cmds=()
> > +	fi
> > +
> >  	sync $file
> >  }
> >  
> > @@ -336,9 +355,9 @@ echo >> $seqres.full
> >  echo "# Populating expected data buffers" >> $seqres.full
> >  populate_expected_data
> >  
> > -# Loop 20 times to shake out any races due to shutdown
> > -for ((iter=0; iter<20; iter++))
> > -do
> > +# Loop to shake out any races due to shutdown
> > +iter=0
> > +while _soak_loop_running $TIME_FACTOR; do
> >  	echo >> $seqres.full
> >  	echo "------ Iteration $iter ------" >> $seqres.full
> >  
> > @@ -361,6 +380,8 @@ do
> >  	echo >> $seqres.full
> >  	echo "# Starting shutdown torn write test for append atomic writes" >> $seqres.full
> >  	test_append_torn_write
> > +
> > +	iter=$((iter + 1))
> >  done
> >  
> >  echo "Silence is golden"
> > 
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/7] generic/778: fix severe performance problems
  2025-11-15  2:32     ` Darrick J. Wong
@ 2025-11-29  8:52       ` Ojaswin Mujoo
  2025-12-01 23:19         ` Darrick J. Wong
  0 siblings, 1 reply; 30+ messages in thread
From: Ojaswin Mujoo @ 2025-11-29  8:52 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: zlang, fstests, linux-ext4

On Fri, Nov 14, 2025 at 06:32:51PM -0800, Darrick J. Wong wrote:
> On Thu, Nov 13, 2025 at 05:23:45PM +0530, Ojaswin Mujoo wrote:
> > On Mon, Nov 10, 2025 at 10:26:32AM -0800, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <djwong@kernel.org>
> > > 
> > > This test takes 4800s to run, which is horrible.  AFAICT it starts out
> > > by timing how much can be written atomically to a new file in 0.2
> > > seconds, then scales up the file size by 3x.  On not very fast storage,
> > 
> > Hi Darrick,

(Sorry I missed this email somehow)

> > 
> > So 250MB in 0.2s is like 1.2GBps which seems pretty fast. Did you mean
> > "On fast storage ..." ?
> 
> No, I have even faster storage. ;)

:O

So that means on an even faster storage this problem would be even more
visible because our file size would be >250MB

> 
> > > this can result in file_size being set to ~250MB on a 4k fsblock
> > > filesystem.  That's about 64,000 blocks.
> > > 
> > > The next thing this test does is try to create a file of that size
> > > (250MB) of alternating written and unwritten blocks.  For some reason,
> > > it sets up this file by invoking xfs_io 64,000 times to write small
> > > amounts of data, which takes 3+ minutes on the author's system because
> > > exec overhead is pretty high when you do that.
> > 
> > > 
> > > As a result, one loop through the test takes almost 4 minutes.  The test
> > > loops 20 times, so it runs for 80 minutes(!!) which is a really long
> > > time.
> > > 
> > > So the first thing we do is observe that the giant slow loop is being
> > > run as a single thread on an empty filesystem.  Most of the time the
> > > allocator generates a mostly physically contiguous file.  We could
> > > fallocate the whole file instead of fallocating one block every other
> > > time through the loop.  This halves the setup time.
> > > 
> > > Next, we can also stuff the remaining pwrite commands into a bash array
> > > and only invoke xfs_io once every 128x through the loop.  This amortizes
> > > the xfs_io startup time, which reduces the test loop runtime to about 20
> > > seconds.
> > 
> > Oh right, this is very bad. Weirdly I never noticed the test taking such
> > a huge time while testing on scsi_debug and also on an enterprise SSD.
> 
> It doesn't help that xfs supports much larger awu_max than (say) ext4.

I did test on xfs as well. But yea maybe my SSD is just not fast enough.

> 
> > Thanks for fixing this up though, I will start using maybe dm-delay
> > while stressing the tests in the future to avoid such issues.
> 
> fork() is a bit expensive.
> 
> > > 
> > > Finally, replace the 20x loop with a _soak_loop_running 5x loop because
> > > 5 seems like enough.  Anyone who wants more can set TIME_FACTOR or
> > > SOAK_DURATION to get more intensive testing.  On my system this cuts the
> > > runtime to 75 seconds.
> > 
> > So about the loops, we were running a modified version of this test,
> > which used non atomic writes, to confirm if we are able to catch torn
> > writes this way. We noticed that it sometimes took 10+ loops to observe
> > the torn write. Hence we kept iters=20. Since catching a torn write is
> > critical for working of atomic writes, I think it might make sense to
> > leave it at 20. If we feel this is a very high value, we can perhaps
> > remove -g auto and keep -g stress -g atomicwrites so only people who
> > explicitly want to stress atomic writes will run it.
> 
> In that case we ought to limit the awu_max that we feed to the test
> because otherwise it starts running a lot of IO.

Yes I think that makes sense. Right now we get awu_max of 4M on xfs that
means we always end up only testing software atomic writes.  Maybe we
can instead cap awu_max at 64K or something. This way, we can test both
hw atomic writes (when device supports it) and sw atomic writes (when it
doesn't)

Regards,
ojaswin

> 
> --D
> 
> > > 
> > > Cc: <fstests@vger.kernel.org> # v2025.10.20
> > > Fixes: ca954527ff9d97 ("generic: Add sudden shutdown tests for multi block atomic writes")
> > > Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
> > > ---
> > >  tests/generic/778 |   59 ++++++++++++++++++++++++++++++++++++-----------------
> > >  1 file changed, 40 insertions(+), 19 deletions(-)
> > > 
> > > 
> > > diff --git a/tests/generic/778 b/tests/generic/778
> > > index 8cb1d8d4cad45d..7cfabc3a47a521 100755
> > > --- a/tests/generic/778
> > > +++ b/tests/generic/778
> > > @@ -42,22 +42,28 @@ atomic_write_loop() {
> > >  		# Due to sudden shutdown this can produce errors so just
> > >  		# redirect them to seqres.full
> > >  		$XFS_IO_PROG -c "open -fsd $testfile" -c "pwrite -S 0x61 -DA -V1 -b $size $off $size" >> /dev/null 2>>$seqres.full
> > > -		echo "Written to offset: $off" >> $tmp.aw
> > > -		off=$((off + $size))
> > > +		echo "Written to offset: $((off + size))" >> $tmp.aw
> > > +		off=$((off + size))
> > >  	done
> > >  }
> > >  
> > >  start_atomic_write_and_shutdown() {
> > >  	atomic_write_loop &
> > >  	awloop_pid=$!
> > > +	local max_loops=100
> > >  
> > >  	local i=0
> > > -	# Wait for at least first write to be recorded or 10s
> > > -	while [ ! -f "$tmp.aw" -a $i -le 50 ]; do i=$((i + 1)); sleep 0.2; done
> > > +	# Wait for at least first write to be recorded or too much time passes
> > > +	while [ ! -f "$tmp.aw" -a $i -le $max_loops ]; do
> > > +		i=$((i + 1))
> > > +		sleep 0.2
> > > +	done
> > >  
> > > -	if [[ $i -gt 50 ]]
> > > +	cat $tmp.aw >> $seqres.full
> > > +
> > > +	if [[ $i -gt $max_loops ]]
> > >  	then
> > > -		_fail "atomic write process took too long to start"
> > > +		_notrun "atomic write process took too long to start"
> > >  	fi
> > >  
> > >  	echo >> $seqres.full
> > > @@ -113,21 +119,34 @@ create_mixed_mappings() {
> > >  	local off=0
> > >  	local operations=("W" "U")
> > >  
> > > +	test $size_bytes -eq 0 && return
> > > +
> > > +	# fallocate the whole file once because preallocating single blocks
> > > +	# with individual xfs_io invocations is really slow and the allocator
> > > +	# usually gives out consecutive blocks anyway
> > > +	$XFS_IO_PROG -f -c "falloc 0 $size_bytes" $file
> > > +
> > > +	local cmds=()
> > >  	for ((i=0; i<$((size_bytes / blksz )); i++)); do
> > > -		index=$(($i % ${#operations[@]}))
> > > -		map="${operations[$index]}"
> > > +		if (( i % 2 == 0 )); then
> > > +			cmds+=(-c "pwrite -b $blksz $off $blksz")
> > > +		fi
> > > +
> > > +		# batch the write commands into larger xfs_io invocations to
> > > +		# amortize the fork overhead
> > > +		if [ "${#cmds[@]}" -ge 128 ]; then
> > > +			$XFS_IO_PROG "${cmds[@]}" "$file" >> /dev/null
> > > +			cmds=()
> > > +		fi
> > >  
> > > -		case "$map" in
> > > -		    "W")
> > > -			$XFS_IO_PROG -fc "pwrite -b $blksz $off $blksz" $file  >> /dev/null
> > > -			;;
> > > -		    "U")
> > > -			$XFS_IO_PROG -fc "falloc $off $blksz" $file >> /dev/null
> > > -			;;
> > > -		esac
> > >  		off=$((off + blksz))
> > >  	done
> > >  
> > > +	if [ "${#cmds[@]}" -gt 0 ]; then
> > > +		$XFS_IO_PROG "${cmds[@]}" "$file" >> /dev/null
> > > +		cmds=()
> > > +	fi
> > > +
> > >  	sync $file
> > >  }
> > >  
> > > @@ -336,9 +355,9 @@ echo >> $seqres.full
> > >  echo "# Populating expected data buffers" >> $seqres.full
> > >  populate_expected_data
> > >  
> > > -# Loop 20 times to shake out any races due to shutdown
> > > -for ((iter=0; iter<20; iter++))
> > > -do
> > > +# Loop to shake out any races due to shutdown
> > > +iter=0
> > > +while _soak_loop_running $TIME_FACTOR; do
> > >  	echo >> $seqres.full
> > >  	echo "------ Iteration $iter ------" >> $seqres.full
> > >  
> > > @@ -361,6 +380,8 @@ do
> > >  	echo >> $seqres.full
> > >  	echo "# Starting shutdown torn write test for append atomic writes" >> $seqres.full
> > >  	test_append_torn_write
> > > +
> > > +	iter=$((iter + 1))
> > >  done
> > >  
> > >  echo "Silence is golden"
> > > 
> > 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/7] generic/778: fix severe performance problems
  2025-11-29  8:52       ` Ojaswin Mujoo
@ 2025-12-01 23:19         ` Darrick J. Wong
  2025-12-03  6:19           ` Ojaswin Mujoo
  0 siblings, 1 reply; 30+ messages in thread
From: Darrick J. Wong @ 2025-12-01 23:19 UTC (permalink / raw)
  To: Ojaswin Mujoo; +Cc: zlang, fstests, linux-ext4

On Sat, Nov 29, 2025 at 02:22:11PM +0530, Ojaswin Mujoo wrote:
> On Fri, Nov 14, 2025 at 06:32:51PM -0800, Darrick J. Wong wrote:
> > On Thu, Nov 13, 2025 at 05:23:45PM +0530, Ojaswin Mujoo wrote:
> > > On Mon, Nov 10, 2025 at 10:26:32AM -0800, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <djwong@kernel.org>
> > > > 
> > > > This test takes 4800s to run, which is horrible.  AFAICT it starts out
> > > > by timing how much can be written atomically to a new file in 0.2
> > > > seconds, then scales up the file size by 3x.  On not very fast storage,
> > > 
> > > Hi Darrick,
> 
> (Sorry I missed this email somehow)
> 
> > > 
> > > So 250MB in 0.2s is like 1.2GBps which seems pretty fast. Did you mean
> > > "On fast storage ..." ?
> > 
> > No, I have even faster storage. ;)
> 
> :O
> 
> So that means on an even faster storage this problem would be even more
> visible because our file size would be >250MB
> 
> > 
> > > > this can result in file_size being set to ~250MB on a 4k fsblock
> > > > filesystem.  That's about 64,000 blocks.
> > > > 
> > > > The next thing this test does is try to create a file of that size
> > > > (250MB) of alternating written and unwritten blocks.  For some reason,
> > > > it sets up this file by invoking xfs_io 64,000 times to write small
> > > > amounts of data, which takes 3+ minutes on the author's system because
> > > > exec overhead is pretty high when you do that.
> > > 
> > > > 
> > > > As a result, one loop through the test takes almost 4 minutes.  The test
> > > > loops 20 times, so it runs for 80 minutes(!!) which is a really long
> > > > time.
> > > > 
> > > > So the first thing we do is observe that the giant slow loop is being
> > > > run as a single thread on an empty filesystem.  Most of the time the
> > > > allocator generates a mostly physically contiguous file.  We could
> > > > fallocate the whole file instead of fallocating one block every other
> > > > time through the loop.  This halves the setup time.
> > > > 
> > > > Next, we can also stuff the remaining pwrite commands into a bash array
> > > > and only invoke xfs_io once every 128x through the loop.  This amortizes
> > > > the xfs_io startup time, which reduces the test loop runtime to about 20
> > > > seconds.
> > > 
> > > Oh right, this is very bad. Weirdly I never noticed the test taking such
> > > a huge time while testing on scsi_debug and also on an enterprise SSD.
> > 
> > It doesn't help that xfs supports much larger awu_max than (say) ext4.
> 
> I did test on xfs as well. But yea maybe my SSD is just not fast enough.
> 
> > 
> > > Thanks for fixing this up though, I will start using maybe dm-delay
> > > while stressing the tests in the future to avoid such issues.
> > 
> > fork() is a bit expensive.
> > 
> > > > 
> > > > Finally, replace the 20x loop with a _soak_loop_running 5x loop because
> > > > 5 seems like enough.  Anyone who wants more can set TIME_FACTOR or
> > > > SOAK_DURATION to get more intensive testing.  On my system this cuts the
> > > > runtime to 75 seconds.
> > > 
> > > So about the loops, we were running a modified version of this test,
> > > which used non atomic writes, to confirm if we are able to catch torn
> > > writes this way. We noticed that it sometimes took 10+ loops to observe
> > > the torn write. Hence we kept iters=20. Since catching a torn write is
> > > critical for working of atomic writes, I think it might make sense to
> > > leave it at 20. If we feel this is a very high value, we can perhaps
> > > remove -g auto and keep -g stress -g atomicwrites so only people who
> > > explicitly want to stress atomic writes will run it.
> > 
> > In that case we ought to limit the awu_max that we feed to the test
> > because otherwise it starts running a lot of IO.
> 
> Yes I think that makes sense. Right now we get awu_max of 4M on xfs that
> means we always end up only testing software atomic writes.  Maybe we
> can instead cap awu_max at 64K or something. This way, we can test both
> hw atomic writes (when device supports it) and sw atomic writes (when it
> doesn't)

Yeah, capping the testing block size sounds like a good idea.  What do
you think about using min(awu_max_opt * 2, awu_max) ?

--D

> Regards,
> ojaswin
> 
> > 
> > --D
> > 
> > > > 
> > > > Cc: <fstests@vger.kernel.org> # v2025.10.20
> > > > Fixes: ca954527ff9d97 ("generic: Add sudden shutdown tests for multi block atomic writes")
> > > > Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
> > > > ---
> > > >  tests/generic/778 |   59 ++++++++++++++++++++++++++++++++++++-----------------
> > > >  1 file changed, 40 insertions(+), 19 deletions(-)
> > > > 
> > > > 
> > > > diff --git a/tests/generic/778 b/tests/generic/778
> > > > index 8cb1d8d4cad45d..7cfabc3a47a521 100755
> > > > --- a/tests/generic/778
> > > > +++ b/tests/generic/778
> > > > @@ -42,22 +42,28 @@ atomic_write_loop() {
> > > >  		# Due to sudden shutdown this can produce errors so just
> > > >  		# redirect them to seqres.full
> > > >  		$XFS_IO_PROG -c "open -fsd $testfile" -c "pwrite -S 0x61 -DA -V1 -b $size $off $size" >> /dev/null 2>>$seqres.full
> > > > -		echo "Written to offset: $off" >> $tmp.aw
> > > > -		off=$((off + $size))
> > > > +		echo "Written to offset: $((off + size))" >> $tmp.aw
> > > > +		off=$((off + size))
> > > >  	done
> > > >  }
> > > >  
> > > >  start_atomic_write_and_shutdown() {
> > > >  	atomic_write_loop &
> > > >  	awloop_pid=$!
> > > > +	local max_loops=100
> > > >  
> > > >  	local i=0
> > > > -	# Wait for at least first write to be recorded or 10s
> > > > -	while [ ! -f "$tmp.aw" -a $i -le 50 ]; do i=$((i + 1)); sleep 0.2; done
> > > > +	# Wait for at least first write to be recorded or too much time passes
> > > > +	while [ ! -f "$tmp.aw" -a $i -le $max_loops ]; do
> > > > +		i=$((i + 1))
> > > > +		sleep 0.2
> > > > +	done
> > > >  
> > > > -	if [[ $i -gt 50 ]]
> > > > +	cat $tmp.aw >> $seqres.full
> > > > +
> > > > +	if [[ $i -gt $max_loops ]]
> > > >  	then
> > > > -		_fail "atomic write process took too long to start"
> > > > +		_notrun "atomic write process took too long to start"
> > > >  	fi
> > > >  
> > > >  	echo >> $seqres.full
> > > > @@ -113,21 +119,34 @@ create_mixed_mappings() {
> > > >  	local off=0
> > > >  	local operations=("W" "U")
> > > >  
> > > > +	test $size_bytes -eq 0 && return
> > > > +
> > > > +	# fallocate the whole file once because preallocating single blocks
> > > > +	# with individual xfs_io invocations is really slow and the allocator
> > > > +	# usually gives out consecutive blocks anyway
> > > > +	$XFS_IO_PROG -f -c "falloc 0 $size_bytes" $file
> > > > +
> > > > +	local cmds=()
> > > >  	for ((i=0; i<$((size_bytes / blksz )); i++)); do
> > > > -		index=$(($i % ${#operations[@]}))
> > > > -		map="${operations[$index]}"
> > > > +		if (( i % 2 == 0 )); then
> > > > +			cmds+=(-c "pwrite -b $blksz $off $blksz")
> > > > +		fi
> > > > +
> > > > +		# batch the write commands into larger xfs_io invocations to
> > > > +		# amortize the fork overhead
> > > > +		if [ "${#cmds[@]}" -ge 128 ]; then
> > > > +			$XFS_IO_PROG "${cmds[@]}" "$file" >> /dev/null
> > > > +			cmds=()
> > > > +		fi
> > > >  
> > > > -		case "$map" in
> > > > -		    "W")
> > > > -			$XFS_IO_PROG -fc "pwrite -b $blksz $off $blksz" $file  >> /dev/null
> > > > -			;;
> > > > -		    "U")
> > > > -			$XFS_IO_PROG -fc "falloc $off $blksz" $file >> /dev/null
> > > > -			;;
> > > > -		esac
> > > >  		off=$((off + blksz))
> > > >  	done
> > > >  
> > > > +	if [ "${#cmds[@]}" -gt 0 ]; then
> > > > +		$XFS_IO_PROG "${cmds[@]}" "$file" >> /dev/null
> > > > +		cmds=()
> > > > +	fi
> > > > +
> > > >  	sync $file
> > > >  }
> > > >  
> > > > @@ -336,9 +355,9 @@ echo >> $seqres.full
> > > >  echo "# Populating expected data buffers" >> $seqres.full
> > > >  populate_expected_data
> > > >  
> > > > -# Loop 20 times to shake out any races due to shutdown
> > > > -for ((iter=0; iter<20; iter++))
> > > > -do
> > > > +# Loop to shake out any races due to shutdown
> > > > +iter=0
> > > > +while _soak_loop_running $TIME_FACTOR; do
> > > >  	echo >> $seqres.full
> > > >  	echo "------ Iteration $iter ------" >> $seqres.full
> > > >  
> > > > @@ -361,6 +380,8 @@ do
> > > >  	echo >> $seqres.full
> > > >  	echo "# Starting shutdown torn write test for append atomic writes" >> $seqres.full
> > > >  	test_append_torn_write
> > > > +
> > > > +	iter=$((iter + 1))
> > > >  done
> > > >  
> > > >  echo "Silence is golden"
> > > > 
> > > 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/7] generic/778: fix severe performance problems
  2025-12-01 23:19         ` Darrick J. Wong
@ 2025-12-03  6:19           ` Ojaswin Mujoo
  2025-12-04 17:18             ` Darrick J. Wong
  0 siblings, 1 reply; 30+ messages in thread
From: Ojaswin Mujoo @ 2025-12-03  6:19 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: zlang, fstests, linux-ext4

On Mon, Dec 01, 2025 at 03:19:08PM -0800, Darrick J. Wong wrote:
> On Sat, Nov 29, 2025 at 02:22:11PM +0530, Ojaswin Mujoo wrote:
> > On Fri, Nov 14, 2025 at 06:32:51PM -0800, Darrick J. Wong wrote:
> > > On Thu, Nov 13, 2025 at 05:23:45PM +0530, Ojaswin Mujoo wrote:
> > > > On Mon, Nov 10, 2025 at 10:26:32AM -0800, Darrick J. Wong wrote:
> > > > > From: Darrick J. Wong <djwong@kernel.org>
> > > > > 
> > > > > This test takes 4800s to run, which is horrible.  AFAICT it starts out
> > > > > by timing how much can be written atomically to a new file in 0.2
> > > > > seconds, then scales up the file size by 3x.  On not very fast storage,
> > > > 
> > > > Hi Darrick,
> > 
> > (Sorry I missed this email somehow)
> > 
> > > > 
> > > > So 250MB in 0.2s is like 1.2GBps which seems pretty fast. Did you mean
> > > > "On fast storage ..." ?
> > > 
> > > No, I have even faster storage. ;)
> > 
> > :O
> > 
> > So that means on an even faster storage this problem would be even more
> > visible because our file size would be >250MB
> > 
> > > 
> > > > > this can result in file_size being set to ~250MB on a 4k fsblock
> > > > > filesystem.  That's about 64,000 blocks.
> > > > > 
> > > > > The next thing this test does is try to create a file of that size
> > > > > (250MB) of alternating written and unwritten blocks.  For some reason,
> > > > > it sets up this file by invoking xfs_io 64,000 times to write small
> > > > > amounts of data, which takes 3+ minutes on the author's system because
> > > > > exec overhead is pretty high when you do that.
> > > > 
> > > > > 
> > > > > As a result, one loop through the test takes almost 4 minutes.  The test
> > > > > loops 20 times, so it runs for 80 minutes(!!) which is a really long
> > > > > time.
> > > > > 
> > > > > So the first thing we do is observe that the giant slow loop is being
> > > > > run as a single thread on an empty filesystem.  Most of the time the
> > > > > allocator generates a mostly physically contiguous file.  We could
> > > > > fallocate the whole file instead of fallocating one block every other
> > > > > time through the loop.  This halves the setup time.
> > > > > 
> > > > > Next, we can also stuff the remaining pwrite commands into a bash array
> > > > > and only invoke xfs_io once every 128x through the loop.  This amortizes
> > > > > the xfs_io startup time, which reduces the test loop runtime to about 20
> > > > > seconds.
> > > > 
> > > > Oh right, this is very bad. Weirdly I never noticed the test taking such
> > > > a huge time while testing on scsi_debug and also on an enterprise SSD.
> > > 
> > > It doesn't help that xfs supports much larger awu_max than (say) ext4.
> > 
> > I did test on xfs as well. But yea maybe my SSD is just not fast enough.
> > 
> > > 
> > > > Thanks for fixing this up though, I will start using maybe dm-delay
> > > > while stressing the tests in the future to avoid such issues.
> > > 
> > > fork() is a bit expensive.
> > > 
> > > > > 
> > > > > Finally, replace the 20x loop with a _soak_loop_running 5x loop because
> > > > > 5 seems like enough.  Anyone who wants more can set TIME_FACTOR or
> > > > > SOAK_DURATION to get more intensive testing.  On my system this cuts the
> > > > > runtime to 75 seconds.
> > > > 
> > > > So about the loops, we were running a modified version of this test,
> > > > which used non atomic writes, to confirm if we are able to catch torn
> > > > writes this way. We noticed that it sometimes took 10+ loops to observe
> > > > the torn write. Hence we kept iters=20. Since catching a torn write is
> > > > critical for working of atomic writes, I think it might make sense to
> > > > leave it at 20. If we feel this is a very high value, we can perhaps
> > > > remove -g auto and keep -g stress -g atomicwrites so only people who
> > > > explicitly want to stress atomic writes will run it.
> > > 
> > > In that case we ought to limit the awu_max that we feed to the test
> > > because otherwise it starts running a lot of IO.
> > 
> > Yes I think that makes sense. Right now we get awu_max of 4M on xfs that
> > means we always end up only testing software atomic writes.  Maybe we
> > can instead cap awu_max at 64K or something. This way, we can test both
> > hw atomic writes (when device supports it) and sw atomic writes (when it
> > doesn't)
> 
> Yeah, capping the testing block size sounds like a good idea.  What do
> you think about using min(awu_max_opt * 2, awu_max) ?

Im thinking that now that we are modifying this, maybe we can improve
coverage by also testing hardware atomic write paths. Right now the
test will mostly be testing SW fallback on XFS because we use awu_max
(usually 4M).

Maybe something like min(awu_max_opt, awu_max) gets us coverage of both
paths?

Also looking at xfs_get_atomic_write_max_opt() there is the caveat that 
awu_max_opt is returned as 0 if awu_max <= blocksize (should be rare
with software atomic writes but yeah) and ext4 always returns it as 0 so
we will need to handle that.

How about (psuedocode):

if (awu_max_opt == 0)
		/* software only, limit to 128k */
    awu_max = min(statx->awu_max, 128K)
else
    awu_max = min(statx->awu_max_opt, statx->awu_max)


Regards,
ojaswin
> 
> --D
> 
> > Regards,
> > ojaswin
> > 
> > > 
> > > --D
> > > 
> > > > > 
> > > > > Cc: <fstests@vger.kernel.org> # v2025.10.20
> > > > > Fixes: ca954527ff9d97 ("generic: Add sudden shutdown tests for multi block atomic writes")
> > > > > Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
> > > > > ---
> > > > >  tests/generic/778 |   59 ++++++++++++++++++++++++++++++++++++-----------------
> > > > >  1 file changed, 40 insertions(+), 19 deletions(-)
> > > > > 
> > > > > 
> > > > > diff --git a/tests/generic/778 b/tests/generic/778
> > > > > index 8cb1d8d4cad45d..7cfabc3a47a521 100755
> > > > > --- a/tests/generic/778
> > > > > +++ b/tests/generic/778
> > > > > @@ -42,22 +42,28 @@ atomic_write_loop() {
> > > > >  		# Due to sudden shutdown this can produce errors so just
> > > > >  		# redirect them to seqres.full
> > > > >  		$XFS_IO_PROG -c "open -fsd $testfile" -c "pwrite -S 0x61 -DA -V1 -b $size $off $size" >> /dev/null 2>>$seqres.full
> > > > > -		echo "Written to offset: $off" >> $tmp.aw
> > > > > -		off=$((off + $size))
> > > > > +		echo "Written to offset: $((off + size))" >> $tmp.aw
> > > > > +		off=$((off + size))
> > > > >  	done
> > > > >  }
> > > > >  
> > > > >  start_atomic_write_and_shutdown() {
> > > > >  	atomic_write_loop &
> > > > >  	awloop_pid=$!
> > > > > +	local max_loops=100
> > > > >  
> > > > >  	local i=0
> > > > > -	# Wait for at least first write to be recorded or 10s
> > > > > -	while [ ! -f "$tmp.aw" -a $i -le 50 ]; do i=$((i + 1)); sleep 0.2; done
> > > > > +	# Wait for at least first write to be recorded or too much time passes
> > > > > +	while [ ! -f "$tmp.aw" -a $i -le $max_loops ]; do
> > > > > +		i=$((i + 1))
> > > > > +		sleep 0.2
> > > > > +	done
> > > > >  
> > > > > -	if [[ $i -gt 50 ]]
> > > > > +	cat $tmp.aw >> $seqres.full
> > > > > +
> > > > > +	if [[ $i -gt $max_loops ]]
> > > > >  	then
> > > > > -		_fail "atomic write process took too long to start"
> > > > > +		_notrun "atomic write process took too long to start"
> > > > >  	fi
> > > > >  
> > > > >  	echo >> $seqres.full
> > > > > @@ -113,21 +119,34 @@ create_mixed_mappings() {
> > > > >  	local off=0
> > > > >  	local operations=("W" "U")
> > > > >  
> > > > > +	test $size_bytes -eq 0 && return
> > > > > +
> > > > > +	# fallocate the whole file once because preallocating single blocks
> > > > > +	# with individual xfs_io invocations is really slow and the allocator
> > > > > +	# usually gives out consecutive blocks anyway
> > > > > +	$XFS_IO_PROG -f -c "falloc 0 $size_bytes" $file
> > > > > +
> > > > > +	local cmds=()
> > > > >  	for ((i=0; i<$((size_bytes / blksz )); i++)); do
> > > > > -		index=$(($i % ${#operations[@]}))
> > > > > -		map="${operations[$index]}"
> > > > > +		if (( i % 2 == 0 )); then
> > > > > +			cmds+=(-c "pwrite -b $blksz $off $blksz")
> > > > > +		fi
> > > > > +
> > > > > +		# batch the write commands into larger xfs_io invocations to
> > > > > +		# amortize the fork overhead
> > > > > +		if [ "${#cmds[@]}" -ge 128 ]; then
> > > > > +			$XFS_IO_PROG "${cmds[@]}" "$file" >> /dev/null
> > > > > +			cmds=()
> > > > > +		fi
> > > > >  
> > > > > -		case "$map" in
> > > > > -		    "W")
> > > > > -			$XFS_IO_PROG -fc "pwrite -b $blksz $off $blksz" $file  >> /dev/null
> > > > > -			;;
> > > > > -		    "U")
> > > > > -			$XFS_IO_PROG -fc "falloc $off $blksz" $file >> /dev/null
> > > > > -			;;
> > > > > -		esac
> > > > >  		off=$((off + blksz))
> > > > >  	done
> > > > >  
> > > > > +	if [ "${#cmds[@]}" -gt 0 ]; then
> > > > > +		$XFS_IO_PROG "${cmds[@]}" "$file" >> /dev/null
> > > > > +		cmds=()
> > > > > +	fi
> > > > > +
> > > > >  	sync $file
> > > > >  }
> > > > >  
> > > > > @@ -336,9 +355,9 @@ echo >> $seqres.full
> > > > >  echo "# Populating expected data buffers" >> $seqres.full
> > > > >  populate_expected_data
> > > > >  
> > > > > -# Loop 20 times to shake out any races due to shutdown
> > > > > -for ((iter=0; iter<20; iter++))
> > > > > -do
> > > > > +# Loop to shake out any races due to shutdown
> > > > > +iter=0
> > > > > +while _soak_loop_running $TIME_FACTOR; do
> > > > >  	echo >> $seqres.full
> > > > >  	echo "------ Iteration $iter ------" >> $seqres.full
> > > > >  
> > > > > @@ -361,6 +380,8 @@ do
> > > > >  	echo >> $seqres.full
> > > > >  	echo "# Starting shutdown torn write test for append atomic writes" >> $seqres.full
> > > > >  	test_append_torn_write
> > > > > +
> > > > > +	iter=$((iter + 1))
> > > > >  done
> > > > >  
> > > > >  echo "Silence is golden"
> > > > > 
> > > > 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/7] generic/778: fix severe performance problems
  2025-12-03  6:19           ` Ojaswin Mujoo
@ 2025-12-04 17:18             ` Darrick J. Wong
  0 siblings, 0 replies; 30+ messages in thread
From: Darrick J. Wong @ 2025-12-04 17:18 UTC (permalink / raw)
  To: Ojaswin Mujoo; +Cc: zlang, fstests, linux-ext4

On Wed, Dec 03, 2025 at 11:49:48AM +0530, Ojaswin Mujoo wrote:
> On Mon, Dec 01, 2025 at 03:19:08PM -0800, Darrick J. Wong wrote:
> > On Sat, Nov 29, 2025 at 02:22:11PM +0530, Ojaswin Mujoo wrote:
> > > On Fri, Nov 14, 2025 at 06:32:51PM -0800, Darrick J. Wong wrote:
> > > > On Thu, Nov 13, 2025 at 05:23:45PM +0530, Ojaswin Mujoo wrote:
> > > > > On Mon, Nov 10, 2025 at 10:26:32AM -0800, Darrick J. Wong wrote:
> > > > > > From: Darrick J. Wong <djwong@kernel.org>
> > > > > > 
> > > > > > This test takes 4800s to run, which is horrible.  AFAICT it starts out
> > > > > > by timing how much can be written atomically to a new file in 0.2
> > > > > > seconds, then scales up the file size by 3x.  On not very fast storage,
> > > > > 
> > > > > Hi Darrick,
> > > 
> > > (Sorry I missed this email somehow)
> > > 
> > > > > 
> > > > > So 250MB in 0.2s is like 1.2GBps which seems pretty fast. Did you mean
> > > > > "On fast storage ..." ?
> > > > 
> > > > No, I have even faster storage. ;)
> > > 
> > > :O
> > > 
> > > So that means on an even faster storage this problem would be even more
> > > visible because our file size would be >250MB
> > > 
> > > > 
> > > > > > this can result in file_size being set to ~250MB on a 4k fsblock
> > > > > > filesystem.  That's about 64,000 blocks.
> > > > > > 
> > > > > > The next thing this test does is try to create a file of that size
> > > > > > (250MB) of alternating written and unwritten blocks.  For some reason,
> > > > > > it sets up this file by invoking xfs_io 64,000 times to write small
> > > > > > amounts of data, which takes 3+ minutes on the author's system because
> > > > > > exec overhead is pretty high when you do that.
> > > > > 
> > > > > > 
> > > > > > As a result, one loop through the test takes almost 4 minutes.  The test
> > > > > > loops 20 times, so it runs for 80 minutes(!!) which is a really long
> > > > > > time.
> > > > > > 
> > > > > > So the first thing we do is observe that the giant slow loop is being
> > > > > > run as a single thread on an empty filesystem.  Most of the time the
> > > > > > allocator generates a mostly physically contiguous file.  We could
> > > > > > fallocate the whole file instead of fallocating one block every other
> > > > > > time through the loop.  This halves the setup time.
> > > > > > 
> > > > > > Next, we can also stuff the remaining pwrite commands into a bash array
> > > > > > and only invoke xfs_io once every 128x through the loop.  This amortizes
> > > > > > the xfs_io startup time, which reduces the test loop runtime to about 20
> > > > > > seconds.
> > > > > 
> > > > > Oh right, this is very bad. Weirdly I never noticed the test taking such
> > > > > a huge time while testing on scsi_debug and also on an enterprise SSD.
> > > > 
> > > > It doesn't help that xfs supports much larger awu_max than (say) ext4.
> > > 
> > > I did test on xfs as well. But yea maybe my SSD is just not fast enough.
> > > 
> > > > 
> > > > > Thanks for fixing this up though, I will start using maybe dm-delay
> > > > > while stressing the tests in the future to avoid such issues.
> > > > 
> > > > fork() is a bit expensive.
> > > > 
> > > > > > 
> > > > > > Finally, replace the 20x loop with a _soak_loop_running 5x loop because
> > > > > > 5 seems like enough.  Anyone who wants more can set TIME_FACTOR or
> > > > > > SOAK_DURATION to get more intensive testing.  On my system this cuts the
> > > > > > runtime to 75 seconds.
> > > > > 
> > > > > So about the loops, we were running a modified version of this test,
> > > > > which used non atomic writes, to confirm if we are able to catch torn
> > > > > writes this way. We noticed that it sometimes took 10+ loops to observe
> > > > > the torn write. Hence we kept iters=20. Since catching a torn write is
> > > > > critical for working of atomic writes, I think it might make sense to
> > > > > leave it at 20. If we feel this is a very high value, we can perhaps
> > > > > remove -g auto and keep -g stress -g atomicwrites so only people who
> > > > > explicitly want to stress atomic writes will run it.
> > > > 
> > > > In that case we ought to limit the awu_max that we feed to the test
> > > > because otherwise it starts running a lot of IO.
> > > 
> > > Yes I think that makes sense. Right now we get awu_max of 4M on xfs that
> > > means we always end up only testing software atomic writes.  Maybe we
> > > can instead cap awu_max at 64K or something. This way, we can test both
> > > hw atomic writes (when device supports it) and sw atomic writes (when it
> > > doesn't)
> > 
> > Yeah, capping the testing block size sounds like a good idea.  What do
> > you think about using min(awu_max_opt * 2, awu_max) ?
> 
> Im thinking that now that we are modifying this, maybe we can improve
> coverage by also testing hardware atomic write paths. Right now the
> test will mostly be testing SW fallback on XFS because we use awu_max
> (usually 4M).
> 
> Maybe something like min(awu_max_opt, awu_max) gets us coverage of both
> paths?
> 
> Also looking at xfs_get_atomic_write_max_opt() there is the caveat that 
> awu_max_opt is returned as 0 if awu_max <= blocksize (should be rare
> with software atomic writes but yeah) and ext4 always returns it as 0 so
> we will need to handle that.
> 
> How about (psuedocode):
> 
> if (awu_max_opt == 0)
> 		/* software only, limit to 128k */
>     awu_max = min(statx->awu_max, 128K)
> else
>     awu_max = min(statx->awu_max_opt, statx->awu_max)

I think that's worth trying!

--D

> 
> Regards,
> ojaswin
> > 
> > --D
> > 
> > > Regards,
> > > ojaswin
> > > 
> > > > 
> > > > --D
> > > > 
> > > > > > 
> > > > > > Cc: <fstests@vger.kernel.org> # v2025.10.20
> > > > > > Fixes: ca954527ff9d97 ("generic: Add sudden shutdown tests for multi block atomic writes")
> > > > > > Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
> > > > > > ---
> > > > > >  tests/generic/778 |   59 ++++++++++++++++++++++++++++++++++++-----------------
> > > > > >  1 file changed, 40 insertions(+), 19 deletions(-)
> > > > > > 
> > > > > > 
> > > > > > diff --git a/tests/generic/778 b/tests/generic/778
> > > > > > index 8cb1d8d4cad45d..7cfabc3a47a521 100755
> > > > > > --- a/tests/generic/778
> > > > > > +++ b/tests/generic/778
> > > > > > @@ -42,22 +42,28 @@ atomic_write_loop() {
> > > > > >  		# Due to sudden shutdown this can produce errors so just
> > > > > >  		# redirect them to seqres.full
> > > > > >  		$XFS_IO_PROG -c "open -fsd $testfile" -c "pwrite -S 0x61 -DA -V1 -b $size $off $size" >> /dev/null 2>>$seqres.full
> > > > > > -		echo "Written to offset: $off" >> $tmp.aw
> > > > > > -		off=$((off + $size))
> > > > > > +		echo "Written to offset: $((off + size))" >> $tmp.aw
> > > > > > +		off=$((off + size))
> > > > > >  	done
> > > > > >  }
> > > > > >  
> > > > > >  start_atomic_write_and_shutdown() {
> > > > > >  	atomic_write_loop &
> > > > > >  	awloop_pid=$!
> > > > > > +	local max_loops=100
> > > > > >  
> > > > > >  	local i=0
> > > > > > -	# Wait for at least first write to be recorded or 10s
> > > > > > -	while [ ! -f "$tmp.aw" -a $i -le 50 ]; do i=$((i + 1)); sleep 0.2; done
> > > > > > +	# Wait for at least first write to be recorded or too much time passes
> > > > > > +	while [ ! -f "$tmp.aw" -a $i -le $max_loops ]; do
> > > > > > +		i=$((i + 1))
> > > > > > +		sleep 0.2
> > > > > > +	done
> > > > > >  
> > > > > > -	if [[ $i -gt 50 ]]
> > > > > > +	cat $tmp.aw >> $seqres.full
> > > > > > +
> > > > > > +	if [[ $i -gt $max_loops ]]
> > > > > >  	then
> > > > > > -		_fail "atomic write process took too long to start"
> > > > > > +		_notrun "atomic write process took too long to start"
> > > > > >  	fi
> > > > > >  
> > > > > >  	echo >> $seqres.full
> > > > > > @@ -113,21 +119,34 @@ create_mixed_mappings() {
> > > > > >  	local off=0
> > > > > >  	local operations=("W" "U")
> > > > > >  
> > > > > > +	test $size_bytes -eq 0 && return
> > > > > > +
> > > > > > +	# fallocate the whole file once because preallocating single blocks
> > > > > > +	# with individual xfs_io invocations is really slow and the allocator
> > > > > > +	# usually gives out consecutive blocks anyway
> > > > > > +	$XFS_IO_PROG -f -c "falloc 0 $size_bytes" $file
> > > > > > +
> > > > > > +	local cmds=()
> > > > > >  	for ((i=0; i<$((size_bytes / blksz )); i++)); do
> > > > > > -		index=$(($i % ${#operations[@]}))
> > > > > > -		map="${operations[$index]}"
> > > > > > +		if (( i % 2 == 0 )); then
> > > > > > +			cmds+=(-c "pwrite -b $blksz $off $blksz")
> > > > > > +		fi
> > > > > > +
> > > > > > +		# batch the write commands into larger xfs_io invocations to
> > > > > > +		# amortize the fork overhead
> > > > > > +		if [ "${#cmds[@]}" -ge 128 ]; then
> > > > > > +			$XFS_IO_PROG "${cmds[@]}" "$file" >> /dev/null
> > > > > > +			cmds=()
> > > > > > +		fi
> > > > > >  
> > > > > > -		case "$map" in
> > > > > > -		    "W")
> > > > > > -			$XFS_IO_PROG -fc "pwrite -b $blksz $off $blksz" $file  >> /dev/null
> > > > > > -			;;
> > > > > > -		    "U")
> > > > > > -			$XFS_IO_PROG -fc "falloc $off $blksz" $file >> /dev/null
> > > > > > -			;;
> > > > > > -		esac
> > > > > >  		off=$((off + blksz))
> > > > > >  	done
> > > > > >  
> > > > > > +	if [ "${#cmds[@]}" -gt 0 ]; then
> > > > > > +		$XFS_IO_PROG "${cmds[@]}" "$file" >> /dev/null
> > > > > > +		cmds=()
> > > > > > +	fi
> > > > > > +
> > > > > >  	sync $file
> > > > > >  }
> > > > > >  
> > > > > > @@ -336,9 +355,9 @@ echo >> $seqres.full
> > > > > >  echo "# Populating expected data buffers" >> $seqres.full
> > > > > >  populate_expected_data
> > > > > >  
> > > > > > -# Loop 20 times to shake out any races due to shutdown
> > > > > > -for ((iter=0; iter<20; iter++))
> > > > > > -do
> > > > > > +# Loop to shake out any races due to shutdown
> > > > > > +iter=0
> > > > > > +while _soak_loop_running $TIME_FACTOR; do
> > > > > >  	echo >> $seqres.full
> > > > > >  	echo "------ Iteration $iter ------" >> $seqres.full
> > > > > >  
> > > > > > @@ -361,6 +380,8 @@ do
> > > > > >  	echo >> $seqres.full
> > > > > >  	echo "# Starting shutdown torn write test for append atomic writes" >> $seqres.full
> > > > > >  	test_append_torn_write
> > > > > > +
> > > > > > +	iter=$((iter + 1))
> > > > > >  done
> > > > > >  
> > > > > >  echo "Silence is golden"
> > > > > > 
> > > > > 

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2025-12-04 17:18 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-10 18:26 [PATCHSET] fstests: more random fixes for v2025.11.04 Darrick J. Wong
2025-11-10 18:26 ` [PATCH 1/7] common: leave any breadcrumbs when _link_out_file_named can't find the output file Darrick J. Wong
2025-11-11  9:13   ` Christoph Hellwig
2025-11-10 18:26 ` [PATCH 2/7] generic/778: fix severe performance problems Darrick J. Wong
2025-11-11  9:27   ` Christoph Hellwig
2025-11-13 11:53   ` Ojaswin Mujoo
2025-11-15  2:32     ` Darrick J. Wong
2025-11-29  8:52       ` Ojaswin Mujoo
2025-12-01 23:19         ` Darrick J. Wong
2025-12-03  6:19           ` Ojaswin Mujoo
2025-12-04 17:18             ` Darrick J. Wong
2025-11-10 18:26 ` [PATCH 3/7] generic/778: fix background loop control with sentinel files Darrick J. Wong
2025-11-11  9:28   ` Christoph Hellwig
2025-11-13 11:06   ` Ojaswin Mujoo
2025-11-10 18:27 ` [PATCH 4/7] generic/019: skip test when there is no journal Darrick J. Wong
2025-11-11  9:28   ` Christoph Hellwig
2025-11-12 18:20     ` Darrick J. Wong
2025-11-10 18:27 ` [PATCH 5/7] xfs/837: fix test to work with pre-metadir quota mount options Darrick J. Wong
2025-11-11  9:29   ` Christoph Hellwig
2025-11-10 18:27 ` [PATCH 6/7] generic/774: reduce file size Darrick J. Wong
2025-11-11  9:13   ` John Garry
2025-11-11  9:29     ` Christoph Hellwig
2025-11-11  9:31   ` Christoph Hellwig
2025-11-12 18:05     ` Darrick J. Wong
2025-11-13 10:44   ` Ojaswin Mujoo
2025-11-10 18:27 ` [PATCH 7/7] generic/774: turn off lfsr Darrick J. Wong
2025-11-11  9:01   ` John Garry
2025-11-12 18:15     ` Darrick J. Wong
2025-11-11  9:32   ` Christoph Hellwig
2025-11-13 10:34   ` Ojaswin Mujoo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox