* [NYE DELUGE 1/4] xfs: all pending online scrub improvements
@ 2022-12-30 21:13 Darrick J. Wong
2022-12-30 22:12 ` [PATCHSET v24.0 00/16] fstests: refactor online fsck stress tests Darrick J. Wong
` (3 more replies)
0 siblings, 4 replies; 32+ messages in thread
From: Darrick J. Wong @ 2022-12-30 21:13 UTC (permalink / raw)
To: Dave Chinner, Allison Henderson, Chandan Babu R, Catherine Hoang,
djwong
Cc: xfs, greg.marsden, shirley.ma, konrad.wilk, fstests, Zorro Lang,
Carlos Maiolino
Hi everyone,
As I've mentioned several times throughout 2022, I would like to merge
the online fsck feature in time for the 2023 LTS kernel. The first big
step in this process is to merge all the pending bug fixes, validation
improvements, and general reorganization of the existing metadata
scrubbing functionality.
This first deluge starts with the design document for the entirety of
the online fsck feature. The design doc should be familiar to most of
you, as it's been on the list for review for months already. It
outlines in brief the problems we're trying to solve, the use cases and
testing plan, and the fundamental data structures and algorithms
underlying the entire feature.
After that come all the code changes to wrap up the metadata checking
part of the feature. The biggest piece here is the scrub drains that
allow scrub to quiesce deferred ops targeting AGs so that it can
cross-reference recordsets. Most of the rest is tweaking the btree code
so that we can do keyspace scans to look for conflicting records.
For this review, I would like people to focus the following:
- Are the major subsystems sufficiently documented that you could figure
out what the code does?
- Do you see any problems that are severe enough to cause long term
support hassles? (e.g. bad API design, writing weird metadata to disk)
- Can you spot mis-interactions between the subsystems?
- What were my blind spots in devising this feature?
- Are there missing pieces that you'd like to help build?
- Can I just merge all of this?
The one thing that is /not/ in scope for this review are requests for
more refactoring of existing subsystems. While there are usually valid
arguments for performing such cleanups, those are separate tasks to be
prioritized separately. I will get to them after merging online fsck.
I've been running daily online scrubs of every computer I own for the
last five years, which has helped me iron out real problems in (limited
scope) production. All issues observed in that time have been corrected
in this submission.
As a warning, the patches will likely take several days to trickle in.
All four patch deluges are based off kernel 6.2-rc1, xfsprogs 6.1, and
fstests 2022-12-25.
Thank you all for your participation in the XFS community. Have a safe
New Years, and I'll see you all next year!
--D
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCHSET v24.0 00/16] fstests: refactor online fsck stress tests
2022-12-30 21:13 [NYE DELUGE 1/4] xfs: all pending online scrub improvements Darrick J. Wong
@ 2022-12-30 22:12 ` Darrick J. Wong
2022-12-30 22:12 ` [PATCH 04/16] fuzzy: clean up scrub stress programs quietly Darrick J. Wong
` (15 more replies)
2022-12-30 22:12 ` [PATCHSET v24.0 0/3] fstests: refactor GETFSMAP stress tests Darrick J. Wong
` (2 subsequent siblings)
3 siblings, 16 replies; 32+ messages in thread
From: Darrick J. Wong @ 2022-12-30 22:12 UTC (permalink / raw)
To: zlang, djwong; +Cc: linux-xfs, fstests, guan
Hi all,
This series prepares us to begin creating stress tests for the XFS
online fsck feature. We start by hoisting the loop control code out of
the one existing test (xfs/422) into common/fuzzy, and then we commence
rearranging the code to make it easy to generate more and more tests.
Eventually we will race fsstress against online scrub and online repair
to make sure that xfs_scrub running on a correct filesystem cannot take
it down by accident.
If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.
This is an extraordinary way to destroy everything. Enjoy!
Comments and questions are, as always, welcome.
--D
fstests git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=refactor-scrub-stress
---
common/fuzzy | 272 +++++++++++++++++++++++++++++++++++++++++++++++++++
doc/group-names.txt | 1
tests/xfs/422 | 109 ++------------------
tests/xfs/422.out | 4 -
4 files changed, 285 insertions(+), 101 deletions(-)
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 01/16] xfs/422: create a new test group for fsstress/repair racers
2022-12-30 22:12 ` [PATCHSET v24.0 00/16] fstests: refactor online fsck stress tests Darrick J. Wong
2022-12-30 22:12 ` [PATCH 04/16] fuzzy: clean up scrub stress programs quietly Darrick J. Wong
2022-12-30 22:12 ` [PATCH 07/16] fuzzy: give each test local control over what scrub stress tests get run Darrick J. Wong
@ 2022-12-30 22:12 ` Darrick J. Wong
2022-12-30 22:12 ` [PATCH 06/16] fuzzy: explicitly check for common/inject in _require_xfs_stress_online_repair Darrick J. Wong
` (12 subsequent siblings)
15 siblings, 0 replies; 32+ messages in thread
From: Darrick J. Wong @ 2022-12-30 22:12 UTC (permalink / raw)
To: zlang, djwong; +Cc: linux-xfs, fstests, guan
From: Darrick J. Wong <djwong@kernel.org>
Create a new group for tests that race fsstress with online filesystem
repair, and add this to the dangerous_online_repair group too.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
doc/group-names.txt | 1 +
tests/xfs/422 | 2 +-
2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/doc/group-names.txt b/doc/group-names.txt
index 6cc9af7844..ac219e05b3 100644
--- a/doc/group-names.txt
+++ b/doc/group-names.txt
@@ -34,6 +34,7 @@ dangerous_bothrepair fuzzers to evaluate xfs_scrub + xfs_repair repair
dangerous_fuzzers fuzzers that can crash your computer
dangerous_norepair fuzzers to evaluate kernel metadata verifiers
dangerous_online_repair fuzzers to evaluate xfs_scrub online repair
+dangerous_fsstress_repair race fsstress and xfs_scrub online repair
dangerous_repair fuzzers to evaluate xfs_repair offline repair
dangerous_scrub fuzzers to evaluate xfs_scrub checking
data data loss checkers
diff --git a/tests/xfs/422 b/tests/xfs/422
index f3c63e8d6a..9ed944ed63 100755
--- a/tests/xfs/422
+++ b/tests/xfs/422
@@ -9,7 +9,7 @@
# activity, so we can't have userspace wandering in and thawing it.
#
. ./common/preamble
-_begin_fstest dangerous_scrub dangerous_online_repair freeze
+_begin_fstest online_repair dangerous_fsstress_repair freeze
_register_cleanup "_cleanup" BUS
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 02/16] xfs/422: move the fsstress/freeze/scrub racing logic to common/fuzzy
2022-12-30 22:12 ` [PATCHSET v24.0 00/16] fstests: refactor online fsck stress tests Darrick J. Wong
` (4 preceding siblings ...)
2022-12-30 22:12 ` [PATCH 05/16] fuzzy: rework scrub stress output filtering Darrick J. Wong
@ 2022-12-30 22:12 ` Darrick J. Wong
2022-12-30 22:12 ` [PATCH 03/16] xfs/422: rework feature detection so we only test-format scratch once Darrick J. Wong
` (9 subsequent siblings)
15 siblings, 0 replies; 32+ messages in thread
From: Darrick J. Wong @ 2022-12-30 22:12 UTC (permalink / raw)
To: zlang, djwong; +Cc: linux-xfs, fstests, guan
From: Darrick J. Wong <djwong@kernel.org>
Hoist all this code to common/fuzzy in preparation for making this code
more generic so that we implement a variety of tests that check the
concurrency correctness of online fsck. Do just enough renaming so that
we don't pollute the test program's namespace; we'll fix the other warts
in subsequent patches.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
common/fuzzy | 100 +++++++++++++++++++++++++++++++++++++++++++++++++++
tests/xfs/422 | 104 ++++-------------------------------------------------
tests/xfs/422.out | 4 +-
3 files changed, 109 insertions(+), 99 deletions(-)
diff --git a/common/fuzzy b/common/fuzzy
index 70213af5db..979fa55515 100644
--- a/common/fuzzy
+++ b/common/fuzzy
@@ -316,3 +316,103 @@ _scratch_xfs_fuzz_metadata() {
done
done
}
+
+# Functions to race fsstress, fs freeze, and xfs metadata scrubbing against
+# each other to shake out bugs in xfs online repair.
+
+# Filter freeze and thaw loop output so that we don't tarnish the golden output
+# if the kernel temporarily won't let us freeze.
+__stress_freeze_filter_output() {
+ grep -E -v '(Device or resource busy|Invalid argument)'
+}
+
+# Filter scrub output so that we don't tarnish the golden output if the fs is
+# too busy to scrub. Note: Tests should _notrun if the scrub type is not
+# supported.
+__stress_scrub_filter_output() {
+ grep -E -v '(Device or resource busy|Invalid argument)'
+}
+
+# Run fs freeze and thaw in a tight loop.
+__stress_scrub_freeze_loop() {
+ local end="$1"
+
+ while [ "$(date +%s)" -lt $end ]; do
+ $XFS_IO_PROG -x -c 'freeze' -c 'thaw' $SCRATCH_MNT 2>&1 | \
+ __stress_freeze_filter_output
+ done
+}
+
+# Run xfs online fsck commands in a tight loop.
+__stress_scrub_loop() {
+ local end="$1"
+
+ while [ "$(date +%s)" -lt $end ]; do
+ $XFS_IO_PROG -x -c 'repair rmapbt 0' -c 'repair rmapbt 1' $SCRATCH_MNT 2>&1 | \
+ __stress_scrub_filter_output
+ done
+}
+
+# Run fsstress while we're testing online fsck.
+__stress_scrub_fsstress_loop() {
+ local end="$1"
+
+ local args=$(_scale_fsstress_args -p 4 -d $SCRATCH_MNT -n 2000 $FSSTRESS_AVOID)
+
+ while [ "$(date +%s)" -lt $end ]; do
+ $FSSTRESS_PROG $args >> $seqres.full
+ done
+}
+
+# Make sure we have everything we need to run stress and scrub
+_require_xfs_stress_scrub() {
+ _require_xfs_io_command "scrub"
+ _require_command "$KILLALL_PROG" killall
+ _require_freeze
+}
+
+# Make sure we have everything we need to run stress and online repair
+_require_xfs_stress_online_repair() {
+ _require_xfs_stress_scrub
+ _require_xfs_io_command "repair"
+ _require_xfs_io_error_injection "force_repair"
+ _require_freeze
+}
+
+# Clean up after the loops in case they didn't do it themselves.
+_scratch_xfs_stress_scrub_cleanup() {
+ $KILLALL_PROG -TERM xfs_io fsstress >> $seqres.full 2>&1
+ $XFS_IO_PROG -x -c 'thaw' $SCRATCH_MNT >> $seqres.full 2>&1
+}
+
+# Start scrub, freeze, and fsstress in background looping processes, and wait
+# for 30*TIME_FACTOR seconds to see if the filesystem goes down. Callers
+# must call _scratch_xfs_stress_scrub_cleanup from their cleanup functions.
+_scratch_xfs_stress_scrub() {
+ local start="$(date +%s)"
+ local end="$((start + (30 * TIME_FACTOR) ))"
+
+ echo "Loop started at $(date --date="@${start}")," \
+ "ending at $(date --date="@${end}")" >> $seqres.full
+
+ __stress_scrub_fsstress_loop $end &
+ __stress_scrub_freeze_loop $end &
+ __stress_scrub_loop $end &
+
+ # Wait until 2 seconds after the loops should have finished, then
+ # clean up after ourselves.
+ while [ "$(date +%s)" -lt $((end + 2)) ]; do
+ sleep 1
+ done
+ _scratch_xfs_stress_scrub_cleanup
+
+ echo "Loop finished at $(date)" >> $seqres.full
+}
+
+# Start online repair, freeze, and fsstress in background looping processes,
+# and wait for 30*TIME_FACTOR seconds to see if the filesystem goes down.
+# Same requirements and arguments as _scratch_xfs_stress_scrub.
+_scratch_xfs_stress_online_repair() {
+ $XFS_IO_PROG -x -c 'inject force_repair' $SCRATCH_MNT
+ _scratch_xfs_stress_scrub "$@"
+}
diff --git a/tests/xfs/422 b/tests/xfs/422
index 9ed944ed63..0bf08572f3 100755
--- a/tests/xfs/422
+++ b/tests/xfs/422
@@ -4,40 +4,19 @@
#
# FS QA Test No. 422
#
-# Race freeze and rmapbt repair for a while to see if we crash or livelock.
+# Race fsstress and rmapbt repair for a while to see if we crash or livelock.
# rmapbt repair requires us to freeze the filesystem to stop all filesystem
# activity, so we can't have userspace wandering in and thawing it.
#
. ./common/preamble
_begin_fstest online_repair dangerous_fsstress_repair freeze
-_register_cleanup "_cleanup" BUS
-
-# First kill and wait the freeze loop so it won't try to freeze fs again
-# Then make sure fs is not frozen
-# Then kill and wait for the rest of the workers
-# Because if fs is frozen a killed writer will never exit
-kill_loops() {
- local sig=$1
-
- [ -n "$freeze_pid" ] && kill $sig $freeze_pid
- wait $freeze_pid
- unset freeze_pid
- $XFS_IO_PROG -x -c 'thaw' $SCRATCH_MNT
- [ -n "$stress_pid" ] && kill $sig $stress_pid
- [ -n "$repair_pid" ] && kill $sig $repair_pid
- wait
- unset stress_pid
- unset repair_pid
-}
-
-# Override the default cleanup function.
-_cleanup()
-{
- kill_loops -9 > /dev/null 2>&1
+_cleanup() {
+ _scratch_xfs_stress_scrub_cleanup &> /dev/null
cd /
- rm -rf $tmp.*
+ rm -r -f $tmp.*
}
+_register_cleanup "_cleanup" BUS
# Import common functions.
. ./common/filter
@@ -47,80 +26,13 @@ _cleanup()
# real QA test starts here
_supported_fs xfs
_require_xfs_scratch_rmapbt
-_require_xfs_io_command "scrub"
-_require_xfs_io_error_injection "force_repair"
-_require_command "$KILLALL_PROG" killall
-_require_freeze
+_require_xfs_stress_online_repair
-echo "Format and populate"
_scratch_mkfs > "$seqres.full" 2>&1
_scratch_mount
-
-STRESS_DIR="$SCRATCH_MNT/testdir"
-mkdir -p $STRESS_DIR
-
-for i in $(seq 0 9); do
- mkdir -p $STRESS_DIR/$i
- for j in $(seq 0 9); do
- mkdir -p $STRESS_DIR/$i/$j
- for k in $(seq 0 9); do
- echo x > $STRESS_DIR/$i/$j/$k
- done
- done
-done
-
-cpus=$(( $($here/src/feature -o) * 4 * LOAD_FACTOR))
-
-echo "Concurrent repair"
-filter_output() {
- grep -E -v '(Device or resource busy|Invalid argument)'
-}
-freeze_loop() {
- end="$1"
-
- while [ "$(date +%s)" -lt $end ]; do
- $XFS_IO_PROG -x -c 'freeze' -c 'thaw' $SCRATCH_MNT 2>&1 | filter_output
- done
-}
-repair_loop() {
- end="$1"
-
- while [ "$(date +%s)" -lt $end ]; do
- $XFS_IO_PROG -x -c 'repair rmapbt 0' -c 'repair rmapbt 1' $SCRATCH_MNT 2>&1 | filter_output
- done
-}
-stress_loop() {
- end="$1"
-
- FSSTRESS_ARGS=$(_scale_fsstress_args -p 4 -d $SCRATCH_MNT -n 2000 $FSSTRESS_AVOID)
- while [ "$(date +%s)" -lt $end ]; do
- $FSSTRESS_PROG $FSSTRESS_ARGS >> $seqres.full
- done
-}
-$XFS_IO_PROG -x -c 'inject force_repair' $SCRATCH_MNT
-
-start=$(date +%s)
-end=$((start + (30 * TIME_FACTOR) ))
-
-echo "Loop started at $(date --date="@${start}"), ending at $(date --date="@${end}")" >> $seqres.full
-stress_loop $end &
-stress_pid=$!
-freeze_loop $end &
-freeze_pid=$!
-repair_loop $end &
-repair_pid=$!
-
-# Wait until 2 seconds after the loops should have finished...
-while [ "$(date +%s)" -lt $((end + 2)) ]; do
- sleep 1
-done
-
-# ...and clean up after the loops in case they didn't do it themselves.
-kill_loops >> $seqres.full 2>&1
-
-echo "Loop finished at $(date)" >> $seqres.full
-echo "Test done"
+_scratch_xfs_stress_online_repair
# success, all done
+echo Silence is golden
status=0
exit
diff --git a/tests/xfs/422.out b/tests/xfs/422.out
index 3818c48fa8..f70693fde6 100644
--- a/tests/xfs/422.out
+++ b/tests/xfs/422.out
@@ -1,4 +1,2 @@
QA output created by 422
-Format and populate
-Concurrent repair
-Test done
+Silence is golden
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 03/16] xfs/422: rework feature detection so we only test-format scratch once
2022-12-30 22:12 ` [PATCHSET v24.0 00/16] fstests: refactor online fsck stress tests Darrick J. Wong
` (5 preceding siblings ...)
2022-12-30 22:12 ` [PATCH 02/16] xfs/422: move the fsstress/freeze/scrub racing logic to common/fuzzy Darrick J. Wong
@ 2022-12-30 22:12 ` Darrick J. Wong
2022-12-30 22:12 ` [PATCH 13/16] fuzzy: clean up frozen fses after scrub stress testing Darrick J. Wong
` (8 subsequent siblings)
15 siblings, 0 replies; 32+ messages in thread
From: Darrick J. Wong @ 2022-12-30 22:12 UTC (permalink / raw)
To: zlang, djwong; +Cc: linux-xfs, fstests, guan
From: Darrick J. Wong <djwong@kernel.org>
Rework the feature detection in the one online fsck stress test so that
we only format the scratch device twice per test run.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
tests/xfs/422 | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/tests/xfs/422 b/tests/xfs/422
index 0bf08572f3..b3353d2202 100755
--- a/tests/xfs/422
+++ b/tests/xfs/422
@@ -25,11 +25,12 @@ _register_cleanup "_cleanup" BUS
# real QA test starts here
_supported_fs xfs
-_require_xfs_scratch_rmapbt
+_require_scratch
_require_xfs_stress_online_repair
_scratch_mkfs > "$seqres.full" 2>&1
_scratch_mount
+_require_xfs_has_feature "$SCRATCH_MNT" rmapbt
_scratch_xfs_stress_online_repair
# success, all done
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 04/16] fuzzy: clean up scrub stress programs quietly
2022-12-30 22:12 ` [PATCHSET v24.0 00/16] fstests: refactor online fsck stress tests Darrick J. Wong
@ 2022-12-30 22:12 ` Darrick J. Wong
2022-12-30 22:12 ` [PATCH 07/16] fuzzy: give each test local control over what scrub stress tests get run Darrick J. Wong
` (14 subsequent siblings)
15 siblings, 0 replies; 32+ messages in thread
From: Darrick J. Wong @ 2022-12-30 22:12 UTC (permalink / raw)
To: zlang, djwong; +Cc: linux-xfs, fstests, guan
From: Darrick J. Wong <djwong@kernel.org>
In the cleanup function for online fsck stress test common code, send
SIGINT instead of SIGTERM to the fsstress and xfs_io processes to kill
them. bash prints 'Terminated' to the golden output when children die
with SIGTERM, which can make a test fail, and we don't want a regular
cleanup function being the thing that prevents the test from passing.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
common/fuzzy | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/common/fuzzy b/common/fuzzy
index 979fa55515..e52831560d 100644
--- a/common/fuzzy
+++ b/common/fuzzy
@@ -381,7 +381,9 @@ _require_xfs_stress_online_repair() {
# Clean up after the loops in case they didn't do it themselves.
_scratch_xfs_stress_scrub_cleanup() {
- $KILLALL_PROG -TERM xfs_io fsstress >> $seqres.full 2>&1
+ # Send SIGINT so that bash won't print a 'Terminated' message that
+ # distorts the golden output.
+ $KILLALL_PROG -INT xfs_io fsstress >> $seqres.full 2>&1
$XFS_IO_PROG -x -c 'thaw' $SCRATCH_MNT >> $seqres.full 2>&1
}
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 05/16] fuzzy: rework scrub stress output filtering
2022-12-30 22:12 ` [PATCHSET v24.0 00/16] fstests: refactor online fsck stress tests Darrick J. Wong
` (3 preceding siblings ...)
2022-12-30 22:12 ` [PATCH 06/16] fuzzy: explicitly check for common/inject in _require_xfs_stress_online_repair Darrick J. Wong
@ 2022-12-30 22:12 ` Darrick J. Wong
2022-12-30 22:12 ` [PATCH 02/16] xfs/422: move the fsstress/freeze/scrub racing logic to common/fuzzy Darrick J. Wong
` (10 subsequent siblings)
15 siblings, 0 replies; 32+ messages in thread
From: Darrick J. Wong @ 2022-12-30 22:12 UTC (permalink / raw)
To: zlang, djwong; +Cc: linux-xfs, fstests, guan
From: Darrick J. Wong <djwong@kernel.org>
Rework the output filtering functions for scrub stress tests: first, we
should use _filter_scratch to avoid leaking the scratch fs details to
the output. Second, for scrub and repair, change the filter elements to
reflect outputs that don't indicate failure (such as busy resources,
preening requests, and insufficient space to do anything). Finally,
change the _require function to check that filter functions have been
sourced.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
common/fuzzy | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/common/fuzzy b/common/fuzzy
index e52831560d..94a6ce85a3 100644
--- a/common/fuzzy
+++ b/common/fuzzy
@@ -323,14 +323,19 @@ _scratch_xfs_fuzz_metadata() {
# Filter freeze and thaw loop output so that we don't tarnish the golden output
# if the kernel temporarily won't let us freeze.
__stress_freeze_filter_output() {
- grep -E -v '(Device or resource busy|Invalid argument)'
+ _filter_scratch | \
+ sed -e '/Device or resource busy/d' \
+ -e '/Invalid argument/d'
}
# Filter scrub output so that we don't tarnish the golden output if the fs is
# too busy to scrub. Note: Tests should _notrun if the scrub type is not
# supported.
__stress_scrub_filter_output() {
- grep -E -v '(Device or resource busy|Invalid argument)'
+ _filter_scratch | \
+ sed -e '/Device or resource busy/d' \
+ -e '/Optimization possible/d' \
+ -e '/No space left on device/d'
}
# Run fs freeze and thaw in a tight loop.
@@ -369,6 +374,8 @@ _require_xfs_stress_scrub() {
_require_xfs_io_command "scrub"
_require_command "$KILLALL_PROG" killall
_require_freeze
+ command -v _filter_scratch &>/dev/null || \
+ _notrun 'xfs scrub stress test requires common/filter'
}
# Make sure we have everything we need to run stress and online repair
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 06/16] fuzzy: explicitly check for common/inject in _require_xfs_stress_online_repair
2022-12-30 22:12 ` [PATCHSET v24.0 00/16] fstests: refactor online fsck stress tests Darrick J. Wong
` (2 preceding siblings ...)
2022-12-30 22:12 ` [PATCH 01/16] xfs/422: create a new test group for fsstress/repair racers Darrick J. Wong
@ 2022-12-30 22:12 ` Darrick J. Wong
2022-12-30 22:12 ` [PATCH 05/16] fuzzy: rework scrub stress output filtering Darrick J. Wong
` (11 subsequent siblings)
15 siblings, 0 replies; 32+ messages in thread
From: Darrick J. Wong @ 2022-12-30 22:12 UTC (permalink / raw)
To: zlang, djwong; +Cc: linux-xfs, fstests, guan
From: Darrick J. Wong <djwong@kernel.org>
In _require_xfs_stress_online_repair, make sure that the test has
sourced common/inject before we try to call its functions.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
common/fuzzy | 2 ++
1 file changed, 2 insertions(+)
diff --git a/common/fuzzy b/common/fuzzy
index 94a6ce85a3..de9e398984 100644
--- a/common/fuzzy
+++ b/common/fuzzy
@@ -382,6 +382,8 @@ _require_xfs_stress_scrub() {
_require_xfs_stress_online_repair() {
_require_xfs_stress_scrub
_require_xfs_io_command "repair"
+ command -v _require_xfs_io_error_injection &>/dev/null || \
+ _notrun 'xfs repair stress test requires common/inject'
_require_xfs_io_error_injection "force_repair"
_require_freeze
}
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 07/16] fuzzy: give each test local control over what scrub stress tests get run
2022-12-30 22:12 ` [PATCHSET v24.0 00/16] fstests: refactor online fsck stress tests Darrick J. Wong
2022-12-30 22:12 ` [PATCH 04/16] fuzzy: clean up scrub stress programs quietly Darrick J. Wong
@ 2022-12-30 22:12 ` Darrick J. Wong
2022-12-30 22:12 ` [PATCH 01/16] xfs/422: create a new test group for fsstress/repair racers Darrick J. Wong
` (13 subsequent siblings)
15 siblings, 0 replies; 32+ messages in thread
From: Darrick J. Wong @ 2022-12-30 22:12 UTC (permalink / raw)
To: zlang, djwong; +Cc: linux-xfs, fstests, guan
From: Darrick J. Wong <djwong@kernel.org>
Now that we've hoisted the scrub stress code to common/fuzzy, introduce
argument parsing so that each test can specify what they want to test.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
common/fuzzy | 39 +++++++++++++++++++++++++++++++++++----
tests/xfs/422 | 2 +-
2 files changed, 36 insertions(+), 5 deletions(-)
diff --git a/common/fuzzy b/common/fuzzy
index de9e398984..88ba5fef69 100644
--- a/common/fuzzy
+++ b/common/fuzzy
@@ -348,12 +348,19 @@ __stress_scrub_freeze_loop() {
done
}
-# Run xfs online fsck commands in a tight loop.
-__stress_scrub_loop() {
+# Run individual XFS online fsck commands in a tight loop with xfs_io.
+__stress_one_scrub_loop() {
local end="$1"
+ local scrub_tgt="$2"
+ shift; shift
+
+ local xfs_io_args=()
+ for arg in "$@"; do
+ xfs_io_args+=('-c' "$arg")
+ done
while [ "$(date +%s)" -lt $end ]; do
- $XFS_IO_PROG -x -c 'repair rmapbt 0' -c 'repair rmapbt 1' $SCRATCH_MNT 2>&1 | \
+ $XFS_IO_PROG -x "${xfs_io_args[@]}" "$scrub_tgt" 2>&1 | \
__stress_scrub_filter_output
done
}
@@ -390,6 +397,8 @@ _require_xfs_stress_online_repair() {
# Clean up after the loops in case they didn't do it themselves.
_scratch_xfs_stress_scrub_cleanup() {
+ echo "Cleaning up scrub stress run at $(date)" >> $seqres.full
+
# Send SIGINT so that bash won't print a 'Terminated' message that
# distorts the golden output.
$KILLALL_PROG -INT xfs_io fsstress >> $seqres.full 2>&1
@@ -399,7 +408,25 @@ _scratch_xfs_stress_scrub_cleanup() {
# Start scrub, freeze, and fsstress in background looping processes, and wait
# for 30*TIME_FACTOR seconds to see if the filesystem goes down. Callers
# must call _scratch_xfs_stress_scrub_cleanup from their cleanup functions.
+#
+# Various options include:
+#
+# -s Pass this command to xfs_io to test scrub. If zero -s options are
+# specified, xfs_io will not be run.
+# -t Run online scrub against this file; $SCRATCH_MNT is the default.
_scratch_xfs_stress_scrub() {
+ local one_scrub_args=()
+ local scrub_tgt="$SCRATCH_MNT"
+
+ OPTIND=1
+ while getopts "s:t:" c; do
+ case "$c" in
+ s) one_scrub_args+=("$OPTARG");;
+ t) scrub_tgt="$OPTARG";;
+ *) return 1; ;;
+ esac
+ done
+
local start="$(date +%s)"
local end="$((start + (30 * TIME_FACTOR) ))"
@@ -408,7 +435,11 @@ _scratch_xfs_stress_scrub() {
__stress_scrub_fsstress_loop $end &
__stress_scrub_freeze_loop $end &
- __stress_scrub_loop $end &
+
+ if [ "${#one_scrub_args[@]}" -gt 0 ]; then
+ __stress_one_scrub_loop "$end" "$scrub_tgt" \
+ "${one_scrub_args[@]}" &
+ fi
# Wait until 2 seconds after the loops should have finished, then
# clean up after ourselves.
diff --git a/tests/xfs/422 b/tests/xfs/422
index b3353d2202..faea5d6792 100755
--- a/tests/xfs/422
+++ b/tests/xfs/422
@@ -31,7 +31,7 @@ _require_xfs_stress_online_repair
_scratch_mkfs > "$seqres.full" 2>&1
_scratch_mount
_require_xfs_has_feature "$SCRATCH_MNT" rmapbt
-_scratch_xfs_stress_online_repair
+_scratch_xfs_stress_online_repair -s "repair rmapbt 0" -s "repair rmapbt 1"
# success, all done
echo Silence is golden
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 08/16] fuzzy: test the scrub stress subcommands before looping
2022-12-30 22:12 ` [PATCHSET v24.0 00/16] fstests: refactor online fsck stress tests Darrick J. Wong
` (11 preceding siblings ...)
2022-12-30 22:12 ` [PATCH 10/16] fuzzy: abort scrub stress testing if the scratch fs went down Darrick J. Wong
@ 2022-12-30 22:12 ` Darrick J. Wong
2022-12-30 22:12 ` [PATCH 09/16] fuzzy: make scrub stress loop control more robust Darrick J. Wong
` (2 subsequent siblings)
15 siblings, 0 replies; 32+ messages in thread
From: Darrick J. Wong @ 2022-12-30 22:12 UTC (permalink / raw)
To: zlang, djwong; +Cc: linux-xfs, fstests, guan
From: Darrick J. Wong <djwong@kernel.org>
Before we commit to running fsstress and scrub commands in a loop for
some time, we should check that the provided commands actually work on
the scratch filesystem. The _require_xfs_io_command predicate only
detects the presence of the scrub ioctl, not any particular subcommand.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
common/fuzzy | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)
diff --git a/common/fuzzy b/common/fuzzy
index 88ba5fef69..8d3e30e32b 100644
--- a/common/fuzzy
+++ b/common/fuzzy
@@ -405,6 +405,25 @@ _scratch_xfs_stress_scrub_cleanup() {
$XFS_IO_PROG -x -c 'thaw' $SCRATCH_MNT >> $seqres.full 2>&1
}
+# Make sure the provided scrub/repair commands actually work on the scratch
+# filesystem before we start running them in a loop.
+__stress_scrub_check_commands() {
+ local scrub_tgt="$1"
+ shift
+
+ for arg in "$@"; do
+ testio=`$XFS_IO_PROG -x -c "$arg" $scrub_tgt 2>&1`
+ echo $testio | grep -q "Unknown type" && \
+ _notrun "xfs_io scrub subcommand support is missing"
+ echo $testio | grep -q "Inappropriate ioctl" && \
+ _notrun "kernel scrub ioctl is missing"
+ echo $testio | grep -q "No such file or directory" && \
+ _notrun "kernel does not know about: $arg"
+ echo $testio | grep -q "Operation not supported" && \
+ _notrun "kernel does not support: $arg"
+ done
+}
+
# Start scrub, freeze, and fsstress in background looping processes, and wait
# for 30*TIME_FACTOR seconds to see if the filesystem goes down. Callers
# must call _scratch_xfs_stress_scrub_cleanup from their cleanup functions.
@@ -427,6 +446,8 @@ _scratch_xfs_stress_scrub() {
esac
done
+ __stress_scrub_check_commands "$scrub_tgt" "${one_scrub_args[@]}"
+
local start="$(date +%s)"
local end="$((start + (30 * TIME_FACTOR) ))"
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 09/16] fuzzy: make scrub stress loop control more robust
2022-12-30 22:12 ` [PATCHSET v24.0 00/16] fstests: refactor online fsck stress tests Darrick J. Wong
` (12 preceding siblings ...)
2022-12-30 22:12 ` [PATCH 08/16] fuzzy: test the scrub stress subcommands before looping Darrick J. Wong
@ 2022-12-30 22:12 ` Darrick J. Wong
2022-12-30 22:12 ` [PATCH 16/16] fuzzy: delay the start of the scrub loop when stress-testing scrub Darrick J. Wong
2022-12-30 22:12 ` [PATCH 15/16] fuzzy: allow substitution of AG numbers when configuring scrub stress test Darrick J. Wong
15 siblings, 0 replies; 32+ messages in thread
From: Darrick J. Wong @ 2022-12-30 22:12 UTC (permalink / raw)
To: zlang, djwong; +Cc: linux-xfs, fstests, guan
From: Darrick J. Wong <djwong@kernel.org>
Currently, each of the scrub stress testing background threads
open-codes logic to decide if it should exit the loop. This decision is
based entirely on TIME_FACTOR*30 seconds having gone by, which means
that we ignore external factors, such as the user pressing ^C, which (in
theory) will invoke cleanup functions to tear everything down.
This is not a great user experience, so refactor the loop exit test into
a helper function and establish a sentinel file that must be present to
continue looping. If the user presses ^C, the cleanup function will
remove the sentinel file and kill the background thread children, which
should be enough to stop everything more or less immediately.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
common/fuzzy | 39 ++++++++++++++++++++++++++++-----------
1 file changed, 28 insertions(+), 11 deletions(-)
diff --git a/common/fuzzy b/common/fuzzy
index 8d3e30e32b..6519d5c1e2 100644
--- a/common/fuzzy
+++ b/common/fuzzy
@@ -338,11 +338,18 @@ __stress_scrub_filter_output() {
-e '/No space left on device/d'
}
+# Decide if we want to keep running stress tests. The first argument is the
+# stop time, and second argument is the path to the sentinel file.
+__stress_scrub_running() {
+ test -e "$2" && test "$(date +%s)" -lt "$1"
+}
+
# Run fs freeze and thaw in a tight loop.
__stress_scrub_freeze_loop() {
local end="$1"
+ local runningfile="$2"
- while [ "$(date +%s)" -lt $end ]; do
+ while __stress_scrub_running "$end" "$runningfile"; do
$XFS_IO_PROG -x -c 'freeze' -c 'thaw' $SCRATCH_MNT 2>&1 | \
__stress_freeze_filter_output
done
@@ -351,15 +358,16 @@ __stress_scrub_freeze_loop() {
# Run individual XFS online fsck commands in a tight loop with xfs_io.
__stress_one_scrub_loop() {
local end="$1"
- local scrub_tgt="$2"
- shift; shift
+ local runningfile="$2"
+ local scrub_tgt="$3"
+ shift; shift; shift
local xfs_io_args=()
for arg in "$@"; do
xfs_io_args+=('-c' "$arg")
done
- while [ "$(date +%s)" -lt $end ]; do
+ while __stress_scrub_running "$end" "$runningfile"; do
$XFS_IO_PROG -x "${xfs_io_args[@]}" "$scrub_tgt" 2>&1 | \
__stress_scrub_filter_output
done
@@ -368,12 +376,16 @@ __stress_one_scrub_loop() {
# Run fsstress while we're testing online fsck.
__stress_scrub_fsstress_loop() {
local end="$1"
+ local runningfile="$2"
local args=$(_scale_fsstress_args -p 4 -d $SCRATCH_MNT -n 2000 $FSSTRESS_AVOID)
+ echo "Running $FSSTRESS_PROG $args" >> $seqres.full
- while [ "$(date +%s)" -lt $end ]; do
+ while __stress_scrub_running "$end" "$runningfile"; do
$FSSTRESS_PROG $args >> $seqres.full
+ echo "fsstress exits with $? at $(date)" >> $seqres.full
done
+ rm -f "$runningfile"
}
# Make sure we have everything we need to run stress and scrub
@@ -397,6 +409,7 @@ _require_xfs_stress_online_repair() {
# Clean up after the loops in case they didn't do it themselves.
_scratch_xfs_stress_scrub_cleanup() {
+ rm -f "$runningfile"
echo "Cleaning up scrub stress run at $(date)" >> $seqres.full
# Send SIGINT so that bash won't print a 'Terminated' message that
@@ -436,6 +449,10 @@ __stress_scrub_check_commands() {
_scratch_xfs_stress_scrub() {
local one_scrub_args=()
local scrub_tgt="$SCRATCH_MNT"
+ local runningfile="$tmp.fsstress"
+
+ rm -f "$runningfile"
+ touch "$runningfile"
OPTIND=1
while getopts "s:t:" c; do
@@ -454,17 +471,17 @@ _scratch_xfs_stress_scrub() {
echo "Loop started at $(date --date="@${start}")," \
"ending at $(date --date="@${end}")" >> $seqres.full
- __stress_scrub_fsstress_loop $end &
- __stress_scrub_freeze_loop $end &
+ __stress_scrub_fsstress_loop "$end" "$runningfile" &
+ __stress_scrub_freeze_loop "$end" "$runningfile" &
if [ "${#one_scrub_args[@]}" -gt 0 ]; then
- __stress_one_scrub_loop "$end" "$scrub_tgt" \
+ __stress_one_scrub_loop "$end" "$runningfile" "$scrub_tgt" \
"${one_scrub_args[@]}" &
fi
- # Wait until 2 seconds after the loops should have finished, then
- # clean up after ourselves.
- while [ "$(date +%s)" -lt $((end + 2)) ]; do
+ # Wait until the designated end time or fsstress dies, then kill all of
+ # our background processes.
+ while __stress_scrub_running "$end" "$runningfile"; do
sleep 1
done
_scratch_xfs_stress_scrub_cleanup
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 10/16] fuzzy: abort scrub stress testing if the scratch fs went down
2022-12-30 22:12 ` [PATCHSET v24.0 00/16] fstests: refactor online fsck stress tests Darrick J. Wong
` (10 preceding siblings ...)
2022-12-30 22:12 ` [PATCH 14/16] fuzzy: make freezing optional for scrub stress tests Darrick J. Wong
@ 2022-12-30 22:12 ` Darrick J. Wong
2022-12-30 22:12 ` [PATCH 08/16] fuzzy: test the scrub stress subcommands before looping Darrick J. Wong
` (3 subsequent siblings)
15 siblings, 0 replies; 32+ messages in thread
From: Darrick J. Wong @ 2022-12-30 22:12 UTC (permalink / raw)
To: zlang, djwong; +Cc: linux-xfs, fstests, guan
From: Darrick J. Wong <djwong@kernel.org>
There's no point in continuing a stress test of online fsck if the
filesystem goes down. We can't query that kind of state directly, so as
a proxy we try to stat the mountpoint and interpret any error return as
a sign that the fs is down.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
common/fuzzy | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/common/fuzzy b/common/fuzzy
index 6519d5c1e2..f1bc2dc756 100644
--- a/common/fuzzy
+++ b/common/fuzzy
@@ -338,10 +338,17 @@ __stress_scrub_filter_output() {
-e '/No space left on device/d'
}
+# Decide if the scratch filesystem is still alive.
+__stress_scrub_scratch_alive() {
+ # If we can't stat the scratch filesystem, there's a reasonably good
+ # chance that the fs shut down, which is not good.
+ stat "$SCRATCH_MNT" &>/dev/null
+}
+
# Decide if we want to keep running stress tests. The first argument is the
# stop time, and second argument is the path to the sentinel file.
__stress_scrub_running() {
- test -e "$2" && test "$(date +%s)" -lt "$1"
+ test -e "$2" && test "$(date +%s)" -lt "$1" && __stress_scrub_scratch_alive
}
# Run fs freeze and thaw in a tight loop.
@@ -486,6 +493,10 @@ _scratch_xfs_stress_scrub() {
done
_scratch_xfs_stress_scrub_cleanup
+ # Warn the user if we think the scratch filesystem went down.
+ __stress_scrub_scratch_alive || \
+ echo "Did the scratch filesystem die?"
+
echo "Loop finished at $(date)" >> $seqres.full
}
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 11/16] fuzzy: clear out the scratch filesystem if it's too full
2022-12-30 22:12 ` [PATCHSET v24.0 00/16] fstests: refactor online fsck stress tests Darrick J. Wong
` (8 preceding siblings ...)
2022-12-30 22:12 ` [PATCH 12/16] fuzzy: increase operation count for each fsstress invocation Darrick J. Wong
@ 2022-12-30 22:12 ` Darrick J. Wong
2022-12-30 22:12 ` [PATCH 14/16] fuzzy: make freezing optional for scrub stress tests Darrick J. Wong
` (5 subsequent siblings)
15 siblings, 0 replies; 32+ messages in thread
From: Darrick J. Wong @ 2022-12-30 22:12 UTC (permalink / raw)
To: zlang, djwong; +Cc: linux-xfs, fstests, guan
From: Darrick J. Wong <djwong@kernel.org>
If the online fsck stress tests run for long enough, they'll fill up the
scratch filesystem completely. While it is interesting to test repair
functionality on a *nearly* full filesystem undergoing a heavy workload,
a totally full filesystem is really only exercising the ENOSPC handlers
in the kernel. That's not what we came here to test, so change the
fsstress loop to detect a nearly full filesystem and erase everything
before starting fsstress again.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
common/fuzzy | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/common/fuzzy b/common/fuzzy
index f1bc2dc756..01cf7f00d8 100644
--- a/common/fuzzy
+++ b/common/fuzzy
@@ -380,6 +380,20 @@ __stress_one_scrub_loop() {
done
}
+# Clean the scratch filesystem between rounds of fsstress if there is 2%
+# available space or less because that isn't an interesting stress test.
+#
+# Returns 0 if we cleared anything, and 1 if we did nothing.
+__stress_scrub_clean_scratch() {
+ local used_pct="$(_used $SCRATCH_DEV)"
+
+ test "$used_pct" -lt 98 && return 1
+
+ echo "Clearing scratch fs at $(date)" >> $seqres.full
+ rm -r -f $SCRATCH_MNT/p*
+ return 0
+}
+
# Run fsstress while we're testing online fsck.
__stress_scrub_fsstress_loop() {
local end="$1"
@@ -389,6 +403,8 @@ __stress_scrub_fsstress_loop() {
echo "Running $FSSTRESS_PROG $args" >> $seqres.full
while __stress_scrub_running "$end" "$runningfile"; do
+ # Need to recheck running conditions if we cleared anything
+ __stress_scrub_clean_scratch && continue
$FSSTRESS_PROG $args >> $seqres.full
echo "fsstress exits with $? at $(date)" >> $seqres.full
done
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 12/16] fuzzy: increase operation count for each fsstress invocation
2022-12-30 22:12 ` [PATCHSET v24.0 00/16] fstests: refactor online fsck stress tests Darrick J. Wong
` (7 preceding siblings ...)
2022-12-30 22:12 ` [PATCH 13/16] fuzzy: clean up frozen fses after scrub stress testing Darrick J. Wong
@ 2022-12-30 22:12 ` Darrick J. Wong
2023-01-13 19:55 ` Zorro Lang
2022-12-30 22:12 ` [PATCH 11/16] fuzzy: clear out the scratch filesystem if it's too full Darrick J. Wong
` (6 subsequent siblings)
15 siblings, 1 reply; 32+ messages in thread
From: Darrick J. Wong @ 2022-12-30 22:12 UTC (permalink / raw)
To: zlang, djwong; +Cc: linux-xfs, fstests, guan
From: Darrick J. Wong <djwong@kernel.org>
For online fsck stress testing, increase the number of filesystem
operations per fsstress run to 2 million, now that we have the ability
to kill fsstress if the user should push ^C to abort the test early.
This should guarantee a couple of hours of continuous stress testing in
between clearing the scratch filesystem.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
common/fuzzy | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/common/fuzzy b/common/fuzzy
index 01cf7f00d8..3e23edc9e4 100644
--- a/common/fuzzy
+++ b/common/fuzzy
@@ -399,7 +399,9 @@ __stress_scrub_fsstress_loop() {
local end="$1"
local runningfile="$2"
- local args=$(_scale_fsstress_args -p 4 -d $SCRATCH_MNT -n 2000 $FSSTRESS_AVOID)
+ # As of March 2022, 2 million fsstress ops should be enough to keep
+ # any filesystem busy for a couple of hours.
+ local args=$(_scale_fsstress_args -p 4 -d $SCRATCH_MNT -n 2000000 $FSSTRESS_AVOID)
echo "Running $FSSTRESS_PROG $args" >> $seqres.full
while __stress_scrub_running "$end" "$runningfile"; do
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 13/16] fuzzy: clean up frozen fses after scrub stress testing
2022-12-30 22:12 ` [PATCHSET v24.0 00/16] fstests: refactor online fsck stress tests Darrick J. Wong
` (6 preceding siblings ...)
2022-12-30 22:12 ` [PATCH 03/16] xfs/422: rework feature detection so we only test-format scratch once Darrick J. Wong
@ 2022-12-30 22:12 ` Darrick J. Wong
2022-12-30 22:12 ` [PATCH 12/16] fuzzy: increase operation count for each fsstress invocation Darrick J. Wong
` (7 subsequent siblings)
15 siblings, 0 replies; 32+ messages in thread
From: Darrick J. Wong @ 2022-12-30 22:12 UTC (permalink / raw)
To: zlang, djwong; +Cc: linux-xfs, fstests, guan
From: Darrick J. Wong <djwong@kernel.org>
Some of our scrub stress tests involve racing scrub, fsstress, and a
program that repeatedly freeze and thaws the scratch filesystem. The
current cleanup code suffers from the deficiency that it doesn't
actually wait for the child processes to exit. First, change it to do
that.
However, that exposes a second problem: there's a race condition with a
freezer process that leads to the stress test exiting with a frozen fs.
If the freezer process is blocked trying to acquire the unmount or
sb_write locks, the receipt of a signal (even a fatal one) doesn't cause
it to abort the freeze. This causes further problems with fstests,
since ./check doesn't expect to regain control with the scratch fs
frozen.
Fix both problems by making the cleanup function smarter.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
common/fuzzy | 35 ++++++++++++++++++++++++++++++++++-
1 file changed, 34 insertions(+), 1 deletion(-)
diff --git a/common/fuzzy b/common/fuzzy
index 3e23edc9e4..0f6fc91b80 100644
--- a/common/fuzzy
+++ b/common/fuzzy
@@ -439,8 +439,39 @@ _scratch_xfs_stress_scrub_cleanup() {
# Send SIGINT so that bash won't print a 'Terminated' message that
# distorts the golden output.
+ echo "Killing stressor processes at $(date)" >> $seqres.full
$KILLALL_PROG -INT xfs_io fsstress >> $seqres.full 2>&1
- $XFS_IO_PROG -x -c 'thaw' $SCRATCH_MNT >> $seqres.full 2>&1
+
+ # Tests are not allowed to exit with the scratch fs frozen. If we
+ # started a fs freeze/thaw background loop, wait for that loop to exit
+ # and then thaw the filesystem. Cleanup for the freeze loop must be
+ # performed prior to waiting for the other children to avoid triggering
+ # a race condition that can hang fstests.
+ #
+ # If the xfs_io -c freeze process is asleep waiting for a write lock on
+ # s_umount or sb_write when the killall signal is delivered, it will
+ # not check for pending signals until after it has frozen the fs. If
+ # even one thread of the stress test processes (xfs_io, fsstress, etc.)
+ # is waiting for read locks on sb_write when the killall signals are
+ # delivered, they will block in the kernel until someone thaws the fs,
+ # and the `wait' below will wait forever.
+ #
+ # Hence we issue the killall, wait for the freezer loop to exit, thaw
+ # the filesystem, and wait for the rest of the children.
+ if [ -n "$__SCRUB_STRESS_FREEZE_PID" ]; then
+ echo "Waiting for fs freezer $__SCRUB_STRESS_FREEZE_PID to exit at $(date)" >> $seqres.full
+ wait "$__SCRUB_STRESS_FREEZE_PID"
+
+ echo "Thawing filesystem at $(date)" >> $seqres.full
+ $XFS_IO_PROG -x -c 'thaw' $SCRATCH_MNT >> $seqres.full 2>&1
+ __SCRUB_STRESS_FREEZE_PID=""
+ fi
+
+ # Wait for the remaining children to exit.
+ echo "Waiting for children to exit at $(date)" >> $seqres.full
+ wait
+
+ echo "Cleanup finished at $(date)" >> $seqres.full
}
# Make sure the provided scrub/repair commands actually work on the scratch
@@ -476,6 +507,7 @@ _scratch_xfs_stress_scrub() {
local scrub_tgt="$SCRATCH_MNT"
local runningfile="$tmp.fsstress"
+ __SCRUB_STRESS_FREEZE_PID=""
rm -f "$runningfile"
touch "$runningfile"
@@ -498,6 +530,7 @@ _scratch_xfs_stress_scrub() {
__stress_scrub_fsstress_loop "$end" "$runningfile" &
__stress_scrub_freeze_loop "$end" "$runningfile" &
+ __SCRUB_STRESS_FREEZE_PID="$!"
if [ "${#one_scrub_args[@]}" -gt 0 ]; then
__stress_one_scrub_loop "$end" "$runningfile" "$scrub_tgt" \
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 14/16] fuzzy: make freezing optional for scrub stress tests
2022-12-30 22:12 ` [PATCHSET v24.0 00/16] fstests: refactor online fsck stress tests Darrick J. Wong
` (9 preceding siblings ...)
2022-12-30 22:12 ` [PATCH 11/16] fuzzy: clear out the scratch filesystem if it's too full Darrick J. Wong
@ 2022-12-30 22:12 ` Darrick J. Wong
2022-12-30 22:12 ` [PATCH 10/16] fuzzy: abort scrub stress testing if the scratch fs went down Darrick J. Wong
` (4 subsequent siblings)
15 siblings, 0 replies; 32+ messages in thread
From: Darrick J. Wong @ 2022-12-30 22:12 UTC (permalink / raw)
To: zlang, djwong; +Cc: linux-xfs, fstests, guan
From: Darrick J. Wong <djwong@kernel.org>
Make the freeze/thaw loop optional, since that's a significant change in
behavior if it's enabled.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
common/fuzzy | 13 ++++++++++---
tests/xfs/422 | 2 +-
2 files changed, 11 insertions(+), 4 deletions(-)
diff --git a/common/fuzzy b/common/fuzzy
index 0f6fc91b80..219dd3bb0a 100644
--- a/common/fuzzy
+++ b/common/fuzzy
@@ -499,6 +499,8 @@ __stress_scrub_check_commands() {
#
# Various options include:
#
+# -f Run a freeze/thaw loop while we're doing other things. Defaults to
+# disabled, unless XFS_SCRUB_STRESS_FREEZE is set.
# -s Pass this command to xfs_io to test scrub. If zero -s options are
# specified, xfs_io will not be run.
# -t Run online scrub against this file; $SCRATCH_MNT is the default.
@@ -506,14 +508,16 @@ _scratch_xfs_stress_scrub() {
local one_scrub_args=()
local scrub_tgt="$SCRATCH_MNT"
local runningfile="$tmp.fsstress"
+ local freeze="${XFS_SCRUB_STRESS_FREEZE}"
__SCRUB_STRESS_FREEZE_PID=""
rm -f "$runningfile"
touch "$runningfile"
OPTIND=1
- while getopts "s:t:" c; do
+ while getopts "fs:t:" c; do
case "$c" in
+ f) freeze=yes;;
s) one_scrub_args+=("$OPTARG");;
t) scrub_tgt="$OPTARG";;
*) return 1; ;;
@@ -529,8 +533,11 @@ _scratch_xfs_stress_scrub() {
"ending at $(date --date="@${end}")" >> $seqres.full
__stress_scrub_fsstress_loop "$end" "$runningfile" &
- __stress_scrub_freeze_loop "$end" "$runningfile" &
- __SCRUB_STRESS_FREEZE_PID="$!"
+
+ if [ -n "$freeze" ]; then
+ __stress_scrub_freeze_loop "$end" "$runningfile" &
+ __SCRUB_STRESS_FREEZE_PID="$!"
+ fi
if [ "${#one_scrub_args[@]}" -gt 0 ]; then
__stress_one_scrub_loop "$end" "$runningfile" "$scrub_tgt" \
diff --git a/tests/xfs/422 b/tests/xfs/422
index faea5d6792..ac88713257 100755
--- a/tests/xfs/422
+++ b/tests/xfs/422
@@ -31,7 +31,7 @@ _require_xfs_stress_online_repair
_scratch_mkfs > "$seqres.full" 2>&1
_scratch_mount
_require_xfs_has_feature "$SCRATCH_MNT" rmapbt
-_scratch_xfs_stress_online_repair -s "repair rmapbt 0" -s "repair rmapbt 1"
+_scratch_xfs_stress_online_repair -f -s "repair rmapbt 0" -s "repair rmapbt 1"
# success, all done
echo Silence is golden
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 15/16] fuzzy: allow substitution of AG numbers when configuring scrub stress test
2022-12-30 22:12 ` [PATCHSET v24.0 00/16] fstests: refactor online fsck stress tests Darrick J. Wong
` (14 preceding siblings ...)
2022-12-30 22:12 ` [PATCH 16/16] fuzzy: delay the start of the scrub loop when stress-testing scrub Darrick J. Wong
@ 2022-12-30 22:12 ` Darrick J. Wong
15 siblings, 0 replies; 32+ messages in thread
From: Darrick J. Wong @ 2022-12-30 22:12 UTC (permalink / raw)
To: zlang, djwong; +Cc: linux-xfs, fstests, guan
From: Darrick J. Wong <djwong@kernel.org>
Allow the test program to use the metavariable '%agno%' when passing
scrub commands to the scrub stress loop. This makes it easier for tests
to scrub or repair every AG in the filesystem without a lot of work.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
common/fuzzy | 14 ++++++++++++--
tests/xfs/422 | 2 +-
2 files changed, 13 insertions(+), 3 deletions(-)
diff --git a/common/fuzzy b/common/fuzzy
index 219dd3bb0a..e42e2ccec1 100644
--- a/common/fuzzy
+++ b/common/fuzzy
@@ -368,10 +368,19 @@ __stress_one_scrub_loop() {
local runningfile="$2"
local scrub_tgt="$3"
shift; shift; shift
+ local agcount="$(_xfs_mount_agcount $SCRATCH_MNT)"
local xfs_io_args=()
for arg in "$@"; do
- xfs_io_args+=('-c' "$arg")
+ if echo "$arg" | grep -q -w '%agno%'; then
+ # Substitute the AG number
+ for ((agno = 0; agno < agcount; agno++)); do
+ local ag_arg="$(echo "$arg" | sed -e "s|%agno%|$agno|g")"
+ xfs_io_args+=('-c' "$ag_arg")
+ done
+ else
+ xfs_io_args+=('-c' "$arg")
+ fi
done
while __stress_scrub_running "$end" "$runningfile"; do
@@ -481,7 +490,8 @@ __stress_scrub_check_commands() {
shift
for arg in "$@"; do
- testio=`$XFS_IO_PROG -x -c "$arg" $scrub_tgt 2>&1`
+ local cooked_arg="$(echo "$arg" | sed -e "s/%agno%/0/g")"
+ testio=`$XFS_IO_PROG -x -c "$cooked_arg" $scrub_tgt 2>&1`
echo $testio | grep -q "Unknown type" && \
_notrun "xfs_io scrub subcommand support is missing"
echo $testio | grep -q "Inappropriate ioctl" && \
diff --git a/tests/xfs/422 b/tests/xfs/422
index ac88713257..995f612166 100755
--- a/tests/xfs/422
+++ b/tests/xfs/422
@@ -31,7 +31,7 @@ _require_xfs_stress_online_repair
_scratch_mkfs > "$seqres.full" 2>&1
_scratch_mount
_require_xfs_has_feature "$SCRATCH_MNT" rmapbt
-_scratch_xfs_stress_online_repair -f -s "repair rmapbt 0" -s "repair rmapbt 1"
+_scratch_xfs_stress_online_repair -f -s "repair rmapbt %agno%"
# success, all done
echo Silence is golden
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 16/16] fuzzy: delay the start of the scrub loop when stress-testing scrub
2022-12-30 22:12 ` [PATCHSET v24.0 00/16] fstests: refactor online fsck stress tests Darrick J. Wong
` (13 preceding siblings ...)
2022-12-30 22:12 ` [PATCH 09/16] fuzzy: make scrub stress loop control more robust Darrick J. Wong
@ 2022-12-30 22:12 ` Darrick J. Wong
2022-12-30 22:12 ` [PATCH 15/16] fuzzy: allow substitution of AG numbers when configuring scrub stress test Darrick J. Wong
15 siblings, 0 replies; 32+ messages in thread
From: Darrick J. Wong @ 2022-12-30 22:12 UTC (permalink / raw)
To: zlang, djwong; +Cc: linux-xfs, fstests, guan
From: Darrick J. Wong <djwong@kernel.org>
By default, online fsck stress testing kicks off the loops for fsstress
and online fsck at the same time. However, in certain debugging
scenarios it can help if we let fsstress get a head-start in filling up
the filesystem. Plumb in a means to delay the start of the scrub loop.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
common/fuzzy | 19 ++++++++++++++++---
1 file changed, 16 insertions(+), 3 deletions(-)
diff --git a/common/fuzzy b/common/fuzzy
index e42e2ccec1..1df51a6dd8 100644
--- a/common/fuzzy
+++ b/common/fuzzy
@@ -367,7 +367,8 @@ __stress_one_scrub_loop() {
local end="$1"
local runningfile="$2"
local scrub_tgt="$3"
- shift; shift; shift
+ local scrub_startat="$4"
+ shift; shift; shift; shift
local agcount="$(_xfs_mount_agcount $SCRATCH_MNT)"
local xfs_io_args=()
@@ -383,6 +384,10 @@ __stress_one_scrub_loop() {
fi
done
+ while __stress_scrub_running "$scrub_startat" "$runningfile"; do
+ sleep 1
+ done
+
while __stress_scrub_running "$end" "$runningfile"; do
$XFS_IO_PROG -x "${xfs_io_args[@]}" "$scrub_tgt" 2>&1 | \
__stress_scrub_filter_output
@@ -514,22 +519,27 @@ __stress_scrub_check_commands() {
# -s Pass this command to xfs_io to test scrub. If zero -s options are
# specified, xfs_io will not be run.
# -t Run online scrub against this file; $SCRATCH_MNT is the default.
+# -w Delay the start of the scrub/repair loop by this number of seconds.
+# Defaults to no delay unless XFS_SCRUB_STRESS_DELAY is set. This value
+# will be clamped to ten seconds before the end time.
_scratch_xfs_stress_scrub() {
local one_scrub_args=()
local scrub_tgt="$SCRATCH_MNT"
local runningfile="$tmp.fsstress"
local freeze="${XFS_SCRUB_STRESS_FREEZE}"
+ local scrub_delay="${XFS_SCRUB_STRESS_DELAY:--1}"
__SCRUB_STRESS_FREEZE_PID=""
rm -f "$runningfile"
touch "$runningfile"
OPTIND=1
- while getopts "fs:t:" c; do
+ while getopts "fs:t:w:" c; do
case "$c" in
f) freeze=yes;;
s) one_scrub_args+=("$OPTARG");;
t) scrub_tgt="$OPTARG";;
+ w) scrub_delay="$OPTARG";;
*) return 1; ;;
esac
done
@@ -538,6 +548,9 @@ _scratch_xfs_stress_scrub() {
local start="$(date +%s)"
local end="$((start + (30 * TIME_FACTOR) ))"
+ local scrub_startat="$((start + scrub_delay))"
+ test "$scrub_startat" -gt "$((end - 10))" &&
+ scrub_startat="$((end - 10))"
echo "Loop started at $(date --date="@${start}")," \
"ending at $(date --date="@${end}")" >> $seqres.full
@@ -551,7 +564,7 @@ _scratch_xfs_stress_scrub() {
if [ "${#one_scrub_args[@]}" -gt 0 ]; then
__stress_one_scrub_loop "$end" "$runningfile" "$scrub_tgt" \
- "${one_scrub_args[@]}" &
+ "$scrub_startat" "${one_scrub_args[@]}" &
fi
# Wait until the designated end time or fsstress dies, then kill all of
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCHSET v24.0 0/3] fstests: refactor GETFSMAP stress tests
2022-12-30 21:13 [NYE DELUGE 1/4] xfs: all pending online scrub improvements Darrick J. Wong
2022-12-30 22:12 ` [PATCHSET v24.0 00/16] fstests: refactor online fsck stress tests Darrick J. Wong
@ 2022-12-30 22:12 ` Darrick J. Wong
2022-12-30 22:12 ` [PATCH 1/3] fuzzy: enhance scrub stress testing to use fsx Darrick J. Wong
` (2 more replies)
2022-12-30 22:13 ` [PATCHSET v24.0 0/2] fstests: race online scrub with mount state changes Darrick J. Wong
2023-01-13 20:10 ` [NYE DELUGE 1/4] xfs: all pending online scrub improvements Zorro Lang
3 siblings, 3 replies; 32+ messages in thread
From: Darrick J. Wong @ 2022-12-30 22:12 UTC (permalink / raw)
To: zlang, djwong; +Cc: linux-xfs, fstests, guan
Hi all,
Refactor the fsmap racing tests to use the general scrub stress loop
infrastructure that we've now created, and then add a bit more
functionality so that we can test racing remounting the filesystem
readonly and readwrite.
If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.
This is an extraordinary way to destroy everything. Enjoy!
Comments and questions are, as always, welcome.
--D
fstests git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=refactor-fsmap-stress
---
common/fuzzy | 161 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
ltp/fsstress.c | 18 ++++++
tests/xfs/517 | 91 +-----------------------------
tests/xfs/517.out | 4 -
tests/xfs/732 | 38 +++++++++++++
tests/xfs/732.out | 2 +
tests/xfs/847 | 38 +++++++++++++
tests/xfs/847.out | 2 +
tests/xfs/848 | 38 +++++++++++++
tests/xfs/848.out | 2 +
10 files changed, 300 insertions(+), 94 deletions(-)
create mode 100755 tests/xfs/732
create mode 100644 tests/xfs/732.out
create mode 100755 tests/xfs/847
create mode 100644 tests/xfs/847.out
create mode 100755 tests/xfs/848
create mode 100644 tests/xfs/848.out
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 1/3] fuzzy: enhance scrub stress testing to use fsx
2022-12-30 22:12 ` [PATCHSET v24.0 0/3] fstests: refactor GETFSMAP stress tests Darrick J. Wong
@ 2022-12-30 22:12 ` Darrick J. Wong
2023-01-05 5:49 ` Zorro Lang
2023-01-05 18:28 ` [PATCH v24.1 " Darrick J. Wong
2022-12-30 22:12 ` [PATCH 2/3] fuzzy: refactor fsmap stress test to use our helper functions Darrick J. Wong
2022-12-30 22:12 ` [PATCH 3/3] xfs: race fsmap with readonly remounts to detect crash or livelock Darrick J. Wong
2 siblings, 2 replies; 32+ messages in thread
From: Darrick J. Wong @ 2022-12-30 22:12 UTC (permalink / raw)
To: zlang, djwong; +Cc: linux-xfs, fstests, guan
From: Darrick J. Wong <djwong@kernel.org>
Add a couple of new online fsck stress tests that race fsx against
online fsck.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
common/fuzzy | 39 ++++++++++++++++++++++++++++++++++++---
tests/xfs/847 | 38 ++++++++++++++++++++++++++++++++++++++
tests/xfs/847.out | 2 ++
tests/xfs/848 | 38 ++++++++++++++++++++++++++++++++++++++
tests/xfs/848.out | 2 ++
5 files changed, 116 insertions(+), 3 deletions(-)
create mode 100755 tests/xfs/847
create mode 100644 tests/xfs/847.out
create mode 100755 tests/xfs/848
create mode 100644 tests/xfs/848.out
diff --git a/common/fuzzy b/common/fuzzy
index 1df51a6dd8..3512e95e02 100644
--- a/common/fuzzy
+++ b/common/fuzzy
@@ -408,6 +408,30 @@ __stress_scrub_clean_scratch() {
return 0
}
+# Run fsx while we're testing online fsck.
+__stress_scrub_fsx_loop() {
+ local end="$1"
+ local runningfile="$2"
+ local focus=(-q -X) # quiet, validate file contents
+
+ # As of November 2022, 2 million fsx ops should be enough to keep
+ # any filesystem busy for a couple of hours.
+ focus+=(-N 2000000)
+ focus+=(-o $((128000 * LOAD_FACTOR)) )
+ focus+=(-l $((600000 * LOAD_FACTOR)) )
+
+ local args="$FSX_AVOID ${focus[@]} ${SCRATCH_MNT}/fsx.$seq"
+ echo "Running $here/ltp/fsx $args" >> $seqres.full
+
+ while __stress_scrub_running "$end" "$runningfile"; do
+ # Need to recheck running conditions if we cleared anything
+ __stress_scrub_clean_scratch && continue
+ $here/ltp/fsx $args >> $seqres.full
+ echo "fsx exits with $? at $(date)" >> $seqres.full
+ done
+ rm -f "$runningfile"
+}
+
# Run fsstress while we're testing online fsck.
__stress_scrub_fsstress_loop() {
local end="$1"
@@ -454,7 +478,7 @@ _scratch_xfs_stress_scrub_cleanup() {
# Send SIGINT so that bash won't print a 'Terminated' message that
# distorts the golden output.
echo "Killing stressor processes at $(date)" >> $seqres.full
- $KILLALL_PROG -INT xfs_io fsstress >> $seqres.full 2>&1
+ $KILLALL_PROG -INT xfs_io fsstress fsx >> $seqres.full 2>&1
# Tests are not allowed to exit with the scratch fs frozen. If we
# started a fs freeze/thaw background loop, wait for that loop to exit
@@ -522,30 +546,39 @@ __stress_scrub_check_commands() {
# -w Delay the start of the scrub/repair loop by this number of seconds.
# Defaults to no delay unless XFS_SCRUB_STRESS_DELAY is set. This value
# will be clamped to ten seconds before the end time.
+# -X Run this program to exercise the filesystem. Currently supported
+# options are 'fsx' and 'fsstress'. The default is 'fsstress'.
_scratch_xfs_stress_scrub() {
local one_scrub_args=()
local scrub_tgt="$SCRATCH_MNT"
local runningfile="$tmp.fsstress"
local freeze="${XFS_SCRUB_STRESS_FREEZE}"
local scrub_delay="${XFS_SCRUB_STRESS_DELAY:--1}"
+ local exerciser="fsstress"
__SCRUB_STRESS_FREEZE_PID=""
rm -f "$runningfile"
touch "$runningfile"
OPTIND=1
- while getopts "fs:t:w:" c; do
+ while getopts "fs:t:w:X:" c; do
case "$c" in
f) freeze=yes;;
s) one_scrub_args+=("$OPTARG");;
t) scrub_tgt="$OPTARG";;
w) scrub_delay="$OPTARG";;
+ X) exerciser="$OPTARG";;
*) return 1; ;;
esac
done
__stress_scrub_check_commands "$scrub_tgt" "${one_scrub_args[@]}"
+ if ! command -v "__stress_scrub_${exerciser}_loop" &>/dev/null; then
+ echo "${exerciser}: Unknown fs exercise program."
+ return 1
+ fi
+
local start="$(date +%s)"
local end="$((start + (30 * TIME_FACTOR) ))"
local scrub_startat="$((start + scrub_delay))"
@@ -555,7 +588,7 @@ _scratch_xfs_stress_scrub() {
echo "Loop started at $(date --date="@${start}")," \
"ending at $(date --date="@${end}")" >> $seqres.full
- __stress_scrub_fsstress_loop "$end" "$runningfile" &
+ "__stress_scrub_${exerciser}_loop" "$end" "$runningfile" &
if [ -n "$freeze" ]; then
__stress_scrub_freeze_loop "$end" "$runningfile" &
diff --git a/tests/xfs/847 b/tests/xfs/847
new file mode 100755
index 0000000000..856e9a6c26
--- /dev/null
+++ b/tests/xfs/847
@@ -0,0 +1,38 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2022 Oracle, Inc. All Rights Reserved.
+#
+# FS QA Test No. 847
+#
+# Race fsx and xfs_scrub in read-only mode for a while to see if we crash
+# or livelock.
+#
+. ./common/preamble
+_begin_fstest scrub dangerous_fsstress_scrub
+
+_cleanup() {
+ cd /
+ _scratch_xfs_stress_scrub_cleanup &> /dev/null
+ rm -r -f $tmp.*
+}
+_register_cleanup "_cleanup" BUS
+
+# Import common functions.
+. ./common/filter
+. ./common/fuzzy
+. ./common/inject
+. ./common/xfs
+
+# real QA test starts here
+_supported_fs xfs
+_require_scratch
+_require_xfs_stress_scrub
+
+_scratch_mkfs > "$seqres.full" 2>&1
+_scratch_mount
+_scratch_xfs_stress_scrub -S '-n' -X 'fsx'
+
+# success, all done
+echo Silence is golden
+status=0
+exit
diff --git a/tests/xfs/847.out b/tests/xfs/847.out
new file mode 100644
index 0000000000..b7041db159
--- /dev/null
+++ b/tests/xfs/847.out
@@ -0,0 +1,2 @@
+QA output created by 847
+Silence is golden
diff --git a/tests/xfs/848 b/tests/xfs/848
new file mode 100755
index 0000000000..ab32020624
--- /dev/null
+++ b/tests/xfs/848
@@ -0,0 +1,38 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2022 Oracle, Inc. All Rights Reserved.
+#
+# FS QA Test No. 848
+#
+# Race fsx and xfs_scrub in force-repair mode for a while to see if we
+# crash or livelock.
+#
+. ./common/preamble
+_begin_fstest online_repair dangerous_fsstress_repair
+
+_cleanup() {
+ cd /
+ _scratch_xfs_stress_scrub_cleanup &> /dev/null
+ rm -r -f $tmp.*
+}
+_register_cleanup "_cleanup" BUS
+
+# Import common functions.
+. ./common/filter
+. ./common/fuzzy
+. ./common/inject
+. ./common/xfs
+
+# real QA test starts here
+_supported_fs xfs
+_require_scratch
+_require_xfs_stress_online_repair
+
+_scratch_mkfs > "$seqres.full" 2>&1
+_scratch_mount
+_scratch_xfs_stress_online_repair -S '-k' -X 'fsx'
+
+# success, all done
+echo Silence is golden
+status=0
+exit
diff --git a/tests/xfs/848.out b/tests/xfs/848.out
new file mode 100644
index 0000000000..23f674045c
--- /dev/null
+++ b/tests/xfs/848.out
@@ -0,0 +1,2 @@
+QA output created by 848
+Silence is golden
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 2/3] fuzzy: refactor fsmap stress test to use our helper functions
2022-12-30 22:12 ` [PATCHSET v24.0 0/3] fstests: refactor GETFSMAP stress tests Darrick J. Wong
2022-12-30 22:12 ` [PATCH 1/3] fuzzy: enhance scrub stress testing to use fsx Darrick J. Wong
@ 2022-12-30 22:12 ` Darrick J. Wong
2022-12-30 22:12 ` [PATCH 3/3] xfs: race fsmap with readonly remounts to detect crash or livelock Darrick J. Wong
2 siblings, 0 replies; 32+ messages in thread
From: Darrick J. Wong @ 2022-12-30 22:12 UTC (permalink / raw)
To: zlang, djwong; +Cc: linux-xfs, fstests, guan
From: Darrick J. Wong <djwong@kernel.org>
Refactor xfs/517 (which races fsstress with fsmap) to use our new
control loop functions instead of open-coding everything.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
common/fuzzy | 30 +++++++++++++++++
tests/xfs/517 | 91 ++---------------------------------------------------
tests/xfs/517.out | 4 +-
3 files changed, 34 insertions(+), 91 deletions(-)
diff --git a/common/fuzzy b/common/fuzzy
index 3512e95e02..58e299d34b 100644
--- a/common/fuzzy
+++ b/common/fuzzy
@@ -362,6 +362,23 @@ __stress_scrub_freeze_loop() {
done
}
+# Run individual xfs_io commands in a tight loop.
+__stress_xfs_io_loop() {
+ local end="$1"
+ local runningfile="$2"
+ shift; shift
+
+ local xfs_io_args=()
+ for arg in "$@"; do
+ xfs_io_args+=('-c' "$arg")
+ done
+
+ while __stress_scrub_running "$end" "$runningfile"; do
+ $XFS_IO_PROG -x "${xfs_io_args[@]}" "$SCRATCH_MNT" \
+ > /dev/null 2>> $seqres.full
+ done
+}
+
# Run individual XFS online fsck commands in a tight loop with xfs_io.
__stress_one_scrub_loop() {
local end="$1"
@@ -540,6 +557,10 @@ __stress_scrub_check_commands() {
#
# -f Run a freeze/thaw loop while we're doing other things. Defaults to
# disabled, unless XFS_SCRUB_STRESS_FREEZE is set.
+# -i Pass this command to xfs_io to exercise something that is not scrub
+# in a separate loop. If zero -i options are specified, do not run.
+# Callers must check each of these commands (via _require_xfs_io_command)
+# before calling here.
# -s Pass this command to xfs_io to test scrub. If zero -s options are
# specified, xfs_io will not be run.
# -t Run online scrub against this file; $SCRATCH_MNT is the default.
@@ -555,15 +576,17 @@ _scratch_xfs_stress_scrub() {
local freeze="${XFS_SCRUB_STRESS_FREEZE}"
local scrub_delay="${XFS_SCRUB_STRESS_DELAY:--1}"
local exerciser="fsstress"
+ local io_args=()
__SCRUB_STRESS_FREEZE_PID=""
rm -f "$runningfile"
touch "$runningfile"
OPTIND=1
- while getopts "fs:t:w:X:" c; do
+ while getopts "fi:s:t:w:X:" c; do
case "$c" in
f) freeze=yes;;
+ i) io_args+=("$OPTARG");;
s) one_scrub_args+=("$OPTARG");;
t) scrub_tgt="$OPTARG";;
w) scrub_delay="$OPTARG";;
@@ -595,6 +618,11 @@ _scratch_xfs_stress_scrub() {
__SCRUB_STRESS_FREEZE_PID="$!"
fi
+ if [ "${#io_args[@]}" -gt 0 ]; then
+ __stress_xfs_io_loop "$end" "$runningfile" \
+ "${io_args[@]}" &
+ fi
+
if [ "${#one_scrub_args[@]}" -gt 0 ]; then
__stress_one_scrub_loop "$end" "$runningfile" "$scrub_tgt" \
"$scrub_startat" "${one_scrub_args[@]}" &
diff --git a/tests/xfs/517 b/tests/xfs/517
index 99fc89b05f..4481ba41da 100755
--- a/tests/xfs/517
+++ b/tests/xfs/517
@@ -11,29 +11,11 @@ _begin_fstest auto quick fsmap freeze
_register_cleanup "_cleanup" BUS
-# First kill and wait the freeze loop so it won't try to freeze fs again
-# Then make sure fs is not frozen
-# Then kill and wait for the rest of the workers
-# Because if fs is frozen a killed writer will never exit
-kill_loops() {
- local sig=$1
-
- [ -n "$freeze_pid" ] && kill $sig $freeze_pid
- wait $freeze_pid
- unset freeze_pid
- $XFS_IO_PROG -x -c 'thaw' $SCRATCH_MNT
- [ -n "$stress_pid" ] && kill $sig $stress_pid
- [ -n "$fsmap_pid" ] && kill $sig $fsmap_pid
- wait
- unset stress_pid
- unset fsmap_pid
-}
-
# Override the default cleanup function.
_cleanup()
{
- kill_loops -9 > /dev/null 2>&1
cd /
+ _scratch_xfs_stress_scrub_cleanup
rm -rf $tmp.*
}
@@ -46,78 +28,13 @@ _cleanup()
_supported_fs xfs
_require_xfs_scratch_rmapbt
_require_xfs_io_command "fsmap"
-_require_command "$KILLALL_PROG" killall
-_require_freeze
+_require_xfs_stress_scrub
-echo "Format and populate"
_scratch_mkfs > "$seqres.full" 2>&1
_scratch_mount
-
-STRESS_DIR="$SCRATCH_MNT/testdir"
-mkdir -p $STRESS_DIR
-
-for i in $(seq 0 9); do
- mkdir -p $STRESS_DIR/$i
- for j in $(seq 0 9); do
- mkdir -p $STRESS_DIR/$i/$j
- for k in $(seq 0 9); do
- echo x > $STRESS_DIR/$i/$j/$k
- done
- done
-done
-
-cpus=$(( $(src/feature -o) * 4 * LOAD_FACTOR))
-
-echo "Concurrent fsmap and freeze"
-filter_output() {
- grep -E -v '(Device or resource busy|Invalid argument)'
-}
-freeze_loop() {
- end="$1"
-
- while [ "$(date +%s)" -lt $end ]; do
- $XFS_IO_PROG -x -c 'freeze' $SCRATCH_MNT 2>&1 | filter_output
- $XFS_IO_PROG -x -c 'thaw' $SCRATCH_MNT 2>&1 | filter_output
- done
-}
-fsmap_loop() {
- end="$1"
-
- while [ "$(date +%s)" -lt $end ]; do
- $XFS_IO_PROG -c 'fsmap -v' $SCRATCH_MNT > /dev/null
- done
-}
-stress_loop() {
- end="$1"
-
- FSSTRESS_ARGS=$(_scale_fsstress_args -p 4 -d $SCRATCH_MNT -n 2000 $FSSTRESS_AVOID)
- while [ "$(date +%s)" -lt $end ]; do
- $FSSTRESS_PROG $FSSTRESS_ARGS >> $seqres.full
- done
-}
-
-start=$(date +%s)
-end=$((start + (30 * TIME_FACTOR) ))
-
-echo "Loop started at $(date --date="@${start}"), ending at $(date --date="@${end}")" >> $seqres.full
-stress_loop $end &
-stress_pid=$!
-freeze_loop $end &
-freeze_pid=$!
-fsmap_loop $end &
-fsmap_pid=$!
-
-# Wait until 2 seconds after the loops should have finished...
-while [ "$(date +%s)" -lt $((end + 2)) ]; do
- sleep 1
-done
-
-# ...and clean up after the loops in case they didn't do it themselves.
-kill_loops >> $seqres.full 2>&1
-
-echo "Loop finished at $(date)" >> $seqres.full
-echo "Test done"
+_scratch_xfs_stress_scrub -i 'fsmap -v'
# success, all done
+echo "Silence is golden"
status=0
exit
diff --git a/tests/xfs/517.out b/tests/xfs/517.out
index da6366e52b..49c53bcaa9 100644
--- a/tests/xfs/517.out
+++ b/tests/xfs/517.out
@@ -1,4 +1,2 @@
QA output created by 517
-Format and populate
-Concurrent fsmap and freeze
-Test done
+Silence is golden
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 3/3] xfs: race fsmap with readonly remounts to detect crash or livelock
2022-12-30 22:12 ` [PATCHSET v24.0 0/3] fstests: refactor GETFSMAP stress tests Darrick J. Wong
2022-12-30 22:12 ` [PATCH 1/3] fuzzy: enhance scrub stress testing to use fsx Darrick J. Wong
2022-12-30 22:12 ` [PATCH 2/3] fuzzy: refactor fsmap stress test to use our helper functions Darrick J. Wong
@ 2022-12-30 22:12 ` Darrick J. Wong
2 siblings, 0 replies; 32+ messages in thread
From: Darrick J. Wong @ 2022-12-30 22:12 UTC (permalink / raw)
To: zlang, djwong; +Cc: linux-xfs, fstests, guan
From: Darrick J. Wong <djwong@kernel.org>
Add a new test that races the GETFSMAP ioctl with ro/rw remounting to
make sure we don't livelock on the empty transaction that fsmap uses to
avoid deadlocking on rmap btree cycles.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
common/fuzzy | 98 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
ltp/fsstress.c | 18 +++++++++-
tests/xfs/732 | 38 +++++++++++++++++++++
tests/xfs/732.out | 2 +
4 files changed, 153 insertions(+), 3 deletions(-)
create mode 100755 tests/xfs/732
create mode 100644 tests/xfs/732.out
diff --git a/common/fuzzy b/common/fuzzy
index 58e299d34b..ee97aa4298 100644
--- a/common/fuzzy
+++ b/common/fuzzy
@@ -429,6 +429,7 @@ __stress_scrub_clean_scratch() {
__stress_scrub_fsx_loop() {
local end="$1"
local runningfile="$2"
+ local remount_period="$3"
local focus=(-q -X) # quiet, validate file contents
# As of November 2022, 2 million fsx ops should be enough to keep
@@ -440,6 +441,43 @@ __stress_scrub_fsx_loop() {
local args="$FSX_AVOID ${focus[@]} ${SCRATCH_MNT}/fsx.$seq"
echo "Running $here/ltp/fsx $args" >> $seqres.full
+ if [ -n "$remount_period" ]; then
+ local mode="rw"
+ local rw_arg=""
+ while __stress_scrub_running "$end" "$runningfile"; do
+ # Need to recheck running conditions if we cleared
+ # anything.
+ test "$mode" = "rw" && __stress_scrub_clean_scratch && continue
+
+ timeout -s TERM "$remount_period" $here/ltp/fsx \
+ $args $rw_arg >> $seqres.full
+ res=$?
+ echo "$mode fsx exits with $res at $(date)" >> $seqres.full
+ if [ "$res" -ne 0 ] && [ "$res" -ne 124 ]; then
+ # Stop if fsstress returns error. Mask off
+ # the magic code 124 because that is how the
+ # timeout(1) program communicates that we ran
+ # out of time.
+ break;
+ fi
+ if [ "$mode" = "rw" ]; then
+ mode="ro"
+ rw_arg="-t 0 -w 0 -FHzCIJBE0"
+ else
+ mode="rw"
+ rw_arg=""
+ fi
+
+ # Try remounting until we get the result we wanted
+ while ! _scratch_remount "$mode" &>/dev/null && \
+ __stress_scrub_running "$end" "$runningfile"; do
+ sleep 0.2
+ done
+ done
+ rm -f "$runningfile"
+ return 0
+ fi
+
while __stress_scrub_running "$end" "$runningfile"; do
# Need to recheck running conditions if we cleared anything
__stress_scrub_clean_scratch && continue
@@ -453,12 +491,50 @@ __stress_scrub_fsx_loop() {
__stress_scrub_fsstress_loop() {
local end="$1"
local runningfile="$2"
+ local remount_period="$3"
# As of March 2022, 2 million fsstress ops should be enough to keep
# any filesystem busy for a couple of hours.
local args=$(_scale_fsstress_args -p 4 -d $SCRATCH_MNT -n 2000000 $FSSTRESS_AVOID)
echo "Running $FSSTRESS_PROG $args" >> $seqres.full
+ if [ -n "$remount_period" ]; then
+ local mode="rw"
+ local rw_arg=""
+ while __stress_scrub_running "$end" "$runningfile"; do
+ # Need to recheck running conditions if we cleared
+ # anything.
+ test "$mode" = "rw" && __stress_scrub_clean_scratch && continue
+
+ timeout -s TERM "$remount_period" $FSSTRESS_PROG \
+ $args $rw_arg >> $seqres.full
+ res=$?
+ echo "$mode fsstress exits with $res at $(date)" >> $seqres.full
+ if [ "$res" -ne 0 ] && [ "$res" -ne 124 ]; then
+ # Stop if fsstress returns error. Mask off
+ # the magic code 124 because that is how the
+ # timeout(1) program communicates that we ran
+ # out of time.
+ break;
+ fi
+ if [ "$mode" = "rw" ]; then
+ mode="ro"
+ rw_arg="-R"
+ else
+ mode="rw"
+ rw_arg=""
+ fi
+
+ # Try remounting until we get the result we wanted
+ while ! _scratch_remount "$mode" &>/dev/null && \
+ __stress_scrub_running "$end" "$runningfile"; do
+ sleep 0.2
+ done
+ done
+ rm -f "$runningfile"
+ return 0
+ fi
+
while __stress_scrub_running "$end" "$runningfile"; do
# Need to recheck running conditions if we cleared anything
__stress_scrub_clean_scratch && continue
@@ -526,6 +602,13 @@ _scratch_xfs_stress_scrub_cleanup() {
echo "Waiting for children to exit at $(date)" >> $seqres.full
wait
+ # Ensure the scratch fs is also writable before we exit.
+ if [ -n "$__SCRUB_STRESS_REMOUNT_LOOP" ]; then
+ echo "Remounting rw at $(date)" >> $seqres.full
+ _scratch_remount rw >> $seqres.full 2>&1
+ __SCRUB_STRESS_REMOUNT_LOOP=""
+ fi
+
echo "Cleanup finished at $(date)" >> $seqres.full
}
@@ -561,6 +644,9 @@ __stress_scrub_check_commands() {
# in a separate loop. If zero -i options are specified, do not run.
# Callers must check each of these commands (via _require_xfs_io_command)
# before calling here.
+# -r Run fsstress for this amount of time, then remount the fs ro or rw.
+# The default is to run fsstress continuously with no remount, unless
+# XFS_SCRUB_STRESS_REMOUNT_PERIOD is set.
# -s Pass this command to xfs_io to test scrub. If zero -s options are
# specified, xfs_io will not be run.
# -t Run online scrub against this file; $SCRATCH_MNT is the default.
@@ -577,16 +663,19 @@ _scratch_xfs_stress_scrub() {
local scrub_delay="${XFS_SCRUB_STRESS_DELAY:--1}"
local exerciser="fsstress"
local io_args=()
+ local remount_period="${XFS_SCRUB_STRESS_REMOUNT_PERIOD}"
__SCRUB_STRESS_FREEZE_PID=""
+ __SCRUB_STRESS_REMOUNT_LOOP=""
rm -f "$runningfile"
touch "$runningfile"
OPTIND=1
- while getopts "fi:s:t:w:X:" c; do
+ while getopts "fi:r:s:t:w:X:" c; do
case "$c" in
f) freeze=yes;;
i) io_args+=("$OPTARG");;
+ r) remount_period="$OPTARG";;
s) one_scrub_args+=("$OPTARG");;
t) scrub_tgt="$OPTARG";;
w) scrub_delay="$OPTARG";;
@@ -611,7 +700,12 @@ _scratch_xfs_stress_scrub() {
echo "Loop started at $(date --date="@${start}")," \
"ending at $(date --date="@${end}")" >> $seqres.full
- "__stress_scrub_${exerciser}_loop" "$end" "$runningfile" &
+ if [ -n "$remount_period" ]; then
+ __SCRUB_STRESS_REMOUNT_LOOP="1"
+ fi
+
+ "__stress_scrub_${exerciser}_loop" "$end" "$runningfile" \
+ "$remount_period" &
if [ -n "$freeze" ]; then
__stress_scrub_freeze_loop "$end" "$runningfile" &
diff --git a/ltp/fsstress.c b/ltp/fsstress.c
index b395bc4da2..10608fb554 100644
--- a/ltp/fsstress.c
+++ b/ltp/fsstress.c
@@ -426,6 +426,7 @@ int symlink_path(const char *, pathname_t *);
int truncate64_path(pathname_t *, off64_t);
int unlink_path(pathname_t *);
void usage(void);
+void read_freq(void);
void write_freq(void);
void zero_freq(void);
void non_btrfs_freq(const char *);
@@ -472,7 +473,7 @@ int main(int argc, char **argv)
xfs_error_injection_t err_inj;
struct sigaction action;
int loops = 1;
- const char *allopts = "cd:e:f:i:l:m:M:n:o:p:rs:S:vVwx:X:zH";
+ const char *allopts = "cd:e:f:i:l:m:M:n:o:p:rRs:S:vVwx:X:zH";
errrange = errtag = 0;
umask(0);
@@ -538,6 +539,9 @@ int main(int argc, char **argv)
case 'r':
namerand = 1;
break;
+ case 'R':
+ read_freq();
+ break;
case 's':
seed = strtoul(optarg, NULL, 0);
break;
@@ -1917,6 +1921,7 @@ usage(void)
printf(" -o logfile specifies logfile name\n");
printf(" -p nproc specifies the no. of processes (default 1)\n");
printf(" -r specifies random name padding\n");
+ printf(" -R zeros frequencies of write operations\n");
printf(" -s seed specifies the seed for the random generator (default random)\n");
printf(" -v specifies verbose mode\n");
printf(" -w zeros frequencies of non-write operations\n");
@@ -1928,6 +1933,17 @@ usage(void)
printf(" -H prints usage and exits\n");
}
+void
+read_freq(void)
+{
+ opdesc_t *p;
+
+ for (p = ops; p < ops_end; p++) {
+ if (p->iswrite)
+ p->freq = 0;
+ }
+}
+
void
write_freq(void)
{
diff --git a/tests/xfs/732 b/tests/xfs/732
new file mode 100755
index 0000000000..ed6fb3c977
--- /dev/null
+++ b/tests/xfs/732
@@ -0,0 +1,38 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0-or-later
+# Copyright (c) 2022 Oracle. All Rights Reserved.
+#
+# FS QA Test No. 732
+#
+# Race GETFSMAP and ro remount for a while to see if we crash or livelock.
+#
+. ./common/preamble
+_begin_fstest auto quick fsmap remount
+
+# Override the default cleanup function.
+_cleanup()
+{
+ cd /
+ _scratch_xfs_stress_scrub_cleanup
+ rm -rf $tmp.*
+}
+
+# Import common functions.
+. ./common/filter
+. ./common/fuzzy
+. ./common/xfs
+
+# real QA test starts here
+_supported_fs xfs
+_require_xfs_scratch_rmapbt
+_require_xfs_io_command "fsmap"
+_require_xfs_stress_scrub
+
+_scratch_mkfs > "$seqres.full" 2>&1
+_scratch_mount
+_scratch_xfs_stress_scrub -r 5 -i 'fsmap -v'
+
+# success, all done
+echo "Silence is golden"
+status=0
+exit
diff --git a/tests/xfs/732.out b/tests/xfs/732.out
new file mode 100644
index 0000000000..451f82ce2d
--- /dev/null
+++ b/tests/xfs/732.out
@@ -0,0 +1,2 @@
+QA output created by 732
+Silence is golden
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCHSET v24.0 0/2] fstests: race online scrub with mount state changes
2022-12-30 21:13 [NYE DELUGE 1/4] xfs: all pending online scrub improvements Darrick J. Wong
2022-12-30 22:12 ` [PATCHSET v24.0 00/16] fstests: refactor online fsck stress tests Darrick J. Wong
2022-12-30 22:12 ` [PATCHSET v24.0 0/3] fstests: refactor GETFSMAP stress tests Darrick J. Wong
@ 2022-12-30 22:13 ` Darrick J. Wong
2022-12-30 22:13 ` [PATCH 1/2] xfs: stress test xfs_scrub(8) with fsstress Darrick J. Wong
2022-12-30 22:13 ` [PATCH 2/2] xfs: stress test xfs_scrub(8) with freeze and ro-remount loops Darrick J. Wong
2023-01-13 20:10 ` [NYE DELUGE 1/4] xfs: all pending online scrub improvements Zorro Lang
3 siblings, 2 replies; 32+ messages in thread
From: Darrick J. Wong @ 2022-12-30 22:13 UTC (permalink / raw)
To: zlang, djwong; +Cc: linux-xfs, fstests, guan
Hi all,
Introduce the ability to run xfs_scrub(8) itself from our online fsck
stress test harness. Create two new tests to race scrub and repair
against fsstress, and four more tests to do the same but racing against
fs freeze and ro remounts.
If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.
This is an extraordinary way to destroy everything. Enjoy!
Comments and questions are, as always, welcome.
--D
fstests git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=race-scrub-and-mount-state-changes
---
common/fuzzy | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++---
tests/xfs/285 | 44 ++++++++++---------------------------
tests/xfs/285.out | 4 +--
tests/xfs/286 | 46 ++++++++++-----------------------------
tests/xfs/286.out | 4 +--
tests/xfs/733 | 39 +++++++++++++++++++++++++++++++++
tests/xfs/733.out | 2 ++
tests/xfs/771 | 39 +++++++++++++++++++++++++++++++++
tests/xfs/771.out | 2 ++
tests/xfs/824 | 40 ++++++++++++++++++++++++++++++++++
tests/xfs/824.out | 2 ++
tests/xfs/825 | 40 ++++++++++++++++++++++++++++++++++
tests/xfs/825.out | 2 ++
13 files changed, 252 insertions(+), 75 deletions(-)
create mode 100755 tests/xfs/733
create mode 100644 tests/xfs/733.out
create mode 100755 tests/xfs/771
create mode 100644 tests/xfs/771.out
create mode 100755 tests/xfs/824
create mode 100644 tests/xfs/824.out
create mode 100755 tests/xfs/825
create mode 100644 tests/xfs/825.out
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 1/2] xfs: stress test xfs_scrub(8) with fsstress
2022-12-30 22:13 ` [PATCHSET v24.0 0/2] fstests: race online scrub with mount state changes Darrick J. Wong
@ 2022-12-30 22:13 ` Darrick J. Wong
2022-12-30 22:13 ` [PATCH 2/2] xfs: stress test xfs_scrub(8) with freeze and ro-remount loops Darrick J. Wong
1 sibling, 0 replies; 32+ messages in thread
From: Darrick J. Wong @ 2022-12-30 22:13 UTC (permalink / raw)
To: zlang, djwong; +Cc: linux-xfs, fstests, guan
From: Darrick J. Wong <djwong@kernel.org>
Port the two existing tests that check that xfs_scrub(8) (aka the main
userspace driver program) doesn't clash with fsstress to use our new
framework.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
common/fuzzy | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++---
tests/xfs/285 | 44 ++++++++++---------------------------
tests/xfs/285.out | 4 +--
tests/xfs/286 | 46 ++++++++++-----------------------------
tests/xfs/286.out | 4 +--
5 files changed, 86 insertions(+), 75 deletions(-)
diff --git a/common/fuzzy b/common/fuzzy
index ee97aa4298..e39f787e78 100644
--- a/common/fuzzy
+++ b/common/fuzzy
@@ -411,6 +411,42 @@ __stress_one_scrub_loop() {
done
}
+# Run xfs_scrub online fsck in a tight loop.
+__stress_xfs_scrub_loop() {
+ local end="$1"
+ local runningfile="$2"
+ local scrub_startat="$3"
+ shift; shift; shift
+ local sigint_ret="$(( $(kill -l SIGINT) + 128 ))"
+ local scrublog="$tmp.scrub"
+
+ while __stress_scrub_running "$scrub_startat" "$runningfile"; do
+ sleep 1
+ done
+
+ while __stress_scrub_running "$end" "$runningfile"; do
+ _scratch_scrub "$@" &> $scrublog
+ res=$?
+ if [ "$res" -eq "$sigint_ret" ]; then
+ # Ignore SIGINT because the cleanup function sends
+ # that to terminate xfs_scrub
+ res=0
+ fi
+ echo "xfs_scrub exits with $res at $(date)" >> $seqres.full
+ if [ "$res" -ge 128 ]; then
+ # Report scrub death due to fatal signals
+ echo "xfs_scrub died with SIG$(kill -l $res)"
+ cat $scrublog >> $seqres.full 2>/dev/null
+ elif [ "$((res & 0x1))" -gt 0 ]; then
+ # Report uncorrected filesystem errors
+ echo "xfs_scrub reports uncorrected errors:"
+ grep -E '(Repair unsuccessful;|Corruption:)' $scrublog
+ cat $scrublog >> $seqres.full 2>/dev/null
+ fi
+ rm -f $scrublog
+ done
+}
+
# Clean the scratch filesystem between rounds of fsstress if there is 2%
# available space or less because that isn't an interesting stress test.
#
@@ -571,7 +607,7 @@ _scratch_xfs_stress_scrub_cleanup() {
# Send SIGINT so that bash won't print a 'Terminated' message that
# distorts the golden output.
echo "Killing stressor processes at $(date)" >> $seqres.full
- $KILLALL_PROG -INT xfs_io fsstress fsx >> $seqres.full 2>&1
+ $KILLALL_PROG -INT xfs_io fsstress fsx xfs_scrub >> $seqres.full 2>&1
# Tests are not allowed to exit with the scratch fs frozen. If we
# started a fs freeze/thaw background loop, wait for that loop to exit
@@ -649,6 +685,8 @@ __stress_scrub_check_commands() {
# XFS_SCRUB_STRESS_REMOUNT_PERIOD is set.
# -s Pass this command to xfs_io to test scrub. If zero -s options are
# specified, xfs_io will not be run.
+# -S Pass this option to xfs_scrub. If zero -S options are specified,
+# xfs_scrub will not be run. To select repair mode, pass '-k' or '-v'.
# -t Run online scrub against this file; $SCRATCH_MNT is the default.
# -w Delay the start of the scrub/repair loop by this number of seconds.
# Defaults to no delay unless XFS_SCRUB_STRESS_DELAY is set. This value
@@ -657,6 +695,7 @@ __stress_scrub_check_commands() {
# options are 'fsx' and 'fsstress'. The default is 'fsstress'.
_scratch_xfs_stress_scrub() {
local one_scrub_args=()
+ local xfs_scrub_args=()
local scrub_tgt="$SCRATCH_MNT"
local runningfile="$tmp.fsstress"
local freeze="${XFS_SCRUB_STRESS_FREEZE}"
@@ -671,12 +710,13 @@ _scratch_xfs_stress_scrub() {
touch "$runningfile"
OPTIND=1
- while getopts "fi:r:s:t:w:X:" c; do
+ while getopts "fi:r:s:S:t:w:X:" c; do
case "$c" in
f) freeze=yes;;
i) io_args+=("$OPTARG");;
r) remount_period="$OPTARG";;
s) one_scrub_args+=("$OPTARG");;
+ S) xfs_scrub_args+=("$OPTARG");;
t) scrub_tgt="$OPTARG";;
w) scrub_delay="$OPTARG";;
X) exerciser="$OPTARG";;
@@ -691,6 +731,18 @@ _scratch_xfs_stress_scrub() {
return 1
fi
+ if [ "${#xfs_scrub_args[@]}" -gt 0 ]; then
+ _scratch_scrub "${xfs_scrub_args[@]}" &> "$tmp.scrub"
+ res=$?
+ if [ $res -ne 0 ]; then
+ echo "xfs_scrub ${xfs_scrub_args[@]} failed, err $res" >> $seqres.full
+ cat "$tmp.scrub" >> $seqres.full
+ rm -f "$tmp.scrub"
+ _notrun 'scrub not supported on scratch filesystem'
+ fi
+ rm -f "$tmp.scrub"
+ fi
+
local start="$(date +%s)"
local end="$((start + (30 * TIME_FACTOR) ))"
local scrub_startat="$((start + scrub_delay))"
@@ -722,6 +774,11 @@ _scratch_xfs_stress_scrub() {
"$scrub_startat" "${one_scrub_args[@]}" &
fi
+ if [ "${#xfs_scrub_args[@]}" -gt 0 ]; then
+ __stress_xfs_scrub_loop "$end" "$runningfile" "$scrub_startat" \
+ "${xfs_scrub_args[@]}" &
+ fi
+
# Wait until the designated end time or fsstress dies, then kill all of
# our background processes.
while __stress_scrub_running "$end" "$runningfile"; do
@@ -741,5 +798,5 @@ _scratch_xfs_stress_scrub() {
# Same requirements and arguments as _scratch_xfs_stress_scrub.
_scratch_xfs_stress_online_repair() {
$XFS_IO_PROG -x -c 'inject force_repair' $SCRATCH_MNT
- _scratch_xfs_stress_scrub "$@"
+ XFS_SCRUB_FORCE_REPAIR=1 _scratch_xfs_stress_scrub "$@"
}
diff --git a/tests/xfs/285 b/tests/xfs/285
index 711211d412..0056baeb1c 100755
--- a/tests/xfs/285
+++ b/tests/xfs/285
@@ -4,55 +4,35 @@
#
# FS QA Test No. 285
#
-# Race fio and xfs_scrub for a while to see if we crash or livelock.
+# Race fsstress and xfs_scrub in read-only mode for a while to see if we crash
+# or livelock.
#
. ./common/preamble
-_begin_fstest dangerous_fuzzers dangerous_scrub
+_begin_fstest scrub dangerous_fsstress_scrub
+_cleanup() {
+ cd /
+ _scratch_xfs_stress_scrub_cleanup &> /dev/null
+ rm -r -f $tmp.*
+}
_register_cleanup "_cleanup" BUS
# Import common functions.
. ./common/filter
. ./common/fuzzy
. ./common/inject
+. ./common/xfs
# real QA test starts here
_supported_fs xfs
-_require_test_program "feature"
-_require_command "$KILLALL_PROG" killall
-_require_command "$TIMEOUT_PROG" timeout
-_require_scrub
_require_scratch
+_require_xfs_stress_scrub
-echo "Format and populate"
_scratch_mkfs > "$seqres.full" 2>&1
_scratch_mount
-
-STRESS_DIR="$SCRATCH_MNT/testdir"
-mkdir -p $STRESS_DIR
-
-cpus=$(( $($here/src/feature -o) * 4 * LOAD_FACTOR))
-$FSSTRESS_PROG -d $STRESS_DIR -p $cpus -n $((cpus * 100000)) $FSSTRESS_AVOID >/dev/null 2>&1 &
-$XFS_SCRUB_PROG -d -T -v -n $SCRATCH_MNT >> $seqres.full
-
-killstress() {
- sleep $(( 60 * TIME_FACTOR ))
- $KILLALL_PROG -q $FSSTRESS_PROG
-}
-
-echo "Concurrent scrub"
-start=$(date +%s)
-end=$((start + (60 * TIME_FACTOR) ))
-killstress &
-echo "Scrub started at $(date --date="@${start}"), ending at $(date --date="@${end}")" >> $seqres.full
-while [ "$(date +%s)" -lt "$end" ]; do
- $TIMEOUT_PROG -s TERM $(( end - $(date +%s) + 2 )) $XFS_SCRUB_PROG -d -T -v -n $SCRATCH_MNT >> $seqres.full 2>&1
-done
-
-echo "Test done"
-echo "Scrub finished at $(date)" >> $seqres.full
-$KILLALL_PROG -q $FSSTRESS_PROG
+_scratch_xfs_stress_scrub -S '-n'
# success, all done
+echo Silence is golden
status=0
exit
diff --git a/tests/xfs/285.out b/tests/xfs/285.out
index be6b49a9fb..ab12da9ae7 100644
--- a/tests/xfs/285.out
+++ b/tests/xfs/285.out
@@ -1,4 +1,2 @@
QA output created by 285
-Format and populate
-Concurrent scrub
-Test done
+Silence is golden
diff --git a/tests/xfs/286 b/tests/xfs/286
index 7edc9c427b..0f61a924db 100755
--- a/tests/xfs/286
+++ b/tests/xfs/286
@@ -4,57 +4,35 @@
#
# FS QA Test No. 286
#
-# Race fio and xfs_scrub for a while to see if we crash or livelock.
+# Race fsstress and xfs_scrub in force-repair mode for a while to see if we
+# crash or livelock.
#
. ./common/preamble
-_begin_fstest dangerous_fuzzers dangerous_scrub dangerous_online_repair
+_begin_fstest online_repair dangerous_fsstress_repair
+_cleanup() {
+ cd /
+ _scratch_xfs_stress_scrub_cleanup &> /dev/null
+ rm -r -f $tmp.*
+}
_register_cleanup "_cleanup" BUS
# Import common functions.
. ./common/filter
. ./common/fuzzy
. ./common/inject
+. ./common/xfs
# real QA test starts here
_supported_fs xfs
-_require_test_program "feature"
-_require_command "$KILLALL_PROG" killall
-_require_command "$TIMEOUT_PROG" timeout
-_require_scrub
_require_scratch
-# xfs_scrub will turn on error injection itself
-_require_xfs_io_error_injection "force_repair"
+_require_xfs_stress_online_repair
-echo "Format and populate"
_scratch_mkfs > "$seqres.full" 2>&1
_scratch_mount
-
-STRESS_DIR="$SCRATCH_MNT/testdir"
-mkdir -p $STRESS_DIR
-
-cpus=$(( $($here/src/feature -o) * 4 * LOAD_FACTOR))
-$FSSTRESS_PROG -d $STRESS_DIR -p $cpus -n $((cpus * 100000)) $FSSTRESS_AVOID >/dev/null 2>&1 &
-$XFS_SCRUB_PROG -d -T -v -n $SCRATCH_MNT >> $seqres.full
-
-killstress() {
- sleep $(( 60 * TIME_FACTOR ))
- $KILLALL_PROG -q $FSSTRESS_PROG
-}
-
-echo "Concurrent repair"
-start=$(date +%s)
-end=$((start + (60 * TIME_FACTOR) ))
-killstress &
-echo "Repair started at $(date --date="@${start}"), ending at $(date --date="@${end}")" >> $seqres.full
-while [ "$(date +%s)" -lt "$end" ]; do
- XFS_SCRUB_FORCE_REPAIR=1 $TIMEOUT_PROG -s TERM $(( end - $(date +%s) + 2 )) $XFS_SCRUB_PROG -d -T -v $SCRATCH_MNT >> $seqres.full
-done
-
-echo "Test done"
-echo "Repair finished at $(date)" >> $seqres.full
-$KILLALL_PROG -q $FSSTRESS_PROG
+_scratch_xfs_stress_online_repair -S '-k'
# success, all done
+echo Silence is golden
status=0
exit
diff --git a/tests/xfs/286.out b/tests/xfs/286.out
index 80e12b5495..35c4800694 100644
--- a/tests/xfs/286.out
+++ b/tests/xfs/286.out
@@ -1,4 +1,2 @@
QA output created by 286
-Format and populate
-Concurrent repair
-Test done
+Silence is golden
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 2/2] xfs: stress test xfs_scrub(8) with freeze and ro-remount loops
2022-12-30 22:13 ` [PATCHSET v24.0 0/2] fstests: race online scrub with mount state changes Darrick J. Wong
2022-12-30 22:13 ` [PATCH 1/2] xfs: stress test xfs_scrub(8) with fsstress Darrick J. Wong
@ 2022-12-30 22:13 ` Darrick J. Wong
1 sibling, 0 replies; 32+ messages in thread
From: Darrick J. Wong @ 2022-12-30 22:13 UTC (permalink / raw)
To: zlang, djwong; +Cc: linux-xfs, fstests, guan
From: Darrick J. Wong <djwong@kernel.org>
Make sure we don't trip over any asserts or livelock when scrub races
with filesystem freezing and readonly remounts.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
tests/xfs/733 | 39 +++++++++++++++++++++++++++++++++++++++
tests/xfs/733.out | 2 ++
tests/xfs/771 | 39 +++++++++++++++++++++++++++++++++++++++
tests/xfs/771.out | 2 ++
tests/xfs/824 | 40 ++++++++++++++++++++++++++++++++++++++++
tests/xfs/824.out | 2 ++
tests/xfs/825 | 40 ++++++++++++++++++++++++++++++++++++++++
tests/xfs/825.out | 2 ++
8 files changed, 166 insertions(+)
create mode 100755 tests/xfs/733
create mode 100644 tests/xfs/733.out
create mode 100755 tests/xfs/771
create mode 100644 tests/xfs/771.out
create mode 100755 tests/xfs/824
create mode 100644 tests/xfs/824.out
create mode 100755 tests/xfs/825
create mode 100644 tests/xfs/825.out
diff --git a/tests/xfs/733 b/tests/xfs/733
new file mode 100755
index 0000000000..ee9a0a26ee
--- /dev/null
+++ b/tests/xfs/733
@@ -0,0 +1,39 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0-or-later
+# Copyright (c) 2022 Oracle. All Rights Reserved.
+#
+# FS QA Test No. 733
+#
+# Race xfs_scrub in check-only mode and ro remount for a while to see if we
+# crash or livelock.
+#
+. ./common/preamble
+_begin_fstest scrub dangerous_fsstress_scrub
+
+# Override the default cleanup function.
+_cleanup()
+{
+ cd /
+ _scratch_xfs_stress_scrub_cleanup &> /dev/null
+ _scratch_remount rw
+ rm -rf $tmp.*
+}
+
+# Import common functions.
+. ./common/filter
+. ./common/fuzzy
+. ./common/xfs
+
+# real QA test starts here
+_supported_fs xfs
+_require_scratch
+_require_xfs_stress_scrub
+
+_scratch_mkfs > "$seqres.full" 2>&1
+_scratch_mount
+_scratch_xfs_stress_scrub -r 5 -S '-n'
+
+# success, all done
+echo "Silence is golden"
+status=0
+exit
diff --git a/tests/xfs/733.out b/tests/xfs/733.out
new file mode 100644
index 0000000000..7118d5ddf0
--- /dev/null
+++ b/tests/xfs/733.out
@@ -0,0 +1,2 @@
+QA output created by 733
+Silence is golden
diff --git a/tests/xfs/771 b/tests/xfs/771
new file mode 100755
index 0000000000..8c8d124f12
--- /dev/null
+++ b/tests/xfs/771
@@ -0,0 +1,39 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0-or-later
+# Copyright (c) 2022 Oracle. All Rights Reserved.
+#
+# FS QA Test No. 771
+#
+# Race xfs_scrub in check-only mode and freeze for a while to see if we crash
+# or livelock.
+#
+. ./common/preamble
+_begin_fstest scrub dangerous_fsstress_scrub
+
+# Override the default cleanup function.
+_cleanup()
+{
+ cd /
+ _scratch_xfs_stress_scrub_cleanup &> /dev/null
+ _scratch_remount rw
+ rm -rf $tmp.*
+}
+
+# Import common functions.
+. ./common/filter
+. ./common/fuzzy
+. ./common/xfs
+
+# real QA test starts here
+_supported_fs xfs
+_require_scratch
+_require_xfs_stress_scrub
+
+_scratch_mkfs > "$seqres.full" 2>&1
+_scratch_mount
+_scratch_xfs_stress_scrub -f -S '-n'
+
+# success, all done
+echo "Silence is golden"
+status=0
+exit
diff --git a/tests/xfs/771.out b/tests/xfs/771.out
new file mode 100644
index 0000000000..c2345c7be3
--- /dev/null
+++ b/tests/xfs/771.out
@@ -0,0 +1,2 @@
+QA output created by 771
+Silence is golden
diff --git a/tests/xfs/824 b/tests/xfs/824
new file mode 100755
index 0000000000..65eeb3a6c9
--- /dev/null
+++ b/tests/xfs/824
@@ -0,0 +1,40 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0-or-later
+# Copyright (c) 2022 Oracle. All Rights Reserved.
+#
+# FS QA Test No. 824
+#
+# Race xfs_scrub in force-repair mdoe and freeze for a while to see if we crash
+# or livelock.
+#
+. ./common/preamble
+_begin_fstest online_repair dangerous_fsstress_repair
+
+# Override the default cleanup function.
+_cleanup()
+{
+ cd /
+ _scratch_xfs_stress_scrub_cleanup &> /dev/null
+ _scratch_remount rw
+ rm -rf $tmp.*
+}
+
+# Import common functions.
+. ./common/filter
+. ./common/fuzzy
+. ./common/xfs
+. ./common/inject
+
+# real QA test starts here
+_supported_fs xfs
+_require_scratch
+_require_xfs_stress_online_repair
+
+_scratch_mkfs > "$seqres.full" 2>&1
+_scratch_mount
+_scratch_xfs_stress_online_repair -f -S '-k'
+
+# success, all done
+echo "Silence is golden"
+status=0
+exit
diff --git a/tests/xfs/824.out b/tests/xfs/824.out
new file mode 100644
index 0000000000..6cf432abbd
--- /dev/null
+++ b/tests/xfs/824.out
@@ -0,0 +1,2 @@
+QA output created by 824
+Silence is golden
diff --git a/tests/xfs/825 b/tests/xfs/825
new file mode 100755
index 0000000000..80ce06932d
--- /dev/null
+++ b/tests/xfs/825
@@ -0,0 +1,40 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0-or-later
+# Copyright (c) 2022 Oracle. All Rights Reserved.
+#
+# FS QA Test No. 825
+#
+# Race xfs_scrub in force-repair mode and ro remount for a while to see if we
+# crash or livelock.
+#
+. ./common/preamble
+_begin_fstest online_repair dangerous_fsstress_repair
+
+# Override the default cleanup function.
+_cleanup()
+{
+ cd /
+ _scratch_xfs_stress_scrub_cleanup &> /dev/null
+ _scratch_remount rw
+ rm -rf $tmp.*
+}
+
+# Import common functions.
+. ./common/filter
+. ./common/fuzzy
+. ./common/xfs
+. ./common/inject
+
+# real QA test starts here
+_supported_fs xfs
+_require_scratch
+_require_xfs_stress_online_repair
+
+_scratch_mkfs > "$seqres.full" 2>&1
+_scratch_mount
+_scratch_xfs_stress_online_repair -r 5 -S '-k'
+
+# success, all done
+echo "Silence is golden"
+status=0
+exit
diff --git a/tests/xfs/825.out b/tests/xfs/825.out
new file mode 100644
index 0000000000..d0e970dfd6
--- /dev/null
+++ b/tests/xfs/825.out
@@ -0,0 +1,2 @@
+QA output created by 825
+Silence is golden
^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH 1/3] fuzzy: enhance scrub stress testing to use fsx
2022-12-30 22:12 ` [PATCH 1/3] fuzzy: enhance scrub stress testing to use fsx Darrick J. Wong
@ 2023-01-05 5:49 ` Zorro Lang
2023-01-05 18:28 ` Darrick J. Wong
2023-01-05 18:28 ` [PATCH v24.1 " Darrick J. Wong
1 sibling, 1 reply; 32+ messages in thread
From: Zorro Lang @ 2023-01-05 5:49 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: linux-xfs, fstests
On Fri, Dec 30, 2022 at 02:12:57PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
>
> Add a couple of new online fsck stress tests that race fsx against
> online fsck.
>
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
> common/fuzzy | 39 ++++++++++++++++++++++++++++++++++++---
> tests/xfs/847 | 38 ++++++++++++++++++++++++++++++++++++++
> tests/xfs/847.out | 2 ++
> tests/xfs/848 | 38 ++++++++++++++++++++++++++++++++++++++
> tests/xfs/848.out | 2 ++
> 5 files changed, 116 insertions(+), 3 deletions(-)
> create mode 100755 tests/xfs/847
> create mode 100644 tests/xfs/847.out
> create mode 100755 tests/xfs/848
> create mode 100644 tests/xfs/848.out
>
>
> diff --git a/common/fuzzy b/common/fuzzy
> index 1df51a6dd8..3512e95e02 100644
> --- a/common/fuzzy
> +++ b/common/fuzzy
> @@ -408,6 +408,30 @@ __stress_scrub_clean_scratch() {
> return 0
> }
>
> +# Run fsx while we're testing online fsck.
> +__stress_scrub_fsx_loop() {
> + local end="$1"
> + local runningfile="$2"
> + local focus=(-q -X) # quiet, validate file contents
> +
> + # As of November 2022, 2 million fsx ops should be enough to keep
> + # any filesystem busy for a couple of hours.
> + focus+=(-N 2000000)
> + focus+=(-o $((128000 * LOAD_FACTOR)) )
> + focus+=(-l $((600000 * LOAD_FACTOR)) )
> +
> + local args="$FSX_AVOID ${focus[@]} ${SCRATCH_MNT}/fsx.$seq"
> + echo "Running $here/ltp/fsx $args" >> $seqres.full
> +
> + while __stress_scrub_running "$end" "$runningfile"; do
> + # Need to recheck running conditions if we cleared anything
> + __stress_scrub_clean_scratch && continue
> + $here/ltp/fsx $args >> $seqres.full
> + echo "fsx exits with $? at $(date)" >> $seqres.full
> + done
> + rm -f "$runningfile"
> +}
> +
> # Run fsstress while we're testing online fsck.
> __stress_scrub_fsstress_loop() {
> local end="$1"
> @@ -454,7 +478,7 @@ _scratch_xfs_stress_scrub_cleanup() {
> # Send SIGINT so that bash won't print a 'Terminated' message that
> # distorts the golden output.
> echo "Killing stressor processes at $(date)" >> $seqres.full
> - $KILLALL_PROG -INT xfs_io fsstress >> $seqres.full 2>&1
> + $KILLALL_PROG -INT xfs_io fsstress fsx >> $seqres.full 2>&1
>
> # Tests are not allowed to exit with the scratch fs frozen. If we
> # started a fs freeze/thaw background loop, wait for that loop to exit
> @@ -522,30 +546,39 @@ __stress_scrub_check_commands() {
> # -w Delay the start of the scrub/repair loop by this number of seconds.
> # Defaults to no delay unless XFS_SCRUB_STRESS_DELAY is set. This value
> # will be clamped to ten seconds before the end time.
> +# -X Run this program to exercise the filesystem. Currently supported
> +# options are 'fsx' and 'fsstress'. The default is 'fsstress'.
> _scratch_xfs_stress_scrub() {
> local one_scrub_args=()
> local scrub_tgt="$SCRATCH_MNT"
> local runningfile="$tmp.fsstress"
> local freeze="${XFS_SCRUB_STRESS_FREEZE}"
> local scrub_delay="${XFS_SCRUB_STRESS_DELAY:--1}"
> + local exerciser="fsstress"
>
> __SCRUB_STRESS_FREEZE_PID=""
> rm -f "$runningfile"
> touch "$runningfile"
>
> OPTIND=1
> - while getopts "fs:t:w:" c; do
> + while getopts "fs:t:w:X:" c; do
> case "$c" in
> f) freeze=yes;;
> s) one_scrub_args+=("$OPTARG");;
> t) scrub_tgt="$OPTARG";;
> w) scrub_delay="$OPTARG";;
> + X) exerciser="$OPTARG";;
> *) return 1; ;;
> esac
> done
>
> __stress_scrub_check_commands "$scrub_tgt" "${one_scrub_args[@]}"
>
> + if ! command -v "__stress_scrub_${exerciser}_loop" &>/dev/null; then
> + echo "${exerciser}: Unknown fs exercise program."
> + return 1
> + fi
> +
> local start="$(date +%s)"
> local end="$((start + (30 * TIME_FACTOR) ))"
> local scrub_startat="$((start + scrub_delay))"
> @@ -555,7 +588,7 @@ _scratch_xfs_stress_scrub() {
> echo "Loop started at $(date --date="@${start}")," \
> "ending at $(date --date="@${end}")" >> $seqres.full
>
> - __stress_scrub_fsstress_loop "$end" "$runningfile" &
> + "__stress_scrub_${exerciser}_loop" "$end" "$runningfile" &
>
> if [ -n "$freeze" ]; then
> __stress_scrub_freeze_loop "$end" "$runningfile" &
> diff --git a/tests/xfs/847 b/tests/xfs/847
> new file mode 100755
> index 0000000000..856e9a6c26
> --- /dev/null
> +++ b/tests/xfs/847
> @@ -0,0 +1,38 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (c) 2022 Oracle, Inc. All Rights Reserved.
> +#
> +# FS QA Test No. 847
> +#
> +# Race fsx and xfs_scrub in read-only mode for a while to see if we crash
> +# or livelock.
> +#
> +. ./common/preamble
> +_begin_fstest scrub dangerous_fsstress_scrub
Hi Darrick,
Such huge patchsets :) I'll try to review them one by one (patchset).
Now I'm trying to review "[NYE DELUGE 1/4]", but I can't find the
"dangerous_fsstress_scrub" group in the whole patchsets. Is there any
prepositive patch(set)? Or you'd like to use "dangerous_fsstress_repair"?
P.S: More cases use "dangerous_fsstress_scrub" in your new patchsets.
Thanks,
Zorro
> +
> +_cleanup() {
> + cd /
> + _scratch_xfs_stress_scrub_cleanup &> /dev/null
> + rm -r -f $tmp.*
> +}
> +_register_cleanup "_cleanup" BUS
> +
> +# Import common functions.
> +. ./common/filter
> +. ./common/fuzzy
> +. ./common/inject
> +. ./common/xfs
> +
> +# real QA test starts here
> +_supported_fs xfs
> +_require_scratch
> +_require_xfs_stress_scrub
> +
> +_scratch_mkfs > "$seqres.full" 2>&1
> +_scratch_mount
> +_scratch_xfs_stress_scrub -S '-n' -X 'fsx'
> +
> +# success, all done
> +echo Silence is golden
> +status=0
> +exit
> diff --git a/tests/xfs/847.out b/tests/xfs/847.out
> new file mode 100644
> index 0000000000..b7041db159
> --- /dev/null
> +++ b/tests/xfs/847.out
> @@ -0,0 +1,2 @@
> +QA output created by 847
> +Silence is golden
> diff --git a/tests/xfs/848 b/tests/xfs/848
> new file mode 100755
> index 0000000000..ab32020624
> --- /dev/null
> +++ b/tests/xfs/848
> @@ -0,0 +1,38 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (c) 2022 Oracle, Inc. All Rights Reserved.
> +#
> +# FS QA Test No. 848
> +#
> +# Race fsx and xfs_scrub in force-repair mode for a while to see if we
> +# crash or livelock.
> +#
> +. ./common/preamble
> +_begin_fstest online_repair dangerous_fsstress_repair
> +
> +_cleanup() {
> + cd /
> + _scratch_xfs_stress_scrub_cleanup &> /dev/null
> + rm -r -f $tmp.*
> +}
> +_register_cleanup "_cleanup" BUS
> +
> +# Import common functions.
> +. ./common/filter
> +. ./common/fuzzy
> +. ./common/inject
> +. ./common/xfs
> +
> +# real QA test starts here
> +_supported_fs xfs
> +_require_scratch
> +_require_xfs_stress_online_repair
> +
> +_scratch_mkfs > "$seqres.full" 2>&1
> +_scratch_mount
> +_scratch_xfs_stress_online_repair -S '-k' -X 'fsx'
> +
> +# success, all done
> +echo Silence is golden
> +status=0
> +exit
> diff --git a/tests/xfs/848.out b/tests/xfs/848.out
> new file mode 100644
> index 0000000000..23f674045c
> --- /dev/null
> +++ b/tests/xfs/848.out
> @@ -0,0 +1,2 @@
> +QA output created by 848
> +Silence is golden
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 1/3] fuzzy: enhance scrub stress testing to use fsx
2023-01-05 5:49 ` Zorro Lang
@ 2023-01-05 18:28 ` Darrick J. Wong
0 siblings, 0 replies; 32+ messages in thread
From: Darrick J. Wong @ 2023-01-05 18:28 UTC (permalink / raw)
To: Zorro Lang; +Cc: linux-xfs, fstests
On Thu, Jan 05, 2023 at 01:49:20PM +0800, Zorro Lang wrote:
> On Fri, Dec 30, 2022 at 02:12:57PM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> >
> > Add a couple of new online fsck stress tests that race fsx against
> > online fsck.
> >
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > ---
> > common/fuzzy | 39 ++++++++++++++++++++++++++++++++++++---
> > tests/xfs/847 | 38 ++++++++++++++++++++++++++++++++++++++
> > tests/xfs/847.out | 2 ++
> > tests/xfs/848 | 38 ++++++++++++++++++++++++++++++++++++++
> > tests/xfs/848.out | 2 ++
> > 5 files changed, 116 insertions(+), 3 deletions(-)
> > create mode 100755 tests/xfs/847
> > create mode 100644 tests/xfs/847.out
> > create mode 100755 tests/xfs/848
> > create mode 100644 tests/xfs/848.out
> >
> >
> > diff --git a/common/fuzzy b/common/fuzzy
> > index 1df51a6dd8..3512e95e02 100644
> > --- a/common/fuzzy
> > +++ b/common/fuzzy
> > @@ -408,6 +408,30 @@ __stress_scrub_clean_scratch() {
> > return 0
> > }
> >
> > +# Run fsx while we're testing online fsck.
> > +__stress_scrub_fsx_loop() {
> > + local end="$1"
> > + local runningfile="$2"
> > + local focus=(-q -X) # quiet, validate file contents
> > +
> > + # As of November 2022, 2 million fsx ops should be enough to keep
> > + # any filesystem busy for a couple of hours.
> > + focus+=(-N 2000000)
> > + focus+=(-o $((128000 * LOAD_FACTOR)) )
> > + focus+=(-l $((600000 * LOAD_FACTOR)) )
> > +
> > + local args="$FSX_AVOID ${focus[@]} ${SCRATCH_MNT}/fsx.$seq"
> > + echo "Running $here/ltp/fsx $args" >> $seqres.full
> > +
> > + while __stress_scrub_running "$end" "$runningfile"; do
> > + # Need to recheck running conditions if we cleared anything
> > + __stress_scrub_clean_scratch && continue
> > + $here/ltp/fsx $args >> $seqres.full
> > + echo "fsx exits with $? at $(date)" >> $seqres.full
> > + done
> > + rm -f "$runningfile"
> > +}
> > +
> > # Run fsstress while we're testing online fsck.
> > __stress_scrub_fsstress_loop() {
> > local end="$1"
> > @@ -454,7 +478,7 @@ _scratch_xfs_stress_scrub_cleanup() {
> > # Send SIGINT so that bash won't print a 'Terminated' message that
> > # distorts the golden output.
> > echo "Killing stressor processes at $(date)" >> $seqres.full
> > - $KILLALL_PROG -INT xfs_io fsstress >> $seqres.full 2>&1
> > + $KILLALL_PROG -INT xfs_io fsstress fsx >> $seqres.full 2>&1
> >
> > # Tests are not allowed to exit with the scratch fs frozen. If we
> > # started a fs freeze/thaw background loop, wait for that loop to exit
> > @@ -522,30 +546,39 @@ __stress_scrub_check_commands() {
> > # -w Delay the start of the scrub/repair loop by this number of seconds.
> > # Defaults to no delay unless XFS_SCRUB_STRESS_DELAY is set. This value
> > # will be clamped to ten seconds before the end time.
> > +# -X Run this program to exercise the filesystem. Currently supported
> > +# options are 'fsx' and 'fsstress'. The default is 'fsstress'.
> > _scratch_xfs_stress_scrub() {
> > local one_scrub_args=()
> > local scrub_tgt="$SCRATCH_MNT"
> > local runningfile="$tmp.fsstress"
> > local freeze="${XFS_SCRUB_STRESS_FREEZE}"
> > local scrub_delay="${XFS_SCRUB_STRESS_DELAY:--1}"
> > + local exerciser="fsstress"
> >
> > __SCRUB_STRESS_FREEZE_PID=""
> > rm -f "$runningfile"
> > touch "$runningfile"
> >
> > OPTIND=1
> > - while getopts "fs:t:w:" c; do
> > + while getopts "fs:t:w:X:" c; do
> > case "$c" in
> > f) freeze=yes;;
> > s) one_scrub_args+=("$OPTARG");;
> > t) scrub_tgt="$OPTARG";;
> > w) scrub_delay="$OPTARG";;
> > + X) exerciser="$OPTARG";;
> > *) return 1; ;;
> > esac
> > done
> >
> > __stress_scrub_check_commands "$scrub_tgt" "${one_scrub_args[@]}"
> >
> > + if ! command -v "__stress_scrub_${exerciser}_loop" &>/dev/null; then
> > + echo "${exerciser}: Unknown fs exercise program."
> > + return 1
> > + fi
> > +
> > local start="$(date +%s)"
> > local end="$((start + (30 * TIME_FACTOR) ))"
> > local scrub_startat="$((start + scrub_delay))"
> > @@ -555,7 +588,7 @@ _scratch_xfs_stress_scrub() {
> > echo "Loop started at $(date --date="@${start}")," \
> > "ending at $(date --date="@${end}")" >> $seqres.full
> >
> > - __stress_scrub_fsstress_loop "$end" "$runningfile" &
> > + "__stress_scrub_${exerciser}_loop" "$end" "$runningfile" &
> >
> > if [ -n "$freeze" ]; then
> > __stress_scrub_freeze_loop "$end" "$runningfile" &
> > diff --git a/tests/xfs/847 b/tests/xfs/847
> > new file mode 100755
> > index 0000000000..856e9a6c26
> > --- /dev/null
> > +++ b/tests/xfs/847
> > @@ -0,0 +1,38 @@
> > +#! /bin/bash
> > +# SPDX-License-Identifier: GPL-2.0
> > +# Copyright (c) 2022 Oracle, Inc. All Rights Reserved.
> > +#
> > +# FS QA Test No. 847
> > +#
> > +# Race fsx and xfs_scrub in read-only mode for a while to see if we crash
> > +# or livelock.
> > +#
> > +. ./common/preamble
> > +_begin_fstest scrub dangerous_fsstress_scrub
>
> Hi Darrick,
>
> Such huge patchsets :) I'll try to review them one by one (patchset).
>
> Now I'm trying to review "[NYE DELUGE 1/4]", but I can't find the
> "dangerous_fsstress_scrub" group in the whole patchsets. Is there any
> prepositive patch(set)? Or you'd like to use "dangerous_fsstress_repair"?
>
> P.S: More cases use "dangerous_fsstress_scrub" in your new patchsets.
Oops. The group was originally added in "xfs: race fsstress with online
scrubbers for AG and fs metadata". Then I created a few more patches at
the top of my stack, tested that, and then decided that their proper
placement was closer to the bottom than the patch that added the group.
Ok, I'll modify the build system to shellcheck any bash scripts in the
current commit (because running it on the full repo took hours and
produced many hundreds of errors, mostly in tests/btrfs/) and go do a
push-and-build of all three stgit repos.
--D
> Thanks,
> Zorro
>
> > +
> > +_cleanup() {
> > + cd /
> > + _scratch_xfs_stress_scrub_cleanup &> /dev/null
> > + rm -r -f $tmp.*
> > +}
> > +_register_cleanup "_cleanup" BUS
> > +
> > +# Import common functions.
> > +. ./common/filter
> > +. ./common/fuzzy
> > +. ./common/inject
> > +. ./common/xfs
> > +
> > +# real QA test starts here
> > +_supported_fs xfs
> > +_require_scratch
> > +_require_xfs_stress_scrub
> > +
> > +_scratch_mkfs > "$seqres.full" 2>&1
> > +_scratch_mount
> > +_scratch_xfs_stress_scrub -S '-n' -X 'fsx'
> > +
> > +# success, all done
> > +echo Silence is golden
> > +status=0
> > +exit
> > diff --git a/tests/xfs/847.out b/tests/xfs/847.out
> > new file mode 100644
> > index 0000000000..b7041db159
> > --- /dev/null
> > +++ b/tests/xfs/847.out
> > @@ -0,0 +1,2 @@
> > +QA output created by 847
> > +Silence is golden
> > diff --git a/tests/xfs/848 b/tests/xfs/848
> > new file mode 100755
> > index 0000000000..ab32020624
> > --- /dev/null
> > +++ b/tests/xfs/848
> > @@ -0,0 +1,38 @@
> > +#! /bin/bash
> > +# SPDX-License-Identifier: GPL-2.0
> > +# Copyright (c) 2022 Oracle, Inc. All Rights Reserved.
> > +#
> > +# FS QA Test No. 848
> > +#
> > +# Race fsx and xfs_scrub in force-repair mode for a while to see if we
> > +# crash or livelock.
> > +#
> > +. ./common/preamble
> > +_begin_fstest online_repair dangerous_fsstress_repair
> > +
> > +_cleanup() {
> > + cd /
> > + _scratch_xfs_stress_scrub_cleanup &> /dev/null
> > + rm -r -f $tmp.*
> > +}
> > +_register_cleanup "_cleanup" BUS
> > +
> > +# Import common functions.
> > +. ./common/filter
> > +. ./common/fuzzy
> > +. ./common/inject
> > +. ./common/xfs
> > +
> > +# real QA test starts here
> > +_supported_fs xfs
> > +_require_scratch
> > +_require_xfs_stress_online_repair
> > +
> > +_scratch_mkfs > "$seqres.full" 2>&1
> > +_scratch_mount
> > +_scratch_xfs_stress_online_repair -S '-k' -X 'fsx'
> > +
> > +# success, all done
> > +echo Silence is golden
> > +status=0
> > +exit
> > diff --git a/tests/xfs/848.out b/tests/xfs/848.out
> > new file mode 100644
> > index 0000000000..23f674045c
> > --- /dev/null
> > +++ b/tests/xfs/848.out
> > @@ -0,0 +1,2 @@
> > +QA output created by 848
> > +Silence is golden
> >
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH v24.1 1/3] fuzzy: enhance scrub stress testing to use fsx
2022-12-30 22:12 ` [PATCH 1/3] fuzzy: enhance scrub stress testing to use fsx Darrick J. Wong
2023-01-05 5:49 ` Zorro Lang
@ 2023-01-05 18:28 ` Darrick J. Wong
1 sibling, 0 replies; 32+ messages in thread
From: Darrick J. Wong @ 2023-01-05 18:28 UTC (permalink / raw)
To: zlang; +Cc: linux-xfs, fstests, guan
From: Darrick J. Wong <djwong@kernel.org>
Add a couple of new online fsck stress tests that race fsx against
online fsck.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
v24.1: move the addition of the group to this patch
---
common/fuzzy | 39 ++++++++++++++++++++++++++++++++++++---
doc/group-names.txt | 1 +
tests/xfs/847 | 38 ++++++++++++++++++++++++++++++++++++++
tests/xfs/847.out | 2 ++
tests/xfs/848 | 38 ++++++++++++++++++++++++++++++++++++++
tests/xfs/848.out | 2 ++
6 files changed, 117 insertions(+), 3 deletions(-)
create mode 100755 tests/xfs/847
create mode 100644 tests/xfs/847.out
create mode 100755 tests/xfs/848
create mode 100644 tests/xfs/848.out
diff --git a/common/fuzzy b/common/fuzzy
index 7994665ef7..a764de461e 100644
--- a/common/fuzzy
+++ b/common/fuzzy
@@ -417,6 +417,30 @@ __stress_scrub_clean_scratch() {
return 0
}
+# Run fsx while we're testing online fsck.
+__stress_scrub_fsx_loop() {
+ local end="$1"
+ local runningfile="$2"
+ local focus=(-q -X) # quiet, validate file contents
+
+ # As of November 2022, 2 million fsx ops should be enough to keep
+ # any filesystem busy for a couple of hours.
+ focus+=(-N 2000000)
+ focus+=(-o $((128000 * LOAD_FACTOR)) )
+ focus+=(-l $((600000 * LOAD_FACTOR)) )
+
+ local args="$FSX_AVOID ${focus[@]} ${SCRATCH_MNT}/fsx.$seq"
+ echo "Running $here/ltp/fsx $args" >> $seqres.full
+
+ while __stress_scrub_running "$end" "$runningfile"; do
+ # Need to recheck running conditions if we cleared anything
+ __stress_scrub_clean_scratch && continue
+ $here/ltp/fsx $args >> $seqres.full
+ echo "fsx exits with $? at $(date)" >> $seqres.full
+ done
+ rm -f "$runningfile"
+}
+
# Run fsstress while we're testing online fsck.
__stress_scrub_fsstress_loop() {
local end="$1"
@@ -463,7 +487,7 @@ _scratch_xfs_stress_scrub_cleanup() {
# Send SIGINT so that bash won't print a 'Terminated' message that
# distorts the golden output.
echo "Killing stressor processes at $(date)" >> $seqres.full
- $KILLALL_PROG -INT xfs_io fsstress >> $seqres.full 2>&1
+ $KILLALL_PROG -INT xfs_io fsstress fsx >> $seqres.full 2>&1
# Tests are not allowed to exit with the scratch fs frozen. If we
# started a fs freeze/thaw background loop, wait for that loop to exit
@@ -531,30 +555,39 @@ __stress_scrub_check_commands() {
# -w Delay the start of the scrub/repair loop by this number of seconds.
# Defaults to no delay unless XFS_SCRUB_STRESS_DELAY is set. This value
# will be clamped to ten seconds before the end time.
+# -X Run this program to exercise the filesystem. Currently supported
+# options are 'fsx' and 'fsstress'. The default is 'fsstress'.
_scratch_xfs_stress_scrub() {
local one_scrub_args=()
local scrub_tgt="$SCRATCH_MNT"
local runningfile="$tmp.fsstress"
local freeze="${XFS_SCRUB_STRESS_FREEZE}"
local scrub_delay="${XFS_SCRUB_STRESS_DELAY:--1}"
+ local exerciser="fsstress"
__SCRUB_STRESS_FREEZE_PID=""
rm -f "$runningfile"
touch "$runningfile"
OPTIND=1
- while getopts "fs:t:w:" c; do
+ while getopts "fs:t:w:X:" c; do
case "$c" in
f) freeze=yes;;
s) one_scrub_args+=("$OPTARG");;
t) scrub_tgt="$OPTARG";;
w) scrub_delay="$OPTARG";;
+ X) exerciser="$OPTARG";;
*) return 1; ;;
esac
done
__stress_scrub_check_commands "$scrub_tgt" "${one_scrub_args[@]}"
+ if ! command -v "__stress_scrub_${exerciser}_loop" &>/dev/null; then
+ echo "${exerciser}: Unknown fs exercise program."
+ return 1
+ fi
+
local start="$(date +%s)"
local end="$((start + (30 * TIME_FACTOR) ))"
local scrub_startat="$((start + scrub_delay))"
@@ -564,7 +597,7 @@ _scratch_xfs_stress_scrub() {
echo "Loop started at $(date --date="@${start}")," \
"ending at $(date --date="@${end}")" >> $seqres.full
- __stress_scrub_fsstress_loop "$end" "$runningfile" &
+ "__stress_scrub_${exerciser}_loop" "$end" "$runningfile" &
if [ -n "$freeze" ]; then
__stress_scrub_freeze_loop "$end" "$runningfile" &
diff --git a/doc/group-names.txt b/doc/group-names.txt
index ac219e05b3..771ce937ae 100644
--- a/doc/group-names.txt
+++ b/doc/group-names.txt
@@ -35,6 +35,7 @@ dangerous_fuzzers fuzzers that can crash your computer
dangerous_norepair fuzzers to evaluate kernel metadata verifiers
dangerous_online_repair fuzzers to evaluate xfs_scrub online repair
dangerous_fsstress_repair race fsstress and xfs_scrub online repair
+dangerous_fsstress_scrub race fsstress and xfs_scrub checking
dangerous_repair fuzzers to evaluate xfs_repair offline repair
dangerous_scrub fuzzers to evaluate xfs_scrub checking
data data loss checkers
diff --git a/tests/xfs/847 b/tests/xfs/847
new file mode 100755
index 0000000000..856e9a6c26
--- /dev/null
+++ b/tests/xfs/847
@@ -0,0 +1,38 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2022 Oracle, Inc. All Rights Reserved.
+#
+# FS QA Test No. 847
+#
+# Race fsx and xfs_scrub in read-only mode for a while to see if we crash
+# or livelock.
+#
+. ./common/preamble
+_begin_fstest scrub dangerous_fsstress_scrub
+
+_cleanup() {
+ cd /
+ _scratch_xfs_stress_scrub_cleanup &> /dev/null
+ rm -r -f $tmp.*
+}
+_register_cleanup "_cleanup" BUS
+
+# Import common functions.
+. ./common/filter
+. ./common/fuzzy
+. ./common/inject
+. ./common/xfs
+
+# real QA test starts here
+_supported_fs xfs
+_require_scratch
+_require_xfs_stress_scrub
+
+_scratch_mkfs > "$seqres.full" 2>&1
+_scratch_mount
+_scratch_xfs_stress_scrub -S '-n' -X 'fsx'
+
+# success, all done
+echo Silence is golden
+status=0
+exit
diff --git a/tests/xfs/847.out b/tests/xfs/847.out
new file mode 100644
index 0000000000..b7041db159
--- /dev/null
+++ b/tests/xfs/847.out
@@ -0,0 +1,2 @@
+QA output created by 847
+Silence is golden
diff --git a/tests/xfs/848 b/tests/xfs/848
new file mode 100755
index 0000000000..ab32020624
--- /dev/null
+++ b/tests/xfs/848
@@ -0,0 +1,38 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2022 Oracle, Inc. All Rights Reserved.
+#
+# FS QA Test No. 848
+#
+# Race fsx and xfs_scrub in force-repair mode for a while to see if we
+# crash or livelock.
+#
+. ./common/preamble
+_begin_fstest online_repair dangerous_fsstress_repair
+
+_cleanup() {
+ cd /
+ _scratch_xfs_stress_scrub_cleanup &> /dev/null
+ rm -r -f $tmp.*
+}
+_register_cleanup "_cleanup" BUS
+
+# Import common functions.
+. ./common/filter
+. ./common/fuzzy
+. ./common/inject
+. ./common/xfs
+
+# real QA test starts here
+_supported_fs xfs
+_require_scratch
+_require_xfs_stress_online_repair
+
+_scratch_mkfs > "$seqres.full" 2>&1
+_scratch_mount
+_scratch_xfs_stress_online_repair -S '-k' -X 'fsx'
+
+# success, all done
+echo Silence is golden
+status=0
+exit
diff --git a/tests/xfs/848.out b/tests/xfs/848.out
new file mode 100644
index 0000000000..23f674045c
--- /dev/null
+++ b/tests/xfs/848.out
@@ -0,0 +1,2 @@
+QA output created by 848
+Silence is golden
^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH 12/16] fuzzy: increase operation count for each fsstress invocation
2022-12-30 22:12 ` [PATCH 12/16] fuzzy: increase operation count for each fsstress invocation Darrick J. Wong
@ 2023-01-13 19:55 ` Zorro Lang
2023-01-13 21:28 ` Darrick J. Wong
0 siblings, 1 reply; 32+ messages in thread
From: Zorro Lang @ 2023-01-13 19:55 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: linux-xfs, fstests
On Fri, Dec 30, 2022 at 02:12:54PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
>
> For online fsck stress testing, increase the number of filesystem
> operations per fsstress run to 2 million, now that we have the ability
> to kill fsstress if the user should push ^C to abort the test early.
> This should guarantee a couple of hours of continuous stress testing in
> between clearing the scratch filesystem.
>
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
> common/fuzzy | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
>
> diff --git a/common/fuzzy b/common/fuzzy
> index 01cf7f00d8..3e23edc9e4 100644
> --- a/common/fuzzy
> +++ b/common/fuzzy
> @@ -399,7 +399,9 @@ __stress_scrub_fsstress_loop() {
> local end="$1"
> local runningfile="$2"
>
> - local args=$(_scale_fsstress_args -p 4 -d $SCRATCH_MNT -n 2000 $FSSTRESS_AVOID)
> + # As of March 2022, 2 million fsstress ops should be enough to keep
> + # any filesystem busy for a couple of hours.
> + local args=$(_scale_fsstress_args -p 4 -d $SCRATCH_MNT -n 2000000 $FSSTRESS_AVOID)
Can fsstress "-l 0" option help?
> echo "Running $FSSTRESS_PROG $args" >> $seqres.full
>
> while __stress_scrub_running "$end" "$runningfile"; do
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [NYE DELUGE 1/4] xfs: all pending online scrub improvements
2022-12-30 21:13 [NYE DELUGE 1/4] xfs: all pending online scrub improvements Darrick J. Wong
` (2 preceding siblings ...)
2022-12-30 22:13 ` [PATCHSET v24.0 0/2] fstests: race online scrub with mount state changes Darrick J. Wong
@ 2023-01-13 20:10 ` Zorro Lang
2023-01-13 21:28 ` Darrick J. Wong
3 siblings, 1 reply; 32+ messages in thread
From: Zorro Lang @ 2023-01-13 20:10 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: xfs, fstests
On Fri, Dec 30, 2022 at 01:13:21PM -0800, Darrick J. Wong wrote:
> Hi everyone,
>
> As I've mentioned several times throughout 2022, I would like to merge
> the online fsck feature in time for the 2023 LTS kernel. The first big
> step in this process is to merge all the pending bug fixes, validation
> improvements, and general reorganization of the existing metadata
> scrubbing functionality.
>
> This first deluge starts with the design document for the entirety of
> the online fsck feature. The design doc should be familiar to most of
> you, as it's been on the list for review for months already. It
> outlines in brief the problems we're trying to solve, the use cases and
> testing plan, and the fundamental data structures and algorithms
> underlying the entire feature.
>
> After that come all the code changes to wrap up the metadata checking
> part of the feature. The biggest piece here is the scrub drains that
> allow scrub to quiesce deferred ops targeting AGs so that it can
> cross-reference recordsets. Most of the rest is tweaking the btree code
> so that we can do keyspace scans to look for conflicting records.
>
> For this review, I would like people to focus the following:
>
> - Are the major subsystems sufficiently documented that you could figure
> out what the code does?
>
> - Do you see any problems that are severe enough to cause long term
> support hassles? (e.g. bad API design, writing weird metadata to disk)
>
> - Can you spot mis-interactions between the subsystems?
>
> - What were my blind spots in devising this feature?
>
> - Are there missing pieces that you'd like to help build?
>
> - Can I just merge all of this?
>
> The one thing that is /not/ in scope for this review are requests for
> more refactoring of existing subsystems. While there are usually valid
> arguments for performing such cleanups, those are separate tasks to be
> prioritized separately. I will get to them after merging online fsck.
>
> I've been running daily online scrubs of every computer I own for the
> last five years, which has helped me iron out real problems in (limited
> scope) production. All issues observed in that time have been corrected
> in this submission.
The 3 fstests patchsets of the [NYE DELUGE 1/4] look good to me. And I didn't
find more critical issues after Darrick fixed that "group name missing" problem.
By testing it a whole week, I decide to merge this 3 patchsets this weekend,
then we can shift to later patchsets are waiting for review and merge.
Reviewed-by: Zorro Lang <zlang@redhat.com>
Thanks,
Zorro
>
> As a warning, the patches will likely take several days to trickle in.
> All four patch deluges are based off kernel 6.2-rc1, xfsprogs 6.1, and
> fstests 2022-12-25.
>
> Thank you all for your participation in the XFS community. Have a safe
> New Years, and I'll see you all next year!
>
> --D
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 12/16] fuzzy: increase operation count for each fsstress invocation
2023-01-13 19:55 ` Zorro Lang
@ 2023-01-13 21:28 ` Darrick J. Wong
0 siblings, 0 replies; 32+ messages in thread
From: Darrick J. Wong @ 2023-01-13 21:28 UTC (permalink / raw)
To: Zorro Lang; +Cc: linux-xfs, fstests
On Sat, Jan 14, 2023 at 03:55:25AM +0800, Zorro Lang wrote:
> On Fri, Dec 30, 2022 at 02:12:54PM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> >
> > For online fsck stress testing, increase the number of filesystem
> > operations per fsstress run to 2 million, now that we have the ability
> > to kill fsstress if the user should push ^C to abort the test early.
> > This should guarantee a couple of hours of continuous stress testing in
> > between clearing the scratch filesystem.
> >
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > ---
> > common/fuzzy | 4 +++-
> > 1 file changed, 3 insertions(+), 1 deletion(-)
> >
> >
> > diff --git a/common/fuzzy b/common/fuzzy
> > index 01cf7f00d8..3e23edc9e4 100644
> > --- a/common/fuzzy
> > +++ b/common/fuzzy
> > @@ -399,7 +399,9 @@ __stress_scrub_fsstress_loop() {
> > local end="$1"
> > local runningfile="$2"
> >
> > - local args=$(_scale_fsstress_args -p 4 -d $SCRATCH_MNT -n 2000 $FSSTRESS_AVOID)
> > + # As of March 2022, 2 million fsstress ops should be enough to keep
> > + # any filesystem busy for a couple of hours.
> > + local args=$(_scale_fsstress_args -p 4 -d $SCRATCH_MNT -n 2000000 $FSSTRESS_AVOID)
>
> Can fsstress "-l 0" option help?
No. -n determines the number of operations per loop, and -l determines
the number of loops:
$ fsstress -d dor/ -n 5 -v -s 1
0/0: mkdir d0 17
0/0: mkdir add id=0,parent=-1
0/1: link - no file
0/2: mkdir d1 17
0/2: mkdir add id=1,parent=-1
0/3: chown . 127/0 0
0/4: rename - no source filename
$ fsstress -d dor/ -n 5 -l 2 -v -s 1
0/0: mkdir d0 17
0/0: mkdir add id=0,parent=-1
0/1: link - no file
0/2: mkdir d1 17
0/2: mkdir add id=1,parent=-1
0/3: chown . 127/0 0
0/4: rename - no source filename
0/0: mkdir d2 0
0/0: mkdir add id=2,parent=-1
0/1: link - no file
0/2: mkdir d2/d3 0
0/2: mkdir add id=3,parent=2
0/3: chown d2 127/0 0
0/4: rename(REXCHANGE) d2/d3 and d2 have ancestor-descendant relationship
--D
> > echo "Running $FSSTRESS_PROG $args" >> $seqres.full
> >
> > while __stress_scrub_running "$end" "$runningfile"; do
> >
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [NYE DELUGE 1/4] xfs: all pending online scrub improvements
2023-01-13 20:10 ` [NYE DELUGE 1/4] xfs: all pending online scrub improvements Zorro Lang
@ 2023-01-13 21:28 ` Darrick J. Wong
0 siblings, 0 replies; 32+ messages in thread
From: Darrick J. Wong @ 2023-01-13 21:28 UTC (permalink / raw)
To: Zorro Lang; +Cc: xfs, fstests
On Sat, Jan 14, 2023 at 04:10:33AM +0800, Zorro Lang wrote:
> On Fri, Dec 30, 2022 at 01:13:21PM -0800, Darrick J. Wong wrote:
> > Hi everyone,
> >
> > As I've mentioned several times throughout 2022, I would like to merge
> > the online fsck feature in time for the 2023 LTS kernel. The first big
> > step in this process is to merge all the pending bug fixes, validation
> > improvements, and general reorganization of the existing metadata
> > scrubbing functionality.
> >
> > This first deluge starts with the design document for the entirety of
> > the online fsck feature. The design doc should be familiar to most of
> > you, as it's been on the list for review for months already. It
> > outlines in brief the problems we're trying to solve, the use cases and
> > testing plan, and the fundamental data structures and algorithms
> > underlying the entire feature.
> >
> > After that come all the code changes to wrap up the metadata checking
> > part of the feature. The biggest piece here is the scrub drains that
> > allow scrub to quiesce deferred ops targeting AGs so that it can
> > cross-reference recordsets. Most of the rest is tweaking the btree code
> > so that we can do keyspace scans to look for conflicting records.
> >
> > For this review, I would like people to focus the following:
> >
> > - Are the major subsystems sufficiently documented that you could figure
> > out what the code does?
> >
> > - Do you see any problems that are severe enough to cause long term
> > support hassles? (e.g. bad API design, writing weird metadata to disk)
> >
> > - Can you spot mis-interactions between the subsystems?
> >
> > - What were my blind spots in devising this feature?
> >
> > - Are there missing pieces that you'd like to help build?
> >
> > - Can I just merge all of this?
> >
> > The one thing that is /not/ in scope for this review are requests for
> > more refactoring of existing subsystems. While there are usually valid
> > arguments for performing such cleanups, those are separate tasks to be
> > prioritized separately. I will get to them after merging online fsck.
> >
> > I've been running daily online scrubs of every computer I own for the
> > last five years, which has helped me iron out real problems in (limited
> > scope) production. All issues observed in that time have been corrected
> > in this submission.
>
> The 3 fstests patchsets of the [NYE DELUGE 1/4] look good to me. And I didn't
> find more critical issues after Darrick fixed that "group name missing" problem.
> By testing it a whole week, I decide to merge this 3 patchsets this weekend,
> then we can shift to later patchsets are waiting for review and merge.
>
> Reviewed-by: Zorro Lang <zlang@redhat.com>
Ok, thanks!
--D
> Thanks,
> Zorro
>
> >
> > As a warning, the patches will likely take several days to trickle in.
> > All four patch deluges are based off kernel 6.2-rc1, xfsprogs 6.1, and
> > fstests 2022-12-25.
> >
> > Thank you all for your participation in the XFS community. Have a safe
> > New Years, and I'll see you all next year!
> >
> > --D
> >
>
^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2023-01-13 21:28 UTC | newest]
Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-12-30 21:13 [NYE DELUGE 1/4] xfs: all pending online scrub improvements Darrick J. Wong
2022-12-30 22:12 ` [PATCHSET v24.0 00/16] fstests: refactor online fsck stress tests Darrick J. Wong
2022-12-30 22:12 ` [PATCH 04/16] fuzzy: clean up scrub stress programs quietly Darrick J. Wong
2022-12-30 22:12 ` [PATCH 07/16] fuzzy: give each test local control over what scrub stress tests get run Darrick J. Wong
2022-12-30 22:12 ` [PATCH 01/16] xfs/422: create a new test group for fsstress/repair racers Darrick J. Wong
2022-12-30 22:12 ` [PATCH 06/16] fuzzy: explicitly check for common/inject in _require_xfs_stress_online_repair Darrick J. Wong
2022-12-30 22:12 ` [PATCH 05/16] fuzzy: rework scrub stress output filtering Darrick J. Wong
2022-12-30 22:12 ` [PATCH 02/16] xfs/422: move the fsstress/freeze/scrub racing logic to common/fuzzy Darrick J. Wong
2022-12-30 22:12 ` [PATCH 03/16] xfs/422: rework feature detection so we only test-format scratch once Darrick J. Wong
2022-12-30 22:12 ` [PATCH 13/16] fuzzy: clean up frozen fses after scrub stress testing Darrick J. Wong
2022-12-30 22:12 ` [PATCH 12/16] fuzzy: increase operation count for each fsstress invocation Darrick J. Wong
2023-01-13 19:55 ` Zorro Lang
2023-01-13 21:28 ` Darrick J. Wong
2022-12-30 22:12 ` [PATCH 11/16] fuzzy: clear out the scratch filesystem if it's too full Darrick J. Wong
2022-12-30 22:12 ` [PATCH 14/16] fuzzy: make freezing optional for scrub stress tests Darrick J. Wong
2022-12-30 22:12 ` [PATCH 10/16] fuzzy: abort scrub stress testing if the scratch fs went down Darrick J. Wong
2022-12-30 22:12 ` [PATCH 08/16] fuzzy: test the scrub stress subcommands before looping Darrick J. Wong
2022-12-30 22:12 ` [PATCH 09/16] fuzzy: make scrub stress loop control more robust Darrick J. Wong
2022-12-30 22:12 ` [PATCH 16/16] fuzzy: delay the start of the scrub loop when stress-testing scrub Darrick J. Wong
2022-12-30 22:12 ` [PATCH 15/16] fuzzy: allow substitution of AG numbers when configuring scrub stress test Darrick J. Wong
2022-12-30 22:12 ` [PATCHSET v24.0 0/3] fstests: refactor GETFSMAP stress tests Darrick J. Wong
2022-12-30 22:12 ` [PATCH 1/3] fuzzy: enhance scrub stress testing to use fsx Darrick J. Wong
2023-01-05 5:49 ` Zorro Lang
2023-01-05 18:28 ` Darrick J. Wong
2023-01-05 18:28 ` [PATCH v24.1 " Darrick J. Wong
2022-12-30 22:12 ` [PATCH 2/3] fuzzy: refactor fsmap stress test to use our helper functions Darrick J. Wong
2022-12-30 22:12 ` [PATCH 3/3] xfs: race fsmap with readonly remounts to detect crash or livelock Darrick J. Wong
2022-12-30 22:13 ` [PATCHSET v24.0 0/2] fstests: race online scrub with mount state changes Darrick J. Wong
2022-12-30 22:13 ` [PATCH 1/2] xfs: stress test xfs_scrub(8) with fsstress Darrick J. Wong
2022-12-30 22:13 ` [PATCH 2/2] xfs: stress test xfs_scrub(8) with freeze and ro-remount loops Darrick J. Wong
2023-01-13 20:10 ` [NYE DELUGE 1/4] xfs: all pending online scrub improvements Zorro Lang
2023-01-13 21:28 ` Darrick J. Wong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox