* [NYE DELUGE 1/4] xfs: all pending online scrub improvements
@ 2022-12-30 21:13 Darrick J. Wong
2022-12-30 22:12 ` [PATCHSET v24.0 00/16] fstests: refactor online fsck stress tests Darrick J. Wong
` (3 more replies)
0 siblings, 4 replies; 32+ messages in thread
From: Darrick J. Wong @ 2022-12-30 21:13 UTC (permalink / raw)
To: Dave Chinner, Allison Henderson, Chandan Babu R, Catherine Hoang,
djwong
Cc: xfs, greg.marsden, shirley.ma, konrad.wilk, fstests, Zorro Lang,
Carlos Maiolino
Hi everyone,
As I've mentioned several times throughout 2022, I would like to merge
the online fsck feature in time for the 2023 LTS kernel. The first big
step in this process is to merge all the pending bug fixes, validation
improvements, and general reorganization of the existing metadata
scrubbing functionality.
This first deluge starts with the design document for the entirety of
the online fsck feature. The design doc should be familiar to most of
you, as it's been on the list for review for months already. It
outlines in brief the problems we're trying to solve, the use cases and
testing plan, and the fundamental data structures and algorithms
underlying the entire feature.
After that come all the code changes to wrap up the metadata checking
part of the feature. The biggest piece here is the scrub drains that
allow scrub to quiesce deferred ops targeting AGs so that it can
cross-reference recordsets. Most of the rest is tweaking the btree code
so that we can do keyspace scans to look for conflicting records.
For this review, I would like people to focus the following:
- Are the major subsystems sufficiently documented that you could figure
out what the code does?
- Do you see any problems that are severe enough to cause long term
support hassles? (e.g. bad API design, writing weird metadata to disk)
- Can you spot mis-interactions between the subsystems?
- What were my blind spots in devising this feature?
- Are there missing pieces that you'd like to help build?
- Can I just merge all of this?
The one thing that is /not/ in scope for this review are requests for
more refactoring of existing subsystems. While there are usually valid
arguments for performing such cleanups, those are separate tasks to be
prioritized separately. I will get to them after merging online fsck.
I've been running daily online scrubs of every computer I own for the
last five years, which has helped me iron out real problems in (limited
scope) production. All issues observed in that time have been corrected
in this submission.
As a warning, the patches will likely take several days to trickle in.
All four patch deluges are based off kernel 6.2-rc1, xfsprogs 6.1, and
fstests 2022-12-25.
Thank you all for your participation in the XFS community. Have a safe
New Years, and I'll see you all next year!
--D
^ permalink raw reply [flat|nested] 32+ messages in thread* [PATCHSET v24.0 00/16] fstests: refactor online fsck stress tests 2022-12-30 21:13 [NYE DELUGE 1/4] xfs: all pending online scrub improvements Darrick J. Wong @ 2022-12-30 22:12 ` Darrick J. Wong 2022-12-30 22:12 ` [PATCH 04/16] fuzzy: clean up scrub stress programs quietly Darrick J. Wong ` (15 more replies) 2022-12-30 22:12 ` [PATCHSET v24.0 0/3] fstests: refactor GETFSMAP stress tests Darrick J. Wong ` (2 subsequent siblings) 3 siblings, 16 replies; 32+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:12 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan Hi all, This series prepares us to begin creating stress tests for the XFS online fsck feature. We start by hoisting the loop control code out of the one existing test (xfs/422) into common/fuzzy, and then we commence rearranging the code to make it easy to generate more and more tests. Eventually we will race fsstress against online scrub and online repair to make sure that xfs_scrub running on a correct filesystem cannot take it down by accident. If you're going to start using this mess, you probably ought to just pull from my git trees, which are linked below. This is an extraordinary way to destroy everything. Enjoy! Comments and questions are, as always, welcome. --D fstests git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=refactor-scrub-stress --- common/fuzzy | 272 +++++++++++++++++++++++++++++++++++++++++++++++++++ doc/group-names.txt | 1 tests/xfs/422 | 109 ++------------------ tests/xfs/422.out | 4 - 4 files changed, 285 insertions(+), 101 deletions(-) ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 04/16] fuzzy: clean up scrub stress programs quietly 2022-12-30 22:12 ` [PATCHSET v24.0 00/16] fstests: refactor online fsck stress tests Darrick J. Wong @ 2022-12-30 22:12 ` Darrick J. Wong 2022-12-30 22:12 ` [PATCH 07/16] fuzzy: give each test local control over what scrub stress tests get run Darrick J. Wong ` (14 subsequent siblings) 15 siblings, 0 replies; 32+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:12 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> In the cleanup function for online fsck stress test common code, send SIGINT instead of SIGTERM to the fsstress and xfs_io processes to kill them. bash prints 'Terminated' to the golden output when children die with SIGTERM, which can make a test fail, and we don't want a regular cleanup function being the thing that prevents the test from passing. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- common/fuzzy | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/common/fuzzy b/common/fuzzy index 979fa55515..e52831560d 100644 --- a/common/fuzzy +++ b/common/fuzzy @@ -381,7 +381,9 @@ _require_xfs_stress_online_repair() { # Clean up after the loops in case they didn't do it themselves. _scratch_xfs_stress_scrub_cleanup() { - $KILLALL_PROG -TERM xfs_io fsstress >> $seqres.full 2>&1 + # Send SIGINT so that bash won't print a 'Terminated' message that + # distorts the golden output. + $KILLALL_PROG -INT xfs_io fsstress >> $seqres.full 2>&1 $XFS_IO_PROG -x -c 'thaw' $SCRATCH_MNT >> $seqres.full 2>&1 } ^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 07/16] fuzzy: give each test local control over what scrub stress tests get run 2022-12-30 22:12 ` [PATCHSET v24.0 00/16] fstests: refactor online fsck stress tests Darrick J. Wong 2022-12-30 22:12 ` [PATCH 04/16] fuzzy: clean up scrub stress programs quietly Darrick J. Wong @ 2022-12-30 22:12 ` Darrick J. Wong 2022-12-30 22:12 ` [PATCH 01/16] xfs/422: create a new test group for fsstress/repair racers Darrick J. Wong ` (13 subsequent siblings) 15 siblings, 0 replies; 32+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:12 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> Now that we've hoisted the scrub stress code to common/fuzzy, introduce argument parsing so that each test can specify what they want to test. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- common/fuzzy | 39 +++++++++++++++++++++++++++++++++++---- tests/xfs/422 | 2 +- 2 files changed, 36 insertions(+), 5 deletions(-) diff --git a/common/fuzzy b/common/fuzzy index de9e398984..88ba5fef69 100644 --- a/common/fuzzy +++ b/common/fuzzy @@ -348,12 +348,19 @@ __stress_scrub_freeze_loop() { done } -# Run xfs online fsck commands in a tight loop. -__stress_scrub_loop() { +# Run individual XFS online fsck commands in a tight loop with xfs_io. +__stress_one_scrub_loop() { local end="$1" + local scrub_tgt="$2" + shift; shift + + local xfs_io_args=() + for arg in "$@"; do + xfs_io_args+=('-c' "$arg") + done while [ "$(date +%s)" -lt $end ]; do - $XFS_IO_PROG -x -c 'repair rmapbt 0' -c 'repair rmapbt 1' $SCRATCH_MNT 2>&1 | \ + $XFS_IO_PROG -x "${xfs_io_args[@]}" "$scrub_tgt" 2>&1 | \ __stress_scrub_filter_output done } @@ -390,6 +397,8 @@ _require_xfs_stress_online_repair() { # Clean up after the loops in case they didn't do it themselves. _scratch_xfs_stress_scrub_cleanup() { + echo "Cleaning up scrub stress run at $(date)" >> $seqres.full + # Send SIGINT so that bash won't print a 'Terminated' message that # distorts the golden output. $KILLALL_PROG -INT xfs_io fsstress >> $seqres.full 2>&1 @@ -399,7 +408,25 @@ _scratch_xfs_stress_scrub_cleanup() { # Start scrub, freeze, and fsstress in background looping processes, and wait # for 30*TIME_FACTOR seconds to see if the filesystem goes down. Callers # must call _scratch_xfs_stress_scrub_cleanup from their cleanup functions. +# +# Various options include: +# +# -s Pass this command to xfs_io to test scrub. If zero -s options are +# specified, xfs_io will not be run. +# -t Run online scrub against this file; $SCRATCH_MNT is the default. _scratch_xfs_stress_scrub() { + local one_scrub_args=() + local scrub_tgt="$SCRATCH_MNT" + + OPTIND=1 + while getopts "s:t:" c; do + case "$c" in + s) one_scrub_args+=("$OPTARG");; + t) scrub_tgt="$OPTARG";; + *) return 1; ;; + esac + done + local start="$(date +%s)" local end="$((start + (30 * TIME_FACTOR) ))" @@ -408,7 +435,11 @@ _scratch_xfs_stress_scrub() { __stress_scrub_fsstress_loop $end & __stress_scrub_freeze_loop $end & - __stress_scrub_loop $end & + + if [ "${#one_scrub_args[@]}" -gt 0 ]; then + __stress_one_scrub_loop "$end" "$scrub_tgt" \ + "${one_scrub_args[@]}" & + fi # Wait until 2 seconds after the loops should have finished, then # clean up after ourselves. diff --git a/tests/xfs/422 b/tests/xfs/422 index b3353d2202..faea5d6792 100755 --- a/tests/xfs/422 +++ b/tests/xfs/422 @@ -31,7 +31,7 @@ _require_xfs_stress_online_repair _scratch_mkfs > "$seqres.full" 2>&1 _scratch_mount _require_xfs_has_feature "$SCRATCH_MNT" rmapbt -_scratch_xfs_stress_online_repair +_scratch_xfs_stress_online_repair -s "repair rmapbt 0" -s "repair rmapbt 1" # success, all done echo Silence is golden ^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 01/16] xfs/422: create a new test group for fsstress/repair racers 2022-12-30 22:12 ` [PATCHSET v24.0 00/16] fstests: refactor online fsck stress tests Darrick J. Wong 2022-12-30 22:12 ` [PATCH 04/16] fuzzy: clean up scrub stress programs quietly Darrick J. Wong 2022-12-30 22:12 ` [PATCH 07/16] fuzzy: give each test local control over what scrub stress tests get run Darrick J. Wong @ 2022-12-30 22:12 ` Darrick J. Wong 2022-12-30 22:12 ` [PATCH 06/16] fuzzy: explicitly check for common/inject in _require_xfs_stress_online_repair Darrick J. Wong ` (12 subsequent siblings) 15 siblings, 0 replies; 32+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:12 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> Create a new group for tests that race fsstress with online filesystem repair, and add this to the dangerous_online_repair group too. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- doc/group-names.txt | 1 + tests/xfs/422 | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/doc/group-names.txt b/doc/group-names.txt index 6cc9af7844..ac219e05b3 100644 --- a/doc/group-names.txt +++ b/doc/group-names.txt @@ -34,6 +34,7 @@ dangerous_bothrepair fuzzers to evaluate xfs_scrub + xfs_repair repair dangerous_fuzzers fuzzers that can crash your computer dangerous_norepair fuzzers to evaluate kernel metadata verifiers dangerous_online_repair fuzzers to evaluate xfs_scrub online repair +dangerous_fsstress_repair race fsstress and xfs_scrub online repair dangerous_repair fuzzers to evaluate xfs_repair offline repair dangerous_scrub fuzzers to evaluate xfs_scrub checking data data loss checkers diff --git a/tests/xfs/422 b/tests/xfs/422 index f3c63e8d6a..9ed944ed63 100755 --- a/tests/xfs/422 +++ b/tests/xfs/422 @@ -9,7 +9,7 @@ # activity, so we can't have userspace wandering in and thawing it. # . ./common/preamble -_begin_fstest dangerous_scrub dangerous_online_repair freeze +_begin_fstest online_repair dangerous_fsstress_repair freeze _register_cleanup "_cleanup" BUS ^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 06/16] fuzzy: explicitly check for common/inject in _require_xfs_stress_online_repair 2022-12-30 22:12 ` [PATCHSET v24.0 00/16] fstests: refactor online fsck stress tests Darrick J. Wong ` (2 preceding siblings ...) 2022-12-30 22:12 ` [PATCH 01/16] xfs/422: create a new test group for fsstress/repair racers Darrick J. Wong @ 2022-12-30 22:12 ` Darrick J. Wong 2022-12-30 22:12 ` [PATCH 05/16] fuzzy: rework scrub stress output filtering Darrick J. Wong ` (11 subsequent siblings) 15 siblings, 0 replies; 32+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:12 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> In _require_xfs_stress_online_repair, make sure that the test has sourced common/inject before we try to call its functions. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- common/fuzzy | 2 ++ 1 file changed, 2 insertions(+) diff --git a/common/fuzzy b/common/fuzzy index 94a6ce85a3..de9e398984 100644 --- a/common/fuzzy +++ b/common/fuzzy @@ -382,6 +382,8 @@ _require_xfs_stress_scrub() { _require_xfs_stress_online_repair() { _require_xfs_stress_scrub _require_xfs_io_command "repair" + command -v _require_xfs_io_error_injection &>/dev/null || \ + _notrun 'xfs repair stress test requires common/inject' _require_xfs_io_error_injection "force_repair" _require_freeze } ^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 05/16] fuzzy: rework scrub stress output filtering 2022-12-30 22:12 ` [PATCHSET v24.0 00/16] fstests: refactor online fsck stress tests Darrick J. Wong ` (3 preceding siblings ...) 2022-12-30 22:12 ` [PATCH 06/16] fuzzy: explicitly check for common/inject in _require_xfs_stress_online_repair Darrick J. Wong @ 2022-12-30 22:12 ` Darrick J. Wong 2022-12-30 22:12 ` [PATCH 02/16] xfs/422: move the fsstress/freeze/scrub racing logic to common/fuzzy Darrick J. Wong ` (10 subsequent siblings) 15 siblings, 0 replies; 32+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:12 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> Rework the output filtering functions for scrub stress tests: first, we should use _filter_scratch to avoid leaking the scratch fs details to the output. Second, for scrub and repair, change the filter elements to reflect outputs that don't indicate failure (such as busy resources, preening requests, and insufficient space to do anything). Finally, change the _require function to check that filter functions have been sourced. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- common/fuzzy | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/common/fuzzy b/common/fuzzy index e52831560d..94a6ce85a3 100644 --- a/common/fuzzy +++ b/common/fuzzy @@ -323,14 +323,19 @@ _scratch_xfs_fuzz_metadata() { # Filter freeze and thaw loop output so that we don't tarnish the golden output # if the kernel temporarily won't let us freeze. __stress_freeze_filter_output() { - grep -E -v '(Device or resource busy|Invalid argument)' + _filter_scratch | \ + sed -e '/Device or resource busy/d' \ + -e '/Invalid argument/d' } # Filter scrub output so that we don't tarnish the golden output if the fs is # too busy to scrub. Note: Tests should _notrun if the scrub type is not # supported. __stress_scrub_filter_output() { - grep -E -v '(Device or resource busy|Invalid argument)' + _filter_scratch | \ + sed -e '/Device or resource busy/d' \ + -e '/Optimization possible/d' \ + -e '/No space left on device/d' } # Run fs freeze and thaw in a tight loop. @@ -369,6 +374,8 @@ _require_xfs_stress_scrub() { _require_xfs_io_command "scrub" _require_command "$KILLALL_PROG" killall _require_freeze + command -v _filter_scratch &>/dev/null || \ + _notrun 'xfs scrub stress test requires common/filter' } # Make sure we have everything we need to run stress and online repair ^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 02/16] xfs/422: move the fsstress/freeze/scrub racing logic to common/fuzzy 2022-12-30 22:12 ` [PATCHSET v24.0 00/16] fstests: refactor online fsck stress tests Darrick J. Wong ` (4 preceding siblings ...) 2022-12-30 22:12 ` [PATCH 05/16] fuzzy: rework scrub stress output filtering Darrick J. Wong @ 2022-12-30 22:12 ` Darrick J. Wong 2022-12-30 22:12 ` [PATCH 03/16] xfs/422: rework feature detection so we only test-format scratch once Darrick J. Wong ` (9 subsequent siblings) 15 siblings, 0 replies; 32+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:12 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> Hoist all this code to common/fuzzy in preparation for making this code more generic so that we implement a variety of tests that check the concurrency correctness of online fsck. Do just enough renaming so that we don't pollute the test program's namespace; we'll fix the other warts in subsequent patches. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- common/fuzzy | 100 +++++++++++++++++++++++++++++++++++++++++++++++++++ tests/xfs/422 | 104 ++++------------------------------------------------- tests/xfs/422.out | 4 +- 3 files changed, 109 insertions(+), 99 deletions(-) diff --git a/common/fuzzy b/common/fuzzy index 70213af5db..979fa55515 100644 --- a/common/fuzzy +++ b/common/fuzzy @@ -316,3 +316,103 @@ _scratch_xfs_fuzz_metadata() { done done } + +# Functions to race fsstress, fs freeze, and xfs metadata scrubbing against +# each other to shake out bugs in xfs online repair. + +# Filter freeze and thaw loop output so that we don't tarnish the golden output +# if the kernel temporarily won't let us freeze. +__stress_freeze_filter_output() { + grep -E -v '(Device or resource busy|Invalid argument)' +} + +# Filter scrub output so that we don't tarnish the golden output if the fs is +# too busy to scrub. Note: Tests should _notrun if the scrub type is not +# supported. +__stress_scrub_filter_output() { + grep -E -v '(Device or resource busy|Invalid argument)' +} + +# Run fs freeze and thaw in a tight loop. +__stress_scrub_freeze_loop() { + local end="$1" + + while [ "$(date +%s)" -lt $end ]; do + $XFS_IO_PROG -x -c 'freeze' -c 'thaw' $SCRATCH_MNT 2>&1 | \ + __stress_freeze_filter_output + done +} + +# Run xfs online fsck commands in a tight loop. +__stress_scrub_loop() { + local end="$1" + + while [ "$(date +%s)" -lt $end ]; do + $XFS_IO_PROG -x -c 'repair rmapbt 0' -c 'repair rmapbt 1' $SCRATCH_MNT 2>&1 | \ + __stress_scrub_filter_output + done +} + +# Run fsstress while we're testing online fsck. +__stress_scrub_fsstress_loop() { + local end="$1" + + local args=$(_scale_fsstress_args -p 4 -d $SCRATCH_MNT -n 2000 $FSSTRESS_AVOID) + + while [ "$(date +%s)" -lt $end ]; do + $FSSTRESS_PROG $args >> $seqres.full + done +} + +# Make sure we have everything we need to run stress and scrub +_require_xfs_stress_scrub() { + _require_xfs_io_command "scrub" + _require_command "$KILLALL_PROG" killall + _require_freeze +} + +# Make sure we have everything we need to run stress and online repair +_require_xfs_stress_online_repair() { + _require_xfs_stress_scrub + _require_xfs_io_command "repair" + _require_xfs_io_error_injection "force_repair" + _require_freeze +} + +# Clean up after the loops in case they didn't do it themselves. +_scratch_xfs_stress_scrub_cleanup() { + $KILLALL_PROG -TERM xfs_io fsstress >> $seqres.full 2>&1 + $XFS_IO_PROG -x -c 'thaw' $SCRATCH_MNT >> $seqres.full 2>&1 +} + +# Start scrub, freeze, and fsstress in background looping processes, and wait +# for 30*TIME_FACTOR seconds to see if the filesystem goes down. Callers +# must call _scratch_xfs_stress_scrub_cleanup from their cleanup functions. +_scratch_xfs_stress_scrub() { + local start="$(date +%s)" + local end="$((start + (30 * TIME_FACTOR) ))" + + echo "Loop started at $(date --date="@${start}")," \ + "ending at $(date --date="@${end}")" >> $seqres.full + + __stress_scrub_fsstress_loop $end & + __stress_scrub_freeze_loop $end & + __stress_scrub_loop $end & + + # Wait until 2 seconds after the loops should have finished, then + # clean up after ourselves. + while [ "$(date +%s)" -lt $((end + 2)) ]; do + sleep 1 + done + _scratch_xfs_stress_scrub_cleanup + + echo "Loop finished at $(date)" >> $seqres.full +} + +# Start online repair, freeze, and fsstress in background looping processes, +# and wait for 30*TIME_FACTOR seconds to see if the filesystem goes down. +# Same requirements and arguments as _scratch_xfs_stress_scrub. +_scratch_xfs_stress_online_repair() { + $XFS_IO_PROG -x -c 'inject force_repair' $SCRATCH_MNT + _scratch_xfs_stress_scrub "$@" +} diff --git a/tests/xfs/422 b/tests/xfs/422 index 9ed944ed63..0bf08572f3 100755 --- a/tests/xfs/422 +++ b/tests/xfs/422 @@ -4,40 +4,19 @@ # # FS QA Test No. 422 # -# Race freeze and rmapbt repair for a while to see if we crash or livelock. +# Race fsstress and rmapbt repair for a while to see if we crash or livelock. # rmapbt repair requires us to freeze the filesystem to stop all filesystem # activity, so we can't have userspace wandering in and thawing it. # . ./common/preamble _begin_fstest online_repair dangerous_fsstress_repair freeze -_register_cleanup "_cleanup" BUS - -# First kill and wait the freeze loop so it won't try to freeze fs again -# Then make sure fs is not frozen -# Then kill and wait for the rest of the workers -# Because if fs is frozen a killed writer will never exit -kill_loops() { - local sig=$1 - - [ -n "$freeze_pid" ] && kill $sig $freeze_pid - wait $freeze_pid - unset freeze_pid - $XFS_IO_PROG -x -c 'thaw' $SCRATCH_MNT - [ -n "$stress_pid" ] && kill $sig $stress_pid - [ -n "$repair_pid" ] && kill $sig $repair_pid - wait - unset stress_pid - unset repair_pid -} - -# Override the default cleanup function. -_cleanup() -{ - kill_loops -9 > /dev/null 2>&1 +_cleanup() { + _scratch_xfs_stress_scrub_cleanup &> /dev/null cd / - rm -rf $tmp.* + rm -r -f $tmp.* } +_register_cleanup "_cleanup" BUS # Import common functions. . ./common/filter @@ -47,80 +26,13 @@ _cleanup() # real QA test starts here _supported_fs xfs _require_xfs_scratch_rmapbt -_require_xfs_io_command "scrub" -_require_xfs_io_error_injection "force_repair" -_require_command "$KILLALL_PROG" killall -_require_freeze +_require_xfs_stress_online_repair -echo "Format and populate" _scratch_mkfs > "$seqres.full" 2>&1 _scratch_mount - -STRESS_DIR="$SCRATCH_MNT/testdir" -mkdir -p $STRESS_DIR - -for i in $(seq 0 9); do - mkdir -p $STRESS_DIR/$i - for j in $(seq 0 9); do - mkdir -p $STRESS_DIR/$i/$j - for k in $(seq 0 9); do - echo x > $STRESS_DIR/$i/$j/$k - done - done -done - -cpus=$(( $($here/src/feature -o) * 4 * LOAD_FACTOR)) - -echo "Concurrent repair" -filter_output() { - grep -E -v '(Device or resource busy|Invalid argument)' -} -freeze_loop() { - end="$1" - - while [ "$(date +%s)" -lt $end ]; do - $XFS_IO_PROG -x -c 'freeze' -c 'thaw' $SCRATCH_MNT 2>&1 | filter_output - done -} -repair_loop() { - end="$1" - - while [ "$(date +%s)" -lt $end ]; do - $XFS_IO_PROG -x -c 'repair rmapbt 0' -c 'repair rmapbt 1' $SCRATCH_MNT 2>&1 | filter_output - done -} -stress_loop() { - end="$1" - - FSSTRESS_ARGS=$(_scale_fsstress_args -p 4 -d $SCRATCH_MNT -n 2000 $FSSTRESS_AVOID) - while [ "$(date +%s)" -lt $end ]; do - $FSSTRESS_PROG $FSSTRESS_ARGS >> $seqres.full - done -} -$XFS_IO_PROG -x -c 'inject force_repair' $SCRATCH_MNT - -start=$(date +%s) -end=$((start + (30 * TIME_FACTOR) )) - -echo "Loop started at $(date --date="@${start}"), ending at $(date --date="@${end}")" >> $seqres.full -stress_loop $end & -stress_pid=$! -freeze_loop $end & -freeze_pid=$! -repair_loop $end & -repair_pid=$! - -# Wait until 2 seconds after the loops should have finished... -while [ "$(date +%s)" -lt $((end + 2)) ]; do - sleep 1 -done - -# ...and clean up after the loops in case they didn't do it themselves. -kill_loops >> $seqres.full 2>&1 - -echo "Loop finished at $(date)" >> $seqres.full -echo "Test done" +_scratch_xfs_stress_online_repair # success, all done +echo Silence is golden status=0 exit diff --git a/tests/xfs/422.out b/tests/xfs/422.out index 3818c48fa8..f70693fde6 100644 --- a/tests/xfs/422.out +++ b/tests/xfs/422.out @@ -1,4 +1,2 @@ QA output created by 422 -Format and populate -Concurrent repair -Test done +Silence is golden ^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 03/16] xfs/422: rework feature detection so we only test-format scratch once 2022-12-30 22:12 ` [PATCHSET v24.0 00/16] fstests: refactor online fsck stress tests Darrick J. Wong ` (5 preceding siblings ...) 2022-12-30 22:12 ` [PATCH 02/16] xfs/422: move the fsstress/freeze/scrub racing logic to common/fuzzy Darrick J. Wong @ 2022-12-30 22:12 ` Darrick J. Wong 2022-12-30 22:12 ` [PATCH 13/16] fuzzy: clean up frozen fses after scrub stress testing Darrick J. Wong ` (8 subsequent siblings) 15 siblings, 0 replies; 32+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:12 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> Rework the feature detection in the one online fsck stress test so that we only format the scratch device twice per test run. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- tests/xfs/422 | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tests/xfs/422 b/tests/xfs/422 index 0bf08572f3..b3353d2202 100755 --- a/tests/xfs/422 +++ b/tests/xfs/422 @@ -25,11 +25,12 @@ _register_cleanup "_cleanup" BUS # real QA test starts here _supported_fs xfs -_require_xfs_scratch_rmapbt +_require_scratch _require_xfs_stress_online_repair _scratch_mkfs > "$seqres.full" 2>&1 _scratch_mount +_require_xfs_has_feature "$SCRATCH_MNT" rmapbt _scratch_xfs_stress_online_repair # success, all done ^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 13/16] fuzzy: clean up frozen fses after scrub stress testing 2022-12-30 22:12 ` [PATCHSET v24.0 00/16] fstests: refactor online fsck stress tests Darrick J. Wong ` (6 preceding siblings ...) 2022-12-30 22:12 ` [PATCH 03/16] xfs/422: rework feature detection so we only test-format scratch once Darrick J. Wong @ 2022-12-30 22:12 ` Darrick J. Wong 2022-12-30 22:12 ` [PATCH 12/16] fuzzy: increase operation count for each fsstress invocation Darrick J. Wong ` (7 subsequent siblings) 15 siblings, 0 replies; 32+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:12 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> Some of our scrub stress tests involve racing scrub, fsstress, and a program that repeatedly freeze and thaws the scratch filesystem. The current cleanup code suffers from the deficiency that it doesn't actually wait for the child processes to exit. First, change it to do that. However, that exposes a second problem: there's a race condition with a freezer process that leads to the stress test exiting with a frozen fs. If the freezer process is blocked trying to acquire the unmount or sb_write locks, the receipt of a signal (even a fatal one) doesn't cause it to abort the freeze. This causes further problems with fstests, since ./check doesn't expect to regain control with the scratch fs frozen. Fix both problems by making the cleanup function smarter. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- common/fuzzy | 35 ++++++++++++++++++++++++++++++++++- 1 file changed, 34 insertions(+), 1 deletion(-) diff --git a/common/fuzzy b/common/fuzzy index 3e23edc9e4..0f6fc91b80 100644 --- a/common/fuzzy +++ b/common/fuzzy @@ -439,8 +439,39 @@ _scratch_xfs_stress_scrub_cleanup() { # Send SIGINT so that bash won't print a 'Terminated' message that # distorts the golden output. + echo "Killing stressor processes at $(date)" >> $seqres.full $KILLALL_PROG -INT xfs_io fsstress >> $seqres.full 2>&1 - $XFS_IO_PROG -x -c 'thaw' $SCRATCH_MNT >> $seqres.full 2>&1 + + # Tests are not allowed to exit with the scratch fs frozen. If we + # started a fs freeze/thaw background loop, wait for that loop to exit + # and then thaw the filesystem. Cleanup for the freeze loop must be + # performed prior to waiting for the other children to avoid triggering + # a race condition that can hang fstests. + # + # If the xfs_io -c freeze process is asleep waiting for a write lock on + # s_umount or sb_write when the killall signal is delivered, it will + # not check for pending signals until after it has frozen the fs. If + # even one thread of the stress test processes (xfs_io, fsstress, etc.) + # is waiting for read locks on sb_write when the killall signals are + # delivered, they will block in the kernel until someone thaws the fs, + # and the `wait' below will wait forever. + # + # Hence we issue the killall, wait for the freezer loop to exit, thaw + # the filesystem, and wait for the rest of the children. + if [ -n "$__SCRUB_STRESS_FREEZE_PID" ]; then + echo "Waiting for fs freezer $__SCRUB_STRESS_FREEZE_PID to exit at $(date)" >> $seqres.full + wait "$__SCRUB_STRESS_FREEZE_PID" + + echo "Thawing filesystem at $(date)" >> $seqres.full + $XFS_IO_PROG -x -c 'thaw' $SCRATCH_MNT >> $seqres.full 2>&1 + __SCRUB_STRESS_FREEZE_PID="" + fi + + # Wait for the remaining children to exit. + echo "Waiting for children to exit at $(date)" >> $seqres.full + wait + + echo "Cleanup finished at $(date)" >> $seqres.full } # Make sure the provided scrub/repair commands actually work on the scratch @@ -476,6 +507,7 @@ _scratch_xfs_stress_scrub() { local scrub_tgt="$SCRATCH_MNT" local runningfile="$tmp.fsstress" + __SCRUB_STRESS_FREEZE_PID="" rm -f "$runningfile" touch "$runningfile" @@ -498,6 +530,7 @@ _scratch_xfs_stress_scrub() { __stress_scrub_fsstress_loop "$end" "$runningfile" & __stress_scrub_freeze_loop "$end" "$runningfile" & + __SCRUB_STRESS_FREEZE_PID="$!" if [ "${#one_scrub_args[@]}" -gt 0 ]; then __stress_one_scrub_loop "$end" "$runningfile" "$scrub_tgt" \ ^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 12/16] fuzzy: increase operation count for each fsstress invocation 2022-12-30 22:12 ` [PATCHSET v24.0 00/16] fstests: refactor online fsck stress tests Darrick J. Wong ` (7 preceding siblings ...) 2022-12-30 22:12 ` [PATCH 13/16] fuzzy: clean up frozen fses after scrub stress testing Darrick J. Wong @ 2022-12-30 22:12 ` Darrick J. Wong 2023-01-13 19:55 ` Zorro Lang 2022-12-30 22:12 ` [PATCH 11/16] fuzzy: clear out the scratch filesystem if it's too full Darrick J. Wong ` (6 subsequent siblings) 15 siblings, 1 reply; 32+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:12 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> For online fsck stress testing, increase the number of filesystem operations per fsstress run to 2 million, now that we have the ability to kill fsstress if the user should push ^C to abort the test early. This should guarantee a couple of hours of continuous stress testing in between clearing the scratch filesystem. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- common/fuzzy | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/common/fuzzy b/common/fuzzy index 01cf7f00d8..3e23edc9e4 100644 --- a/common/fuzzy +++ b/common/fuzzy @@ -399,7 +399,9 @@ __stress_scrub_fsstress_loop() { local end="$1" local runningfile="$2" - local args=$(_scale_fsstress_args -p 4 -d $SCRATCH_MNT -n 2000 $FSSTRESS_AVOID) + # As of March 2022, 2 million fsstress ops should be enough to keep + # any filesystem busy for a couple of hours. + local args=$(_scale_fsstress_args -p 4 -d $SCRATCH_MNT -n 2000000 $FSSTRESS_AVOID) echo "Running $FSSTRESS_PROG $args" >> $seqres.full while __stress_scrub_running "$end" "$runningfile"; do ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH 12/16] fuzzy: increase operation count for each fsstress invocation 2022-12-30 22:12 ` [PATCH 12/16] fuzzy: increase operation count for each fsstress invocation Darrick J. Wong @ 2023-01-13 19:55 ` Zorro Lang 2023-01-13 21:28 ` Darrick J. Wong 0 siblings, 1 reply; 32+ messages in thread From: Zorro Lang @ 2023-01-13 19:55 UTC (permalink / raw) To: Darrick J. Wong; +Cc: linux-xfs, fstests On Fri, Dec 30, 2022 at 02:12:54PM -0800, Darrick J. Wong wrote: > From: Darrick J. Wong <djwong@kernel.org> > > For online fsck stress testing, increase the number of filesystem > operations per fsstress run to 2 million, now that we have the ability > to kill fsstress if the user should push ^C to abort the test early. > This should guarantee a couple of hours of continuous stress testing in > between clearing the scratch filesystem. > > Signed-off-by: Darrick J. Wong <djwong@kernel.org> > --- > common/fuzzy | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > > diff --git a/common/fuzzy b/common/fuzzy > index 01cf7f00d8..3e23edc9e4 100644 > --- a/common/fuzzy > +++ b/common/fuzzy > @@ -399,7 +399,9 @@ __stress_scrub_fsstress_loop() { > local end="$1" > local runningfile="$2" > > - local args=$(_scale_fsstress_args -p 4 -d $SCRATCH_MNT -n 2000 $FSSTRESS_AVOID) > + # As of March 2022, 2 million fsstress ops should be enough to keep > + # any filesystem busy for a couple of hours. > + local args=$(_scale_fsstress_args -p 4 -d $SCRATCH_MNT -n 2000000 $FSSTRESS_AVOID) Can fsstress "-l 0" option help? > echo "Running $FSSTRESS_PROG $args" >> $seqres.full > > while __stress_scrub_running "$end" "$runningfile"; do > ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 12/16] fuzzy: increase operation count for each fsstress invocation 2023-01-13 19:55 ` Zorro Lang @ 2023-01-13 21:28 ` Darrick J. Wong 0 siblings, 0 replies; 32+ messages in thread From: Darrick J. Wong @ 2023-01-13 21:28 UTC (permalink / raw) To: Zorro Lang; +Cc: linux-xfs, fstests On Sat, Jan 14, 2023 at 03:55:25AM +0800, Zorro Lang wrote: > On Fri, Dec 30, 2022 at 02:12:54PM -0800, Darrick J. Wong wrote: > > From: Darrick J. Wong <djwong@kernel.org> > > > > For online fsck stress testing, increase the number of filesystem > > operations per fsstress run to 2 million, now that we have the ability > > to kill fsstress if the user should push ^C to abort the test early. > > This should guarantee a couple of hours of continuous stress testing in > > between clearing the scratch filesystem. > > > > Signed-off-by: Darrick J. Wong <djwong@kernel.org> > > --- > > common/fuzzy | 4 +++- > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > > > diff --git a/common/fuzzy b/common/fuzzy > > index 01cf7f00d8..3e23edc9e4 100644 > > --- a/common/fuzzy > > +++ b/common/fuzzy > > @@ -399,7 +399,9 @@ __stress_scrub_fsstress_loop() { > > local end="$1" > > local runningfile="$2" > > > > - local args=$(_scale_fsstress_args -p 4 -d $SCRATCH_MNT -n 2000 $FSSTRESS_AVOID) > > + # As of March 2022, 2 million fsstress ops should be enough to keep > > + # any filesystem busy for a couple of hours. > > + local args=$(_scale_fsstress_args -p 4 -d $SCRATCH_MNT -n 2000000 $FSSTRESS_AVOID) > > Can fsstress "-l 0" option help? No. -n determines the number of operations per loop, and -l determines the number of loops: $ fsstress -d dor/ -n 5 -v -s 1 0/0: mkdir d0 17 0/0: mkdir add id=0,parent=-1 0/1: link - no file 0/2: mkdir d1 17 0/2: mkdir add id=1,parent=-1 0/3: chown . 127/0 0 0/4: rename - no source filename $ fsstress -d dor/ -n 5 -l 2 -v -s 1 0/0: mkdir d0 17 0/0: mkdir add id=0,parent=-1 0/1: link - no file 0/2: mkdir d1 17 0/2: mkdir add id=1,parent=-1 0/3: chown . 127/0 0 0/4: rename - no source filename 0/0: mkdir d2 0 0/0: mkdir add id=2,parent=-1 0/1: link - no file 0/2: mkdir d2/d3 0 0/2: mkdir add id=3,parent=2 0/3: chown d2 127/0 0 0/4: rename(REXCHANGE) d2/d3 and d2 have ancestor-descendant relationship --D > > echo "Running $FSSTRESS_PROG $args" >> $seqres.full > > > > while __stress_scrub_running "$end" "$runningfile"; do > > > ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 11/16] fuzzy: clear out the scratch filesystem if it's too full 2022-12-30 22:12 ` [PATCHSET v24.0 00/16] fstests: refactor online fsck stress tests Darrick J. Wong ` (8 preceding siblings ...) 2022-12-30 22:12 ` [PATCH 12/16] fuzzy: increase operation count for each fsstress invocation Darrick J. Wong @ 2022-12-30 22:12 ` Darrick J. Wong 2022-12-30 22:12 ` [PATCH 10/16] fuzzy: abort scrub stress testing if the scratch fs went down Darrick J. Wong ` (5 subsequent siblings) 15 siblings, 0 replies; 32+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:12 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> If the online fsck stress tests run for long enough, they'll fill up the scratch filesystem completely. While it is interesting to test repair functionality on a *nearly* full filesystem undergoing a heavy workload, a totally full filesystem is really only exercising the ENOSPC handlers in the kernel. That's not what we came here to test, so change the fsstress loop to detect a nearly full filesystem and erase everything before starting fsstress again. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- common/fuzzy | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/common/fuzzy b/common/fuzzy index f1bc2dc756..01cf7f00d8 100644 --- a/common/fuzzy +++ b/common/fuzzy @@ -380,6 +380,20 @@ __stress_one_scrub_loop() { done } +# Clean the scratch filesystem between rounds of fsstress if there is 2% +# available space or less because that isn't an interesting stress test. +# +# Returns 0 if we cleared anything, and 1 if we did nothing. +__stress_scrub_clean_scratch() { + local used_pct="$(_used $SCRATCH_DEV)" + + test "$used_pct" -lt 98 && return 1 + + echo "Clearing scratch fs at $(date)" >> $seqres.full + rm -r -f $SCRATCH_MNT/p* + return 0 +} + # Run fsstress while we're testing online fsck. __stress_scrub_fsstress_loop() { local end="$1" @@ -389,6 +403,8 @@ __stress_scrub_fsstress_loop() { echo "Running $FSSTRESS_PROG $args" >> $seqres.full while __stress_scrub_running "$end" "$runningfile"; do + # Need to recheck running conditions if we cleared anything + __stress_scrub_clean_scratch && continue $FSSTRESS_PROG $args >> $seqres.full echo "fsstress exits with $? at $(date)" >> $seqres.full done ^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 10/16] fuzzy: abort scrub stress testing if the scratch fs went down 2022-12-30 22:12 ` [PATCHSET v24.0 00/16] fstests: refactor online fsck stress tests Darrick J. Wong ` (9 preceding siblings ...) 2022-12-30 22:12 ` [PATCH 11/16] fuzzy: clear out the scratch filesystem if it's too full Darrick J. Wong @ 2022-12-30 22:12 ` Darrick J. Wong 2022-12-30 22:12 ` [PATCH 14/16] fuzzy: make freezing optional for scrub stress tests Darrick J. Wong ` (4 subsequent siblings) 15 siblings, 0 replies; 32+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:12 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> There's no point in continuing a stress test of online fsck if the filesystem goes down. We can't query that kind of state directly, so as a proxy we try to stat the mountpoint and interpret any error return as a sign that the fs is down. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- common/fuzzy | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/common/fuzzy b/common/fuzzy index 6519d5c1e2..f1bc2dc756 100644 --- a/common/fuzzy +++ b/common/fuzzy @@ -338,10 +338,17 @@ __stress_scrub_filter_output() { -e '/No space left on device/d' } +# Decide if the scratch filesystem is still alive. +__stress_scrub_scratch_alive() { + # If we can't stat the scratch filesystem, there's a reasonably good + # chance that the fs shut down, which is not good. + stat "$SCRATCH_MNT" &>/dev/null +} + # Decide if we want to keep running stress tests. The first argument is the # stop time, and second argument is the path to the sentinel file. __stress_scrub_running() { - test -e "$2" && test "$(date +%s)" -lt "$1" + test -e "$2" && test "$(date +%s)" -lt "$1" && __stress_scrub_scratch_alive } # Run fs freeze and thaw in a tight loop. @@ -486,6 +493,10 @@ _scratch_xfs_stress_scrub() { done _scratch_xfs_stress_scrub_cleanup + # Warn the user if we think the scratch filesystem went down. + __stress_scrub_scratch_alive || \ + echo "Did the scratch filesystem die?" + echo "Loop finished at $(date)" >> $seqres.full } ^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 14/16] fuzzy: make freezing optional for scrub stress tests 2022-12-30 22:12 ` [PATCHSET v24.0 00/16] fstests: refactor online fsck stress tests Darrick J. Wong ` (10 preceding siblings ...) 2022-12-30 22:12 ` [PATCH 10/16] fuzzy: abort scrub stress testing if the scratch fs went down Darrick J. Wong @ 2022-12-30 22:12 ` Darrick J. Wong 2022-12-30 22:12 ` [PATCH 08/16] fuzzy: test the scrub stress subcommands before looping Darrick J. Wong ` (3 subsequent siblings) 15 siblings, 0 replies; 32+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:12 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> Make the freeze/thaw loop optional, since that's a significant change in behavior if it's enabled. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- common/fuzzy | 13 ++++++++++--- tests/xfs/422 | 2 +- 2 files changed, 11 insertions(+), 4 deletions(-) diff --git a/common/fuzzy b/common/fuzzy index 0f6fc91b80..219dd3bb0a 100644 --- a/common/fuzzy +++ b/common/fuzzy @@ -499,6 +499,8 @@ __stress_scrub_check_commands() { # # Various options include: # +# -f Run a freeze/thaw loop while we're doing other things. Defaults to +# disabled, unless XFS_SCRUB_STRESS_FREEZE is set. # -s Pass this command to xfs_io to test scrub. If zero -s options are # specified, xfs_io will not be run. # -t Run online scrub against this file; $SCRATCH_MNT is the default. @@ -506,14 +508,16 @@ _scratch_xfs_stress_scrub() { local one_scrub_args=() local scrub_tgt="$SCRATCH_MNT" local runningfile="$tmp.fsstress" + local freeze="${XFS_SCRUB_STRESS_FREEZE}" __SCRUB_STRESS_FREEZE_PID="" rm -f "$runningfile" touch "$runningfile" OPTIND=1 - while getopts "s:t:" c; do + while getopts "fs:t:" c; do case "$c" in + f) freeze=yes;; s) one_scrub_args+=("$OPTARG");; t) scrub_tgt="$OPTARG";; *) return 1; ;; @@ -529,8 +533,11 @@ _scratch_xfs_stress_scrub() { "ending at $(date --date="@${end}")" >> $seqres.full __stress_scrub_fsstress_loop "$end" "$runningfile" & - __stress_scrub_freeze_loop "$end" "$runningfile" & - __SCRUB_STRESS_FREEZE_PID="$!" + + if [ -n "$freeze" ]; then + __stress_scrub_freeze_loop "$end" "$runningfile" & + __SCRUB_STRESS_FREEZE_PID="$!" + fi if [ "${#one_scrub_args[@]}" -gt 0 ]; then __stress_one_scrub_loop "$end" "$runningfile" "$scrub_tgt" \ diff --git a/tests/xfs/422 b/tests/xfs/422 index faea5d6792..ac88713257 100755 --- a/tests/xfs/422 +++ b/tests/xfs/422 @@ -31,7 +31,7 @@ _require_xfs_stress_online_repair _scratch_mkfs > "$seqres.full" 2>&1 _scratch_mount _require_xfs_has_feature "$SCRATCH_MNT" rmapbt -_scratch_xfs_stress_online_repair -s "repair rmapbt 0" -s "repair rmapbt 1" +_scratch_xfs_stress_online_repair -f -s "repair rmapbt 0" -s "repair rmapbt 1" # success, all done echo Silence is golden ^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 08/16] fuzzy: test the scrub stress subcommands before looping 2022-12-30 22:12 ` [PATCHSET v24.0 00/16] fstests: refactor online fsck stress tests Darrick J. Wong ` (11 preceding siblings ...) 2022-12-30 22:12 ` [PATCH 14/16] fuzzy: make freezing optional for scrub stress tests Darrick J. Wong @ 2022-12-30 22:12 ` Darrick J. Wong 2022-12-30 22:12 ` [PATCH 09/16] fuzzy: make scrub stress loop control more robust Darrick J. Wong ` (2 subsequent siblings) 15 siblings, 0 replies; 32+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:12 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> Before we commit to running fsstress and scrub commands in a loop for some time, we should check that the provided commands actually work on the scratch filesystem. The _require_xfs_io_command predicate only detects the presence of the scrub ioctl, not any particular subcommand. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- common/fuzzy | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/common/fuzzy b/common/fuzzy index 88ba5fef69..8d3e30e32b 100644 --- a/common/fuzzy +++ b/common/fuzzy @@ -405,6 +405,25 @@ _scratch_xfs_stress_scrub_cleanup() { $XFS_IO_PROG -x -c 'thaw' $SCRATCH_MNT >> $seqres.full 2>&1 } +# Make sure the provided scrub/repair commands actually work on the scratch +# filesystem before we start running them in a loop. +__stress_scrub_check_commands() { + local scrub_tgt="$1" + shift + + for arg in "$@"; do + testio=`$XFS_IO_PROG -x -c "$arg" $scrub_tgt 2>&1` + echo $testio | grep -q "Unknown type" && \ + _notrun "xfs_io scrub subcommand support is missing" + echo $testio | grep -q "Inappropriate ioctl" && \ + _notrun "kernel scrub ioctl is missing" + echo $testio | grep -q "No such file or directory" && \ + _notrun "kernel does not know about: $arg" + echo $testio | grep -q "Operation not supported" && \ + _notrun "kernel does not support: $arg" + done +} + # Start scrub, freeze, and fsstress in background looping processes, and wait # for 30*TIME_FACTOR seconds to see if the filesystem goes down. Callers # must call _scratch_xfs_stress_scrub_cleanup from their cleanup functions. @@ -427,6 +446,8 @@ _scratch_xfs_stress_scrub() { esac done + __stress_scrub_check_commands "$scrub_tgt" "${one_scrub_args[@]}" + local start="$(date +%s)" local end="$((start + (30 * TIME_FACTOR) ))" ^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 09/16] fuzzy: make scrub stress loop control more robust 2022-12-30 22:12 ` [PATCHSET v24.0 00/16] fstests: refactor online fsck stress tests Darrick J. Wong ` (12 preceding siblings ...) 2022-12-30 22:12 ` [PATCH 08/16] fuzzy: test the scrub stress subcommands before looping Darrick J. Wong @ 2022-12-30 22:12 ` Darrick J. Wong 2022-12-30 22:12 ` [PATCH 16/16] fuzzy: delay the start of the scrub loop when stress-testing scrub Darrick J. Wong 2022-12-30 22:12 ` [PATCH 15/16] fuzzy: allow substitution of AG numbers when configuring scrub stress test Darrick J. Wong 15 siblings, 0 replies; 32+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:12 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> Currently, each of the scrub stress testing background threads open-codes logic to decide if it should exit the loop. This decision is based entirely on TIME_FACTOR*30 seconds having gone by, which means that we ignore external factors, such as the user pressing ^C, which (in theory) will invoke cleanup functions to tear everything down. This is not a great user experience, so refactor the loop exit test into a helper function and establish a sentinel file that must be present to continue looping. If the user presses ^C, the cleanup function will remove the sentinel file and kill the background thread children, which should be enough to stop everything more or less immediately. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- common/fuzzy | 39 ++++++++++++++++++++++++++++----------- 1 file changed, 28 insertions(+), 11 deletions(-) diff --git a/common/fuzzy b/common/fuzzy index 8d3e30e32b..6519d5c1e2 100644 --- a/common/fuzzy +++ b/common/fuzzy @@ -338,11 +338,18 @@ __stress_scrub_filter_output() { -e '/No space left on device/d' } +# Decide if we want to keep running stress tests. The first argument is the +# stop time, and second argument is the path to the sentinel file. +__stress_scrub_running() { + test -e "$2" && test "$(date +%s)" -lt "$1" +} + # Run fs freeze and thaw in a tight loop. __stress_scrub_freeze_loop() { local end="$1" + local runningfile="$2" - while [ "$(date +%s)" -lt $end ]; do + while __stress_scrub_running "$end" "$runningfile"; do $XFS_IO_PROG -x -c 'freeze' -c 'thaw' $SCRATCH_MNT 2>&1 | \ __stress_freeze_filter_output done @@ -351,15 +358,16 @@ __stress_scrub_freeze_loop() { # Run individual XFS online fsck commands in a tight loop with xfs_io. __stress_one_scrub_loop() { local end="$1" - local scrub_tgt="$2" - shift; shift + local runningfile="$2" + local scrub_tgt="$3" + shift; shift; shift local xfs_io_args=() for arg in "$@"; do xfs_io_args+=('-c' "$arg") done - while [ "$(date +%s)" -lt $end ]; do + while __stress_scrub_running "$end" "$runningfile"; do $XFS_IO_PROG -x "${xfs_io_args[@]}" "$scrub_tgt" 2>&1 | \ __stress_scrub_filter_output done @@ -368,12 +376,16 @@ __stress_one_scrub_loop() { # Run fsstress while we're testing online fsck. __stress_scrub_fsstress_loop() { local end="$1" + local runningfile="$2" local args=$(_scale_fsstress_args -p 4 -d $SCRATCH_MNT -n 2000 $FSSTRESS_AVOID) + echo "Running $FSSTRESS_PROG $args" >> $seqres.full - while [ "$(date +%s)" -lt $end ]; do + while __stress_scrub_running "$end" "$runningfile"; do $FSSTRESS_PROG $args >> $seqres.full + echo "fsstress exits with $? at $(date)" >> $seqres.full done + rm -f "$runningfile" } # Make sure we have everything we need to run stress and scrub @@ -397,6 +409,7 @@ _require_xfs_stress_online_repair() { # Clean up after the loops in case they didn't do it themselves. _scratch_xfs_stress_scrub_cleanup() { + rm -f "$runningfile" echo "Cleaning up scrub stress run at $(date)" >> $seqres.full # Send SIGINT so that bash won't print a 'Terminated' message that @@ -436,6 +449,10 @@ __stress_scrub_check_commands() { _scratch_xfs_stress_scrub() { local one_scrub_args=() local scrub_tgt="$SCRATCH_MNT" + local runningfile="$tmp.fsstress" + + rm -f "$runningfile" + touch "$runningfile" OPTIND=1 while getopts "s:t:" c; do @@ -454,17 +471,17 @@ _scratch_xfs_stress_scrub() { echo "Loop started at $(date --date="@${start}")," \ "ending at $(date --date="@${end}")" >> $seqres.full - __stress_scrub_fsstress_loop $end & - __stress_scrub_freeze_loop $end & + __stress_scrub_fsstress_loop "$end" "$runningfile" & + __stress_scrub_freeze_loop "$end" "$runningfile" & if [ "${#one_scrub_args[@]}" -gt 0 ]; then - __stress_one_scrub_loop "$end" "$scrub_tgt" \ + __stress_one_scrub_loop "$end" "$runningfile" "$scrub_tgt" \ "${one_scrub_args[@]}" & fi - # Wait until 2 seconds after the loops should have finished, then - # clean up after ourselves. - while [ "$(date +%s)" -lt $((end + 2)) ]; do + # Wait until the designated end time or fsstress dies, then kill all of + # our background processes. + while __stress_scrub_running "$end" "$runningfile"; do sleep 1 done _scratch_xfs_stress_scrub_cleanup ^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 16/16] fuzzy: delay the start of the scrub loop when stress-testing scrub 2022-12-30 22:12 ` [PATCHSET v24.0 00/16] fstests: refactor online fsck stress tests Darrick J. Wong ` (13 preceding siblings ...) 2022-12-30 22:12 ` [PATCH 09/16] fuzzy: make scrub stress loop control more robust Darrick J. Wong @ 2022-12-30 22:12 ` Darrick J. Wong 2022-12-30 22:12 ` [PATCH 15/16] fuzzy: allow substitution of AG numbers when configuring scrub stress test Darrick J. Wong 15 siblings, 0 replies; 32+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:12 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> By default, online fsck stress testing kicks off the loops for fsstress and online fsck at the same time. However, in certain debugging scenarios it can help if we let fsstress get a head-start in filling up the filesystem. Plumb in a means to delay the start of the scrub loop. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- common/fuzzy | 19 ++++++++++++++++--- 1 file changed, 16 insertions(+), 3 deletions(-) diff --git a/common/fuzzy b/common/fuzzy index e42e2ccec1..1df51a6dd8 100644 --- a/common/fuzzy +++ b/common/fuzzy @@ -367,7 +367,8 @@ __stress_one_scrub_loop() { local end="$1" local runningfile="$2" local scrub_tgt="$3" - shift; shift; shift + local scrub_startat="$4" + shift; shift; shift; shift local agcount="$(_xfs_mount_agcount $SCRATCH_MNT)" local xfs_io_args=() @@ -383,6 +384,10 @@ __stress_one_scrub_loop() { fi done + while __stress_scrub_running "$scrub_startat" "$runningfile"; do + sleep 1 + done + while __stress_scrub_running "$end" "$runningfile"; do $XFS_IO_PROG -x "${xfs_io_args[@]}" "$scrub_tgt" 2>&1 | \ __stress_scrub_filter_output @@ -514,22 +519,27 @@ __stress_scrub_check_commands() { # -s Pass this command to xfs_io to test scrub. If zero -s options are # specified, xfs_io will not be run. # -t Run online scrub against this file; $SCRATCH_MNT is the default. +# -w Delay the start of the scrub/repair loop by this number of seconds. +# Defaults to no delay unless XFS_SCRUB_STRESS_DELAY is set. This value +# will be clamped to ten seconds before the end time. _scratch_xfs_stress_scrub() { local one_scrub_args=() local scrub_tgt="$SCRATCH_MNT" local runningfile="$tmp.fsstress" local freeze="${XFS_SCRUB_STRESS_FREEZE}" + local scrub_delay="${XFS_SCRUB_STRESS_DELAY:--1}" __SCRUB_STRESS_FREEZE_PID="" rm -f "$runningfile" touch "$runningfile" OPTIND=1 - while getopts "fs:t:" c; do + while getopts "fs:t:w:" c; do case "$c" in f) freeze=yes;; s) one_scrub_args+=("$OPTARG");; t) scrub_tgt="$OPTARG";; + w) scrub_delay="$OPTARG";; *) return 1; ;; esac done @@ -538,6 +548,9 @@ _scratch_xfs_stress_scrub() { local start="$(date +%s)" local end="$((start + (30 * TIME_FACTOR) ))" + local scrub_startat="$((start + scrub_delay))" + test "$scrub_startat" -gt "$((end - 10))" && + scrub_startat="$((end - 10))" echo "Loop started at $(date --date="@${start}")," \ "ending at $(date --date="@${end}")" >> $seqres.full @@ -551,7 +564,7 @@ _scratch_xfs_stress_scrub() { if [ "${#one_scrub_args[@]}" -gt 0 ]; then __stress_one_scrub_loop "$end" "$runningfile" "$scrub_tgt" \ - "${one_scrub_args[@]}" & + "$scrub_startat" "${one_scrub_args[@]}" & fi # Wait until the designated end time or fsstress dies, then kill all of ^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 15/16] fuzzy: allow substitution of AG numbers when configuring scrub stress test 2022-12-30 22:12 ` [PATCHSET v24.0 00/16] fstests: refactor online fsck stress tests Darrick J. Wong ` (14 preceding siblings ...) 2022-12-30 22:12 ` [PATCH 16/16] fuzzy: delay the start of the scrub loop when stress-testing scrub Darrick J. Wong @ 2022-12-30 22:12 ` Darrick J. Wong 15 siblings, 0 replies; 32+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:12 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> Allow the test program to use the metavariable '%agno%' when passing scrub commands to the scrub stress loop. This makes it easier for tests to scrub or repair every AG in the filesystem without a lot of work. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- common/fuzzy | 14 ++++++++++++-- tests/xfs/422 | 2 +- 2 files changed, 13 insertions(+), 3 deletions(-) diff --git a/common/fuzzy b/common/fuzzy index 219dd3bb0a..e42e2ccec1 100644 --- a/common/fuzzy +++ b/common/fuzzy @@ -368,10 +368,19 @@ __stress_one_scrub_loop() { local runningfile="$2" local scrub_tgt="$3" shift; shift; shift + local agcount="$(_xfs_mount_agcount $SCRATCH_MNT)" local xfs_io_args=() for arg in "$@"; do - xfs_io_args+=('-c' "$arg") + if echo "$arg" | grep -q -w '%agno%'; then + # Substitute the AG number + for ((agno = 0; agno < agcount; agno++)); do + local ag_arg="$(echo "$arg" | sed -e "s|%agno%|$agno|g")" + xfs_io_args+=('-c' "$ag_arg") + done + else + xfs_io_args+=('-c' "$arg") + fi done while __stress_scrub_running "$end" "$runningfile"; do @@ -481,7 +490,8 @@ __stress_scrub_check_commands() { shift for arg in "$@"; do - testio=`$XFS_IO_PROG -x -c "$arg" $scrub_tgt 2>&1` + local cooked_arg="$(echo "$arg" | sed -e "s/%agno%/0/g")" + testio=`$XFS_IO_PROG -x -c "$cooked_arg" $scrub_tgt 2>&1` echo $testio | grep -q "Unknown type" && \ _notrun "xfs_io scrub subcommand support is missing" echo $testio | grep -q "Inappropriate ioctl" && \ diff --git a/tests/xfs/422 b/tests/xfs/422 index ac88713257..995f612166 100755 --- a/tests/xfs/422 +++ b/tests/xfs/422 @@ -31,7 +31,7 @@ _require_xfs_stress_online_repair _scratch_mkfs > "$seqres.full" 2>&1 _scratch_mount _require_xfs_has_feature "$SCRATCH_MNT" rmapbt -_scratch_xfs_stress_online_repair -f -s "repair rmapbt 0" -s "repair rmapbt 1" +_scratch_xfs_stress_online_repair -f -s "repair rmapbt %agno%" # success, all done echo Silence is golden ^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCHSET v24.0 0/3] fstests: refactor GETFSMAP stress tests 2022-12-30 21:13 [NYE DELUGE 1/4] xfs: all pending online scrub improvements Darrick J. Wong 2022-12-30 22:12 ` [PATCHSET v24.0 00/16] fstests: refactor online fsck stress tests Darrick J. Wong @ 2022-12-30 22:12 ` Darrick J. Wong 2022-12-30 22:12 ` [PATCH 1/3] fuzzy: enhance scrub stress testing to use fsx Darrick J. Wong ` (2 more replies) 2022-12-30 22:13 ` [PATCHSET v24.0 0/2] fstests: race online scrub with mount state changes Darrick J. Wong 2023-01-13 20:10 ` [NYE DELUGE 1/4] xfs: all pending online scrub improvements Zorro Lang 3 siblings, 3 replies; 32+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:12 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan Hi all, Refactor the fsmap racing tests to use the general scrub stress loop infrastructure that we've now created, and then add a bit more functionality so that we can test racing remounting the filesystem readonly and readwrite. If you're going to start using this mess, you probably ought to just pull from my git trees, which are linked below. This is an extraordinary way to destroy everything. Enjoy! Comments and questions are, as always, welcome. --D fstests git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=refactor-fsmap-stress --- common/fuzzy | 161 ++++++++++++++++++++++++++++++++++++++++++++++++++++- ltp/fsstress.c | 18 ++++++ tests/xfs/517 | 91 +----------------------------- tests/xfs/517.out | 4 - tests/xfs/732 | 38 +++++++++++++ tests/xfs/732.out | 2 + tests/xfs/847 | 38 +++++++++++++ tests/xfs/847.out | 2 + tests/xfs/848 | 38 +++++++++++++ tests/xfs/848.out | 2 + 10 files changed, 300 insertions(+), 94 deletions(-) create mode 100755 tests/xfs/732 create mode 100644 tests/xfs/732.out create mode 100755 tests/xfs/847 create mode 100644 tests/xfs/847.out create mode 100755 tests/xfs/848 create mode 100644 tests/xfs/848.out ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 1/3] fuzzy: enhance scrub stress testing to use fsx 2022-12-30 22:12 ` [PATCHSET v24.0 0/3] fstests: refactor GETFSMAP stress tests Darrick J. Wong @ 2022-12-30 22:12 ` Darrick J. Wong 2023-01-05 5:49 ` Zorro Lang 2023-01-05 18:28 ` [PATCH v24.1 " Darrick J. Wong 2022-12-30 22:12 ` [PATCH 2/3] fuzzy: refactor fsmap stress test to use our helper functions Darrick J. Wong 2022-12-30 22:12 ` [PATCH 3/3] xfs: race fsmap with readonly remounts to detect crash or livelock Darrick J. Wong 2 siblings, 2 replies; 32+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:12 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> Add a couple of new online fsck stress tests that race fsx against online fsck. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- common/fuzzy | 39 ++++++++++++++++++++++++++++++++++++--- tests/xfs/847 | 38 ++++++++++++++++++++++++++++++++++++++ tests/xfs/847.out | 2 ++ tests/xfs/848 | 38 ++++++++++++++++++++++++++++++++++++++ tests/xfs/848.out | 2 ++ 5 files changed, 116 insertions(+), 3 deletions(-) create mode 100755 tests/xfs/847 create mode 100644 tests/xfs/847.out create mode 100755 tests/xfs/848 create mode 100644 tests/xfs/848.out diff --git a/common/fuzzy b/common/fuzzy index 1df51a6dd8..3512e95e02 100644 --- a/common/fuzzy +++ b/common/fuzzy @@ -408,6 +408,30 @@ __stress_scrub_clean_scratch() { return 0 } +# Run fsx while we're testing online fsck. +__stress_scrub_fsx_loop() { + local end="$1" + local runningfile="$2" + local focus=(-q -X) # quiet, validate file contents + + # As of November 2022, 2 million fsx ops should be enough to keep + # any filesystem busy for a couple of hours. + focus+=(-N 2000000) + focus+=(-o $((128000 * LOAD_FACTOR)) ) + focus+=(-l $((600000 * LOAD_FACTOR)) ) + + local args="$FSX_AVOID ${focus[@]} ${SCRATCH_MNT}/fsx.$seq" + echo "Running $here/ltp/fsx $args" >> $seqres.full + + while __stress_scrub_running "$end" "$runningfile"; do + # Need to recheck running conditions if we cleared anything + __stress_scrub_clean_scratch && continue + $here/ltp/fsx $args >> $seqres.full + echo "fsx exits with $? at $(date)" >> $seqres.full + done + rm -f "$runningfile" +} + # Run fsstress while we're testing online fsck. __stress_scrub_fsstress_loop() { local end="$1" @@ -454,7 +478,7 @@ _scratch_xfs_stress_scrub_cleanup() { # Send SIGINT so that bash won't print a 'Terminated' message that # distorts the golden output. echo "Killing stressor processes at $(date)" >> $seqres.full - $KILLALL_PROG -INT xfs_io fsstress >> $seqres.full 2>&1 + $KILLALL_PROG -INT xfs_io fsstress fsx >> $seqres.full 2>&1 # Tests are not allowed to exit with the scratch fs frozen. If we # started a fs freeze/thaw background loop, wait for that loop to exit @@ -522,30 +546,39 @@ __stress_scrub_check_commands() { # -w Delay the start of the scrub/repair loop by this number of seconds. # Defaults to no delay unless XFS_SCRUB_STRESS_DELAY is set. This value # will be clamped to ten seconds before the end time. +# -X Run this program to exercise the filesystem. Currently supported +# options are 'fsx' and 'fsstress'. The default is 'fsstress'. _scratch_xfs_stress_scrub() { local one_scrub_args=() local scrub_tgt="$SCRATCH_MNT" local runningfile="$tmp.fsstress" local freeze="${XFS_SCRUB_STRESS_FREEZE}" local scrub_delay="${XFS_SCRUB_STRESS_DELAY:--1}" + local exerciser="fsstress" __SCRUB_STRESS_FREEZE_PID="" rm -f "$runningfile" touch "$runningfile" OPTIND=1 - while getopts "fs:t:w:" c; do + while getopts "fs:t:w:X:" c; do case "$c" in f) freeze=yes;; s) one_scrub_args+=("$OPTARG");; t) scrub_tgt="$OPTARG";; w) scrub_delay="$OPTARG";; + X) exerciser="$OPTARG";; *) return 1; ;; esac done __stress_scrub_check_commands "$scrub_tgt" "${one_scrub_args[@]}" + if ! command -v "__stress_scrub_${exerciser}_loop" &>/dev/null; then + echo "${exerciser}: Unknown fs exercise program." + return 1 + fi + local start="$(date +%s)" local end="$((start + (30 * TIME_FACTOR) ))" local scrub_startat="$((start + scrub_delay))" @@ -555,7 +588,7 @@ _scratch_xfs_stress_scrub() { echo "Loop started at $(date --date="@${start}")," \ "ending at $(date --date="@${end}")" >> $seqres.full - __stress_scrub_fsstress_loop "$end" "$runningfile" & + "__stress_scrub_${exerciser}_loop" "$end" "$runningfile" & if [ -n "$freeze" ]; then __stress_scrub_freeze_loop "$end" "$runningfile" & diff --git a/tests/xfs/847 b/tests/xfs/847 new file mode 100755 index 0000000000..856e9a6c26 --- /dev/null +++ b/tests/xfs/847 @@ -0,0 +1,38 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (c) 2022 Oracle, Inc. All Rights Reserved. +# +# FS QA Test No. 847 +# +# Race fsx and xfs_scrub in read-only mode for a while to see if we crash +# or livelock. +# +. ./common/preamble +_begin_fstest scrub dangerous_fsstress_scrub + +_cleanup() { + cd / + _scratch_xfs_stress_scrub_cleanup &> /dev/null + rm -r -f $tmp.* +} +_register_cleanup "_cleanup" BUS + +# Import common functions. +. ./common/filter +. ./common/fuzzy +. ./common/inject +. ./common/xfs + +# real QA test starts here +_supported_fs xfs +_require_scratch +_require_xfs_stress_scrub + +_scratch_mkfs > "$seqres.full" 2>&1 +_scratch_mount +_scratch_xfs_stress_scrub -S '-n' -X 'fsx' + +# success, all done +echo Silence is golden +status=0 +exit diff --git a/tests/xfs/847.out b/tests/xfs/847.out new file mode 100644 index 0000000000..b7041db159 --- /dev/null +++ b/tests/xfs/847.out @@ -0,0 +1,2 @@ +QA output created by 847 +Silence is golden diff --git a/tests/xfs/848 b/tests/xfs/848 new file mode 100755 index 0000000000..ab32020624 --- /dev/null +++ b/tests/xfs/848 @@ -0,0 +1,38 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (c) 2022 Oracle, Inc. All Rights Reserved. +# +# FS QA Test No. 848 +# +# Race fsx and xfs_scrub in force-repair mode for a while to see if we +# crash or livelock. +# +. ./common/preamble +_begin_fstest online_repair dangerous_fsstress_repair + +_cleanup() { + cd / + _scratch_xfs_stress_scrub_cleanup &> /dev/null + rm -r -f $tmp.* +} +_register_cleanup "_cleanup" BUS + +# Import common functions. +. ./common/filter +. ./common/fuzzy +. ./common/inject +. ./common/xfs + +# real QA test starts here +_supported_fs xfs +_require_scratch +_require_xfs_stress_online_repair + +_scratch_mkfs > "$seqres.full" 2>&1 +_scratch_mount +_scratch_xfs_stress_online_repair -S '-k' -X 'fsx' + +# success, all done +echo Silence is golden +status=0 +exit diff --git a/tests/xfs/848.out b/tests/xfs/848.out new file mode 100644 index 0000000000..23f674045c --- /dev/null +++ b/tests/xfs/848.out @@ -0,0 +1,2 @@ +QA output created by 848 +Silence is golden ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH 1/3] fuzzy: enhance scrub stress testing to use fsx 2022-12-30 22:12 ` [PATCH 1/3] fuzzy: enhance scrub stress testing to use fsx Darrick J. Wong @ 2023-01-05 5:49 ` Zorro Lang 2023-01-05 18:28 ` Darrick J. Wong 2023-01-05 18:28 ` [PATCH v24.1 " Darrick J. Wong 1 sibling, 1 reply; 32+ messages in thread From: Zorro Lang @ 2023-01-05 5:49 UTC (permalink / raw) To: Darrick J. Wong; +Cc: linux-xfs, fstests On Fri, Dec 30, 2022 at 02:12:57PM -0800, Darrick J. Wong wrote: > From: Darrick J. Wong <djwong@kernel.org> > > Add a couple of new online fsck stress tests that race fsx against > online fsck. > > Signed-off-by: Darrick J. Wong <djwong@kernel.org> > --- > common/fuzzy | 39 ++++++++++++++++++++++++++++++++++++--- > tests/xfs/847 | 38 ++++++++++++++++++++++++++++++++++++++ > tests/xfs/847.out | 2 ++ > tests/xfs/848 | 38 ++++++++++++++++++++++++++++++++++++++ > tests/xfs/848.out | 2 ++ > 5 files changed, 116 insertions(+), 3 deletions(-) > create mode 100755 tests/xfs/847 > create mode 100644 tests/xfs/847.out > create mode 100755 tests/xfs/848 > create mode 100644 tests/xfs/848.out > > > diff --git a/common/fuzzy b/common/fuzzy > index 1df51a6dd8..3512e95e02 100644 > --- a/common/fuzzy > +++ b/common/fuzzy > @@ -408,6 +408,30 @@ __stress_scrub_clean_scratch() { > return 0 > } > > +# Run fsx while we're testing online fsck. > +__stress_scrub_fsx_loop() { > + local end="$1" > + local runningfile="$2" > + local focus=(-q -X) # quiet, validate file contents > + > + # As of November 2022, 2 million fsx ops should be enough to keep > + # any filesystem busy for a couple of hours. > + focus+=(-N 2000000) > + focus+=(-o $((128000 * LOAD_FACTOR)) ) > + focus+=(-l $((600000 * LOAD_FACTOR)) ) > + > + local args="$FSX_AVOID ${focus[@]} ${SCRATCH_MNT}/fsx.$seq" > + echo "Running $here/ltp/fsx $args" >> $seqres.full > + > + while __stress_scrub_running "$end" "$runningfile"; do > + # Need to recheck running conditions if we cleared anything > + __stress_scrub_clean_scratch && continue > + $here/ltp/fsx $args >> $seqres.full > + echo "fsx exits with $? at $(date)" >> $seqres.full > + done > + rm -f "$runningfile" > +} > + > # Run fsstress while we're testing online fsck. > __stress_scrub_fsstress_loop() { > local end="$1" > @@ -454,7 +478,7 @@ _scratch_xfs_stress_scrub_cleanup() { > # Send SIGINT so that bash won't print a 'Terminated' message that > # distorts the golden output. > echo "Killing stressor processes at $(date)" >> $seqres.full > - $KILLALL_PROG -INT xfs_io fsstress >> $seqres.full 2>&1 > + $KILLALL_PROG -INT xfs_io fsstress fsx >> $seqres.full 2>&1 > > # Tests are not allowed to exit with the scratch fs frozen. If we > # started a fs freeze/thaw background loop, wait for that loop to exit > @@ -522,30 +546,39 @@ __stress_scrub_check_commands() { > # -w Delay the start of the scrub/repair loop by this number of seconds. > # Defaults to no delay unless XFS_SCRUB_STRESS_DELAY is set. This value > # will be clamped to ten seconds before the end time. > +# -X Run this program to exercise the filesystem. Currently supported > +# options are 'fsx' and 'fsstress'. The default is 'fsstress'. > _scratch_xfs_stress_scrub() { > local one_scrub_args=() > local scrub_tgt="$SCRATCH_MNT" > local runningfile="$tmp.fsstress" > local freeze="${XFS_SCRUB_STRESS_FREEZE}" > local scrub_delay="${XFS_SCRUB_STRESS_DELAY:--1}" > + local exerciser="fsstress" > > __SCRUB_STRESS_FREEZE_PID="" > rm -f "$runningfile" > touch "$runningfile" > > OPTIND=1 > - while getopts "fs:t:w:" c; do > + while getopts "fs:t:w:X:" c; do > case "$c" in > f) freeze=yes;; > s) one_scrub_args+=("$OPTARG");; > t) scrub_tgt="$OPTARG";; > w) scrub_delay="$OPTARG";; > + X) exerciser="$OPTARG";; > *) return 1; ;; > esac > done > > __stress_scrub_check_commands "$scrub_tgt" "${one_scrub_args[@]}" > > + if ! command -v "__stress_scrub_${exerciser}_loop" &>/dev/null; then > + echo "${exerciser}: Unknown fs exercise program." > + return 1 > + fi > + > local start="$(date +%s)" > local end="$((start + (30 * TIME_FACTOR) ))" > local scrub_startat="$((start + scrub_delay))" > @@ -555,7 +588,7 @@ _scratch_xfs_stress_scrub() { > echo "Loop started at $(date --date="@${start}")," \ > "ending at $(date --date="@${end}")" >> $seqres.full > > - __stress_scrub_fsstress_loop "$end" "$runningfile" & > + "__stress_scrub_${exerciser}_loop" "$end" "$runningfile" & > > if [ -n "$freeze" ]; then > __stress_scrub_freeze_loop "$end" "$runningfile" & > diff --git a/tests/xfs/847 b/tests/xfs/847 > new file mode 100755 > index 0000000000..856e9a6c26 > --- /dev/null > +++ b/tests/xfs/847 > @@ -0,0 +1,38 @@ > +#! /bin/bash > +# SPDX-License-Identifier: GPL-2.0 > +# Copyright (c) 2022 Oracle, Inc. All Rights Reserved. > +# > +# FS QA Test No. 847 > +# > +# Race fsx and xfs_scrub in read-only mode for a while to see if we crash > +# or livelock. > +# > +. ./common/preamble > +_begin_fstest scrub dangerous_fsstress_scrub Hi Darrick, Such huge patchsets :) I'll try to review them one by one (patchset). Now I'm trying to review "[NYE DELUGE 1/4]", but I can't find the "dangerous_fsstress_scrub" group in the whole patchsets. Is there any prepositive patch(set)? Or you'd like to use "dangerous_fsstress_repair"? P.S: More cases use "dangerous_fsstress_scrub" in your new patchsets. Thanks, Zorro > + > +_cleanup() { > + cd / > + _scratch_xfs_stress_scrub_cleanup &> /dev/null > + rm -r -f $tmp.* > +} > +_register_cleanup "_cleanup" BUS > + > +# Import common functions. > +. ./common/filter > +. ./common/fuzzy > +. ./common/inject > +. ./common/xfs > + > +# real QA test starts here > +_supported_fs xfs > +_require_scratch > +_require_xfs_stress_scrub > + > +_scratch_mkfs > "$seqres.full" 2>&1 > +_scratch_mount > +_scratch_xfs_stress_scrub -S '-n' -X 'fsx' > + > +# success, all done > +echo Silence is golden > +status=0 > +exit > diff --git a/tests/xfs/847.out b/tests/xfs/847.out > new file mode 100644 > index 0000000000..b7041db159 > --- /dev/null > +++ b/tests/xfs/847.out > @@ -0,0 +1,2 @@ > +QA output created by 847 > +Silence is golden > diff --git a/tests/xfs/848 b/tests/xfs/848 > new file mode 100755 > index 0000000000..ab32020624 > --- /dev/null > +++ b/tests/xfs/848 > @@ -0,0 +1,38 @@ > +#! /bin/bash > +# SPDX-License-Identifier: GPL-2.0 > +# Copyright (c) 2022 Oracle, Inc. All Rights Reserved. > +# > +# FS QA Test No. 848 > +# > +# Race fsx and xfs_scrub in force-repair mode for a while to see if we > +# crash or livelock. > +# > +. ./common/preamble > +_begin_fstest online_repair dangerous_fsstress_repair > + > +_cleanup() { > + cd / > + _scratch_xfs_stress_scrub_cleanup &> /dev/null > + rm -r -f $tmp.* > +} > +_register_cleanup "_cleanup" BUS > + > +# Import common functions. > +. ./common/filter > +. ./common/fuzzy > +. ./common/inject > +. ./common/xfs > + > +# real QA test starts here > +_supported_fs xfs > +_require_scratch > +_require_xfs_stress_online_repair > + > +_scratch_mkfs > "$seqres.full" 2>&1 > +_scratch_mount > +_scratch_xfs_stress_online_repair -S '-k' -X 'fsx' > + > +# success, all done > +echo Silence is golden > +status=0 > +exit > diff --git a/tests/xfs/848.out b/tests/xfs/848.out > new file mode 100644 > index 0000000000..23f674045c > --- /dev/null > +++ b/tests/xfs/848.out > @@ -0,0 +1,2 @@ > +QA output created by 848 > +Silence is golden > ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 1/3] fuzzy: enhance scrub stress testing to use fsx 2023-01-05 5:49 ` Zorro Lang @ 2023-01-05 18:28 ` Darrick J. Wong 0 siblings, 0 replies; 32+ messages in thread From: Darrick J. Wong @ 2023-01-05 18:28 UTC (permalink / raw) To: Zorro Lang; +Cc: linux-xfs, fstests On Thu, Jan 05, 2023 at 01:49:20PM +0800, Zorro Lang wrote: > On Fri, Dec 30, 2022 at 02:12:57PM -0800, Darrick J. Wong wrote: > > From: Darrick J. Wong <djwong@kernel.org> > > > > Add a couple of new online fsck stress tests that race fsx against > > online fsck. > > > > Signed-off-by: Darrick J. Wong <djwong@kernel.org> > > --- > > common/fuzzy | 39 ++++++++++++++++++++++++++++++++++++--- > > tests/xfs/847 | 38 ++++++++++++++++++++++++++++++++++++++ > > tests/xfs/847.out | 2 ++ > > tests/xfs/848 | 38 ++++++++++++++++++++++++++++++++++++++ > > tests/xfs/848.out | 2 ++ > > 5 files changed, 116 insertions(+), 3 deletions(-) > > create mode 100755 tests/xfs/847 > > create mode 100644 tests/xfs/847.out > > create mode 100755 tests/xfs/848 > > create mode 100644 tests/xfs/848.out > > > > > > diff --git a/common/fuzzy b/common/fuzzy > > index 1df51a6dd8..3512e95e02 100644 > > --- a/common/fuzzy > > +++ b/common/fuzzy > > @@ -408,6 +408,30 @@ __stress_scrub_clean_scratch() { > > return 0 > > } > > > > +# Run fsx while we're testing online fsck. > > +__stress_scrub_fsx_loop() { > > + local end="$1" > > + local runningfile="$2" > > + local focus=(-q -X) # quiet, validate file contents > > + > > + # As of November 2022, 2 million fsx ops should be enough to keep > > + # any filesystem busy for a couple of hours. > > + focus+=(-N 2000000) > > + focus+=(-o $((128000 * LOAD_FACTOR)) ) > > + focus+=(-l $((600000 * LOAD_FACTOR)) ) > > + > > + local args="$FSX_AVOID ${focus[@]} ${SCRATCH_MNT}/fsx.$seq" > > + echo "Running $here/ltp/fsx $args" >> $seqres.full > > + > > + while __stress_scrub_running "$end" "$runningfile"; do > > + # Need to recheck running conditions if we cleared anything > > + __stress_scrub_clean_scratch && continue > > + $here/ltp/fsx $args >> $seqres.full > > + echo "fsx exits with $? at $(date)" >> $seqres.full > > + done > > + rm -f "$runningfile" > > +} > > + > > # Run fsstress while we're testing online fsck. > > __stress_scrub_fsstress_loop() { > > local end="$1" > > @@ -454,7 +478,7 @@ _scratch_xfs_stress_scrub_cleanup() { > > # Send SIGINT so that bash won't print a 'Terminated' message that > > # distorts the golden output. > > echo "Killing stressor processes at $(date)" >> $seqres.full > > - $KILLALL_PROG -INT xfs_io fsstress >> $seqres.full 2>&1 > > + $KILLALL_PROG -INT xfs_io fsstress fsx >> $seqres.full 2>&1 > > > > # Tests are not allowed to exit with the scratch fs frozen. If we > > # started a fs freeze/thaw background loop, wait for that loop to exit > > @@ -522,30 +546,39 @@ __stress_scrub_check_commands() { > > # -w Delay the start of the scrub/repair loop by this number of seconds. > > # Defaults to no delay unless XFS_SCRUB_STRESS_DELAY is set. This value > > # will be clamped to ten seconds before the end time. > > +# -X Run this program to exercise the filesystem. Currently supported > > +# options are 'fsx' and 'fsstress'. The default is 'fsstress'. > > _scratch_xfs_stress_scrub() { > > local one_scrub_args=() > > local scrub_tgt="$SCRATCH_MNT" > > local runningfile="$tmp.fsstress" > > local freeze="${XFS_SCRUB_STRESS_FREEZE}" > > local scrub_delay="${XFS_SCRUB_STRESS_DELAY:--1}" > > + local exerciser="fsstress" > > > > __SCRUB_STRESS_FREEZE_PID="" > > rm -f "$runningfile" > > touch "$runningfile" > > > > OPTIND=1 > > - while getopts "fs:t:w:" c; do > > + while getopts "fs:t:w:X:" c; do > > case "$c" in > > f) freeze=yes;; > > s) one_scrub_args+=("$OPTARG");; > > t) scrub_tgt="$OPTARG";; > > w) scrub_delay="$OPTARG";; > > + X) exerciser="$OPTARG";; > > *) return 1; ;; > > esac > > done > > > > __stress_scrub_check_commands "$scrub_tgt" "${one_scrub_args[@]}" > > > > + if ! command -v "__stress_scrub_${exerciser}_loop" &>/dev/null; then > > + echo "${exerciser}: Unknown fs exercise program." > > + return 1 > > + fi > > + > > local start="$(date +%s)" > > local end="$((start + (30 * TIME_FACTOR) ))" > > local scrub_startat="$((start + scrub_delay))" > > @@ -555,7 +588,7 @@ _scratch_xfs_stress_scrub() { > > echo "Loop started at $(date --date="@${start}")," \ > > "ending at $(date --date="@${end}")" >> $seqres.full > > > > - __stress_scrub_fsstress_loop "$end" "$runningfile" & > > + "__stress_scrub_${exerciser}_loop" "$end" "$runningfile" & > > > > if [ -n "$freeze" ]; then > > __stress_scrub_freeze_loop "$end" "$runningfile" & > > diff --git a/tests/xfs/847 b/tests/xfs/847 > > new file mode 100755 > > index 0000000000..856e9a6c26 > > --- /dev/null > > +++ b/tests/xfs/847 > > @@ -0,0 +1,38 @@ > > +#! /bin/bash > > +# SPDX-License-Identifier: GPL-2.0 > > +# Copyright (c) 2022 Oracle, Inc. All Rights Reserved. > > +# > > +# FS QA Test No. 847 > > +# > > +# Race fsx and xfs_scrub in read-only mode for a while to see if we crash > > +# or livelock. > > +# > > +. ./common/preamble > > +_begin_fstest scrub dangerous_fsstress_scrub > > Hi Darrick, > > Such huge patchsets :) I'll try to review them one by one (patchset). > > Now I'm trying to review "[NYE DELUGE 1/4]", but I can't find the > "dangerous_fsstress_scrub" group in the whole patchsets. Is there any > prepositive patch(set)? Or you'd like to use "dangerous_fsstress_repair"? > > P.S: More cases use "dangerous_fsstress_scrub" in your new patchsets. Oops. The group was originally added in "xfs: race fsstress with online scrubbers for AG and fs metadata". Then I created a few more patches at the top of my stack, tested that, and then decided that their proper placement was closer to the bottom than the patch that added the group. Ok, I'll modify the build system to shellcheck any bash scripts in the current commit (because running it on the full repo took hours and produced many hundreds of errors, mostly in tests/btrfs/) and go do a push-and-build of all three stgit repos. --D > Thanks, > Zorro > > > + > > +_cleanup() { > > + cd / > > + _scratch_xfs_stress_scrub_cleanup &> /dev/null > > + rm -r -f $tmp.* > > +} > > +_register_cleanup "_cleanup" BUS > > + > > +# Import common functions. > > +. ./common/filter > > +. ./common/fuzzy > > +. ./common/inject > > +. ./common/xfs > > + > > +# real QA test starts here > > +_supported_fs xfs > > +_require_scratch > > +_require_xfs_stress_scrub > > + > > +_scratch_mkfs > "$seqres.full" 2>&1 > > +_scratch_mount > > +_scratch_xfs_stress_scrub -S '-n' -X 'fsx' > > + > > +# success, all done > > +echo Silence is golden > > +status=0 > > +exit > > diff --git a/tests/xfs/847.out b/tests/xfs/847.out > > new file mode 100644 > > index 0000000000..b7041db159 > > --- /dev/null > > +++ b/tests/xfs/847.out > > @@ -0,0 +1,2 @@ > > +QA output created by 847 > > +Silence is golden > > diff --git a/tests/xfs/848 b/tests/xfs/848 > > new file mode 100755 > > index 0000000000..ab32020624 > > --- /dev/null > > +++ b/tests/xfs/848 > > @@ -0,0 +1,38 @@ > > +#! /bin/bash > > +# SPDX-License-Identifier: GPL-2.0 > > +# Copyright (c) 2022 Oracle, Inc. All Rights Reserved. > > +# > > +# FS QA Test No. 848 > > +# > > +# Race fsx and xfs_scrub in force-repair mode for a while to see if we > > +# crash or livelock. > > +# > > +. ./common/preamble > > +_begin_fstest online_repair dangerous_fsstress_repair > > + > > +_cleanup() { > > + cd / > > + _scratch_xfs_stress_scrub_cleanup &> /dev/null > > + rm -r -f $tmp.* > > +} > > +_register_cleanup "_cleanup" BUS > > + > > +# Import common functions. > > +. ./common/filter > > +. ./common/fuzzy > > +. ./common/inject > > +. ./common/xfs > > + > > +# real QA test starts here > > +_supported_fs xfs > > +_require_scratch > > +_require_xfs_stress_online_repair > > + > > +_scratch_mkfs > "$seqres.full" 2>&1 > > +_scratch_mount > > +_scratch_xfs_stress_online_repair -S '-k' -X 'fsx' > > + > > +# success, all done > > +echo Silence is golden > > +status=0 > > +exit > > diff --git a/tests/xfs/848.out b/tests/xfs/848.out > > new file mode 100644 > > index 0000000000..23f674045c > > --- /dev/null > > +++ b/tests/xfs/848.out > > @@ -0,0 +1,2 @@ > > +QA output created by 848 > > +Silence is golden > > > ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH v24.1 1/3] fuzzy: enhance scrub stress testing to use fsx 2022-12-30 22:12 ` [PATCH 1/3] fuzzy: enhance scrub stress testing to use fsx Darrick J. Wong 2023-01-05 5:49 ` Zorro Lang @ 2023-01-05 18:28 ` Darrick J. Wong 1 sibling, 0 replies; 32+ messages in thread From: Darrick J. Wong @ 2023-01-05 18:28 UTC (permalink / raw) To: zlang; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> Add a couple of new online fsck stress tests that race fsx against online fsck. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- v24.1: move the addition of the group to this patch --- common/fuzzy | 39 ++++++++++++++++++++++++++++++++++++--- doc/group-names.txt | 1 + tests/xfs/847 | 38 ++++++++++++++++++++++++++++++++++++++ tests/xfs/847.out | 2 ++ tests/xfs/848 | 38 ++++++++++++++++++++++++++++++++++++++ tests/xfs/848.out | 2 ++ 6 files changed, 117 insertions(+), 3 deletions(-) create mode 100755 tests/xfs/847 create mode 100644 tests/xfs/847.out create mode 100755 tests/xfs/848 create mode 100644 tests/xfs/848.out diff --git a/common/fuzzy b/common/fuzzy index 7994665ef7..a764de461e 100644 --- a/common/fuzzy +++ b/common/fuzzy @@ -417,6 +417,30 @@ __stress_scrub_clean_scratch() { return 0 } +# Run fsx while we're testing online fsck. +__stress_scrub_fsx_loop() { + local end="$1" + local runningfile="$2" + local focus=(-q -X) # quiet, validate file contents + + # As of November 2022, 2 million fsx ops should be enough to keep + # any filesystem busy for a couple of hours. + focus+=(-N 2000000) + focus+=(-o $((128000 * LOAD_FACTOR)) ) + focus+=(-l $((600000 * LOAD_FACTOR)) ) + + local args="$FSX_AVOID ${focus[@]} ${SCRATCH_MNT}/fsx.$seq" + echo "Running $here/ltp/fsx $args" >> $seqres.full + + while __stress_scrub_running "$end" "$runningfile"; do + # Need to recheck running conditions if we cleared anything + __stress_scrub_clean_scratch && continue + $here/ltp/fsx $args >> $seqres.full + echo "fsx exits with $? at $(date)" >> $seqres.full + done + rm -f "$runningfile" +} + # Run fsstress while we're testing online fsck. __stress_scrub_fsstress_loop() { local end="$1" @@ -463,7 +487,7 @@ _scratch_xfs_stress_scrub_cleanup() { # Send SIGINT so that bash won't print a 'Terminated' message that # distorts the golden output. echo "Killing stressor processes at $(date)" >> $seqres.full - $KILLALL_PROG -INT xfs_io fsstress >> $seqres.full 2>&1 + $KILLALL_PROG -INT xfs_io fsstress fsx >> $seqres.full 2>&1 # Tests are not allowed to exit with the scratch fs frozen. If we # started a fs freeze/thaw background loop, wait for that loop to exit @@ -531,30 +555,39 @@ __stress_scrub_check_commands() { # -w Delay the start of the scrub/repair loop by this number of seconds. # Defaults to no delay unless XFS_SCRUB_STRESS_DELAY is set. This value # will be clamped to ten seconds before the end time. +# -X Run this program to exercise the filesystem. Currently supported +# options are 'fsx' and 'fsstress'. The default is 'fsstress'. _scratch_xfs_stress_scrub() { local one_scrub_args=() local scrub_tgt="$SCRATCH_MNT" local runningfile="$tmp.fsstress" local freeze="${XFS_SCRUB_STRESS_FREEZE}" local scrub_delay="${XFS_SCRUB_STRESS_DELAY:--1}" + local exerciser="fsstress" __SCRUB_STRESS_FREEZE_PID="" rm -f "$runningfile" touch "$runningfile" OPTIND=1 - while getopts "fs:t:w:" c; do + while getopts "fs:t:w:X:" c; do case "$c" in f) freeze=yes;; s) one_scrub_args+=("$OPTARG");; t) scrub_tgt="$OPTARG";; w) scrub_delay="$OPTARG";; + X) exerciser="$OPTARG";; *) return 1; ;; esac done __stress_scrub_check_commands "$scrub_tgt" "${one_scrub_args[@]}" + if ! command -v "__stress_scrub_${exerciser}_loop" &>/dev/null; then + echo "${exerciser}: Unknown fs exercise program." + return 1 + fi + local start="$(date +%s)" local end="$((start + (30 * TIME_FACTOR) ))" local scrub_startat="$((start + scrub_delay))" @@ -564,7 +597,7 @@ _scratch_xfs_stress_scrub() { echo "Loop started at $(date --date="@${start}")," \ "ending at $(date --date="@${end}")" >> $seqres.full - __stress_scrub_fsstress_loop "$end" "$runningfile" & + "__stress_scrub_${exerciser}_loop" "$end" "$runningfile" & if [ -n "$freeze" ]; then __stress_scrub_freeze_loop "$end" "$runningfile" & diff --git a/doc/group-names.txt b/doc/group-names.txt index ac219e05b3..771ce937ae 100644 --- a/doc/group-names.txt +++ b/doc/group-names.txt @@ -35,6 +35,7 @@ dangerous_fuzzers fuzzers that can crash your computer dangerous_norepair fuzzers to evaluate kernel metadata verifiers dangerous_online_repair fuzzers to evaluate xfs_scrub online repair dangerous_fsstress_repair race fsstress and xfs_scrub online repair +dangerous_fsstress_scrub race fsstress and xfs_scrub checking dangerous_repair fuzzers to evaluate xfs_repair offline repair dangerous_scrub fuzzers to evaluate xfs_scrub checking data data loss checkers diff --git a/tests/xfs/847 b/tests/xfs/847 new file mode 100755 index 0000000000..856e9a6c26 --- /dev/null +++ b/tests/xfs/847 @@ -0,0 +1,38 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (c) 2022 Oracle, Inc. All Rights Reserved. +# +# FS QA Test No. 847 +# +# Race fsx and xfs_scrub in read-only mode for a while to see if we crash +# or livelock. +# +. ./common/preamble +_begin_fstest scrub dangerous_fsstress_scrub + +_cleanup() { + cd / + _scratch_xfs_stress_scrub_cleanup &> /dev/null + rm -r -f $tmp.* +} +_register_cleanup "_cleanup" BUS + +# Import common functions. +. ./common/filter +. ./common/fuzzy +. ./common/inject +. ./common/xfs + +# real QA test starts here +_supported_fs xfs +_require_scratch +_require_xfs_stress_scrub + +_scratch_mkfs > "$seqres.full" 2>&1 +_scratch_mount +_scratch_xfs_stress_scrub -S '-n' -X 'fsx' + +# success, all done +echo Silence is golden +status=0 +exit diff --git a/tests/xfs/847.out b/tests/xfs/847.out new file mode 100644 index 0000000000..b7041db159 --- /dev/null +++ b/tests/xfs/847.out @@ -0,0 +1,2 @@ +QA output created by 847 +Silence is golden diff --git a/tests/xfs/848 b/tests/xfs/848 new file mode 100755 index 0000000000..ab32020624 --- /dev/null +++ b/tests/xfs/848 @@ -0,0 +1,38 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (c) 2022 Oracle, Inc. All Rights Reserved. +# +# FS QA Test No. 848 +# +# Race fsx and xfs_scrub in force-repair mode for a while to see if we +# crash or livelock. +# +. ./common/preamble +_begin_fstest online_repair dangerous_fsstress_repair + +_cleanup() { + cd / + _scratch_xfs_stress_scrub_cleanup &> /dev/null + rm -r -f $tmp.* +} +_register_cleanup "_cleanup" BUS + +# Import common functions. +. ./common/filter +. ./common/fuzzy +. ./common/inject +. ./common/xfs + +# real QA test starts here +_supported_fs xfs +_require_scratch +_require_xfs_stress_online_repair + +_scratch_mkfs > "$seqres.full" 2>&1 +_scratch_mount +_scratch_xfs_stress_online_repair -S '-k' -X 'fsx' + +# success, all done +echo Silence is golden +status=0 +exit diff --git a/tests/xfs/848.out b/tests/xfs/848.out new file mode 100644 index 0000000000..23f674045c --- /dev/null +++ b/tests/xfs/848.out @@ -0,0 +1,2 @@ +QA output created by 848 +Silence is golden ^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 2/3] fuzzy: refactor fsmap stress test to use our helper functions 2022-12-30 22:12 ` [PATCHSET v24.0 0/3] fstests: refactor GETFSMAP stress tests Darrick J. Wong 2022-12-30 22:12 ` [PATCH 1/3] fuzzy: enhance scrub stress testing to use fsx Darrick J. Wong @ 2022-12-30 22:12 ` Darrick J. Wong 2022-12-30 22:12 ` [PATCH 3/3] xfs: race fsmap with readonly remounts to detect crash or livelock Darrick J. Wong 2 siblings, 0 replies; 32+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:12 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> Refactor xfs/517 (which races fsstress with fsmap) to use our new control loop functions instead of open-coding everything. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- common/fuzzy | 30 +++++++++++++++++ tests/xfs/517 | 91 ++--------------------------------------------------- tests/xfs/517.out | 4 +- 3 files changed, 34 insertions(+), 91 deletions(-) diff --git a/common/fuzzy b/common/fuzzy index 3512e95e02..58e299d34b 100644 --- a/common/fuzzy +++ b/common/fuzzy @@ -362,6 +362,23 @@ __stress_scrub_freeze_loop() { done } +# Run individual xfs_io commands in a tight loop. +__stress_xfs_io_loop() { + local end="$1" + local runningfile="$2" + shift; shift + + local xfs_io_args=() + for arg in "$@"; do + xfs_io_args+=('-c' "$arg") + done + + while __stress_scrub_running "$end" "$runningfile"; do + $XFS_IO_PROG -x "${xfs_io_args[@]}" "$SCRATCH_MNT" \ + > /dev/null 2>> $seqres.full + done +} + # Run individual XFS online fsck commands in a tight loop with xfs_io. __stress_one_scrub_loop() { local end="$1" @@ -540,6 +557,10 @@ __stress_scrub_check_commands() { # # -f Run a freeze/thaw loop while we're doing other things. Defaults to # disabled, unless XFS_SCRUB_STRESS_FREEZE is set. +# -i Pass this command to xfs_io to exercise something that is not scrub +# in a separate loop. If zero -i options are specified, do not run. +# Callers must check each of these commands (via _require_xfs_io_command) +# before calling here. # -s Pass this command to xfs_io to test scrub. If zero -s options are # specified, xfs_io will not be run. # -t Run online scrub against this file; $SCRATCH_MNT is the default. @@ -555,15 +576,17 @@ _scratch_xfs_stress_scrub() { local freeze="${XFS_SCRUB_STRESS_FREEZE}" local scrub_delay="${XFS_SCRUB_STRESS_DELAY:--1}" local exerciser="fsstress" + local io_args=() __SCRUB_STRESS_FREEZE_PID="" rm -f "$runningfile" touch "$runningfile" OPTIND=1 - while getopts "fs:t:w:X:" c; do + while getopts "fi:s:t:w:X:" c; do case "$c" in f) freeze=yes;; + i) io_args+=("$OPTARG");; s) one_scrub_args+=("$OPTARG");; t) scrub_tgt="$OPTARG";; w) scrub_delay="$OPTARG";; @@ -595,6 +618,11 @@ _scratch_xfs_stress_scrub() { __SCRUB_STRESS_FREEZE_PID="$!" fi + if [ "${#io_args[@]}" -gt 0 ]; then + __stress_xfs_io_loop "$end" "$runningfile" \ + "${io_args[@]}" & + fi + if [ "${#one_scrub_args[@]}" -gt 0 ]; then __stress_one_scrub_loop "$end" "$runningfile" "$scrub_tgt" \ "$scrub_startat" "${one_scrub_args[@]}" & diff --git a/tests/xfs/517 b/tests/xfs/517 index 99fc89b05f..4481ba41da 100755 --- a/tests/xfs/517 +++ b/tests/xfs/517 @@ -11,29 +11,11 @@ _begin_fstest auto quick fsmap freeze _register_cleanup "_cleanup" BUS -# First kill and wait the freeze loop so it won't try to freeze fs again -# Then make sure fs is not frozen -# Then kill and wait for the rest of the workers -# Because if fs is frozen a killed writer will never exit -kill_loops() { - local sig=$1 - - [ -n "$freeze_pid" ] && kill $sig $freeze_pid - wait $freeze_pid - unset freeze_pid - $XFS_IO_PROG -x -c 'thaw' $SCRATCH_MNT - [ -n "$stress_pid" ] && kill $sig $stress_pid - [ -n "$fsmap_pid" ] && kill $sig $fsmap_pid - wait - unset stress_pid - unset fsmap_pid -} - # Override the default cleanup function. _cleanup() { - kill_loops -9 > /dev/null 2>&1 cd / + _scratch_xfs_stress_scrub_cleanup rm -rf $tmp.* } @@ -46,78 +28,13 @@ _cleanup() _supported_fs xfs _require_xfs_scratch_rmapbt _require_xfs_io_command "fsmap" -_require_command "$KILLALL_PROG" killall -_require_freeze +_require_xfs_stress_scrub -echo "Format and populate" _scratch_mkfs > "$seqres.full" 2>&1 _scratch_mount - -STRESS_DIR="$SCRATCH_MNT/testdir" -mkdir -p $STRESS_DIR - -for i in $(seq 0 9); do - mkdir -p $STRESS_DIR/$i - for j in $(seq 0 9); do - mkdir -p $STRESS_DIR/$i/$j - for k in $(seq 0 9); do - echo x > $STRESS_DIR/$i/$j/$k - done - done -done - -cpus=$(( $(src/feature -o) * 4 * LOAD_FACTOR)) - -echo "Concurrent fsmap and freeze" -filter_output() { - grep -E -v '(Device or resource busy|Invalid argument)' -} -freeze_loop() { - end="$1" - - while [ "$(date +%s)" -lt $end ]; do - $XFS_IO_PROG -x -c 'freeze' $SCRATCH_MNT 2>&1 | filter_output - $XFS_IO_PROG -x -c 'thaw' $SCRATCH_MNT 2>&1 | filter_output - done -} -fsmap_loop() { - end="$1" - - while [ "$(date +%s)" -lt $end ]; do - $XFS_IO_PROG -c 'fsmap -v' $SCRATCH_MNT > /dev/null - done -} -stress_loop() { - end="$1" - - FSSTRESS_ARGS=$(_scale_fsstress_args -p 4 -d $SCRATCH_MNT -n 2000 $FSSTRESS_AVOID) - while [ "$(date +%s)" -lt $end ]; do - $FSSTRESS_PROG $FSSTRESS_ARGS >> $seqres.full - done -} - -start=$(date +%s) -end=$((start + (30 * TIME_FACTOR) )) - -echo "Loop started at $(date --date="@${start}"), ending at $(date --date="@${end}")" >> $seqres.full -stress_loop $end & -stress_pid=$! -freeze_loop $end & -freeze_pid=$! -fsmap_loop $end & -fsmap_pid=$! - -# Wait until 2 seconds after the loops should have finished... -while [ "$(date +%s)" -lt $((end + 2)) ]; do - sleep 1 -done - -# ...and clean up after the loops in case they didn't do it themselves. -kill_loops >> $seqres.full 2>&1 - -echo "Loop finished at $(date)" >> $seqres.full -echo "Test done" +_scratch_xfs_stress_scrub -i 'fsmap -v' # success, all done +echo "Silence is golden" status=0 exit diff --git a/tests/xfs/517.out b/tests/xfs/517.out index da6366e52b..49c53bcaa9 100644 --- a/tests/xfs/517.out +++ b/tests/xfs/517.out @@ -1,4 +1,2 @@ QA output created by 517 -Format and populate -Concurrent fsmap and freeze -Test done +Silence is golden ^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 3/3] xfs: race fsmap with readonly remounts to detect crash or livelock 2022-12-30 22:12 ` [PATCHSET v24.0 0/3] fstests: refactor GETFSMAP stress tests Darrick J. Wong 2022-12-30 22:12 ` [PATCH 1/3] fuzzy: enhance scrub stress testing to use fsx Darrick J. Wong 2022-12-30 22:12 ` [PATCH 2/3] fuzzy: refactor fsmap stress test to use our helper functions Darrick J. Wong @ 2022-12-30 22:12 ` Darrick J. Wong 2 siblings, 0 replies; 32+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:12 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> Add a new test that races the GETFSMAP ioctl with ro/rw remounting to make sure we don't livelock on the empty transaction that fsmap uses to avoid deadlocking on rmap btree cycles. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- common/fuzzy | 98 ++++++++++++++++++++++++++++++++++++++++++++++++++++- ltp/fsstress.c | 18 +++++++++- tests/xfs/732 | 38 +++++++++++++++++++++ tests/xfs/732.out | 2 + 4 files changed, 153 insertions(+), 3 deletions(-) create mode 100755 tests/xfs/732 create mode 100644 tests/xfs/732.out diff --git a/common/fuzzy b/common/fuzzy index 58e299d34b..ee97aa4298 100644 --- a/common/fuzzy +++ b/common/fuzzy @@ -429,6 +429,7 @@ __stress_scrub_clean_scratch() { __stress_scrub_fsx_loop() { local end="$1" local runningfile="$2" + local remount_period="$3" local focus=(-q -X) # quiet, validate file contents # As of November 2022, 2 million fsx ops should be enough to keep @@ -440,6 +441,43 @@ __stress_scrub_fsx_loop() { local args="$FSX_AVOID ${focus[@]} ${SCRATCH_MNT}/fsx.$seq" echo "Running $here/ltp/fsx $args" >> $seqres.full + if [ -n "$remount_period" ]; then + local mode="rw" + local rw_arg="" + while __stress_scrub_running "$end" "$runningfile"; do + # Need to recheck running conditions if we cleared + # anything. + test "$mode" = "rw" && __stress_scrub_clean_scratch && continue + + timeout -s TERM "$remount_period" $here/ltp/fsx \ + $args $rw_arg >> $seqres.full + res=$? + echo "$mode fsx exits with $res at $(date)" >> $seqres.full + if [ "$res" -ne 0 ] && [ "$res" -ne 124 ]; then + # Stop if fsstress returns error. Mask off + # the magic code 124 because that is how the + # timeout(1) program communicates that we ran + # out of time. + break; + fi + if [ "$mode" = "rw" ]; then + mode="ro" + rw_arg="-t 0 -w 0 -FHzCIJBE0" + else + mode="rw" + rw_arg="" + fi + + # Try remounting until we get the result we wanted + while ! _scratch_remount "$mode" &>/dev/null && \ + __stress_scrub_running "$end" "$runningfile"; do + sleep 0.2 + done + done + rm -f "$runningfile" + return 0 + fi + while __stress_scrub_running "$end" "$runningfile"; do # Need to recheck running conditions if we cleared anything __stress_scrub_clean_scratch && continue @@ -453,12 +491,50 @@ __stress_scrub_fsx_loop() { __stress_scrub_fsstress_loop() { local end="$1" local runningfile="$2" + local remount_period="$3" # As of March 2022, 2 million fsstress ops should be enough to keep # any filesystem busy for a couple of hours. local args=$(_scale_fsstress_args -p 4 -d $SCRATCH_MNT -n 2000000 $FSSTRESS_AVOID) echo "Running $FSSTRESS_PROG $args" >> $seqres.full + if [ -n "$remount_period" ]; then + local mode="rw" + local rw_arg="" + while __stress_scrub_running "$end" "$runningfile"; do + # Need to recheck running conditions if we cleared + # anything. + test "$mode" = "rw" && __stress_scrub_clean_scratch && continue + + timeout -s TERM "$remount_period" $FSSTRESS_PROG \ + $args $rw_arg >> $seqres.full + res=$? + echo "$mode fsstress exits with $res at $(date)" >> $seqres.full + if [ "$res" -ne 0 ] && [ "$res" -ne 124 ]; then + # Stop if fsstress returns error. Mask off + # the magic code 124 because that is how the + # timeout(1) program communicates that we ran + # out of time. + break; + fi + if [ "$mode" = "rw" ]; then + mode="ro" + rw_arg="-R" + else + mode="rw" + rw_arg="" + fi + + # Try remounting until we get the result we wanted + while ! _scratch_remount "$mode" &>/dev/null && \ + __stress_scrub_running "$end" "$runningfile"; do + sleep 0.2 + done + done + rm -f "$runningfile" + return 0 + fi + while __stress_scrub_running "$end" "$runningfile"; do # Need to recheck running conditions if we cleared anything __stress_scrub_clean_scratch && continue @@ -526,6 +602,13 @@ _scratch_xfs_stress_scrub_cleanup() { echo "Waiting for children to exit at $(date)" >> $seqres.full wait + # Ensure the scratch fs is also writable before we exit. + if [ -n "$__SCRUB_STRESS_REMOUNT_LOOP" ]; then + echo "Remounting rw at $(date)" >> $seqres.full + _scratch_remount rw >> $seqres.full 2>&1 + __SCRUB_STRESS_REMOUNT_LOOP="" + fi + echo "Cleanup finished at $(date)" >> $seqres.full } @@ -561,6 +644,9 @@ __stress_scrub_check_commands() { # in a separate loop. If zero -i options are specified, do not run. # Callers must check each of these commands (via _require_xfs_io_command) # before calling here. +# -r Run fsstress for this amount of time, then remount the fs ro or rw. +# The default is to run fsstress continuously with no remount, unless +# XFS_SCRUB_STRESS_REMOUNT_PERIOD is set. # -s Pass this command to xfs_io to test scrub. If zero -s options are # specified, xfs_io will not be run. # -t Run online scrub against this file; $SCRATCH_MNT is the default. @@ -577,16 +663,19 @@ _scratch_xfs_stress_scrub() { local scrub_delay="${XFS_SCRUB_STRESS_DELAY:--1}" local exerciser="fsstress" local io_args=() + local remount_period="${XFS_SCRUB_STRESS_REMOUNT_PERIOD}" __SCRUB_STRESS_FREEZE_PID="" + __SCRUB_STRESS_REMOUNT_LOOP="" rm -f "$runningfile" touch "$runningfile" OPTIND=1 - while getopts "fi:s:t:w:X:" c; do + while getopts "fi:r:s:t:w:X:" c; do case "$c" in f) freeze=yes;; i) io_args+=("$OPTARG");; + r) remount_period="$OPTARG";; s) one_scrub_args+=("$OPTARG");; t) scrub_tgt="$OPTARG";; w) scrub_delay="$OPTARG";; @@ -611,7 +700,12 @@ _scratch_xfs_stress_scrub() { echo "Loop started at $(date --date="@${start}")," \ "ending at $(date --date="@${end}")" >> $seqres.full - "__stress_scrub_${exerciser}_loop" "$end" "$runningfile" & + if [ -n "$remount_period" ]; then + __SCRUB_STRESS_REMOUNT_LOOP="1" + fi + + "__stress_scrub_${exerciser}_loop" "$end" "$runningfile" \ + "$remount_period" & if [ -n "$freeze" ]; then __stress_scrub_freeze_loop "$end" "$runningfile" & diff --git a/ltp/fsstress.c b/ltp/fsstress.c index b395bc4da2..10608fb554 100644 --- a/ltp/fsstress.c +++ b/ltp/fsstress.c @@ -426,6 +426,7 @@ int symlink_path(const char *, pathname_t *); int truncate64_path(pathname_t *, off64_t); int unlink_path(pathname_t *); void usage(void); +void read_freq(void); void write_freq(void); void zero_freq(void); void non_btrfs_freq(const char *); @@ -472,7 +473,7 @@ int main(int argc, char **argv) xfs_error_injection_t err_inj; struct sigaction action; int loops = 1; - const char *allopts = "cd:e:f:i:l:m:M:n:o:p:rs:S:vVwx:X:zH"; + const char *allopts = "cd:e:f:i:l:m:M:n:o:p:rRs:S:vVwx:X:zH"; errrange = errtag = 0; umask(0); @@ -538,6 +539,9 @@ int main(int argc, char **argv) case 'r': namerand = 1; break; + case 'R': + read_freq(); + break; case 's': seed = strtoul(optarg, NULL, 0); break; @@ -1917,6 +1921,7 @@ usage(void) printf(" -o logfile specifies logfile name\n"); printf(" -p nproc specifies the no. of processes (default 1)\n"); printf(" -r specifies random name padding\n"); + printf(" -R zeros frequencies of write operations\n"); printf(" -s seed specifies the seed for the random generator (default random)\n"); printf(" -v specifies verbose mode\n"); printf(" -w zeros frequencies of non-write operations\n"); @@ -1928,6 +1933,17 @@ usage(void) printf(" -H prints usage and exits\n"); } +void +read_freq(void) +{ + opdesc_t *p; + + for (p = ops; p < ops_end; p++) { + if (p->iswrite) + p->freq = 0; + } +} + void write_freq(void) { diff --git a/tests/xfs/732 b/tests/xfs/732 new file mode 100755 index 0000000000..ed6fb3c977 --- /dev/null +++ b/tests/xfs/732 @@ -0,0 +1,38 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0-or-later +# Copyright (c) 2022 Oracle. All Rights Reserved. +# +# FS QA Test No. 732 +# +# Race GETFSMAP and ro remount for a while to see if we crash or livelock. +# +. ./common/preamble +_begin_fstest auto quick fsmap remount + +# Override the default cleanup function. +_cleanup() +{ + cd / + _scratch_xfs_stress_scrub_cleanup + rm -rf $tmp.* +} + +# Import common functions. +. ./common/filter +. ./common/fuzzy +. ./common/xfs + +# real QA test starts here +_supported_fs xfs +_require_xfs_scratch_rmapbt +_require_xfs_io_command "fsmap" +_require_xfs_stress_scrub + +_scratch_mkfs > "$seqres.full" 2>&1 +_scratch_mount +_scratch_xfs_stress_scrub -r 5 -i 'fsmap -v' + +# success, all done +echo "Silence is golden" +status=0 +exit diff --git a/tests/xfs/732.out b/tests/xfs/732.out new file mode 100644 index 0000000000..451f82ce2d --- /dev/null +++ b/tests/xfs/732.out @@ -0,0 +1,2 @@ +QA output created by 732 +Silence is golden ^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCHSET v24.0 0/2] fstests: race online scrub with mount state changes 2022-12-30 21:13 [NYE DELUGE 1/4] xfs: all pending online scrub improvements Darrick J. Wong 2022-12-30 22:12 ` [PATCHSET v24.0 00/16] fstests: refactor online fsck stress tests Darrick J. Wong 2022-12-30 22:12 ` [PATCHSET v24.0 0/3] fstests: refactor GETFSMAP stress tests Darrick J. Wong @ 2022-12-30 22:13 ` Darrick J. Wong 2022-12-30 22:13 ` [PATCH 2/2] xfs: stress test xfs_scrub(8) with freeze and ro-remount loops Darrick J. Wong 2022-12-30 22:13 ` [PATCH 1/2] xfs: stress test xfs_scrub(8) with fsstress Darrick J. Wong 2023-01-13 20:10 ` [NYE DELUGE 1/4] xfs: all pending online scrub improvements Zorro Lang 3 siblings, 2 replies; 32+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:13 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan Hi all, Introduce the ability to run xfs_scrub(8) itself from our online fsck stress test harness. Create two new tests to race scrub and repair against fsstress, and four more tests to do the same but racing against fs freeze and ro remounts. If you're going to start using this mess, you probably ought to just pull from my git trees, which are linked below. This is an extraordinary way to destroy everything. Enjoy! Comments and questions are, as always, welcome. --D fstests git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=race-scrub-and-mount-state-changes --- common/fuzzy | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++--- tests/xfs/285 | 44 ++++++++++--------------------------- tests/xfs/285.out | 4 +-- tests/xfs/286 | 46 ++++++++++----------------------------- tests/xfs/286.out | 4 +-- tests/xfs/733 | 39 +++++++++++++++++++++++++++++++++ tests/xfs/733.out | 2 ++ tests/xfs/771 | 39 +++++++++++++++++++++++++++++++++ tests/xfs/771.out | 2 ++ tests/xfs/824 | 40 ++++++++++++++++++++++++++++++++++ tests/xfs/824.out | 2 ++ tests/xfs/825 | 40 ++++++++++++++++++++++++++++++++++ tests/xfs/825.out | 2 ++ 13 files changed, 252 insertions(+), 75 deletions(-) create mode 100755 tests/xfs/733 create mode 100644 tests/xfs/733.out create mode 100755 tests/xfs/771 create mode 100644 tests/xfs/771.out create mode 100755 tests/xfs/824 create mode 100644 tests/xfs/824.out create mode 100755 tests/xfs/825 create mode 100644 tests/xfs/825.out ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 2/2] xfs: stress test xfs_scrub(8) with freeze and ro-remount loops 2022-12-30 22:13 ` [PATCHSET v24.0 0/2] fstests: race online scrub with mount state changes Darrick J. Wong @ 2022-12-30 22:13 ` Darrick J. Wong 2022-12-30 22:13 ` [PATCH 1/2] xfs: stress test xfs_scrub(8) with fsstress Darrick J. Wong 1 sibling, 0 replies; 32+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:13 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> Make sure we don't trip over any asserts or livelock when scrub races with filesystem freezing and readonly remounts. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- tests/xfs/733 | 39 +++++++++++++++++++++++++++++++++++++++ tests/xfs/733.out | 2 ++ tests/xfs/771 | 39 +++++++++++++++++++++++++++++++++++++++ tests/xfs/771.out | 2 ++ tests/xfs/824 | 40 ++++++++++++++++++++++++++++++++++++++++ tests/xfs/824.out | 2 ++ tests/xfs/825 | 40 ++++++++++++++++++++++++++++++++++++++++ tests/xfs/825.out | 2 ++ 8 files changed, 166 insertions(+) create mode 100755 tests/xfs/733 create mode 100644 tests/xfs/733.out create mode 100755 tests/xfs/771 create mode 100644 tests/xfs/771.out create mode 100755 tests/xfs/824 create mode 100644 tests/xfs/824.out create mode 100755 tests/xfs/825 create mode 100644 tests/xfs/825.out diff --git a/tests/xfs/733 b/tests/xfs/733 new file mode 100755 index 0000000000..ee9a0a26ee --- /dev/null +++ b/tests/xfs/733 @@ -0,0 +1,39 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0-or-later +# Copyright (c) 2022 Oracle. All Rights Reserved. +# +# FS QA Test No. 733 +# +# Race xfs_scrub in check-only mode and ro remount for a while to see if we +# crash or livelock. +# +. ./common/preamble +_begin_fstest scrub dangerous_fsstress_scrub + +# Override the default cleanup function. +_cleanup() +{ + cd / + _scratch_xfs_stress_scrub_cleanup &> /dev/null + _scratch_remount rw + rm -rf $tmp.* +} + +# Import common functions. +. ./common/filter +. ./common/fuzzy +. ./common/xfs + +# real QA test starts here +_supported_fs xfs +_require_scratch +_require_xfs_stress_scrub + +_scratch_mkfs > "$seqres.full" 2>&1 +_scratch_mount +_scratch_xfs_stress_scrub -r 5 -S '-n' + +# success, all done +echo "Silence is golden" +status=0 +exit diff --git a/tests/xfs/733.out b/tests/xfs/733.out new file mode 100644 index 0000000000..7118d5ddf0 --- /dev/null +++ b/tests/xfs/733.out @@ -0,0 +1,2 @@ +QA output created by 733 +Silence is golden diff --git a/tests/xfs/771 b/tests/xfs/771 new file mode 100755 index 0000000000..8c8d124f12 --- /dev/null +++ b/tests/xfs/771 @@ -0,0 +1,39 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0-or-later +# Copyright (c) 2022 Oracle. All Rights Reserved. +# +# FS QA Test No. 771 +# +# Race xfs_scrub in check-only mode and freeze for a while to see if we crash +# or livelock. +# +. ./common/preamble +_begin_fstest scrub dangerous_fsstress_scrub + +# Override the default cleanup function. +_cleanup() +{ + cd / + _scratch_xfs_stress_scrub_cleanup &> /dev/null + _scratch_remount rw + rm -rf $tmp.* +} + +# Import common functions. +. ./common/filter +. ./common/fuzzy +. ./common/xfs + +# real QA test starts here +_supported_fs xfs +_require_scratch +_require_xfs_stress_scrub + +_scratch_mkfs > "$seqres.full" 2>&1 +_scratch_mount +_scratch_xfs_stress_scrub -f -S '-n' + +# success, all done +echo "Silence is golden" +status=0 +exit diff --git a/tests/xfs/771.out b/tests/xfs/771.out new file mode 100644 index 0000000000..c2345c7be3 --- /dev/null +++ b/tests/xfs/771.out @@ -0,0 +1,2 @@ +QA output created by 771 +Silence is golden diff --git a/tests/xfs/824 b/tests/xfs/824 new file mode 100755 index 0000000000..65eeb3a6c9 --- /dev/null +++ b/tests/xfs/824 @@ -0,0 +1,40 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0-or-later +# Copyright (c) 2022 Oracle. All Rights Reserved. +# +# FS QA Test No. 824 +# +# Race xfs_scrub in force-repair mdoe and freeze for a while to see if we crash +# or livelock. +# +. ./common/preamble +_begin_fstest online_repair dangerous_fsstress_repair + +# Override the default cleanup function. +_cleanup() +{ + cd / + _scratch_xfs_stress_scrub_cleanup &> /dev/null + _scratch_remount rw + rm -rf $tmp.* +} + +# Import common functions. +. ./common/filter +. ./common/fuzzy +. ./common/xfs +. ./common/inject + +# real QA test starts here +_supported_fs xfs +_require_scratch +_require_xfs_stress_online_repair + +_scratch_mkfs > "$seqres.full" 2>&1 +_scratch_mount +_scratch_xfs_stress_online_repair -f -S '-k' + +# success, all done +echo "Silence is golden" +status=0 +exit diff --git a/tests/xfs/824.out b/tests/xfs/824.out new file mode 100644 index 0000000000..6cf432abbd --- /dev/null +++ b/tests/xfs/824.out @@ -0,0 +1,2 @@ +QA output created by 824 +Silence is golden diff --git a/tests/xfs/825 b/tests/xfs/825 new file mode 100755 index 0000000000..80ce06932d --- /dev/null +++ b/tests/xfs/825 @@ -0,0 +1,40 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0-or-later +# Copyright (c) 2022 Oracle. All Rights Reserved. +# +# FS QA Test No. 825 +# +# Race xfs_scrub in force-repair mode and ro remount for a while to see if we +# crash or livelock. +# +. ./common/preamble +_begin_fstest online_repair dangerous_fsstress_repair + +# Override the default cleanup function. +_cleanup() +{ + cd / + _scratch_xfs_stress_scrub_cleanup &> /dev/null + _scratch_remount rw + rm -rf $tmp.* +} + +# Import common functions. +. ./common/filter +. ./common/fuzzy +. ./common/xfs +. ./common/inject + +# real QA test starts here +_supported_fs xfs +_require_scratch +_require_xfs_stress_online_repair + +_scratch_mkfs > "$seqres.full" 2>&1 +_scratch_mount +_scratch_xfs_stress_online_repair -r 5 -S '-k' + +# success, all done +echo "Silence is golden" +status=0 +exit diff --git a/tests/xfs/825.out b/tests/xfs/825.out new file mode 100644 index 0000000000..d0e970dfd6 --- /dev/null +++ b/tests/xfs/825.out @@ -0,0 +1,2 @@ +QA output created by 825 +Silence is golden ^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 1/2] xfs: stress test xfs_scrub(8) with fsstress 2022-12-30 22:13 ` [PATCHSET v24.0 0/2] fstests: race online scrub with mount state changes Darrick J. Wong 2022-12-30 22:13 ` [PATCH 2/2] xfs: stress test xfs_scrub(8) with freeze and ro-remount loops Darrick J. Wong @ 2022-12-30 22:13 ` Darrick J. Wong 1 sibling, 0 replies; 32+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:13 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> Port the two existing tests that check that xfs_scrub(8) (aka the main userspace driver program) doesn't clash with fsstress to use our new framework. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- common/fuzzy | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++--- tests/xfs/285 | 44 ++++++++++--------------------------- tests/xfs/285.out | 4 +-- tests/xfs/286 | 46 ++++++++++----------------------------- tests/xfs/286.out | 4 +-- 5 files changed, 86 insertions(+), 75 deletions(-) diff --git a/common/fuzzy b/common/fuzzy index ee97aa4298..e39f787e78 100644 --- a/common/fuzzy +++ b/common/fuzzy @@ -411,6 +411,42 @@ __stress_one_scrub_loop() { done } +# Run xfs_scrub online fsck in a tight loop. +__stress_xfs_scrub_loop() { + local end="$1" + local runningfile="$2" + local scrub_startat="$3" + shift; shift; shift + local sigint_ret="$(( $(kill -l SIGINT) + 128 ))" + local scrublog="$tmp.scrub" + + while __stress_scrub_running "$scrub_startat" "$runningfile"; do + sleep 1 + done + + while __stress_scrub_running "$end" "$runningfile"; do + _scratch_scrub "$@" &> $scrublog + res=$? + if [ "$res" -eq "$sigint_ret" ]; then + # Ignore SIGINT because the cleanup function sends + # that to terminate xfs_scrub + res=0 + fi + echo "xfs_scrub exits with $res at $(date)" >> $seqres.full + if [ "$res" -ge 128 ]; then + # Report scrub death due to fatal signals + echo "xfs_scrub died with SIG$(kill -l $res)" + cat $scrublog >> $seqres.full 2>/dev/null + elif [ "$((res & 0x1))" -gt 0 ]; then + # Report uncorrected filesystem errors + echo "xfs_scrub reports uncorrected errors:" + grep -E '(Repair unsuccessful;|Corruption:)' $scrublog + cat $scrublog >> $seqres.full 2>/dev/null + fi + rm -f $scrublog + done +} + # Clean the scratch filesystem between rounds of fsstress if there is 2% # available space or less because that isn't an interesting stress test. # @@ -571,7 +607,7 @@ _scratch_xfs_stress_scrub_cleanup() { # Send SIGINT so that bash won't print a 'Terminated' message that # distorts the golden output. echo "Killing stressor processes at $(date)" >> $seqres.full - $KILLALL_PROG -INT xfs_io fsstress fsx >> $seqres.full 2>&1 + $KILLALL_PROG -INT xfs_io fsstress fsx xfs_scrub >> $seqres.full 2>&1 # Tests are not allowed to exit with the scratch fs frozen. If we # started a fs freeze/thaw background loop, wait for that loop to exit @@ -649,6 +685,8 @@ __stress_scrub_check_commands() { # XFS_SCRUB_STRESS_REMOUNT_PERIOD is set. # -s Pass this command to xfs_io to test scrub. If zero -s options are # specified, xfs_io will not be run. +# -S Pass this option to xfs_scrub. If zero -S options are specified, +# xfs_scrub will not be run. To select repair mode, pass '-k' or '-v'. # -t Run online scrub against this file; $SCRATCH_MNT is the default. # -w Delay the start of the scrub/repair loop by this number of seconds. # Defaults to no delay unless XFS_SCRUB_STRESS_DELAY is set. This value @@ -657,6 +695,7 @@ __stress_scrub_check_commands() { # options are 'fsx' and 'fsstress'. The default is 'fsstress'. _scratch_xfs_stress_scrub() { local one_scrub_args=() + local xfs_scrub_args=() local scrub_tgt="$SCRATCH_MNT" local runningfile="$tmp.fsstress" local freeze="${XFS_SCRUB_STRESS_FREEZE}" @@ -671,12 +710,13 @@ _scratch_xfs_stress_scrub() { touch "$runningfile" OPTIND=1 - while getopts "fi:r:s:t:w:X:" c; do + while getopts "fi:r:s:S:t:w:X:" c; do case "$c" in f) freeze=yes;; i) io_args+=("$OPTARG");; r) remount_period="$OPTARG";; s) one_scrub_args+=("$OPTARG");; + S) xfs_scrub_args+=("$OPTARG");; t) scrub_tgt="$OPTARG";; w) scrub_delay="$OPTARG";; X) exerciser="$OPTARG";; @@ -691,6 +731,18 @@ _scratch_xfs_stress_scrub() { return 1 fi + if [ "${#xfs_scrub_args[@]}" -gt 0 ]; then + _scratch_scrub "${xfs_scrub_args[@]}" &> "$tmp.scrub" + res=$? + if [ $res -ne 0 ]; then + echo "xfs_scrub ${xfs_scrub_args[@]} failed, err $res" >> $seqres.full + cat "$tmp.scrub" >> $seqres.full + rm -f "$tmp.scrub" + _notrun 'scrub not supported on scratch filesystem' + fi + rm -f "$tmp.scrub" + fi + local start="$(date +%s)" local end="$((start + (30 * TIME_FACTOR) ))" local scrub_startat="$((start + scrub_delay))" @@ -722,6 +774,11 @@ _scratch_xfs_stress_scrub() { "$scrub_startat" "${one_scrub_args[@]}" & fi + if [ "${#xfs_scrub_args[@]}" -gt 0 ]; then + __stress_xfs_scrub_loop "$end" "$runningfile" "$scrub_startat" \ + "${xfs_scrub_args[@]}" & + fi + # Wait until the designated end time or fsstress dies, then kill all of # our background processes. while __stress_scrub_running "$end" "$runningfile"; do @@ -741,5 +798,5 @@ _scratch_xfs_stress_scrub() { # Same requirements and arguments as _scratch_xfs_stress_scrub. _scratch_xfs_stress_online_repair() { $XFS_IO_PROG -x -c 'inject force_repair' $SCRATCH_MNT - _scratch_xfs_stress_scrub "$@" + XFS_SCRUB_FORCE_REPAIR=1 _scratch_xfs_stress_scrub "$@" } diff --git a/tests/xfs/285 b/tests/xfs/285 index 711211d412..0056baeb1c 100755 --- a/tests/xfs/285 +++ b/tests/xfs/285 @@ -4,55 +4,35 @@ # # FS QA Test No. 285 # -# Race fio and xfs_scrub for a while to see if we crash or livelock. +# Race fsstress and xfs_scrub in read-only mode for a while to see if we crash +# or livelock. # . ./common/preamble -_begin_fstest dangerous_fuzzers dangerous_scrub +_begin_fstest scrub dangerous_fsstress_scrub +_cleanup() { + cd / + _scratch_xfs_stress_scrub_cleanup &> /dev/null + rm -r -f $tmp.* +} _register_cleanup "_cleanup" BUS # Import common functions. . ./common/filter . ./common/fuzzy . ./common/inject +. ./common/xfs # real QA test starts here _supported_fs xfs -_require_test_program "feature" -_require_command "$KILLALL_PROG" killall -_require_command "$TIMEOUT_PROG" timeout -_require_scrub _require_scratch +_require_xfs_stress_scrub -echo "Format and populate" _scratch_mkfs > "$seqres.full" 2>&1 _scratch_mount - -STRESS_DIR="$SCRATCH_MNT/testdir" -mkdir -p $STRESS_DIR - -cpus=$(( $($here/src/feature -o) * 4 * LOAD_FACTOR)) -$FSSTRESS_PROG -d $STRESS_DIR -p $cpus -n $((cpus * 100000)) $FSSTRESS_AVOID >/dev/null 2>&1 & -$XFS_SCRUB_PROG -d -T -v -n $SCRATCH_MNT >> $seqres.full - -killstress() { - sleep $(( 60 * TIME_FACTOR )) - $KILLALL_PROG -q $FSSTRESS_PROG -} - -echo "Concurrent scrub" -start=$(date +%s) -end=$((start + (60 * TIME_FACTOR) )) -killstress & -echo "Scrub started at $(date --date="@${start}"), ending at $(date --date="@${end}")" >> $seqres.full -while [ "$(date +%s)" -lt "$end" ]; do - $TIMEOUT_PROG -s TERM $(( end - $(date +%s) + 2 )) $XFS_SCRUB_PROG -d -T -v -n $SCRATCH_MNT >> $seqres.full 2>&1 -done - -echo "Test done" -echo "Scrub finished at $(date)" >> $seqres.full -$KILLALL_PROG -q $FSSTRESS_PROG +_scratch_xfs_stress_scrub -S '-n' # success, all done +echo Silence is golden status=0 exit diff --git a/tests/xfs/285.out b/tests/xfs/285.out index be6b49a9fb..ab12da9ae7 100644 --- a/tests/xfs/285.out +++ b/tests/xfs/285.out @@ -1,4 +1,2 @@ QA output created by 285 -Format and populate -Concurrent scrub -Test done +Silence is golden diff --git a/tests/xfs/286 b/tests/xfs/286 index 7edc9c427b..0f61a924db 100755 --- a/tests/xfs/286 +++ b/tests/xfs/286 @@ -4,57 +4,35 @@ # # FS QA Test No. 286 # -# Race fio and xfs_scrub for a while to see if we crash or livelock. +# Race fsstress and xfs_scrub in force-repair mode for a while to see if we +# crash or livelock. # . ./common/preamble -_begin_fstest dangerous_fuzzers dangerous_scrub dangerous_online_repair +_begin_fstest online_repair dangerous_fsstress_repair +_cleanup() { + cd / + _scratch_xfs_stress_scrub_cleanup &> /dev/null + rm -r -f $tmp.* +} _register_cleanup "_cleanup" BUS # Import common functions. . ./common/filter . ./common/fuzzy . ./common/inject +. ./common/xfs # real QA test starts here _supported_fs xfs -_require_test_program "feature" -_require_command "$KILLALL_PROG" killall -_require_command "$TIMEOUT_PROG" timeout -_require_scrub _require_scratch -# xfs_scrub will turn on error injection itself -_require_xfs_io_error_injection "force_repair" +_require_xfs_stress_online_repair -echo "Format and populate" _scratch_mkfs > "$seqres.full" 2>&1 _scratch_mount - -STRESS_DIR="$SCRATCH_MNT/testdir" -mkdir -p $STRESS_DIR - -cpus=$(( $($here/src/feature -o) * 4 * LOAD_FACTOR)) -$FSSTRESS_PROG -d $STRESS_DIR -p $cpus -n $((cpus * 100000)) $FSSTRESS_AVOID >/dev/null 2>&1 & -$XFS_SCRUB_PROG -d -T -v -n $SCRATCH_MNT >> $seqres.full - -killstress() { - sleep $(( 60 * TIME_FACTOR )) - $KILLALL_PROG -q $FSSTRESS_PROG -} - -echo "Concurrent repair" -start=$(date +%s) -end=$((start + (60 * TIME_FACTOR) )) -killstress & -echo "Repair started at $(date --date="@${start}"), ending at $(date --date="@${end}")" >> $seqres.full -while [ "$(date +%s)" -lt "$end" ]; do - XFS_SCRUB_FORCE_REPAIR=1 $TIMEOUT_PROG -s TERM $(( end - $(date +%s) + 2 )) $XFS_SCRUB_PROG -d -T -v $SCRATCH_MNT >> $seqres.full -done - -echo "Test done" -echo "Repair finished at $(date)" >> $seqres.full -$KILLALL_PROG -q $FSSTRESS_PROG +_scratch_xfs_stress_online_repair -S '-k' # success, all done +echo Silence is golden status=0 exit diff --git a/tests/xfs/286.out b/tests/xfs/286.out index 80e12b5495..35c4800694 100644 --- a/tests/xfs/286.out +++ b/tests/xfs/286.out @@ -1,4 +1,2 @@ QA output created by 286 -Format and populate -Concurrent repair -Test done +Silence is golden ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [NYE DELUGE 1/4] xfs: all pending online scrub improvements 2022-12-30 21:13 [NYE DELUGE 1/4] xfs: all pending online scrub improvements Darrick J. Wong ` (2 preceding siblings ...) 2022-12-30 22:13 ` [PATCHSET v24.0 0/2] fstests: race online scrub with mount state changes Darrick J. Wong @ 2023-01-13 20:10 ` Zorro Lang 2023-01-13 21:28 ` Darrick J. Wong 3 siblings, 1 reply; 32+ messages in thread From: Zorro Lang @ 2023-01-13 20:10 UTC (permalink / raw) To: Darrick J. Wong; +Cc: xfs, fstests On Fri, Dec 30, 2022 at 01:13:21PM -0800, Darrick J. Wong wrote: > Hi everyone, > > As I've mentioned several times throughout 2022, I would like to merge > the online fsck feature in time for the 2023 LTS kernel. The first big > step in this process is to merge all the pending bug fixes, validation > improvements, and general reorganization of the existing metadata > scrubbing functionality. > > This first deluge starts with the design document for the entirety of > the online fsck feature. The design doc should be familiar to most of > you, as it's been on the list for review for months already. It > outlines in brief the problems we're trying to solve, the use cases and > testing plan, and the fundamental data structures and algorithms > underlying the entire feature. > > After that come all the code changes to wrap up the metadata checking > part of the feature. The biggest piece here is the scrub drains that > allow scrub to quiesce deferred ops targeting AGs so that it can > cross-reference recordsets. Most of the rest is tweaking the btree code > so that we can do keyspace scans to look for conflicting records. > > For this review, I would like people to focus the following: > > - Are the major subsystems sufficiently documented that you could figure > out what the code does? > > - Do you see any problems that are severe enough to cause long term > support hassles? (e.g. bad API design, writing weird metadata to disk) > > - Can you spot mis-interactions between the subsystems? > > - What were my blind spots in devising this feature? > > - Are there missing pieces that you'd like to help build? > > - Can I just merge all of this? > > The one thing that is /not/ in scope for this review are requests for > more refactoring of existing subsystems. While there are usually valid > arguments for performing such cleanups, those are separate tasks to be > prioritized separately. I will get to them after merging online fsck. > > I've been running daily online scrubs of every computer I own for the > last five years, which has helped me iron out real problems in (limited > scope) production. All issues observed in that time have been corrected > in this submission. The 3 fstests patchsets of the [NYE DELUGE 1/4] look good to me. And I didn't find more critical issues after Darrick fixed that "group name missing" problem. By testing it a whole week, I decide to merge this 3 patchsets this weekend, then we can shift to later patchsets are waiting for review and merge. Reviewed-by: Zorro Lang <zlang@redhat.com> Thanks, Zorro > > As a warning, the patches will likely take several days to trickle in. > All four patch deluges are based off kernel 6.2-rc1, xfsprogs 6.1, and > fstests 2022-12-25. > > Thank you all for your participation in the XFS community. Have a safe > New Years, and I'll see you all next year! > > --D > ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [NYE DELUGE 1/4] xfs: all pending online scrub improvements 2023-01-13 20:10 ` [NYE DELUGE 1/4] xfs: all pending online scrub improvements Zorro Lang @ 2023-01-13 21:28 ` Darrick J. Wong 0 siblings, 0 replies; 32+ messages in thread From: Darrick J. Wong @ 2023-01-13 21:28 UTC (permalink / raw) To: Zorro Lang; +Cc: xfs, fstests On Sat, Jan 14, 2023 at 04:10:33AM +0800, Zorro Lang wrote: > On Fri, Dec 30, 2022 at 01:13:21PM -0800, Darrick J. Wong wrote: > > Hi everyone, > > > > As I've mentioned several times throughout 2022, I would like to merge > > the online fsck feature in time for the 2023 LTS kernel. The first big > > step in this process is to merge all the pending bug fixes, validation > > improvements, and general reorganization of the existing metadata > > scrubbing functionality. > > > > This first deluge starts with the design document for the entirety of > > the online fsck feature. The design doc should be familiar to most of > > you, as it's been on the list for review for months already. It > > outlines in brief the problems we're trying to solve, the use cases and > > testing plan, and the fundamental data structures and algorithms > > underlying the entire feature. > > > > After that come all the code changes to wrap up the metadata checking > > part of the feature. The biggest piece here is the scrub drains that > > allow scrub to quiesce deferred ops targeting AGs so that it can > > cross-reference recordsets. Most of the rest is tweaking the btree code > > so that we can do keyspace scans to look for conflicting records. > > > > For this review, I would like people to focus the following: > > > > - Are the major subsystems sufficiently documented that you could figure > > out what the code does? > > > > - Do you see any problems that are severe enough to cause long term > > support hassles? (e.g. bad API design, writing weird metadata to disk) > > > > - Can you spot mis-interactions between the subsystems? > > > > - What were my blind spots in devising this feature? > > > > - Are there missing pieces that you'd like to help build? > > > > - Can I just merge all of this? > > > > The one thing that is /not/ in scope for this review are requests for > > more refactoring of existing subsystems. While there are usually valid > > arguments for performing such cleanups, those are separate tasks to be > > prioritized separately. I will get to them after merging online fsck. > > > > I've been running daily online scrubs of every computer I own for the > > last five years, which has helped me iron out real problems in (limited > > scope) production. All issues observed in that time have been corrected > > in this submission. > > The 3 fstests patchsets of the [NYE DELUGE 1/4] look good to me. And I didn't > find more critical issues after Darrick fixed that "group name missing" problem. > By testing it a whole week, I decide to merge this 3 patchsets this weekend, > then we can shift to later patchsets are waiting for review and merge. > > Reviewed-by: Zorro Lang <zlang@redhat.com> Ok, thanks! --D > Thanks, > Zorro > > > > > As a warning, the patches will likely take several days to trickle in. > > All four patch deluges are based off kernel 6.2-rc1, xfsprogs 6.1, and > > fstests 2022-12-25. > > > > Thank you all for your participation in the XFS community. Have a safe > > New Years, and I'll see you all next year! > > > > --D > > > ^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2023-01-13 21:28 UTC | newest] Thread overview: 32+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-12-30 21:13 [NYE DELUGE 1/4] xfs: all pending online scrub improvements Darrick J. Wong 2022-12-30 22:12 ` [PATCHSET v24.0 00/16] fstests: refactor online fsck stress tests Darrick J. Wong 2022-12-30 22:12 ` [PATCH 04/16] fuzzy: clean up scrub stress programs quietly Darrick J. Wong 2022-12-30 22:12 ` [PATCH 07/16] fuzzy: give each test local control over what scrub stress tests get run Darrick J. Wong 2022-12-30 22:12 ` [PATCH 01/16] xfs/422: create a new test group for fsstress/repair racers Darrick J. Wong 2022-12-30 22:12 ` [PATCH 06/16] fuzzy: explicitly check for common/inject in _require_xfs_stress_online_repair Darrick J. Wong 2022-12-30 22:12 ` [PATCH 05/16] fuzzy: rework scrub stress output filtering Darrick J. Wong 2022-12-30 22:12 ` [PATCH 02/16] xfs/422: move the fsstress/freeze/scrub racing logic to common/fuzzy Darrick J. Wong 2022-12-30 22:12 ` [PATCH 03/16] xfs/422: rework feature detection so we only test-format scratch once Darrick J. Wong 2022-12-30 22:12 ` [PATCH 13/16] fuzzy: clean up frozen fses after scrub stress testing Darrick J. Wong 2022-12-30 22:12 ` [PATCH 12/16] fuzzy: increase operation count for each fsstress invocation Darrick J. Wong 2023-01-13 19:55 ` Zorro Lang 2023-01-13 21:28 ` Darrick J. Wong 2022-12-30 22:12 ` [PATCH 11/16] fuzzy: clear out the scratch filesystem if it's too full Darrick J. Wong 2022-12-30 22:12 ` [PATCH 10/16] fuzzy: abort scrub stress testing if the scratch fs went down Darrick J. Wong 2022-12-30 22:12 ` [PATCH 14/16] fuzzy: make freezing optional for scrub stress tests Darrick J. Wong 2022-12-30 22:12 ` [PATCH 08/16] fuzzy: test the scrub stress subcommands before looping Darrick J. Wong 2022-12-30 22:12 ` [PATCH 09/16] fuzzy: make scrub stress loop control more robust Darrick J. Wong 2022-12-30 22:12 ` [PATCH 16/16] fuzzy: delay the start of the scrub loop when stress-testing scrub Darrick J. Wong 2022-12-30 22:12 ` [PATCH 15/16] fuzzy: allow substitution of AG numbers when configuring scrub stress test Darrick J. Wong 2022-12-30 22:12 ` [PATCHSET v24.0 0/3] fstests: refactor GETFSMAP stress tests Darrick J. Wong 2022-12-30 22:12 ` [PATCH 1/3] fuzzy: enhance scrub stress testing to use fsx Darrick J. Wong 2023-01-05 5:49 ` Zorro Lang 2023-01-05 18:28 ` Darrick J. Wong 2023-01-05 18:28 ` [PATCH v24.1 " Darrick J. Wong 2022-12-30 22:12 ` [PATCH 2/3] fuzzy: refactor fsmap stress test to use our helper functions Darrick J. Wong 2022-12-30 22:12 ` [PATCH 3/3] xfs: race fsmap with readonly remounts to detect crash or livelock Darrick J. Wong 2022-12-30 22:13 ` [PATCHSET v24.0 0/2] fstests: race online scrub with mount state changes Darrick J. Wong 2022-12-30 22:13 ` [PATCH 2/2] xfs: stress test xfs_scrub(8) with freeze and ro-remount loops Darrick J. Wong 2022-12-30 22:13 ` [PATCH 1/2] xfs: stress test xfs_scrub(8) with fsstress Darrick J. Wong 2023-01-13 20:10 ` [NYE DELUGE 1/4] xfs: all pending online scrub improvements Zorro Lang 2023-01-13 21:28 ` Darrick J. Wong
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox