* [PATCH 1/2] check: try to fix the test device if it gets corrupted @ 2023-04-14 22:11 Leah Rumancik 2023-04-14 22:11 ` [PATCH 2/2] check: _check_filesystems for errors even if test failed Leah Rumancik 2023-04-15 0:35 ` [PATCH 1/2] check: try to fix the test device if it gets corrupted Darrick J. Wong 0 siblings, 2 replies; 7+ messages in thread From: Leah Rumancik @ 2023-04-14 22:11 UTC (permalink / raw) To: fstests; +Cc: tytso, djwong From: Theodore Ts'o <tytso@mit.edu> If the test device gets corrupted all subsequent tests will fail. To prevent this from causing all subsequent tests to be useless, try repair the file system on TEST_DEV if possible. We don't need to do this with the scratch device since that file system gets recreated each time anyway. Signed-off-by: Theodore Ts'o <tytso@mit.edu> --- check | 7 ++++++- common/rc | 41 +++++++++++++++++++++++++++++++++++++++++ common/xfs | 12 ++++++++++++ 3 files changed, 59 insertions(+), 1 deletion(-) diff --git a/check b/check index 1a58a2b2..befbf465 100755 --- a/check +++ b/check @@ -536,7 +536,12 @@ _check_filesystems() local ret=0 if [ -f ${RESULT_DIR}/require_test ]; then - _check_test_fs || ret=1 + if ! _check_test_fs ; then + ret=1 + echo "Trying to repair broken TEST_DEV file system" + _repair_test_fs + _test_mount + fi rm -f ${RESULT_DIR}/require_test* else _test_unmount 2> /dev/null diff --git a/common/rc b/common/rc index 90749343..e8fc7e86 100644 --- a/common/rc +++ b/common/rc @@ -1199,6 +1199,47 @@ _repair_scratch_fs() esac } +_repair_test_fs() +{ + case $FSTYP in + xfs) + _repair_xfs_test_fs "$@" >$tmp.repair 2>&1 + res=$? + if [ "$res" -ne 0 ]; then + echo "xfs_repair returns $res; replay log?" >>$tmp.repair + _test_mount + res=$? + if [ $res -gt 0 ]; then + echo "mount returns $res; zap log?" >>$tmp.repair + _xfs_repair_test_fs -L >>$tmp.repair 2>&1 + echo "log zap returns $?" >> $tmp.repair + else + umount "$TEST_DEV" + fi + _xfs_repair_test_fs "$@" >>$tmp.repair 2>&1 + res=$? + fi + ;; + *) + # Let's hope fsck -y suffices... + fsck -t $FSTYP -fy $TEST_DEV >$tmp.repair 2>&1 + res=$? + if test "$res" -lt 4 ; then + res=0 + fi + ;; + esac + if [ $res -ne 0 ]; then + _log_err "_repair_test_fs: failed, err=$res" + echo "*** fsck.$FSTYP output ***" >>$seqres.full + cat $tmp.repair >>$seqres.full + echo "*** end fsck.$FSTYP output" >>$seqres.full + + fi + rm -f $tmp.repair + return $res +} + _get_pids_by_name() { if [ $# -ne 1 ] diff --git a/common/xfs b/common/xfs index e8e4832c..4a130493 100644 --- a/common/xfs +++ b/common/xfs @@ -988,6 +988,18 @@ _check_xfs_test_fs() return $? } +# modeled after _scratch_xfs_repair +_repair_xfs_test_fs() +{ + TEST_OPTIONS="" + [ "$USE_EXTERNAL" = yes -a ! -z "$TEST_LOGDEV" ] && \ + TEST_OPTIONS="-l$TEST_LOGDEV" + [ "$USE_EXTERNAL" = yes -a ! -z "$TEST_RTDEV" ] && \ + TEST_OPTIONS=$TEST_OPTIONS" -r$TEST_RTDEV" + [ "$LARGE_TEST_DEV" = yes ] && TEST_OPTIONS=$TEST_OPTIONS" -t" + $XFS_REPAIR_PROG $TEST_OPTIONS $* $TEST_DEV +} + _require_xfs_test_rmapbt() { _require_test -- 2.40.0.634.g4ca3ef3211-goog ^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 2/2] check: _check_filesystems for errors even if test failed 2023-04-14 22:11 [PATCH 1/2] check: try to fix the test device if it gets corrupted Leah Rumancik @ 2023-04-14 22:11 ` Leah Rumancik 2023-04-15 0:41 ` Darrick J. Wong 2023-04-15 0:35 ` [PATCH 1/2] check: try to fix the test device if it gets corrupted Darrick J. Wong 1 sibling, 1 reply; 7+ messages in thread From: Leah Rumancik @ 2023-04-14 22:11 UTC (permalink / raw) To: fstests; +Cc: tytso, djwong, Leah Rumancik Previously, we would only run _check_filesystems to ensure that a test that appeared to pass did not have any filesystem corruption. However, in _check_filesystems, we also repair any errors found in the filesystem. Let's do this even if we already know the test failed so that subsequent tests aren't affected. Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com> --- check | 1 + 1 file changed, 1 insertion(+) diff --git a/check b/check index befbf465..c18f02ca 100755 --- a/check +++ b/check @@ -972,6 +972,7 @@ function run_section() # Even though we failed, there may be something interesting in # dmesg which can help debugging. _check_dmesg + (_adjust_oom_score 250; _check_filesystems) tc_status="fail" else # The test apparently passed, so check for corruption -- 2.40.0.634.g4ca3ef3211-goog ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] check: _check_filesystems for errors even if test failed 2023-04-14 22:11 ` [PATCH 2/2] check: _check_filesystems for errors even if test failed Leah Rumancik @ 2023-04-15 0:41 ` Darrick J. Wong 2023-04-17 17:19 ` Leah Rumancik 0 siblings, 1 reply; 7+ messages in thread From: Darrick J. Wong @ 2023-04-15 0:41 UTC (permalink / raw) To: Leah Rumancik; +Cc: fstests, tytso On Fri, Apr 14, 2023 at 03:11:33PM -0700, Leah Rumancik wrote: > Previously, we would only run _check_filesystems to ensure that a test > that appeared to pass did not have any filesystem corruption. However, > in _check_filesystems, we also repair any errors found in the filesystem. > Let's do this even if we already know the test failed so that subsequent > tests aren't affected. > > Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com> > --- > check | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/check b/check > index befbf465..c18f02ca 100755 > --- a/check > +++ b/check > @@ -972,6 +972,7 @@ function run_section() > # Even though we failed, there may be something interesting in > # dmesg which can help debugging. > _check_dmesg > + (_adjust_oom_score 250; _check_filesystems) Seeing as the test failed, do we care about the state of the scratch fs? Would it be sufficient only to clean up the test fs to avoid cascading damage? (Asking as someone who knows how impactful slow filesystem checking can be on fstests runtimes... ;)) --D > tc_status="fail" > else > # The test apparently passed, so check for corruption > -- > 2.40.0.634.g4ca3ef3211-goog > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] check: _check_filesystems for errors even if test failed 2023-04-15 0:41 ` Darrick J. Wong @ 2023-04-17 17:19 ` Leah Rumancik 2023-04-17 20:08 ` Leah Rumancik 0 siblings, 1 reply; 7+ messages in thread From: Leah Rumancik @ 2023-04-17 17:19 UTC (permalink / raw) To: Darrick J. Wong; +Cc: Leah Rumancik, fstests, tytso Sure, will send out another set. Thanks! On Fri, Apr 14, 2023 at 5:41 PM Darrick J. Wong <djwong@kernel.org> wrote: > > On Fri, Apr 14, 2023 at 03:11:33PM -0700, Leah Rumancik wrote: > > Previously, we would only run _check_filesystems to ensure that a test > > that appeared to pass did not have any filesystem corruption. However, > > in _check_filesystems, we also repair any errors found in the filesystem. > > Let's do this even if we already know the test failed so that subsequent > > tests aren't affected. > > > > Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com> > > --- > > check | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/check b/check > > index befbf465..c18f02ca 100755 > > --- a/check > > +++ b/check > > @@ -972,6 +972,7 @@ function run_section() > > # Even though we failed, there may be something interesting in > > # dmesg which can help debugging. > > _check_dmesg > > + (_adjust_oom_score 250; _check_filesystems) > > Seeing as the test failed, do we care about the state of the scratch fs? > Would it be sufficient only to clean up the test fs to avoid cascading > damage? > > (Asking as someone who knows how impactful slow filesystem checking can > be on fstests runtimes... ;)) > > --D > > > tc_status="fail" > > else > > # The test apparently passed, so check for corruption > > -- > > 2.40.0.634.g4ca3ef3211-goog > > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] check: _check_filesystems for errors even if test failed 2023-04-17 17:19 ` Leah Rumancik @ 2023-04-17 20:08 ` Leah Rumancik 2023-04-18 4:49 ` Darrick J. Wong 0 siblings, 1 reply; 7+ messages in thread From: Leah Rumancik @ 2023-04-17 20:08 UTC (permalink / raw) To: Darrick J. Wong; +Cc: Leah Rumancik, fstests, tytso Well actually, if you are using DUMP_CORRUPT_FS, you'd probably want _check_filesystems to run on the scratch fs as well. This shouldn't add that much time if most of your tests are passing :) but I could make this only run for scratch fs on failure only if DUMP_CORRUPT_FS is set? On Mon, Apr 17, 2023 at 10:19 AM Leah Rumancik <lrumancik@google.com> wrote: > > Sure, will send out another set. Thanks! > > On Fri, Apr 14, 2023 at 5:41 PM Darrick J. Wong <djwong@kernel.org> wrote: > > > > On Fri, Apr 14, 2023 at 03:11:33PM -0700, Leah Rumancik wrote: > > > Previously, we would only run _check_filesystems to ensure that a test > > > that appeared to pass did not have any filesystem corruption. However, > > > in _check_filesystems, we also repair any errors found in the filesystem. > > > Let's do this even if we already know the test failed so that subsequent > > > tests aren't affected. > > > > > > Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com> > > > --- > > > check | 1 + > > > 1 file changed, 1 insertion(+) > > > > > > diff --git a/check b/check > > > index befbf465..c18f02ca 100755 > > > --- a/check > > > +++ b/check > > > @@ -972,6 +972,7 @@ function run_section() > > > # Even though we failed, there may be something interesting in > > > # dmesg which can help debugging. > > > _check_dmesg > > > + (_adjust_oom_score 250; _check_filesystems) > > > > Seeing as the test failed, do we care about the state of the scratch fs? > > Would it be sufficient only to clean up the test fs to avoid cascading > > damage? > > > > (Asking as someone who knows how impactful slow filesystem checking can > > be on fstests runtimes... ;)) > > > > --D > > > > > tc_status="fail" > > > else > > > # The test apparently passed, so check for corruption > > > -- > > > 2.40.0.634.g4ca3ef3211-goog > > > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] check: _check_filesystems for errors even if test failed 2023-04-17 20:08 ` Leah Rumancik @ 2023-04-18 4:49 ` Darrick J. Wong 0 siblings, 0 replies; 7+ messages in thread From: Darrick J. Wong @ 2023-04-18 4:49 UTC (permalink / raw) To: Leah Rumancik; +Cc: Leah Rumancik, fstests, tytso On Mon, Apr 17, 2023 at 01:08:20PM -0700, Leah Rumancik wrote: > Well actually, if you are using DUMP_CORRUPT_FS, you'd probably want > _check_filesystems to run on the scratch fs as well. ooh, good point > This shouldn't > add that much time if most of your tests are passing :) but I could > make this only run for scratch fs on failure only if DUMP_CORRUPT_FS > is set? Hmm. On second thought, you've convinced me to go along with this. The test failed, let's see if corruption happened --> HAPPY DANCE. Reviewed-by: Darrick J. Wong <djwong@kernel.org> --D > > On Mon, Apr 17, 2023 at 10:19 AM Leah Rumancik <lrumancik@google.com> wrote: > > > > Sure, will send out another set. Thanks! > > > > On Fri, Apr 14, 2023 at 5:41 PM Darrick J. Wong <djwong@kernel.org> wrote: > > > > > > On Fri, Apr 14, 2023 at 03:11:33PM -0700, Leah Rumancik wrote: > > > > Previously, we would only run _check_filesystems to ensure that a test > > > > that appeared to pass did not have any filesystem corruption. However, > > > > in _check_filesystems, we also repair any errors found in the filesystem. > > > > Let's do this even if we already know the test failed so that subsequent > > > > tests aren't affected. > > > > > > > > Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com> > > > > --- > > > > check | 1 + > > > > 1 file changed, 1 insertion(+) > > > > > > > > diff --git a/check b/check > > > > index befbf465..c18f02ca 100755 > > > > --- a/check > > > > +++ b/check > > > > @@ -972,6 +972,7 @@ function run_section() > > > > # Even though we failed, there may be something interesting in > > > > # dmesg which can help debugging. > > > > _check_dmesg > > > > + (_adjust_oom_score 250; _check_filesystems) > > > > > > Seeing as the test failed, do we care about the state of the scratch fs? > > > Would it be sufficient only to clean up the test fs to avoid cascading > > > damage? > > > > > > (Asking as someone who knows how impactful slow filesystem checking can > > > be on fstests runtimes... ;)) > > > > > > --D > > > > > > > tc_status="fail" > > > > else > > > > # The test apparently passed, so check for corruption > > > > -- > > > > 2.40.0.634.g4ca3ef3211-goog > > > > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] check: try to fix the test device if it gets corrupted 2023-04-14 22:11 [PATCH 1/2] check: try to fix the test device if it gets corrupted Leah Rumancik 2023-04-14 22:11 ` [PATCH 2/2] check: _check_filesystems for errors even if test failed Leah Rumancik @ 2023-04-15 0:35 ` Darrick J. Wong 1 sibling, 0 replies; 7+ messages in thread From: Darrick J. Wong @ 2023-04-15 0:35 UTC (permalink / raw) To: Leah Rumancik; +Cc: fstests, tytso On Fri, Apr 14, 2023 at 03:11:32PM -0700, Leah Rumancik wrote: > From: Theodore Ts'o <tytso@mit.edu> > > If the test device gets corrupted all subsequent tests will fail. To > prevent this from causing all subsequent tests to be useless, try > repair the file system on TEST_DEV if possible. We don't need to do > this with the scratch device since that file system gets recreated > each time anyway. > > Signed-off-by: Theodore Ts'o <tytso@mit.edu> > --- > check | 7 ++++++- > common/rc | 41 +++++++++++++++++++++++++++++++++++++++++ > common/xfs | 12 ++++++++++++ > 3 files changed, 59 insertions(+), 1 deletion(-) > > diff --git a/check b/check > index 1a58a2b2..befbf465 100755 > --- a/check > +++ b/check > @@ -536,7 +536,12 @@ _check_filesystems() > local ret=0 > > if [ -f ${RESULT_DIR}/require_test ]; then > - _check_test_fs || ret=1 > + if ! _check_test_fs ; then > + ret=1 > + echo "Trying to repair broken TEST_DEV file system" > + _repair_test_fs > + _test_mount > + fi > rm -f ${RESULT_DIR}/require_test* > else > _test_unmount 2> /dev/null > diff --git a/common/rc b/common/rc > index 90749343..e8fc7e86 100644 > --- a/common/rc > +++ b/common/rc > @@ -1199,6 +1199,47 @@ _repair_scratch_fs() > esac > } > > +_repair_test_fs() > +{ > + case $FSTYP in > + xfs) > + _repair_xfs_test_fs "$@" >$tmp.repair 2>&1 > + res=$? > + if [ "$res" -ne 0 ]; then > + echo "xfs_repair returns $res; replay log?" >>$tmp.repair > + _test_mount > + res=$? > + if [ $res -gt 0 ]; then > + echo "mount returns $res; zap log?" >>$tmp.repair > + _xfs_repair_test_fs -L >>$tmp.repair 2>&1 > + echo "log zap returns $?" >> $tmp.repair > + else > + umount "$TEST_DEV" > + fi > + _xfs_repair_test_fs "$@" >>$tmp.repair 2>&1 > + res=$? > + fi > + ;; > + *) > + # Let's hope fsck -y suffices... > + fsck -t $FSTYP -fy $TEST_DEV >$tmp.repair 2>&1 > + res=$? > + if test "$res" -lt 4 ; then > + res=0 > + fi > + ;; > + esac > + if [ $res -ne 0 ]; then > + _log_err "_repair_test_fs: failed, err=$res" > + echo "*** fsck.$FSTYP output ***" >>$seqres.full > + cat $tmp.repair >>$seqres.full > + echo "*** end fsck.$FSTYP output" >>$seqres.full > + > + fi > + rm -f $tmp.repair > + return $res > +} > + > _get_pids_by_name() > { > if [ $# -ne 1 ] > diff --git a/common/xfs b/common/xfs > index e8e4832c..4a130493 100644 > --- a/common/xfs > +++ b/common/xfs > @@ -988,6 +988,18 @@ _check_xfs_test_fs() > return $? > } > > +# modeled after _scratch_xfs_repair > +_repair_xfs_test_fs() Dumb bikeshed: Can this be named _test_xfs_repair? :D With that changed, Reviewed-by: Darrick J. Wong <djwong@kernel.org> --D > +{ > + TEST_OPTIONS="" > + [ "$USE_EXTERNAL" = yes -a ! -z "$TEST_LOGDEV" ] && \ > + TEST_OPTIONS="-l$TEST_LOGDEV" > + [ "$USE_EXTERNAL" = yes -a ! -z "$TEST_RTDEV" ] && \ > + TEST_OPTIONS=$TEST_OPTIONS" -r$TEST_RTDEV" > + [ "$LARGE_TEST_DEV" = yes ] && TEST_OPTIONS=$TEST_OPTIONS" -t" > + $XFS_REPAIR_PROG $TEST_OPTIONS $* $TEST_DEV > +} > + > _require_xfs_test_rmapbt() > { > _require_test > -- > 2.40.0.634.g4ca3ef3211-goog > ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2023-04-18 4:49 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-04-14 22:11 [PATCH 1/2] check: try to fix the test device if it gets corrupted Leah Rumancik 2023-04-14 22:11 ` [PATCH 2/2] check: _check_filesystems for errors even if test failed Leah Rumancik 2023-04-15 0:41 ` Darrick J. Wong 2023-04-17 17:19 ` Leah Rumancik 2023-04-17 20:08 ` Leah Rumancik 2023-04-18 4:49 ` Darrick J. Wong 2023-04-15 0:35 ` [PATCH 1/2] check: try to fix the test device if it gets corrupted Darrick J. Wong
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox