* [RFC PATCH] check: try to fix the test device if it gets corrupted @ 2017-03-02 23:20 Theodore Ts'o 2017-03-03 9:03 ` Eryu Guan 0 siblings, 1 reply; 14+ messages in thread From: Theodore Ts'o @ 2017-03-02 23:20 UTC (permalink / raw) To: fstests; +Cc: Theodore Ts'o If the test device gets corrupted all subsequent tests will fail. To prevent this from causing all subsequent tests to be useless, try repair the file system on TEST_DEV if possible. We don't need to do this with the scratch device since that file system gets recreated each time anyway. Signed-off-by: Theodore Ts'o <tytso@mit.edu> --- This is a quick hack that I needed while debubgging some research code[1]. It turns out that when the grad student is up against a paper deadline is an, this becomes amazing evolutionary process which creates file system modifications which are optimized for running postmark and file bench --- and falls over very easily otherwise. So when TEST_DEV is getting corrupted very frequently, it's nice to be able to continue running other tests in the quick or auto group. So please consider this a proof-concept-patch; would people consider it worthwhile to have this in xfstests upstream? check | 6 +++++- common/rc | 22 ++++++++++++++++++++++ 2 files changed, 27 insertions(+), 1 deletion(-) diff --git a/check b/check index 5d7f75c4..62f8ff41 100755 --- a/check +++ b/check @@ -455,7 +455,11 @@ _summary() _check_filesystems() { if [ -f ${RESULT_DIR}/require_test ]; then - _check_test_fs || err=true + if ! _check_test_fs ; then + err=true + echo "Trying to repair broken TEST_DEV file system" + _repair_test_fs + fi rm -f ${RESULT_DIR}/require_test* fi if [ -f ${RESULT_DIR}/require_scratch ]; then diff --git a/common/rc b/common/rc index 2be55e6f..60af86fc 100644 --- a/common/rc +++ b/common/rc @@ -1098,6 +1098,28 @@ _repair_scratch_fs() esac } +_repair_test_fs() +{ + case $FSTYP in + ext2|ext3|ext4) + fsck -t $FSTYP -fy $TEST_DEV >$tmp.repair 2>&1 + if test $? -ge 4 ; then + echo "_repair_test_fs: couldn't repair filesystem on $device (see $seqres.full)" + + echo "_repair_test_fs: couldn't repair filesystem on $device" >>$seqres.full + echo "*** fsck.$FSTYP output ***" >>$seqres.full + cat $tmp.repair >>$seqres.full + echo "*** end fsck.$FSTYP output" >>$seqres.full + return 1 + fi + return 0 + ;; + *) + return 1 + ;; + esac +} + _get_pids_by_name() { if [ $# -ne 1 ] -- 2.11.0.rc0.7.gbe5a750 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [RFC PATCH] check: try to fix the test device if it gets corrupted 2017-03-02 23:20 [RFC PATCH] check: try to fix the test device if it gets corrupted Theodore Ts'o @ 2017-03-03 9:03 ` Eryu Guan 2017-03-03 17:21 ` Darrick J. Wong 0 siblings, 1 reply; 14+ messages in thread From: Eryu Guan @ 2017-03-03 9:03 UTC (permalink / raw) To: Theodore Ts'o; +Cc: fstests On Thu, Mar 02, 2017 at 06:20:50PM -0500, Theodore Ts'o wrote: > If the test device gets corrupted all subsequent tests will fail. To > prevent this from causing all subsequent tests to be useless, try > repair the file system on TEST_DEV if possible. We don't need to do > this with the scratch device since that file system gets recreated > each time anyway. > > Signed-off-by: Theodore Ts'o <tytso@mit.edu> > --- > > This is a quick hack that I needed while debubgging some research > code[1]. It turns out that when the grad student is up against a > paper deadline is an, this becomes amazing evolutionary process which > creates file system modifications which are optimized for running > postmark and file bench --- and falls over very easily otherwise. So > when TEST_DEV is getting corrupted very frequently, it's nice to be > able to continue running other tests in the quick or auto group. > > So please consider this a proof-concept-patch; would people consider > it worthwhile to have this in xfstests upstream? This idea looks reasonable to me, and TEST_DEV is supposed to be aging, perhaps being currupted & repaired is kind of aging too :) Thanks, Eryu ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH] check: try to fix the test device if it gets corrupted 2017-03-03 9:03 ` Eryu Guan @ 2017-03-03 17:21 ` Darrick J. Wong 2017-03-03 23:01 ` Theodore Ts'o 0 siblings, 1 reply; 14+ messages in thread From: Darrick J. Wong @ 2017-03-03 17:21 UTC (permalink / raw) To: Eryu Guan; +Cc: Theodore Ts'o, fstests On Fri, Mar 03, 2017 at 05:03:32PM +0800, Eryu Guan wrote: > On Thu, Mar 02, 2017 at 06:20:50PM -0500, Theodore Ts'o wrote: > > If the test device gets corrupted all subsequent tests will fail. To > > prevent this from causing all subsequent tests to be useless, try > > repair the file system on TEST_DEV if possible. We don't need to do > > this with the scratch device since that file system gets recreated > > each time anyway. > > > > Signed-off-by: Theodore Ts'o <tytso@mit.edu> > > --- > > > > This is a quick hack that I needed while debubgging some research > > code[1]. It turns out that when the grad student is up against a > > paper deadline is an, this becomes amazing evolutionary process which > > creates file system modifications which are optimized for running > > postmark and file bench --- and falls over very easily otherwise. So > > when TEST_DEV is getting corrupted very frequently, it's nice to be > > able to continue running other tests in the quick or auto group. > > > > So please consider this a proof-concept-patch; would people consider > > it worthwhile to have this in xfstests upstream? > > This idea looks reasonable to me, and TEST_DEV is supposed to be aging, > perhaps being currupted & repaired is kind of aging too :) <shrug> The test device isn't supposed to get corrupted, since it (at least in theory) should be an old filesystem. That said, I suppose there's little point in banging around with a corrupt test fs. Maybe we could go further and stop running if there's unfixable corruption? --D > > Thanks, > Eryu > -- > To unsubscribe from this list: send the line "unsubscribe fstests" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH] check: try to fix the test device if it gets corrupted 2017-03-03 17:21 ` Darrick J. Wong @ 2017-03-03 23:01 ` Theodore Ts'o 2017-03-27 1:48 ` Theodore Ts'o 0 siblings, 1 reply; 14+ messages in thread From: Theodore Ts'o @ 2017-03-03 23:01 UTC (permalink / raw) To: Darrick J. Wong; +Cc: Eryu Guan, fstests On Fri, Mar 03, 2017 at 09:21:57AM -0800, Darrick J. Wong wrote: > > <shrug> The test device isn't supposed to get corrupted, since it (at > least in theory) should be an old filesystem. That said, I suppose > there's little point in banging around with a corrupt test fs. Maybe we > could go further and stop running if there's unfixable corruption? Yes, that was the other alternative I considered. In my case, though, since I'm trying to get a sense of how many failures I have to deal with, I really wanted a "make -k" behavior that would continue after the first failure. After all, all I was going to do was manually run fsck, and then continue the run --- so we might as well have the check script do it automatically and then allow things to continue. We could make it be configurable, via a command-line option. The -k option isn't taken so we could have check -k that works like make -k if you think that's better. OTOH, perhaps making -k the default behaviour is actually the better way to go, and in that case, maybe it's not worth having the command-line flag? - Ted ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH] check: try to fix the test device if it gets corrupted 2017-03-03 23:01 ` Theodore Ts'o @ 2017-03-27 1:48 ` Theodore Ts'o 2017-03-27 8:51 ` Eryu Guan 0 siblings, 1 reply; 14+ messages in thread From: Theodore Ts'o @ 2017-03-27 1:48 UTC (permalink / raw) To: Darrick J. Wong; +Cc: Eryu Guan, fstests On Fri, Mar 03, 2017 at 06:01:29PM -0500, Theodore Ts'o wrote: > On Fri, Mar 03, 2017 at 09:21:57AM -0800, Darrick J. Wong wrote: > > > > <shrug> The test device isn't supposed to get corrupted, since it (at > > least in theory) should be an old filesystem. That said, I suppose > > there's little point in banging around with a corrupt test fs. Maybe we > > could go further and stop running if there's unfixable corruption? > > Yes, that was the other alternative I considered. In my case, though, > since I'm trying to get a sense of how many failures I have to deal > with, I really wanted a "make -k" behavior that would continue after > the first failure. After all, all I was going to do was manually run > fsck, and then continue the run --- so we might as well have the check > script do it automatically and then allow things to continue. > > We could make it be configurable, via a command-line option. The -k > option isn't taken so we could have check -k that works like make -k > if you think that's better. OTOH, perhaps making -k the default > behaviour is actually the better way to go, and in that case, maybe > it's not worth having the command-line flag? Eryu, do you have any preferences or comments about how you'd like me to modify this patch for upstreaming? (Attached is my current version of the patch). Thanks!! - Ted commit 727c737d1f0a40288fc897c0263fbf8e7a5db8b3 Author: Theodore Ts'o <tytso@mit.edu> Date: Wed Mar 1 19:54:08 2017 -0500 check: try to fix the test device if it gets corrupted If the test device gets corrupted all subsequent tests will fail. To prevent this from causing all subsequent tests to be useless, try repair the file system on TEST_DEV if possible. We don't need to do this with the scratch device since that file system gets recreated each time anyway. Signed-off-by: Theodore Ts'o <tytso@mit.edu> diff --git a/check b/check index 2fcf385f..d253f744 100755 --- a/check +++ b/check @@ -472,7 +472,11 @@ _summary() _check_filesystems() { if [ -f ${RESULT_DIR}/require_test ]; then - _check_test_fs || err=true + if ! _check_test_fs ; then + err=true + echo "Trying to repair broken TEST_DEV file system" + _repair_test_fs + fi rm -f ${RESULT_DIR}/require_test* fi if [ -f ${RESULT_DIR}/require_scratch ]; then diff --git a/common/rc b/common/rc index 052f67aa..ce491f3f 100644 --- a/common/rc +++ b/common/rc @@ -1172,6 +1172,28 @@ _repair_scratch_fs() esac } +_repair_test_fs() +{ + case $FSTYP in + ext2|ext3|ext4) + fsck -t $FSTYP -fy $TEST_DEV >$tmp.repair 2>&1 + if test $? -ge 4 ; then + echo "_repair_test_fs: couldn't repair filesystem on $device (see $seqres.full)" + + echo "_repair_test_fs: couldn't repair filesystem on $device" >>$seqres.full + echo "*** fsck.$FSTYP output ***" >>$seqres.full + cat $tmp.repair >>$seqres.full + echo "*** end fsck.$FSTYP output" >>$seqres.full + return 1 + fi + return 0 + ;; + *) + return 1 + ;; + esac +} + _get_pids_by_name() { if [ $# -ne 1 ] -- To unsubscribe from this list: send the line "unsubscribe fstests" in the body of a message to majordomo@vger.kernel.org More majordomo info at https://urldefense.proofpoint.com/v2/url?u=http-3A__vger.kernel.org_majordomo-2Dinfo.html&d=DwIBAg&c=RoP1YumCXCgaWHvlZYR8PQcxBKCX5YTpkKY057SbK10&r=-yMrTV4jriXR7ieyzzjV-QgHBD0UDw8ixoR77aMeAHE&m=TzkdhjUlBq8jw7ezkl6BuvazC1ulmLiwN0_zxLN0WsE&s=BrM0YkzpJep1zV6T-XPTWCIjCTjMYv_CK_yzXPCXfKY&e= ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [RFC PATCH] check: try to fix the test device if it gets corrupted 2017-03-27 1:48 ` Theodore Ts'o @ 2017-03-27 8:51 ` Eryu Guan 2017-07-16 1:30 ` Theodore Ts'o 0 siblings, 1 reply; 14+ messages in thread From: Eryu Guan @ 2017-03-27 8:51 UTC (permalink / raw) To: Theodore Ts'o; +Cc: Darrick J. Wong, fstests Hi Ted, On Sun, Mar 26, 2017 at 09:48:02PM -0400, Theodore Ts'o wrote: > On Fri, Mar 03, 2017 at 06:01:29PM -0500, Theodore Ts'o wrote: > > On Fri, Mar 03, 2017 at 09:21:57AM -0800, Darrick J. Wong wrote: > > > > > > <shrug> The test device isn't supposed to get corrupted, since it (at > > > least in theory) should be an old filesystem. That said, I suppose > > > there's little point in banging around with a corrupt test fs. Maybe we > > > could go further and stop running if there's unfixable corruption? > > > > Yes, that was the other alternative I considered. In my case, though, > > since I'm trying to get a sense of how many failures I have to deal > > with, I really wanted a "make -k" behavior that would continue after > > the first failure. After all, all I was going to do was manually run > > fsck, and then continue the run --- so we might as well have the check > > script do it automatically and then allow things to continue. > > > > We could make it be configurable, via a command-line option. The -k > > option isn't taken so we could have check -k that works like make -k > > if you think that's better. OTOH, perhaps making -k the default > > behaviour is actually the better way to go, and in that case, maybe > > it's not worth having the command-line flag? > > Eryu, do you have any preferences or comments about how you'd like me > to modify this patch for upstreaming? (Attached is my current version > of the patch). > > Thanks!! Sorry I lost this thread, I thought I've replied but apparently I didn't.. I agreed with both of you and Darrick, I think we can try to repair the corrupted test fs, and if repair succeeds we can continue the test, and stop running the whole test if repair fails. > > - Ted > > commit 727c737d1f0a40288fc897c0263fbf8e7a5db8b3 > Author: Theodore Ts'o <tytso@mit.edu> > Date: Wed Mar 1 19:54:08 2017 -0500 > > check: try to fix the test device if it gets corrupted > > If the test device gets corrupted all subsequent tests will fail. To > prevent this from causing all subsequent tests to be useless, try > repair the file system on TEST_DEV if possible. We don't need to do > this with the scratch device since that file system gets recreated > each time anyway. > > Signed-off-by: Theodore Ts'o <tytso@mit.edu> > > diff --git a/check b/check > index 2fcf385f..d253f744 100755 > --- a/check > +++ b/check > @@ -472,7 +472,11 @@ _summary() > _check_filesystems() > { > if [ -f ${RESULT_DIR}/require_test ]; then > - _check_test_fs || err=true > + if ! _check_test_fs ; then > + err=true > + echo "Trying to repair broken TEST_DEV file system" > + _repair_test_fs Minor nit, need tab for indention in the if block. > + fi > rm -f ${RESULT_DIR}/require_test* > fi > if [ -f ${RESULT_DIR}/require_scratch ]; then > diff --git a/common/rc b/common/rc > index 052f67aa..ce491f3f 100644 > --- a/common/rc > +++ b/common/rc > @@ -1172,6 +1172,28 @@ _repair_scratch_fs() > esac > } > > +_repair_test_fs() > +{ Minor nit, this function has mixed tab & space for indention too, use tab for new functions. > + case $FSTYP in > + ext2|ext3|ext4) > + fsck -t $FSTYP -fy $TEST_DEV >$tmp.repair 2>&1 > + if test $? -ge 4 ; then > + echo "_repair_test_fs: couldn't repair filesystem on $device (see $seqres.full)" > + > + echo "_repair_test_fs: couldn't repair filesystem on $device" >>$seqres.full We have _log_err() to do these to echos now. > + echo "*** fsck.$FSTYP output ***" >>$seqres.full > + cat $tmp.repair >>$seqres.full > + echo "*** end fsck.$FSTYP output" >>$seqres.full > + return 1 > + fi > + return 0 > + ;; > + *) > + return 1 I think we should try to fix other filesystems too? > + ;; > + esac > +} > + And I'm wondering if a new helper function can be factored out and used in both _repair_scratch_fs and _repair_scratch_fs? One of the problems I see in doing this is that we don't have a _test_xfs_repair() counterpart right now. Thanks, Eryu > _get_pids_by_name() > { > if [ $# -ne 1 ] -- To unsubscribe from this list: send the line "unsubscribe fstests" in the body of a message to majordomo@vger.kernel.org More majordomo info at https://urldefense.proofpoint.com/v2/url?u=http-3A__vger.kernel.org_majordomo-2Dinfo.html&d=DwIBAg&c=RoP1YumCXCgaWHvlZYR8PQcxBKCX5YTpkKY057SbK10&r=-yMrTV4jriXR7ieyzzjV-QgHBD0UDw8ixoR77aMeAHE&m=0soB5gI1r50psvTdjA1KUFEWFkn9WxV3nD1owmmUN_w&s=hxbjQYhJ222LOmb3FrPhlXqW0DiWVx4j_ZNiQGQF1ok&e= ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH] check: try to fix the test device if it gets corrupted 2017-03-27 8:51 ` Eryu Guan @ 2017-07-16 1:30 ` Theodore Ts'o 2017-07-17 23:45 ` Darrick J. Wong 0 siblings, 1 reply; 14+ messages in thread From: Theodore Ts'o @ 2017-07-16 1:30 UTC (permalink / raw) To: Eryu Guan; +Cc: Darrick J. Wong, fstests On Mon, Mar 27, 2017 at 04:51:03PM +0800, Eryu Guan wrote: > > Sorry I lost this thread, I thought I've replied but apparently I didn't.. > > I agreed with both of you and Darrick, I think we can try to repair the > corrupted test fs, and if repair succeeds we can continue the test, and > stop running the whole test if repair fails. Sorry for the delay in getting back to this. Things got busy and this got dropped on my end. I've fixed the whitespace nits that you pointed out and am using _log_err. > I think we should try to fix other filesystems too? Hmm... yeah. The main reason why I hadn't was because xfs has _scratch_xfs_repair and _scratch_xfs_check, which are very similar. But _check_xfs_test_fs looks *very* different from _scratch_xfs_check, and I'm not sure why. So I've created a _repair_xfs_test_fs which is modelled after the simpler _scratch_xfs_repair function, but I'm not 100% sure that is correct. Anyways, WDYT? - Ted >From 96a13cc22878ee5c016a606d76f8e9a6bd84eb20 Mon Sep 17 00:00:00 2001 From: Theodore Ts'o <tytso@mit.edu> Date: Wed, 1 Mar 2017 19:54:08 -0500 Subject: [PATCH] check: try to fix the test device if it gets corrupted If the test device gets corrupted all subsequent tests will fail. To prevent this from causing all subsequent tests to be useless, try repair the file system on TEST_DEV if possible. We don't need to do this with the scratch device since that file system gets recreated each time anyway. Signed-off-by: Theodore Ts'o <tytso@mit.edu> --- check | 7 ++++++- common/rc | 41 +++++++++++++++++++++++++++++++++++++++++ common/xfs | 12 ++++++++++++ 3 files changed, 59 insertions(+), 1 deletion(-) diff --git a/check b/check index f8db3cd6..d89d2e91 100755 --- a/check +++ b/check @@ -476,7 +476,12 @@ _summary() _check_filesystems() { if [ -f ${RESULT_DIR}/require_test ]; then - _check_test_fs || err=true + if ! _check_test_fs ; then + err=true + echo "Trying to repair broken TEST_DEV file system" + _repair_test_fs + _test_mount + fi rm -f ${RESULT_DIR}/require_test* fi if [ -f ${RESULT_DIR}/require_scratch ]; then diff --git a/common/rc b/common/rc index 328b6b07..d37a1611 100644 --- a/common/rc +++ b/common/rc @@ -1201,6 +1201,47 @@ _repair_scratch_fs() esac } +_repair_test_fs() +{ + case $FSTYP in + xfs) + _repair_xfs_test_fs "$@" >$tmp.repair 2>&1 + res=$? + if [ "$res" -ne 0 ]; then + echo "xfs_repair returns $res; replay log?" >>$tmp.repair + _test_mount + res=$? + if [ $res -gt 0 ]; then + echo "mount returns $res; zap log?" >>$tmp.repair + _xfs_repair_test_fs -L >>$tmp.repair 2>&1 + echo "log zap returns $?" >> $tmp.repair + else + umount "$TEST_DEV" + fi + _xfs_repair_test_fs "$@" >>$tmp.repair 2>&1 + res=$? + fi + ;; + *) + # Let's hope fsck -y suffices... + fsck -t $FSTYP -fy $TEST_DEV >$tmp.repair 2>&1 + res=$? + if test "$res" -lt 4 ; then + res=0 + fi + ;; + esac + if [ $res -ne 0 ]; then + _log_err "_repair_test_fs: failed, err=$res" + echo "*** fsck.$FSTYP output ***" >>$seqres.full + cat $tmp.repair >>$seqres.full + echo "*** end fsck.$FSTYP output" >>$seqres.full + + fi + rm -f $tmp.repair + return $res +} + _get_pids_by_name() { if [ $# -ne 1 ] diff --git a/common/xfs b/common/xfs index a1ee3847..c8f4e46b 100644 --- a/common/xfs +++ b/common/xfs @@ -443,6 +443,18 @@ _check_xfs_test_fs() fi } +# modeled after _scratch_xfs_repair +_repair_xfs_test_fs() +{ + TEST_OPTIONS="" + [ "$USE_EXTERNAL" = yes -a ! -z "$TEST_LOGDEV" ] && \ + TEST_OPTIONS="-l$TEST_LOGDEV" + [ "$USE_EXTERNAL" = yes -a ! -z "$TEST_RTDEV" ] && \ + TEST_OPTIONS=$TEST_OPTIONS" -r$TEST_RTDEV" + [ "$LARGE_TEST_DEV" = yes ] && TEST_OPTIONS=$TEST_OPTIONS" -t" + $XFS_REPAIR_PROG $TEST_OPTIONS $* $TEST_DEV +} + _require_xfs_test_rmapbt() { _require_test -- 2.11.0.rc0.7.gbe5a750 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [RFC PATCH] check: try to fix the test device if it gets corrupted 2017-07-16 1:30 ` Theodore Ts'o @ 2017-07-17 23:45 ` Darrick J. Wong 2017-07-18 5:05 ` Eryu Guan 0 siblings, 1 reply; 14+ messages in thread From: Darrick J. Wong @ 2017-07-17 23:45 UTC (permalink / raw) To: Theodore Ts'o; +Cc: Eryu Guan, fstests On Sat, Jul 15, 2017 at 09:30:58PM -0400, Theodore Ts'o wrote: > On Mon, Mar 27, 2017 at 04:51:03PM +0800, Eryu Guan wrote: > > > > Sorry I lost this thread, I thought I've replied but apparently I didn't.. > > > > I agreed with both of you and Darrick, I think we can try to repair the > > corrupted test fs, and if repair succeeds we can continue the test, and > > stop running the whole test if repair fails. > > Sorry for the delay in getting back to this. Things got busy and this > got dropped on my end. > > I've fixed the whitespace nits that you pointed out and am using _log_err. > > > I think we should try to fix other filesystems too? > > Hmm... yeah. The main reason why I hadn't was because xfs has > _scratch_xfs_repair and _scratch_xfs_check, which are very similar. > But _check_xfs_test_fs looks *very* different from _scratch_xfs_check, > and I'm not sure why. _check_xfs_filesystem is a helper that does all the checking that we can do to an xfs filesystem (online fsck, the old xfs_check, and the newer xfs_repair). AFAICT that's what all new xfstest code should be calling. xfs_check is obsolete. _check_xfs_test_fs calls _check_xfs_filesystem and then, weirdly, calls a nonexistent tool calld xfs_repair_ipaths to .... I assume fix all the problems that can crop with the Irix XFS parent pointer implementation. None of that exists on Linux and Irix is long dead, so I assume the "check for ipath consistency" can go away entirely. _scratch_xfs_check calls xfs_check directly, so I think it should get replaced with _check_scratch_fs, which calls _check_xfs_filesystem. <keep scrolling> > > So I've created a _repair_xfs_test_fs which is modelled after the > simpler _scratch_xfs_repair function, but I'm not 100% sure that is > correct. > > Anyways, WDYT? > > - Ted > > From 96a13cc22878ee5c016a606d76f8e9a6bd84eb20 Mon Sep 17 00:00:00 2001 > From: Theodore Ts'o <tytso@mit.edu> > Date: Wed, 1 Mar 2017 19:54:08 -0500 > Subject: [PATCH] check: try to fix the test device if it gets corrupted > > If the test device gets corrupted all subsequent tests will fail. To > prevent this from causing all subsequent tests to be useless, try > repair the file system on TEST_DEV if possible. We don't need to do > this with the scratch device since that file system gets recreated > each time anyway. > > Signed-off-by: Theodore Ts'o <tytso@mit.edu> > --- > check | 7 ++++++- > common/rc | 41 +++++++++++++++++++++++++++++++++++++++++ > common/xfs | 12 ++++++++++++ > 3 files changed, 59 insertions(+), 1 deletion(-) > > diff --git a/check b/check > index f8db3cd6..d89d2e91 100755 > --- a/check > +++ b/check > @@ -476,7 +476,12 @@ _summary() > _check_filesystems() > { > if [ -f ${RESULT_DIR}/require_test ]; then > - _check_test_fs || err=true > + if ! _check_test_fs ; then > + err=true > + echo "Trying to repair broken TEST_DEV file system" > + _repair_test_fs > + _test_mount > + fi > rm -f ${RESULT_DIR}/require_test* > fi > if [ -f ${RESULT_DIR}/require_scratch ]; then > diff --git a/common/rc b/common/rc > index 328b6b07..d37a1611 100644 > --- a/common/rc > +++ b/common/rc > @@ -1201,6 +1201,47 @@ _repair_scratch_fs() > esac > } > > +_repair_test_fs() > +{ > + case $FSTYP in > + xfs) > + _repair_xfs_test_fs "$@" >$tmp.repair 2>&1 > + res=$? > + if [ "$res" -ne 0 ]; then > + echo "xfs_repair returns $res; replay log?" >>$tmp.repair > + _test_mount > + res=$? > + if [ $res -gt 0 ]; then > + echo "mount returns $res; zap log?" >>$tmp.repair > + _xfs_repair_test_fs -L >>$tmp.repair 2>&1 > + echo "log zap returns $?" >> $tmp.repair > + else > + umount "$TEST_DEV" > + fi > + _xfs_repair_test_fs "$@" >>$tmp.repair 2>&1 > + res=$? > + fi Structurally this all looks ok, but it's a little weird that we have _scratch_xfs_repair for the scratch device (object-verb) but _repair_test_fs (verb-object) for the test device. --D > + ;; > + *) > + # Let's hope fsck -y suffices... > + fsck -t $FSTYP -fy $TEST_DEV >$tmp.repair 2>&1 > + res=$? > + if test "$res" -lt 4 ; then > + res=0 > + fi > + ;; > + esac > + if [ $res -ne 0 ]; then > + _log_err "_repair_test_fs: failed, err=$res" > + echo "*** fsck.$FSTYP output ***" >>$seqres.full > + cat $tmp.repair >>$seqres.full > + echo "*** end fsck.$FSTYP output" >>$seqres.full > + > + fi > + rm -f $tmp.repair > + return $res > +} > + > _get_pids_by_name() > { > if [ $# -ne 1 ] > diff --git a/common/xfs b/common/xfs > index a1ee3847..c8f4e46b 100644 > --- a/common/xfs > +++ b/common/xfs > @@ -443,6 +443,18 @@ _check_xfs_test_fs() > fi > } > > +# modeled after _scratch_xfs_repair > +_repair_xfs_test_fs() > +{ > + TEST_OPTIONS="" > + [ "$USE_EXTERNAL" = yes -a ! -z "$TEST_LOGDEV" ] && \ > + TEST_OPTIONS="-l$TEST_LOGDEV" > + [ "$USE_EXTERNAL" = yes -a ! -z "$TEST_RTDEV" ] && \ > + TEST_OPTIONS=$TEST_OPTIONS" -r$TEST_RTDEV" > + [ "$LARGE_TEST_DEV" = yes ] && TEST_OPTIONS=$TEST_OPTIONS" -t" > + $XFS_REPAIR_PROG $TEST_OPTIONS $* $TEST_DEV > +} > + > _require_xfs_test_rmapbt() > { > _require_test > -- > 2.11.0.rc0.7.gbe5a750 > > -- > To unsubscribe from this list: send the line "unsubscribe fstests" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH] check: try to fix the test device if it gets corrupted 2017-07-17 23:45 ` Darrick J. Wong @ 2017-07-18 5:05 ` Eryu Guan 2017-07-18 17:30 ` Theodore Ts'o 0 siblings, 1 reply; 14+ messages in thread From: Eryu Guan @ 2017-07-18 5:05 UTC (permalink / raw) To: Darrick J. Wong, Theodore Ts'o; +Cc: fstests On Mon, Jul 17, 2017 at 04:45:08PM -0700, Darrick J. Wong wrote: > On Sat, Jul 15, 2017 at 09:30:58PM -0400, Theodore Ts'o wrote: > > On Mon, Mar 27, 2017 at 04:51:03PM +0800, Eryu Guan wrote: > > > > > > Sorry I lost this thread, I thought I've replied but apparently I didn't.. > > > > > > I agreed with both of you and Darrick, I think we can try to repair the > > > corrupted test fs, and if repair succeeds we can continue the test, and > > > stop running the whole test if repair fails. > > > > Sorry for the delay in getting back to this. Things got busy and this > > got dropped on my end. > > > > I've fixed the whitespace nits that you pointed out and am using _log_err. > > > > > I think we should try to fix other filesystems too? > > > > Hmm... yeah. The main reason why I hadn't was because xfs has > > _scratch_xfs_repair and _scratch_xfs_check, which are very similar. > > But _check_xfs_test_fs looks *very* different from _scratch_xfs_check, > > and I'm not sure why. > > _check_xfs_filesystem is a helper that does all the checking that we can > do to an xfs filesystem (online fsck, the old xfs_check, and the newer > xfs_repair). AFAICT that's what all new xfstest code should be calling. > xfs_check is obsolete. Agreed. > > _check_xfs_test_fs calls _check_xfs_filesystem and then, weirdly, calls > a nonexistent tool calld xfs_repair_ipaths to .... I assume fix all the > problems that can crop with the Irix XFS parent pointer implementation. > None of that exists on Linux and Irix is long dead, so I assume the > "check for ipath consistency" can go away entirely. Agreed. > > _scratch_xfs_check calls xfs_check directly, so I think it should get > replaced with _check_scratch_fs, which calls _check_xfs_filesystem. Agreed again :) > > <keep scrolling> > > > > > So I've created a _repair_xfs_test_fs which is modelled after the > > simpler _scratch_xfs_repair function, but I'm not 100% sure that is > > correct. > > > > Anyways, WDYT? > > > > - Ted > > > > From 96a13cc22878ee5c016a606d76f8e9a6bd84eb20 Mon Sep 17 00:00:00 2001 > > From: Theodore Ts'o <tytso@mit.edu> > > Date: Wed, 1 Mar 2017 19:54:08 -0500 > > Subject: [PATCH] check: try to fix the test device if it gets corrupted > > > > If the test device gets corrupted all subsequent tests will fail. To > > prevent this from causing all subsequent tests to be useless, try > > repair the file system on TEST_DEV if possible. We don't need to do > > this with the scratch device since that file system gets recreated > > each time anyway. > > > > Signed-off-by: Theodore Ts'o <tytso@mit.edu> > > --- > > check | 7 ++++++- > > common/rc | 41 +++++++++++++++++++++++++++++++++++++++++ > > common/xfs | 12 ++++++++++++ > > 3 files changed, 59 insertions(+), 1 deletion(-) > > > > diff --git a/check b/check > > index f8db3cd6..d89d2e91 100755 > > --- a/check > > +++ b/check > > @@ -476,7 +476,12 @@ _summary() > > _check_filesystems() > > { > > if [ -f ${RESULT_DIR}/require_test ]; then > > - _check_test_fs || err=true > > + if ! _check_test_fs ; then > > + err=true > > + echo "Trying to repair broken TEST_DEV file system" > > + _repair_test_fs Should we stop running if _repair_test_fs failed to fix TEST_DEV? I think that was what Darrick first suggested. And I think unfixable errors will cause subsequent tests to fail too. > > + _test_mount > > + fi > > rm -f ${RESULT_DIR}/require_test* > > fi > > if [ -f ${RESULT_DIR}/require_scratch ]; then > > diff --git a/common/rc b/common/rc > > index 328b6b07..d37a1611 100644 > > --- a/common/rc > > +++ b/common/rc > > @@ -1201,6 +1201,47 @@ _repair_scratch_fs() > > esac > > } > > > > +_repair_test_fs() > > +{ > > + case $FSTYP in > > + xfs) > > + _repair_xfs_test_fs "$@" >$tmp.repair 2>&1 > > + res=$? Better to declare res as a local variable. I know this was copied from _repair_scratch_fs, but better to get it improved in new code :) > > + if [ "$res" -ne 0 ]; then > > + echo "xfs_repair returns $res; replay log?" >>$tmp.repair > > + _test_mount > > + res=$? > > + if [ $res -gt 0 ]; then > > + echo "mount returns $res; zap log?" >>$tmp.repair > > + _xfs_repair_test_fs -L >>$tmp.repair 2>&1 meant _repair_xfs_test_fs here? > > + echo "log zap returns $?" >> $tmp.repair > > + else > > + umount "$TEST_DEV" > > + fi > > + _xfs_repair_test_fs "$@" >>$tmp.repair 2>&1 same here, _repair_xfs_test_fs is what really being added in common/xfs. (Though I'd like to rename it, see below.) > > + res=$? > > + fi > > Structurally this all looks ok, but it's a little weird that we have > _scratch_xfs_repair for the scratch device (object-verb) but > _repair_test_fs (verb-object) for the test device. My best guess is that _repair_scratch_fs and _repair_test_fs are helpers in a higher level, and are defined in common/rc, they deal with all filesystem types. _scratch_xfs_repair is xfs specific, and defined in common/xfs, it only deals with xfs. So the naming schema is different but they are also in different "namespace"s, which looks fine to me :) > > --D > > > + ;; > > + *) > > + # Let's hope fsck -y suffices... > > + fsck -t $FSTYP -fy $TEST_DEV >$tmp.repair 2>&1 > > + res=$? > > + if test "$res" -lt 4 ; then > > + res=0 > > + fi > > + ;; > > + esac > > + if [ $res -ne 0 ]; then > > + _log_err "_repair_test_fs: failed, err=$res" > > + echo "*** fsck.$FSTYP output ***" >>$seqres.full > > + cat $tmp.repair >>$seqres.full > > + echo "*** end fsck.$FSTYP output" >>$seqres.full > > + > > + fi > > + rm -f $tmp.repair > > + return $res > > +} > > + > > _get_pids_by_name() > > { > > if [ $# -ne 1 ] > > diff --git a/common/xfs b/common/xfs > > index a1ee3847..c8f4e46b 100644 > > --- a/common/xfs > > +++ b/common/xfs > > @@ -443,6 +443,18 @@ _check_xfs_test_fs() > > fi > > } > > > > +# modeled after _scratch_xfs_repair > > +_repair_xfs_test_fs() Like I mentioned above, this helper is meant to pair with _scratch_xfs_repair, so following the same naming schema, rename it to _test_xfs_repair? Like the _test_xfs_logprint and _scratch_xfs_logprint pair. Thanks, Eryu > > +{ > > + TEST_OPTIONS="" > > + [ "$USE_EXTERNAL" = yes -a ! -z "$TEST_LOGDEV" ] && \ > > + TEST_OPTIONS="-l$TEST_LOGDEV" > > + [ "$USE_EXTERNAL" = yes -a ! -z "$TEST_RTDEV" ] && \ > > + TEST_OPTIONS=$TEST_OPTIONS" -r$TEST_RTDEV" > > + [ "$LARGE_TEST_DEV" = yes ] && TEST_OPTIONS=$TEST_OPTIONS" -t" > > + $XFS_REPAIR_PROG $TEST_OPTIONS $* $TEST_DEV > > +} > > + > > _require_xfs_test_rmapbt() > > { > > _require_test > > -- > > 2.11.0.rc0.7.gbe5a750 > > > > -- > > To unsubscribe from this list: send the line "unsubscribe fstests" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH] check: try to fix the test device if it gets corrupted 2017-07-18 5:05 ` Eryu Guan @ 2017-07-18 17:30 ` Theodore Ts'o 2017-07-19 9:40 ` Eryu Guan 0 siblings, 1 reply; 14+ messages in thread From: Theodore Ts'o @ 2017-07-18 17:30 UTC (permalink / raw) To: Eryu Guan; +Cc: Darrick J. Wong, fstests On Tue, Jul 18, 2017 at 01:05:07PM +0800, Eryu Guan wrote: > > > if [ -f ${RESULT_DIR}/require_test ]; then > > > - _check_test_fs || err=true > > > + if ! _check_test_fs ; then > > > + err=true > > > + echo "Trying to repair broken TEST_DEV file system" > > > + _repair_test_fs > > Should we stop running if _repair_test_fs failed to fix TEST_DEV? I > think that was what Darrick first suggested. And I think unfixable > errors will cause subsequent tests to fail too. Seems reasonable to me. I'll take a look at it. > > > +_repair_test_fs() > > > +{ > > > + case $FSTYP in > > > + xfs) > > > + _repair_xfs_test_fs "$@" >$tmp.repair 2>&1 > > > + res=$? > > Better to declare res as a local variable. I know this was copied from > _repair_scratch_fs, but better to get it improved in new code :) Ack. > > Structurally this all looks ok, but it's a little weird that we have > > _scratch_xfs_repair for the scratch device (object-verb) but > > _repair_test_fs (verb-object) for the test device. > > My best guess is that _repair_scratch_fs and _repair_test_fs are helpers > in a higher level, and are defined in common/rc, they deal with all > filesystem types. > > _scratch_xfs_repair is xfs specific, and defined in common/xfs, it only > deals with xfs. So the naming schema is different but they are also in > different "namespace"s, which looks fine to me :) I also didn't like _test_repair_fs because it's not clear whether "test" is a verb or a noun. So one thing we could do is _xfs_scratch_repair and _xfs_test_repair? The other thing is how much of the cleanup in common/xfs should be segregated into a separate commit (and I'm not sure how competent I'm going to be ate doing that cleanup, but I'm willing to give it a go). - Ted ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH] check: try to fix the test device if it gets corrupted 2017-07-18 17:30 ` Theodore Ts'o @ 2017-07-19 9:40 ` Eryu Guan 2017-07-19 14:53 ` Theodore Ts'o 0 siblings, 1 reply; 14+ messages in thread From: Eryu Guan @ 2017-07-19 9:40 UTC (permalink / raw) To: Theodore Ts'o; +Cc: Darrick J. Wong, fstests On Tue, Jul 18, 2017 at 01:30:09PM -0400, Theodore Ts'o wrote: > On Tue, Jul 18, 2017 at 01:05:07PM +0800, Eryu Guan wrote: > > > > if [ -f ${RESULT_DIR}/require_test ]; then > > > > - _check_test_fs || err=true > > > > + if ! _check_test_fs ; then > > > > + err=true > > > > + echo "Trying to repair broken TEST_DEV file system" > > > > + _repair_test_fs > > > > Should we stop running if _repair_test_fs failed to fix TEST_DEV? I > > think that was what Darrick first suggested. And I think unfixable > > errors will cause subsequent tests to fail too. > > Seems reasonable to me. I'll take a look at it. > > > > > +_repair_test_fs() > > > > +{ > > > > + case $FSTYP in > > > > + xfs) > > > > + _repair_xfs_test_fs "$@" >$tmp.repair 2>&1 > > > > + res=$? > > > > Better to declare res as a local variable. I know this was copied from > > _repair_scratch_fs, but better to get it improved in new code :) > > Ack. > > > > Structurally this all looks ok, but it's a little weird that we have > > > _scratch_xfs_repair for the scratch device (object-verb) but > > > _repair_test_fs (verb-object) for the test device. > > > > My best guess is that _repair_scratch_fs and _repair_test_fs are helpers > > in a higher level, and are defined in common/rc, they deal with all > > filesystem types. > > > > _scratch_xfs_repair is xfs specific, and defined in common/xfs, it only > > deals with xfs. So the naming schema is different but they are also in > > different "namespace"s, which looks fine to me :) > > I also didn't like _test_repair_fs because it's not clear whether > "test" is a verb or a noun. > > So one thing we could do is _xfs_scratch_repair and _xfs_test_repair? Then all existing _scratch_xfs_repair calls need a rename, but I'm fine with the rename. > > The other thing is how much of the cleanup in common/xfs should be > segregated into a separate commit (and I'm not sure how competent I'm > going to be ate doing that cleanup, but I'm willing to give it a go). I think we can do minimal cleanup in this patch (like the rename of _scratch_xfs_repair to _xfs_scratch_repair) and do other cleanups in separate commits (if there's still any). Thanks! Eryu ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH] check: try to fix the test device if it gets corrupted 2017-07-19 9:40 ` Eryu Guan @ 2017-07-19 14:53 ` Theodore Ts'o 2017-07-19 15:02 ` Eryu Guan 0 siblings, 1 reply; 14+ messages in thread From: Theodore Ts'o @ 2017-07-19 14:53 UTC (permalink / raw) To: Eryu Guan; +Cc: Darrick J. Wong, fstests On Wed, Jul 19, 2017 at 05:40:47PM +0800, Eryu Guan wrote: > > > > So one thing we could do is _xfs_scratch_repair and _xfs_test_repair? Hmm, or maybe _xfs_repair_scratch and _xfs_repair_test; since at least for English speakers verb-object is probably more natural. But if you or Darrick have a strong preference I'm don't care that much. > Then all existing _scratch_xfs_repair calls need a rename, but I'm fine > with the rename. > > > > > The other thing is how much of the cleanup in common/xfs should be > > segregated into a separate commit (and I'm not sure how competent I'm > > going to be ate doing that cleanup, but I'm willing to give it a go). > > I think we can do minimal cleanup in this patch (like the rename of > _scratch_xfs_repair to _xfs_scratch_repair) and do other cleanups in > separate commits (if there's still any). If you don't mind I'll do the renames as a separate commit since that's much easier to verify/review. Cheers, - Ted ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH] check: try to fix the test device if it gets corrupted 2017-07-19 14:53 ` Theodore Ts'o @ 2017-07-19 15:02 ` Eryu Guan 2017-07-19 16:13 ` Darrick J. Wong 0 siblings, 1 reply; 14+ messages in thread From: Eryu Guan @ 2017-07-19 15:02 UTC (permalink / raw) To: Theodore Ts'o; +Cc: Darrick J. Wong, fstests On Wed, Jul 19, 2017 at 10:53:01AM -0400, Theodore Ts'o wrote: > On Wed, Jul 19, 2017 at 05:40:47PM +0800, Eryu Guan wrote: > > > > > > So one thing we could do is _xfs_scratch_repair and _xfs_test_repair? > > Hmm, or maybe _xfs_repair_scratch and _xfs_repair_test; since at least > for English speakers verb-object is probably more natural. But if you > or Darrick have a strong preference I'm don't care that much. I'm not native English speaker, you (and other native English speakers) decide :) > > > Then all existing _scratch_xfs_repair calls need a rename, but I'm fine > > with the rename. > > > > > > > > The other thing is how much of the cleanup in common/xfs should be > > > segregated into a separate commit (and I'm not sure how competent I'm > > > going to be ate doing that cleanup, but I'm willing to give it a go). > > > > I think we can do minimal cleanup in this patch (like the rename of > > _scratch_xfs_repair to _xfs_scratch_repair) and do other cleanups in > > separate commits (if there's still any). > > If you don't mind I'll do the renames as a separate commit since > that's much easier to verify/review. Sure, make sense to me. Thanks, Eryu ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH] check: try to fix the test device if it gets corrupted 2017-07-19 15:02 ` Eryu Guan @ 2017-07-19 16:13 ` Darrick J. Wong 0 siblings, 0 replies; 14+ messages in thread From: Darrick J. Wong @ 2017-07-19 16:13 UTC (permalink / raw) To: Eryu Guan; +Cc: Theodore Ts'o, fstests On Wed, Jul 19, 2017 at 11:02:29PM +0800, Eryu Guan wrote: > On Wed, Jul 19, 2017 at 10:53:01AM -0400, Theodore Ts'o wrote: > > On Wed, Jul 19, 2017 at 05:40:47PM +0800, Eryu Guan wrote: > > > > > > > > So one thing we could do is _xfs_scratch_repair and _xfs_test_repair? > > > > Hmm, or maybe _xfs_repair_scratch and _xfs_repair_test; since at least > > for English speakers verb-object is probably more natural. But if you > > or Darrick have a strong preference I'm don't care that much. > > I'm not native English speaker, you (and other native English speakers) > decide :) TBH I wish we'd call them scratchdev and testdev, e.g. _xfs_repair_scratchdev _xfs_repair_testdev So that it's plainly obvious that we're talking about a device and not the verbs scratch or test. I guess you could also argue that we're really repairing a filesystem on the device and that it should be _xfs_repair_testfs... --D > > > > > > Then all existing _scratch_xfs_repair calls need a rename, but I'm fine > > > with the rename. > > > > > > > > > > > The other thing is how much of the cleanup in common/xfs should be > > > > segregated into a separate commit (and I'm not sure how competent I'm > > > > going to be ate doing that cleanup, but I'm willing to give it a go). > > > > > > I think we can do minimal cleanup in this patch (like the rename of > > > _scratch_xfs_repair to _xfs_scratch_repair) and do other cleanups in > > > separate commits (if there's still any). > > > > If you don't mind I'll do the renames as a separate commit since > > that's much easier to verify/review. > > Sure, make sense to me. > > Thanks, > Eryu > -- > To unsubscribe from this list: send the line "unsubscribe fstests" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2017-07-19 16:13 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-03-02 23:20 [RFC PATCH] check: try to fix the test device if it gets corrupted Theodore Ts'o 2017-03-03 9:03 ` Eryu Guan 2017-03-03 17:21 ` Darrick J. Wong 2017-03-03 23:01 ` Theodore Ts'o 2017-03-27 1:48 ` Theodore Ts'o 2017-03-27 8:51 ` Eryu Guan 2017-07-16 1:30 ` Theodore Ts'o 2017-07-17 23:45 ` Darrick J. Wong 2017-07-18 5:05 ` Eryu Guan 2017-07-18 17:30 ` Theodore Ts'o 2017-07-19 9:40 ` Eryu Guan 2017-07-19 14:53 ` Theodore Ts'o 2017-07-19 15:02 ` Eryu Guan 2017-07-19 16:13 ` Darrick J. Wong
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox