From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mx1.redhat.com ([209.132.183.28]:33650 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752457AbdC0Ivf (ORCPT ); Mon, 27 Mar 2017 04:51:35 -0400 Date: Mon, 27 Mar 2017 16:51:03 +0800 From: Eryu Guan Subject: Re: [RFC PATCH] check: try to fix the test device if it gets corrupted Message-ID: <20170327085103.GQ14226@eguan.usersys.redhat.com> References: <20170302232050.31125-1-tytso@mit.edu> <20170303090332.GP14226@eguan.usersys.redhat.com> <20170303172157.GA5070@birch.djwong.org> <20170303230129.apzqe77r4d5jtf63@thunk.org> <20170327014802.wc5as2tdgecy3rzu@thunk.org> Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170327014802.wc5as2tdgecy3rzu@thunk.org> Sender: fstests-owner@vger.kernel.org Content-Transfer-Encoding: Quoted-printable MIME-Version: 1.0 To: Theodore Ts'o Cc: "Darrick J. Wong" , fstests@vger.kernel.org List-ID: Hi Ted, On Sun, Mar 26, 2017 at 09:48:02PM -0400, Theodore Ts'o wrote: > On Fri, Mar 03, 2017 at 06:01:29PM -0500, Theodore Ts'o wrote: > > On Fri, Mar 03, 2017 at 09:21:57AM -0800, Darrick J. Wong wrote: > > >=20 > > > The test device isn't supposed to get corrupted, since it (at > > > least in theory) should be an old filesystem. That said, I suppose > > > there's little point in banging around with a corrupt test fs. Maybe= we > > > could go further and stop running if there's unfixable corruption? > >=20 > > Yes, that was the other alternative I considered. In my case, though, > > since I'm trying to get a sense of how many failures I have to deal > > with, I really wanted a "make -k" behavior that would continue after > > the first failure. After all, all I was going to do was manually run > > fsck, and then continue the run --- so we might as well have the check > > script do it automatically and then allow things to continue. > >=20 > > We could make it be configurable, via a command-line option. The -k > > option isn't taken so we could have check -k that works like make -k > > if you think that's better. OTOH, perhaps making -k the default > > behaviour is actually the better way to go, and in that case, maybe > > it's not worth having the command-line flag? >=20 > Eryu, do you have any preferences or comments about how you'd like me > to modify this patch for upstreaming? (Attached is my current version > of the patch). >=20 > Thanks!! Sorry I lost this thread, I thought I've replied but apparently I didn't.. I agreed with both of you and Darrick, I think we can try to repair the corrupted test fs, and if repair succeeds we can continue the test, and stop running the whole test if repair fails. >=20 > - Ted >=20 > commit 727c737d1f0a40288fc897c0263fbf8e7a5db8b3 > Author: Theodore Ts'o > Date: Wed Mar 1 19:54:08 2017 -0500 >=20 > check: try to fix the test device if it gets corrupted >=20=20=20=20=20 > If the test device gets corrupted all subsequent tests will fail. To > prevent this from causing all subsequent tests to be useless, try > repair the file system on TEST_DEV if possible. We don't need to do > this with the scratch device since that file system gets recreated > each time anyway. >=20=20=20=20=20 > Signed-off-by: Theodore Ts'o >=20 > diff --git a/check b/check > index 2fcf385f..d253f744 100755 > --- a/check > +++ b/check > @@ -472,7 +472,11 @@ _summary() > _check_filesystems() > { > if [ -f ${RESULT_DIR}/require_test ]; then > - _check_test_fs || err=3Dtrue > + if ! _check_test_fs ; then > + err=3Dtrue > + echo "Trying to repair broken TEST_DEV file system" > + _repair_test_fs Minor nit, need tab for indention in the if block. > + fi > rm -f ${RESULT_DIR}/require_test* > fi > if [ -f ${RESULT_DIR}/require_scratch ]; then > diff --git a/common/rc b/common/rc > index 052f67aa..ce491f3f 100644 > --- a/common/rc > +++ b/common/rc > @@ -1172,6 +1172,28 @@ _repair_scratch_fs() > esac > } >=20=20 > +_repair_test_fs() > +{ Minor nit, this function has mixed tab & space for indention too, use tab for new functions. > + case $FSTYP in > + ext2|ext3|ext4) > + fsck -t $FSTYP -fy $TEST_DEV >$tmp.repair 2>&1 > + if test $? -ge 4 ; then > + echo "_repair_test_fs: couldn't repair filesystem on $device (see $= seqres.full)" > + > + echo "_repair_test_fs: couldn't repair filesystem on $device" >>$se= qres.full We have _log_err() to do these to echos now. > + echo "*** fsck.$FSTYP output ***" >>$seqres.full > + cat $tmp.repair >>$seqres.full > + echo "*** end fsck.$FSTYP output" >>$seqres.full > + return 1 > + fi > + return 0 > + ;; > + *) > + return 1 I think we should try to fix other filesystems too? > + ;; > + esac > +} > + And I'm wondering if a new helper function can be factored out and used in both _repair_scratch_fs and _repair_scratch_fs? One of the problems I see in doing this is that we don't have a _test_xfs_repair() counterpart right now. Thanks, Eryu > _get_pids_by_name() > { > if [ $# -ne 1 ] -- To unsubscribe from this list: send the line "unsubscribe fstests" in the body of a message to majordomo@vger.kernel.org More majordomo info at https://urldefense.proofpoint.com/v2/url?u=3Dhttp-3= A__vger.kernel.org_majordomo-2Dinfo.html&d=3DDwIBAg&c=3DRoP1YumCXCgaWHvlZYR= 8PQcxBKCX5YTpkKY057SbK10&r=3D-yMrTV4jriXR7ieyzzjV-QgHBD0UDw8ixoR77aMeAHE&m= =3D0soB5gI1r50psvTdjA1KUFEWFkn9WxV3nD1owmmUN_w&s=3DhxbjQYhJ222LOmb3FrPhlXqW= 0DiWVx4j_ZNiQGQF1ok&e=3D=20