public inbox for fstests@vger.kernel.org
 help / color / mirror / Atom feed
From: Eryu Guan <eguan@redhat.com>
To: Dave Chinner <david@fromorbit.com>, Brian Foster <bfoster@redhat.com>
Cc: xiao yang <yangx.jy@cn.fujitsu.com>,
	fstests@vger.kernel.org, darrick.wong@oracle.com
Subject: Re: [PATCH v2 2/2] xfs: Regression test for invalid sb_logsunit
Date: Tue, 16 Jan 2018 12:02:54 +0800	[thread overview]
Message-ID: <20180116040254.GE3102@eguan.usersys.redhat.com> (raw)
In-Reply-To: <20180115210352.GY16421@dastard>

On Tue, Jan 16, 2018 at 08:03:52AM +1100, Dave Chinner wrote:
> On Mon, Jan 15, 2018 at 07:45:23AM -0500, Brian Foster wrote:
> > On Sat, Jan 13, 2018 at 01:23:53PM +1100, Dave Chinner wrote:
> > > On Fri, Jan 12, 2018 at 04:50:05PM +0800, Eryu Guan wrote:
> > > > On Fri, Jan 12, 2018 at 07:36:05PM +1100, Dave Chinner wrote:
> > > > > On Fri, Jan 12, 2018 at 03:49:21PM +0800, Eryu Guan wrote:
> > > > > > On Fri, Jan 12, 2018 at 02:16:23PM +0800, xiao yang wrote:
> > > > > > > If log stripe unit isn't a multiple of the fs blocksize and mounting,
> > > > > > > the invalid sb_logsunit leads to crash as soon as we try to write to
> > > > > > > the log.
> > > > > > > 
> > > > > > > Signed-off-by: xiao yang <yangx.jy@cn.fujitsu.com>
> > > > > > > ---
> > > > > > >  tests/xfs/437     | 73 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > > > > > >  tests/xfs/437.out |  2 ++
> > > > > > >  tests/xfs/group   |  1 +
> > > > > > >  3 files changed, 76 insertions(+)
> > > > > > >  create mode 100755 tests/xfs/437
> > > > > > >  create mode 100644 tests/xfs/437.out
> > > > > > > 
> > > > > > > diff --git a/tests/xfs/437 b/tests/xfs/437
> > > > > > > new file mode 100755
> > > > > > > index 0000000..f2b84ad
> > > > > > > --- /dev/null
> > > > > > > +++ b/tests/xfs/437
> > > > > > > @@ -0,0 +1,73 @@
> > > > > > > +#! /bin/bash
> > > > > > > +# FS QA Test No. 437
> > > > > > > +#
> > > > > > > +# Regression test for commit:
> > > > > > > +# 9c92ee2 ("xfs: validate sb_logsunit is a multiple of the fs blocksize")
> > > > > > > +#
> > > > > > > +# If log stripe unit isn't a multiple of the fs blocksize and mounting,
> > > > > > > +# the invalid sb_logsunit leads to crash as soon as we try to write to
> > > > > > > +# the log.
> > > > > > > +#
> > > > > > > +#-----------------------------------------------------------------------
> > > > > > > +# Copyright (c) 2018 FUJITSU.  All Rights Reserved.
> > > > > > > +# Author: Xiao Yang <yangx.jy@cn.fujitsu.com>
> > > > > > > +#
> > > > > > > +# This program is free software; you can redistribute it and/or
> > > > > > > +# modify it under the terms of the GNU General Public License as
> > > > > > > +# published by the Free Software Foundation.
> > > > > > > +#
> > > > > > > +# This program is distributed in the hope that it would be useful,
> > > > > > > +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > > > > > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > > > > > > +# GNU General Public License for more details.
> > > > > > > +#
> > > > > > > +# You should have received a copy of the GNU General Public License
> > > > > > > +# along with this program; if not, write the Free Software Foundation,
> > > > > > > +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> > > > > > > +#-----------------------------------------------------------------------
> > > > > > > +
> > > > > > > +seq=`basename "$0"`
> > > > > > > +seqres="$RESULT_DIR/$seq"
> > > > > > > +echo "QA output created by $seq"
> > > > > > > +
> > > > > > > +here=`pwd`
> > > > > > > +tmp=/tmp/$$
> > > > > > > +status=1    # failure is the default!
> > > > > > > +trap "_cleanup; exit \$status" 0 1 2 3 15
> > > > > > > +
> > > > > > > +_cleanup()
> > > > > > > +{
> > > > > > > +	rm -rf $tmp.*
> > > > > > > +}
> > > > > > > +
> > > > > > > +# get standard environment and checks
> > > > > > > +. ./common/rc
> > > > > > > +
> > > > > > > +# real QA test starts here
> > > > > > > +_supported_os Linux
> > > > > > > +_supported_fs xfs
> > > > > > > +_require_scratch
> > > > > > 
> > > > > > This test triggers ASSERT failure and warning on debug build, thus
> > > > > > failed _dmesg_check, I think we need _disable_dmesg_check (and some
> > > > > > comments) too.
> > > > > > 
> > > > > > [1809960.157615] XFS (sda6): log stripe unit 4095 bytes must be a multiple of block size
> > > > > > [1809960.159228] XFS (sda6): AAIEEE! Log failed size checks. Abort!
> > > > > > [1809960.160534] XFS: Assertion failed: 0, file: fs/xfs/xfs_log.c, line: 679
> > > > > > [1809960.162298] WARNING: CPU: 1 PID: 2186 at fs/xfs/xfs_message.c:105 asswarn+0x1e/0x30 [xfs]
> > > > > 
> > > > > If it triggers debug asserts which have been put there to catch bugs
> > > > > in other utilities (like mkfs) on a current upstream debug kernel,
> > > > > then the test should not be part of the auto and quick groups.
> > > > 
> > > > This test corrupts the filesystem on purpose (I didn't make this clear
> > > > in my last reply, sorry about that), and xfs detects the corruption &
> > > > refuses the mount instead of crashing. So I think the ASSERT failure is
> > > > expected on debug build and we just need to ignore it.
> > > 
> > > Except that after that assert, the block device is now busy and can't
> > > be released, so we can't use the scratch device anymore. All tests
> > > after this assert has been triggered are going to fail because
> > > the block device is busy....
> > > 
> > > That's the whole point of adding debug asserts in cases like this -
> > > they are supposed to stop test execution in it's tracks and leave a
> > > corpse to analyse. The auto group regression tests are not supposed
> > > to take the machine down on normal test configs (i.e.
> > > CONFIG_XFS_DEBUG=y).
> > > 
> > 
> > FWIW, my understanding of the dangerous group has always been that it's
> > for tests that when they trigger a regression, forcibly affect the
> > entire system as such (lockup, hang, crash, etc.). IMO, a test that

I had the same understanding of dangerous group. And I recommended the
usage of "-g auto -x dangerous" before[1], and Dave acknowledged this
dangerous group usage[2] :)

[1] https://www.spinics.net/lists/linux-btrfs/msg57312.html
[2] https://www.spinics.net/lists/linux-btrfs/msg57330.html

> > intentionally generates an XFS assert doesn't really follow the spirit
> > of that categorization, even though technically an assert can cause a
> > BUG. This is why we have CONFIG_XFS_ASSERT_FATAL now, because unless
> > that is disabled any auto test that reproduces a regression (assuming we
> > have broad/robust assert coverage) can very easily BUG() and have the
> > associated effect on the system.
> 
> And this is exactly why I opposed the CONFIG_XFS_ASSERT_FATAL
> inclusion. It started us down the slippery slope of allowing people
> to ignore asserts that indicate an impossible condition had
> occurred.
> 
> This was a bug found via fuzzing, not user problems in the wild. The
> assert had functioned perfectly well up to this point as a mechanism
> for preventing the XFS tools from creating a corrupt log stripe
> unit, and it still does....
> 
> That's the point here - a corrupt log stripe unit *should never
> happen* because it should trigger a CRC validation failure long
> before we get to parsing it. If we've got a corrupt value with a
> correct CRC, then we damn well need to understand the cause
> *immediately* because something else has gone very, very wrong.
> 
> > The flipside of course is that one should generally be able to run
> > xfstests against a FATAL_ASSERTS=y kernel that otherwise has no
> > regressions and not bork the system because a test generates a BUG() as
> > part of a successful run.
> 
> That is the historical behaviour of CONFIG_XFS_DEBUG=y and the
> historical policy for xfstests. Adding CONFIG_XFS_ASSERT_FATAL does
> not change that because (at least) some of use still treat asserts
> as fatal.
> 
> > Perhaps the root problem here is that the
> > kernel generates an assert from a codepath that "successfully" handles
> > the erroneous situation.
> 
> We do that all over the place - the assert is there because it
> indicates a fatal problem has occurred that should never happen.
> For production systems, we still have to handle that gracefully
> because people do fuzz testing of "should never happen" conditions
> and then get upset we fail to detect this. That does not mean
> the ASSERT is wrong - it makes it even more important that
> developers know when their code hits these cases....
> 
> If we really want to test these "should not ever happen" conditions
> that trigger asserts on debug kernels as "everyone always runs"
> regression testing, then these need to _notrun on CONFIG_XFS_DEBUG=y
> kernels.  fstests already knows that it's running on a debug kernel,
> so adding a _requires_production_kernel check may be the way around
> this being considered a dangerous test.

This reminds that we already have a _require_no_xfs_debug rule, as used
in xfs/115, which is known to trigger ASSERT failure on debug build. So
we can do the same in this new test.

Thanks,
Eryu

> 
> > We already spit out several error messages and
> > fail the mount, so my .02 (fwiw) would be to leave the test as
> > auto+dangerous, delete the useless assert and let people affected by
> > older kernels (w/ FATAL_ASSERT behavior) filter out the test as
> > appropriate in those environments. (Looking back, I see Darrick already
> > suggested to delete the assert as well...).
> 
> ... and this specific assert caught multiple bugs I introduced while
> refactoring mkfs recently. That's it's purpose, and it works for
> that purpose very well.
> 
> > > If you want to test dangerous regression tests, then you need to
> > > *opt-in* by adding "-g dangerous", not force everyone else to
> > > opt-out by having to run "-g auto -x dangerous".
> > > 
> > 
> > I often explicitly filter out the dangerous group as above because we
> > already have more than a handful of tests that are in both groups. I
> > don't think they've historically been mutually exclusive.
> 
> Which means we've already deviated from the historic policy. That
> doesn't mean it the right direction to take.
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2018-01-16  4:02 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-11  8:25 [PATCH] xfs: Regression test for invalid sb_logsunit xiao yang
2018-01-11 18:07 ` Darrick J. Wong
2018-01-12  1:58   ` Xiao Yang
2018-01-12  6:14   ` [PATCH v2] syscalls/madvise09.c: Use custom mount point instead of /sys/fs/cgroup/memory xiao yang
2018-01-12  6:14     ` [PATCH v2] xfs: Regression test for invalid sb_logsunit xiao yang
2018-01-12  6:19     ` [PATCH v2] syscalls/madvise09.c: Use custom mount point instead of /sys/fs/cgroup/memory Xiao Yang
2018-01-12  6:16   ` [PATCH v2 1/2] common/xfs: Check if write supports [-c|-d] option in xfs_db xiao yang
2018-01-12  6:16     ` [PATCH v2 2/2] xfs: Regression test for invalid sb_logsunit xiao yang
2018-01-12  7:49       ` Eryu Guan
2018-01-12  8:36         ` Dave Chinner
2018-01-12  8:50           ` Eryu Guan
2018-01-12 16:41             ` Darrick J. Wong
2018-01-13  2:23             ` Dave Chinner
2018-01-15  6:29               ` Eryu Guan
2018-01-15  7:48                 ` [PATCH v3 1/3] common/xfs: Check if write supports [-c|-d] option in xfs_db xiao yang
2018-01-15  7:48                   ` [PATCH v3 2/3] common/filter: factor out expected XFS warnings for mount xiao yang
2018-01-15  7:48                   ` [PATCH v3 3/3] xfs: Regression test for invalid sb_logsunit xiao yang
2018-01-15 12:45               ` [PATCH v2 2/2] " Brian Foster
2018-01-15 21:03                 ` Dave Chinner
2018-01-16  4:02                   ` Eryu Guan [this message]
2018-01-16  6:41                     ` Xiao Yang
2018-01-16  7:26                     ` [PATCH v4 1/3] common/xfs: Check if write supports [-c|-d] option in xfs_db xiao yang
2018-01-16  7:26                       ` [PATCH v4 2/3] common/filter: factor out expected XFS warnings for mount xiao yang
2018-01-18  8:48                         ` Eryu Guan
2018-01-18  8:56                           ` Xiao Yang
2018-01-16  7:26                       ` [PATCH v4 3/3] xfs: Regression test for invalid sb_logsunit xiao yang
2018-01-18  8:46                         ` Eryu Guan
2018-01-18 10:49                           ` [PATCH v5 1/3] common/xfs: Check if write supports [-c|-d] option in xfs_db xiao yang
2018-01-18 10:49                             ` [PATCH v5 2/3] common/filter: Factor out expected XFS warnings for assert xiao yang
2018-01-18 18:29                               ` Darrick J. Wong
2018-01-19  2:51                                 ` Eryu Guan
2018-01-19  4:04                                 ` Xiao Yang
2018-01-19  5:38                                 ` [PATCH v6 2/3] common: Add _require_no_xfs_bug_on_assert && Factor out filter_xfs_dmesg xiao yang
2018-01-19  5:38                                   ` [PATCH v6 3/3] xfs: Regression test for invalid sb_logsunit xiao yang
2018-01-18 10:49                             ` [PATCH v5 " xiao yang
2018-01-18 18:19                             ` [PATCH v5 1/3] common/xfs: Check if write supports [-c|-d] option in xfs_db Darrick J. Wong
2018-01-16  8:50                     ` [PATCH v2 2/2] xfs: Regression test for invalid sb_logsunit Dave Chinner
2018-01-16 14:09                       ` Brian Foster
2018-01-18  8:44                         ` Eryu Guan
2018-01-16 13:58                   ` Brian Foster
2018-01-12  7:44     ` [PATCH v2 1/2] common/xfs: Check if write supports [-c|-d] option in xfs_db Eryu Guan
2018-01-12 16:43     ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180116040254.GE3102@eguan.usersys.redhat.com \
    --to=eguan@redhat.com \
    --cc=bfoster@redhat.com \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=fstests@vger.kernel.org \
    --cc=yangx.jy@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox