linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Zorro Lang <zlang@redhat.com>, Qu Wenruo <wqu@suse.com>
Cc: linux-btrfs@vger.kernel.org, fstests@vger.kernel.org
Subject: Re: [PATCH] fstests: add test case to make sure btrfs can handle one corrupted device
Date: Wed, 27 Jul 2022 13:03:09 +0800	[thread overview]
Message-ID: <8fe19e4b-11d2-d35f-5490-502db3b32da2@gmx.com> (raw)
In-Reply-To: <20220727045736.pa46d55t67g7jwwl@zlang-mailbox>



On 2022/7/27 12:57, Zorro Lang wrote:
> On Tue, Jul 26, 2022 at 02:29:48PM +0800, Qu Wenruo wrote:
>> The new test case will verify that btrfs can handle one corrupted device
>> without affecting the consistency of the filesystem.
>>
>> Unlike a missing device, one corrupted device can return garbage to the fs,
>> thus btrfs has to utilize its data/metadata checksum to verify which
>> data is correct.
>>
>> The test case will:
>>
>> - Create a small fs
>>    Mostly to speedup the test
>>
>> - Fill the fs with a regular file
>>
>> - Use fsstress to create some contents
>>
>> - Save the fssum for later verification
>>
>> - Corrupt one device with garbage but keep the primary superblock
>>    untouched
>>
>> - Run fssum verification
>>
>> - Run scrub to fix the fs
>>
>> - Run scrub again to make sure the fs is fine
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>   tests/btrfs/261     | 94 +++++++++++++++++++++++++++++++++++++++++++++
>>   tests/btrfs/261.out |  2 +
>>   2 files changed, 96 insertions(+)
>>   create mode 100755 tests/btrfs/261
>>   create mode 100644 tests/btrfs/261.out
>>
>> diff --git a/tests/btrfs/261 b/tests/btrfs/261
>> new file mode 100755
>> index 00000000..15218e28
>> --- /dev/null
>> +++ b/tests/btrfs/261
>> @@ -0,0 +1,94 @@
>> +#! /bin/bash
>> +# SPDX-License-Identifier: GPL-2.0
>> +# Copyright (C) 2022 SUSE Linux Products GmbH. All Rights Reserved.
>> +#
>> +# FS QA Test 261
>> +#
>> +# Make sure btrfs raid profiles can handling one corrupted device
>> +# without affecting the consistency of the fs.
>> +#
>> +. ./common/preamble
>> +_begin_fstest raid
>
> Do you think about add it into auto group, or more groups?

Oh all my bad, was planning to add to auto later but forgot.

>
>> +
>> +. ./common/filter
>> +. ./common/populate
>> +
>> +_supported_fs btrfs
>> +_require_scratch_dev_pool 4
>> +_require_fssum
>> +
>> +prepare_fs()
>> +{
>> +	local profile=$1
>> +
>> +	# We don't want too large fs which can take too long to populate
>> +	# And the extra redirection of stderr is to avoid the RAID56 warning
>> +	# message to polluate the golden output
>> +	_scratch_pool_mkfs -m $profile -d $profile -b 1G >> $seqres.full 2>&1
>> +	if [ $? -ne 0 ]; then
>> +		echo "mkfs $mkfs_opts failed"
>> +		return
>> +	fi
>
> If mkfs fails, below workload() will keep running, I think it's not what you
> want, right?

Right, it should call _fail() instead.

>
>> +
>> +	# Disable compression, as compressed read repair is known to have problems
>> +	_scratch_mount -o compress=no
>> +
>> +	# Fill some part of the fs first
>> +	$XFS_IO_PROG -f -c "pwrite -S 0xfe 0 400M" $SCRATCH_MNT/garbage > /dev/null 2>&1
>> +
>> +	# Then use fsstress to generate some extra contents.
>> +	# Disable setattr related operations, as it may set NODATACOW which will
>> +	# not allow us to use btrfs checksum to verify the content.
>> +	$FSSTRESS_PROG -f setattr=0 -d $SCRATCH_MNT -w -n 3000 > /dev/null 2>&1
>> +	sync
>> +
>> +	# Save the fssum of this fs
>> +	$FSSUM_PROG -A -f -w $tmp.saved_fssum $SCRATCH_MNT
>> +	$BTRFS_UTIL_PROG fi show $SCRATCH_MNT >> $seqres.full
>> +	_scratch_unmount
>> +}
>> +
>> +workload()
>> +{
>> +	local target=$(echo $SCRATCH_DEV_POOL | $AWK_PROG '{print $1}')
>
> In common/config, we always set SCRATCH_DEV to the first of $SCRATCH_DEV_POOL,
> as below:
>
>          # a btrfs tester will set only SCRATCH_DEV_POOL, we will put first of its dev
>          # to SCRATCH_DEV and rest to SCRATCH_DEV_POOL to maintain the backward compatibility
>          if [ ! -z "$SCRATCH_DEV_POOL" ]; then
>                  if [ ! -z "$SCRATCH_DEV" ]; then
>                          echo "common/config: Error: \$SCRATCH_DEV ($SCRATCH_DEV) should be unset when \$SCRATCH_DEV_POOL ($SCRATCH_DEV_POOL) is set"
>                          exit 1
>                  fi
>                  SCRATCH_DEV=`echo $SCRATCH_DEV_POOL | awk '{print $1}'`
>
> is that help?

Thanks, then I can use SCRATCH_DEV directly.

>
>> +	local profile=$1
>> +	local num_devs=$2
>> +
>> +	_scratch_dev_pool_get $num_devs
>> +	echo "=== Testing profile $profile ===" >> $seqres.full
>> +	rm -f -- $tmp.saved_fssum
>> +	prepare_fs $profile
>> +
>> +	# Corrupt the target device, only keep the superblock.
>> +	$XFS_IO_PROG -c "pwrite 1M 1023M" $target > /dev/null 2>&1
>
> Do you need "_require_scratch_nocheck", if you corrupt the fs purposely?

I don't think we need that.

The point here is, we will immediately mount the fs and do a RW scrub,
which should make all devices good.

Thus if fsck after the test case found something wrong, it means the RW
scrub doesn't properly repair the fs.

Thanks for the review,
Qu

>
> But I think it won't check the SCRATCH_DEV currently, due to you even don't use
> _require_scratch, and the _require_scratch_dev_pool might not trigger a fsck at
> the end of the testing.
>
> Hmm... I prefer having a "_require_scratch_nocheck", and you can use $SCRATCH_DEV
> to replace your "$target".
>
>> +
>> +	_scratch_mount
>> +
>> +	# All content should be fine
>> +	$FSSUM_PROG -r $tmp.saved_fssum $SCRATCH_MNT > /dev/null
>> +
>> +	# Scrub to fix the fs, this is known to report various correctable
>> +	# errors
>> +	$BTRFS_UTIL_PROG scrub start -B $SCRATCH_MNT >> $seqres.full 2>&1
>> +
>> +	# Make sure above scrub fixed the fs
>> +	$BTRFS_UTIL_PROG scrub start -Br $SCRATCH_MNT >> $seqres.full
>> +	if [ $? -ne 0 ]; then
>> +		echo "scrub failed to fix the fs for profile $profile"
>> +	fi
>> +	_scratch_unmount
>> +	_scratch_dev_pool_put
>> +}
>> +
>> +workload raid1 2
>> +workload raid1c3 3
>> +workload raid1c4 4
>> +workload raid10 4
>> +workload raid5 3
>> +workload raid6 4
>> +
>> +echo "Silence is golden"
>> +
>> +# success, all done
>> +status=0
>> +exit
>> diff --git a/tests/btrfs/261.out b/tests/btrfs/261.out
>> new file mode 100644
>> index 00000000..679ddc0f
>> --- /dev/null
>> +++ b/tests/btrfs/261.out
>> @@ -0,0 +1,2 @@
>> +QA output created by 261
>> +Silence is golden
>> --
>> 2.36.1
>>
>

      reply	other threads:[~2022-07-27  5:03 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-26  6:29 [PATCH] fstests: add test case to make sure btrfs can handle one corrupted device Qu Wenruo
2022-07-26 23:09 ` Boris Burkov
2022-07-27  1:39   ` Qu Wenruo
2022-07-27  2:49     ` Boris Burkov
2022-07-27  4:57 ` Zorro Lang
2022-07-27  5:03   ` Qu Wenruo [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8fe19e4b-11d2-d35f-5490-502db3b32da2@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=fstests@vger.kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=wqu@suse.com \
    --cc=zlang@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).