* [PATCH] btrfs: add a test case to make sure scrub can repair parity corruption
@ 2023-06-30 1:36 Qu Wenruo
2023-07-22 13:24 ` Anand Jain
0 siblings, 1 reply; 3+ messages in thread
From: Qu Wenruo @ 2023-06-30 1:36 UTC (permalink / raw)
To: linux-btrfs, fstests
There is a kernel regression caused by commit 75b470332965 ("btrfs:
raid56: migrate recovery and scrub recovery path to use error_bitmap"),
which leads to scrub not repairing corrupted parity stripes.
So here we add a test case to verify the P/Q stripe scrub behavior by:
- Create a RAID5 or RAID6 btrfs with minimal amount of devices
This means 2 devices for RAID5, and 3 devices for RAID6.
This would result the parity stripe to be a mirror of the only data
stripe.
And since we have control of the content of data stripes, the content
of the P stripe is also fixed.
- Create an 64K file
The file would cover one data stripe.
- Corrupt the P stripe
- Scrub the fs
If scrub is working, the P stripe would be repaired.
Unfortunately scrub can not report any P/Q corruption, limited by its
reporting structure.
So we can not use the return value of scrub to determine if we
repaired anything.
- Verify the content of the P stripe
- Use "btrfs check --check-data-csum" to double check
By above steps, we can verify if the P stripe is properly fixed.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
tests/btrfs/294 | 86 +++++++++++++++++++++++++++++++++++++++++++++
tests/btrfs/294.out | 2 ++
2 files changed, 88 insertions(+)
create mode 100755 tests/btrfs/294
create mode 100644 tests/btrfs/294.out
diff --git a/tests/btrfs/294 b/tests/btrfs/294
new file mode 100755
index 00000000..97b85988
--- /dev/null
+++ b/tests/btrfs/294
@@ -0,0 +1,86 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (C) 2023 SUSE Linux Products GmbH. All Rights Reserved.
+#
+# FS QA Test 294
+#
+# Make sure btrfs scrub can fix parity stripe corruption
+#
+. ./common/preamble
+_begin_fstest auto quick raid scrub
+
+. ./common/filter
+
+_supported_fs btrfs
+_require_odirect
+_require_non_zoned_device "${SCRATCH_DEV}"
+_require_scratch_dev_pool 3
+_fixed_by_kernel_commit xxxxxxxxxxxx \
+ "btrfs: raid56: always verify the P/Q contents for scrub"
+
+workload()
+{
+ local profile=$1
+ local nr_devs=$2
+
+ echo "=== Testing $nr_devs devices $profile ===" >> $seqres.full
+ _scratch_dev_pool_get $nr_devs
+
+ _scratch_pool_mkfs -d $profile -m single >> $seqres.full 2>&1
+ # Disable space cache to prevent v1 space cache affecting
+ # the result.
+ _scratch_mount -o nospace_cache
+
+ # Create one 64K extent which would cover one data stripe.
+ $XFS_IO_PROG -f -d -c "pwrite -S 0xaa -b 64K 0 64K" \
+ "$SCRATCH_MNT/foobar" > /dev/null
+ sync
+
+ # Corrupt the P/Q stripe
+ local logical=$(_btrfs_get_first_logical $SCRATCH_MNT/foobar)
+
+ # The 2nd copy is pointed to P stripe directly.
+ physical_p=$(_btrfs_get_physical ${logical} 2)
+ devpath_p=$(_btrfs_get_device_path ${logical} 2)
+
+ _scratch_unmount
+
+ echo "Corrupt stripe P at devpath $devpath_p physical $physical_p" \
+ >> $seqres.full
+ $XFS_IO_PROG -d -c "pwrite -S 0xff -b 64K $physical_p 64K" $devpath_p \
+ > /dev/null
+
+ # Do a scrub to try repair the P stripe.
+ _scratch_mount -o nospace_cache
+ $BTRFS_UTIL_PROG scrub start -BdR $SCRATCH_MNT >> $seqres.full 2>&1
+ _scratch_unmount
+
+ # Verify the repaired content directly
+ local output=$($XFS_IO_PROG -c "pread -qv $physical_p 16" $devpath_p | _filter_xfs_io_offset)
+ local expect="XXXXXXXX: aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa ................"
+
+ echo "The first 16 bytes of parity stripe after scrub:" >> $seqres.full
+ echo $output >> $seqres.full
+
+ if [ "$output" != "$expect" ]; then
+ echo "Unexpected parity content"
+ echo "has:"
+ echo "$output"
+ echo "expect"
+ echo "$expect"
+ fi
+
+ # Last safenet, let btrfs check --check-data-csum to do an offline scrub.
+ $BTRFS_UTIL_PROG check --check-data-csum $SCRATCH_DEV >> $seqres.full 2>&1
+ if [ $? -ne 0 ]; then
+ echo "Error detected after the scrub"
+ fi
+ _scratch_dev_pool_put
+}
+
+workload raid5 2
+workload raid6 3
+
+echo "Silence is golden"
+status=0
+exit
diff --git a/tests/btrfs/294.out b/tests/btrfs/294.out
new file mode 100644
index 00000000..c09531de
--- /dev/null
+++ b/tests/btrfs/294.out
@@ -0,0 +1,2 @@
+QA output created by 294
+Silence is golden
--
2.39.0
^ permalink raw reply related [flat|nested] 3+ messages in thread* Re: [PATCH] btrfs: add a test case to make sure scrub can repair parity corruption
2023-06-30 1:36 [PATCH] btrfs: add a test case to make sure scrub can repair parity corruption Qu Wenruo
@ 2023-07-22 13:24 ` Anand Jain
2023-07-23 1:26 ` Qu Wenruo
0 siblings, 1 reply; 3+ messages in thread
From: Anand Jain @ 2023-07-22 13:24 UTC (permalink / raw)
To: Qu Wenruo, linux-btrfs, fstests
On 30/06/2023 09:36, Qu Wenruo wrote:
> There is a kernel regression caused by commit 75b470332965 ("btrfs:
> raid56: migrate recovery and scrub recovery path to use error_bitmap"),
> which leads to scrub not repairing corrupted parity stripes.
>
> So here we add a test case to verify the P/Q stripe scrub behavior by:
>
> - Create a RAID5 or RAID6 btrfs with minimal amount of devices
> This means 2 devices for RAID5, and 3 devices for RAID6.
> This would result the parity stripe to be a mirror of the only data
> stripe.
>
> And since we have control of the content of data stripes, the content
> of the P stripe is also fixed.
>
> - Create an 64K file
> The file would cover one data stripe.
>
> - Corrupt the P stripe
>
> - Scrub the fs
> If scrub is working, the P stripe would be repaired.
>
> Unfortunately scrub can not report any P/Q corruption, limited by its
> reporting structure.
> So we can not use the return value of scrub to determine if we
> repaired anything.
>
> - Verify the content of the P stripe
>
> - Use "btrfs check --check-data-csum" to double check
>
> By above steps, we can verify if the P stripe is properly fixed.
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> tests/btrfs/294 | 86 +++++++++++++++++++++++++++++++++++++++++++++
> tests/btrfs/294.out | 2 ++
> 2 files changed, 88 insertions(+)
> create mode 100755 tests/btrfs/294
> create mode 100644 tests/btrfs/294.out
>
> diff --git a/tests/btrfs/294 b/tests/btrfs/294
> new file mode 100755
> index 00000000..97b85988
> --- /dev/null
> +++ b/tests/btrfs/294
> @@ -0,0 +1,86 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (C) 2023 SUSE Linux Products GmbH. All Rights Reserved.
> +#
> +# FS QA Test 294
> +#
> +# Make sure btrfs scrub can fix parity stripe corruption
> +#
> +. ./common/preamble
> +_begin_fstest auto quick raid scrub
> +
> +. ./common/filter
> +
> +_supported_fs btrfs
> +_require_odirect
> +_require_non_zoned_device "${SCRATCH_DEV}"
> +_require_scratch_dev_pool 3
> +_fixed_by_kernel_commit xxxxxxxxxxxx \
> + "btrfs: raid56: always verify the P/Q contents for scrub"
> +
> +workload()
> +{
> + local profile=$1
> + local nr_devs=$2
> +
> + echo "=== Testing $nr_devs devices $profile ===" >> $seqres.full
> + _scratch_dev_pool_get $nr_devs
> +
> + _scratch_pool_mkfs -d $profile -m single >> $seqres.full 2>&1
> + # Disable space cache to prevent v1 space cache affecting
> + # the result.
> + _scratch_mount -o nospace_cache
[ 1562.331370] BTRFS error (device sdb3): cannot disable free space tree
Mount is failing for the -o nospace_cache option.
> +
> + # Create one 64K extent which would cover one data stripe.
> + $XFS_IO_PROG -f -d -c "pwrite -S 0xaa -b 64K 0 64K" \
> + "$SCRATCH_MNT/foobar" > /dev/null
> + sync
> +
> + # Corrupt the P/Q stripe
> + local logical=$(_btrfs_get_first_logical $SCRATCH_MNT/foobar)
> +
> + # The 2nd copy is pointed to P stripe directly.
> + physical_p=$(_btrfs_get_physical ${logical} 2)
> + devpath_p=$(_btrfs_get_device_path ${logical} 2)
> +
> + _scratch_unmount
> +
> + echo "Corrupt stripe P at devpath $devpath_p physical $physical_p" \
> + >> $seqres.full
> + $XFS_IO_PROG -d -c "pwrite -S 0xff -b 64K $physical_p 64K" $devpath_p \
> + > /dev/null
> +
> + # Do a scrub to try repair the P stripe.
> + _scratch_mount -o nospace_cache
> + $BTRFS_UTIL_PROG scrub start -BdR $SCRATCH_MNT >> $seqres.full 2>&1
> + _scratch_unmount
> +
> + # Verify the repaired content directly
> + local output=$($XFS_IO_PROG -c "pread -qv $physical_p 16" $devpath_p | _filter_xfs_io_offset)
> + local expect="XXXXXXXX: aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa ................"
> +
> + echo "The first 16 bytes of parity stripe after scrub:" >> $seqres.full
> + echo $output >> $seqres.full
> +
White space error here.
Thanks.
> + if [ "$output" != "$expect" ]; then > + echo "Unexpected parity content"
> + echo "has:"
> + echo "$output"
> + echo "expect"
> + echo "$expect"
> + fi
> +
> + # Last safenet, let btrfs check --check-data-csum to do an offline scrub.
> + $BTRFS_UTIL_PROG check --check-data-csum $SCRATCH_DEV >> $seqres.full 2>&1
> + if [ $? -ne 0 ]; then
> + echo "Error detected after the scrub"
> + fi
> + _scratch_dev_pool_put
> +}
> +
> +workload raid5 2
> +workload raid6 3
> +
> +echo "Silence is golden"
> +status=0
> +exit
> diff --git a/tests/btrfs/294.out b/tests/btrfs/294.out
> new file mode 100644
> index 00000000..c09531de
> --- /dev/null
> +++ b/tests/btrfs/294.out
> @@ -0,0 +1,2 @@
> +QA output created by 294
> +Silence is golden
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: [PATCH] btrfs: add a test case to make sure scrub can repair parity corruption
2023-07-22 13:24 ` Anand Jain
@ 2023-07-23 1:26 ` Qu Wenruo
0 siblings, 0 replies; 3+ messages in thread
From: Qu Wenruo @ 2023-07-23 1:26 UTC (permalink / raw)
To: Anand Jain, Qu Wenruo, linux-btrfs, fstests
On 2023/7/22 21:24, Anand Jain wrote:
>
>
> On 30/06/2023 09:36, Qu Wenruo wrote:
>> There is a kernel regression caused by commit 75b470332965 ("btrfs:
>> raid56: migrate recovery and scrub recovery path to use error_bitmap"),
>> which leads to scrub not repairing corrupted parity stripes.
>>
>> So here we add a test case to verify the P/Q stripe scrub behavior by:
>>
>> - Create a RAID5 or RAID6 btrfs with minimal amount of devices
>> This means 2 devices for RAID5, and 3 devices for RAID6.
>> This would result the parity stripe to be a mirror of the only data
>> stripe.
>>
>> And since we have control of the content of data stripes, the content
>> of the P stripe is also fixed.
>>
>> - Create an 64K file
>> The file would cover one data stripe.
>>
>> - Corrupt the P stripe
>>
>> - Scrub the fs
>> If scrub is working, the P stripe would be repaired.
>>
>> Unfortunately scrub can not report any P/Q corruption, limited by its
>> reporting structure.
>> So we can not use the return value of scrub to determine if we
>> repaired anything.
>>
>> - Verify the content of the P stripe
>>
>> - Use "btrfs check --check-data-csum" to double check
>>
>> By above steps, we can verify if the P stripe is properly fixed.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>> tests/btrfs/294 | 86 +++++++++++++++++++++++++++++++++++++++++++++
>> tests/btrfs/294.out | 2 ++
>> 2 files changed, 88 insertions(+)
>> create mode 100755 tests/btrfs/294
>> create mode 100644 tests/btrfs/294.out
>>
>> diff --git a/tests/btrfs/294 b/tests/btrfs/294
>> new file mode 100755
>> index 00000000..97b85988
>> --- /dev/null
>> +++ b/tests/btrfs/294
>> @@ -0,0 +1,86 @@
>> +#! /bin/bash
>> +# SPDX-License-Identifier: GPL-2.0
>> +# Copyright (C) 2023 SUSE Linux Products GmbH. All Rights Reserved.
>> +#
>> +# FS QA Test 294
>> +#
>> +# Make sure btrfs scrub can fix parity stripe corruption
>> +#
>> +. ./common/preamble
>> +_begin_fstest auto quick raid scrub
>> +
>> +. ./common/filter
>> +
>> +_supported_fs btrfs
>> +_require_odirect
>> +_require_non_zoned_device "${SCRATCH_DEV}"
>> +_require_scratch_dev_pool 3
>> +_fixed_by_kernel_commit xxxxxxxxxxxx \
>> + "btrfs: raid56: always verify the P/Q contents for scrub"
>> +
>> +workload()
>> +{
>> + local profile=$1
>> + local nr_devs=$2
>> +
>> + echo "=== Testing $nr_devs devices $profile ===" >> $seqres.full
>> + _scratch_dev_pool_get $nr_devs
>> +
>> + _scratch_pool_mkfs -d $profile -m single >> $seqres.full 2>&1
>> + # Disable space cache to prevent v1 space cache affecting
>> + # the result.
>
>
>> + _scratch_mount -o nospace_cache
>
> [ 1562.331370] BTRFS error (device sdb3): cannot disable free space tree
>
>
> Mount is failing for the -o nospace_cache option.
Indeed, I guess this is caused by block group tree feature.
The original intention is just to prevent v1 cache, considering v2 is
required by several features, I'd switch it to explicitly require v2
cache instead.
Thanks,
Qu
>
>> +
>> + # Create one 64K extent which would cover one data stripe.
>> + $XFS_IO_PROG -f -d -c "pwrite -S 0xaa -b 64K 0 64K" \
>> + "$SCRATCH_MNT/foobar" > /dev/null
>> + sync
>> +
>> + # Corrupt the P/Q stripe
>> + local logical=$(_btrfs_get_first_logical $SCRATCH_MNT/foobar)
>> +
>> + # The 2nd copy is pointed to P stripe directly.
>> + physical_p=$(_btrfs_get_physical ${logical} 2)
>> + devpath_p=$(_btrfs_get_device_path ${logical} 2)
>> +
>> + _scratch_unmount
>> +
>> + echo "Corrupt stripe P at devpath $devpath_p physical $physical_p" \
>> + >> $seqres.full
>> + $XFS_IO_PROG -d -c "pwrite -S 0xff -b 64K $physical_p 64K"
>> $devpath_p \
>> + > /dev/null
>> +
>> + # Do a scrub to try repair the P stripe.
>> + _scratch_mount -o nospace_cache
>> + $BTRFS_UTIL_PROG scrub start -BdR $SCRATCH_MNT >> $seqres.full 2>&1
>> + _scratch_unmount
>> +
>> + # Verify the repaired content directly
>> + local output=$($XFS_IO_PROG -c "pread -qv $physical_p 16"
>> $devpath_p | _filter_xfs_io_offset)
>> + local expect="XXXXXXXX: aa aa aa aa aa aa aa aa aa aa aa aa aa
>> aa aa aa ................"
>> +
>> + echo "The first 16 bytes of parity stripe after scrub:" >>
>> $seqres.full
>
>> + echo $output >> $seqres.full
>> +
>
> White space error here.
>
> Thanks.
>
>> + if [ "$output" != "$expect" ]; then > + echo "Unexpected
>> parity content"
>> + echo "has:"
>> + echo "$output"
>> + echo "expect"
>> + echo "$expect"
>> + fi
>> +
>> + # Last safenet, let btrfs check --check-data-csum to do an
>> offline scrub.
>> + $BTRFS_UTIL_PROG check --check-data-csum $SCRATCH_DEV >>
>> $seqres.full 2>&1
>> + if [ $? -ne 0 ]; then
>> + echo "Error detected after the scrub"
>> + fi
>> + _scratch_dev_pool_put
>> +}
>> +
>> +workload raid5 2
>> +workload raid6 3
>> +
>> +echo "Silence is golden"
>> +status=0
>> +exit
>> diff --git a/tests/btrfs/294.out b/tests/btrfs/294.out
>> new file mode 100644
>> index 00000000..c09531de
>> --- /dev/null
>> +++ b/tests/btrfs/294.out
>> @@ -0,0 +1,2 @@
>> +QA output created by 294
>> +Silence is golden
>
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2023-07-23 1:26 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-30 1:36 [PATCH] btrfs: add a test case to make sure scrub can repair parity corruption Qu Wenruo
2023-07-22 13:24 ` Anand Jain
2023-07-23 1:26 ` Qu Wenruo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox