[PATCH] btrfs: add test case to verify the behavior with large RAID0 data chunks

FS/XFS testing framework
 help / color / mirror / Atom feed

* [PATCH] btrfs: add test case to verify the behavior with large RAID0 data chunks
@ 2023-06-21  1:51 Qu Wenruo
  2023-06-21  5:47 ` Anand Jain
  0 siblings, 1 reply; 5+ messages in thread
From: Qu Wenruo @ 2023-06-21  1:51 UTC (permalink / raw)
  To: linux-btrfs, fstests

There is a recent regression during v6.4 merge window, that a u32 left
shift overflow can cause problems with large data chunks (over 4G
sized).

This is the regression test case for it.

The test case itself would:

- Create a RAID0 chunk with a single 6G data chunk
  This requires at least 6 devices from SCRATCH_DEV_POOL, and each
  should be larger than 2G.

- Fill the fs with 5G data

- Make sure everything is fine
  Including the data csums.

- Delete half of the data

- Do a fstrim
  This would lead to out-of-boundary trim if the kernel is not patched.

- Make sure everything is fine again
  If not patched, we may have corrupted data due to the bad fstrim
  above.

For now, this test case only checks the behavior for RAID0.
As for RAID10, we need 12 devices, which is out-of-reach for a lot of
test envionrments.

For RAID56, they would have a different test case, as they don't support
discard inside the RAID56 chunks.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 tests/btrfs/292     | 83 +++++++++++++++++++++++++++++++++++++++++++++
 tests/btrfs/292.out |  2 ++
 2 files changed, 85 insertions(+)
 create mode 100755 tests/btrfs/292
 create mode 100644 tests/btrfs/292.out

diff --git a/tests/btrfs/292 b/tests/btrfs/292
new file mode 100755
index 00000000..d1e31603
--- /dev/null
+++ b/tests/btrfs/292
@@ -0,0 +1,83 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (C) 2023 SUSE Linux Products GmbH. All Rights Reserved.
+#
+# FS QA Test 292
+#
+# Test btrfs behavior with large chunks (size beyond 4G) for basic read-write
+# and discard.
+# This test focus on RAID0.
+#
+. ./common/preamble
+_begin_fstest auto raid volume trim
+
+. ./common/filter
+
+# real QA test starts here
+
+# Modify as appropriate.
+_supported_fs btrfs
+_require_scratch_dev_pool 6
+_fixed_by_kernel_commit xxxxxxxxxxxx \
+	"btrfs: fix u32 overflows when left shifting @stripe_nr"
+
+_scratch_dev_pool_get 6
+
+
+datasize=$((5 * 1024 * 1024 * 1024))
+filesize=$((8 * 1024 * 1024))
+nr_files=$(($datasize / $filesize))
+
+# Make sure each device has at least 2G.
+# Btrfs has a limits on per-device stripe length of 1G.
+# Double that so that we can ensure a data chunk with 6G size.
+
+for i in $SCRATCH_DEV_POOL; do
+	devsize=$(blockdev --getsize64 "$i")
+	if [ $devsize -lt $((2 * 1024 * 1024 * 1024)) ]; then
+		_notrun "device $i is too small, need at least 2G"
+	fi
+done
+
+_scratch_pool_mkfs -m raid1 -d raid0 >> $seqres.full 2>&1
+_scratch_mount
+
+# Fill the data chunk with 5G data.
+for (( i = 0; i < $nr_files; i++ )); do
+	xfs_io -f -c "pwrite -i /dev/urandom 0 $filesize" $SCRATCH_MNT/file_$i > /dev/null
+done
+sync
+echo "=== With initial 5G data written ===" >> $seqres.full
+$BTRFS_UTIL_PROG filesystem df $SCRATCH_MNT >> $seqres.full
+
+_scratch_unmount
+
+# Make sure we haven't corrupted anything.
+$BTRFS_UTIL_PROG check --check-data-csum $SCRATCH_DEV >> $seqres.full 2>&1
+if [ $? -ne 0 ]; then
+	_fail "data corruption detected after initial data filling"
+fi
+
+_scratch_mount
+# Delete half of the data, and do discard
+rm -rf - "$SCRATCH_MNT/*[02468]"
+sync
+$FSTRIM_PROG $SCRATCH_MNT
+
+echo "=== With 2.5G data trimmed ===" >> $seqres.full
+$BTRFS_UTIL_PROG filesystem df $SCRATCH_MNT >> $seqres.full
+_scratch_unmount
+
+# Make sure fstrim didn't corrupte anything.
+$BTRFS_UTIL_PROG check --check-data-csum $SCRATCH_DEV >> $seqres.full 2>&1
+if [ $? -ne 0 ]; then
+	_fail "data corruption detected after initial data filling"
+fi
+
+_scratch_dev_pool_put
+
+echo "Silence is golden"
+
+# success, all done
+status=0
+exit
diff --git a/tests/btrfs/292.out b/tests/btrfs/292.out
new file mode 100644
index 00000000..627309d3
--- /dev/null
+++ b/tests/btrfs/292.out
@@ -0,0 +1,2 @@
+QA output created by 292
+Silence is golden
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] btrfs: add test case to verify the behavior with large RAID0 data chunks
  2023-06-21  1:51 [PATCH] btrfs: add test case to verify the behavior with large RAID0 data chunks Qu Wenruo
@ 2023-06-21  5:47 ` Anand Jain
  2023-06-21  6:00   ` Qu Wenruo
  0 siblings, 1 reply; 5+ messages in thread
From: Anand Jain @ 2023-06-21  5:47 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs, fstests




> +for i in $SCRATCH_DEV_POOL; do
> +	devsize=$(blockdev --getsize64 "$i")
> +	if [ $devsize -lt $((2 * 1024 * 1024 * 1024)) ]; then
> +		_notrun "device $i is too small, need at least 2G"

Also, you need to check if those devices support discard.

Uneven device sizes will alter the distribution of chunk allocation.
Since the default chunk allocation is also based on the device sizes
and free spaces.


> +	fi
> +done
> +
> +_scratch_pool_mkfs -m raid1 -d raid0 >> $seqres.full 2>&1
> +_scratch_mount
> +
> +# Fill the data chunk with 5G data.
> +for (( i = 0; i < $nr_files; i++ )); do
> +	xfs_io -f -c "pwrite -i /dev/urandom 0 $filesize" $SCRATCH_MNT/file_$i > /dev/null
> +done
> +sync
> +echo "=== With initial 5G data written ===" >> $seqres.full
> +$BTRFS_UTIL_PROG filesystem df $SCRATCH_MNT >> $seqres.full
> +
> +_scratch_unmount
> +
> +# Make sure we haven't corrupted anything.
> +$BTRFS_UTIL_PROG check --check-data-csum $SCRATCH_DEV >> $seqres.full 2>&1
> +if [ $? -ne 0 ]; then
> +	_fail "data corruption detected after initial data filling"
> +fi
> +
> +_scratch_mount
> +# Delete half of the data, and do discard
> +rm -rf - "$SCRATCH_MNT/*[02468]"

Are there any specific chunks that need to be deleted to successfully
reproduce this test case?

> +sync
> +$FSTRIM_PROG $SCRATCH_MNT

Do we need fstrim if we use mount -o discard=sync instead?

Thanks, Anand

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] btrfs: add test case to verify the behavior with large RAID0 data chunks
  2023-06-21  5:47 ` Anand Jain
@ 2023-06-21  6:00   ` Qu Wenruo
  2023-06-21  6:35     ` Anand Jain
  0 siblings, 1 reply; 5+ messages in thread
From: Qu Wenruo @ 2023-06-21  6:00 UTC (permalink / raw)
  To: Anand Jain, Qu Wenruo, linux-btrfs, fstests



On 2023/6/21 13:47, Anand Jain wrote:
>
>
>
>> +for i in $SCRATCH_DEV_POOL; do
>> +    devsize=$(blockdev --getsize64 "$i")
>> +    if [ $devsize -lt $((2 * 1024 * 1024 * 1024)) ]; then
>> +        _notrun "device $i is too small, need at least 2G"
>
> Also, you need to check if those devices support discard.
>
> Uneven device sizes will alter the distribution of chunk allocation.
> Since the default chunk allocation is also based on the device sizes
> and free spaces.

That is not a big deal, if we have all 6 devices beyond 2G size, we're
already allocating the device stripe in 1G size anyway, and we're
ensured to have a 6G RAID0 chunk no matter if the sizes are uneven.

It's the next new data chunk going to be affected, but our workload
would only need the initial RAID0 chunk, thus it's totally fine to have
uneven disk sizes.
>
>
>> +    fi
>> +done
>> +
>> +_scratch_pool_mkfs -m raid1 -d raid0 >> $seqres.full 2>&1
>> +_scratch_mount
>> +
>> +# Fill the data chunk with 5G data.
>> +for (( i = 0; i < $nr_files; i++ )); do
>> +    xfs_io -f -c "pwrite -i /dev/urandom 0 $filesize"
>> $SCRATCH_MNT/file_$i > /dev/null
>> +done
>> +sync
>> +echo "=== With initial 5G data written ===" >> $seqres.full
>> +$BTRFS_UTIL_PROG filesystem df $SCRATCH_MNT >> $seqres.full
>> +
>> +_scratch_unmount
>> +
>> +# Make sure we haven't corrupted anything.
>> +$BTRFS_UTIL_PROG check --check-data-csum $SCRATCH_DEV >> $seqres.full
>> 2>&1
>> +if [ $? -ne 0 ]; then
>> +    _fail "data corruption detected after initial data filling"
>> +fi
>> +
>> +_scratch_mount
>> +# Delete half of the data, and do discard
>> +rm -rf - "$SCRATCH_MNT/*[02468]"
>
> Are there any specific chunks that need to be deleted to successfully
> reproduce this test case?

No, there is and will only be one data chunk.

We're here only to create holes to cause extra trim workload.

>
>> +sync
>> +$FSTRIM_PROG $SCRATCH_MNT
>
> Do we need fstrim if we use mount -o discard=sync instead?

There is not much difference.

Thanks,
Qu

>
> Thanks, Anand

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] btrfs: add test case to verify the behavior with large RAID0 data chunks
  2023-06-21  6:00   ` Qu Wenruo
@ 2023-06-21  6:35     ` Anand Jain
  2023-06-21  6:50       ` Qu Wenruo
  0 siblings, 1 reply; 5+ messages in thread
From: Anand Jain @ 2023-06-21  6:35 UTC (permalink / raw)
  To: Qu Wenruo, Qu Wenruo, linux-btrfs, fstests



On 21/06/2023 14:00, Qu Wenruo wrote:
> 
> 
> On 2023/6/21 13:47, Anand Jain wrote:
>>
>>
>>
>>> +for i in $SCRATCH_DEV_POOL; do
>>> +    devsize=$(blockdev --getsize64 "$i")
>>> +    if [ $devsize -lt $((2 * 1024 * 1024 * 1024)) ]; then
>>> +        _notrun "device $i is too small, need at least 2G"
>>


>> Also, you need to check if those devices support discard.
>>

Howabout this?


btrfs/292       - output mismatch (see 
/xfstests-dev/results//btrfs/292.out.bad)
     --- tests/btrfs/292.out	2023-06-21 13:27:12.764966120 +0800
     +++ /xfstests-dev/results//btrfs/292.out.bad	2023-06-21 
13:54:01.863082692 +0800
     @@ -1,2 +1,3 @@
      QA output created by 292
     +fstrim: /mnt/scratch: the discard operation is not supported
      Silence is golden
     ...
     (Run 'diff -u /xfstests-dev/tests/btrfs/292.out 
/xfstests-dev/results//btrfs/292.out.bad'  to see the entire diff)

HINT: You _MAY_ be missing kernel fix:
       xxxxxxxxxxxx btrfs: fix u32 overflows when left shifting @stripe_nr




>> Uneven device sizes will alter the distribution of chunk allocation.
>> Since the default chunk allocation is also based on the device sizes
>> and free spaces.
> 
> That is not a big deal, if we have all 6 devices beyond 2G size, we're
> already allocating the device stripe in 1G size anyway, and we're
> ensured to have a 6G RAID0 chunk no matter if the sizes are uneven.

Ah. RAID0. Got it.

> It's the next new data chunk going to be affected, but our workload
> would only need the initial RAID0 chunk, thus it's totally fine to have
> uneven disk sizes.

>>
>>
>>> +    fi
>>> +done
>>> +
>>> +_scratch_pool_mkfs -m raid1 -d raid0 >> $seqres.full 2>&1
>>> +_scratch_mount
>>> +
>>> +# Fill the data chunk with 5G data.
>>> +for (( i = 0; i < $nr_files; i++ )); do
>>> +    xfs_io -f -c "pwrite -i /dev/urandom 0 $filesize"
>>> $SCRATCH_MNT/file_$i > /dev/null
>>> +done
>>> +sync
>>> +echo "=== With initial 5G data written ===" >> $seqres.full
>>> +$BTRFS_UTIL_PROG filesystem df $SCRATCH_MNT >> $seqres.full
>>> +
>>> +_scratch_unmount
>>> +
>>> +# Make sure we haven't corrupted anything.
>>> +$BTRFS_UTIL_PROG check --check-data-csum $SCRATCH_DEV >> $seqres.full
>>> 2>&1
>>> +if [ $? -ne 0 ]; then
>>> +    _fail "data corruption detected after initial data filling"
>>> +fi
>>> +
>>> +_scratch_mount
>>> +# Delete half of the data, and do discard
>>> +rm -rf - "$SCRATCH_MNT/*[02468]"
>>
>> Are there any specific chunks that need to be deleted to successfully
>> reproduce this test case?
 >
> No, there is and will only be one data chunk.


  Right. I missed the point it is a raid0.


> We're here only to create holes to cause extra trim workload.
> 

Thanks, Anand

>>
>>> +sync
>>> +$FSTRIM_PROG $SCRATCH_MNT
>>
>> Do we need fstrim if we use mount -o discard=sync instead?
> 
> There is not much difference.



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] btrfs: add test case to verify the behavior with large RAID0 data chunks
  2023-06-21  6:35     ` Anand Jain
@ 2023-06-21  6:50       ` Qu Wenruo
  0 siblings, 0 replies; 5+ messages in thread
From: Qu Wenruo @ 2023-06-21  6:50 UTC (permalink / raw)
  To: Anand Jain, Qu Wenruo, linux-btrfs, fstests



On 2023/6/21 14:35, Anand Jain wrote:
>
>
> On 21/06/2023 14:00, Qu Wenruo wrote:
>>
>>
>> On 2023/6/21 13:47, Anand Jain wrote:
>>>
>>>
>>>
>>>> +for i in $SCRATCH_DEV_POOL; do
>>>> +    devsize=$(blockdev --getsize64 "$i")
>>>> +    if [ $devsize -lt $((2 * 1024 * 1024 * 1024)) ]; then
>>>> +        _notrun "device $i is too small, need at least 2G"
>>>
>
>
>>> Also, you need to check if those devices support discard.
>>>
>
> Howabout this?
>
>
> btrfs/292       - output mismatch (see
> /xfstests-dev/results//btrfs/292.out.bad)
>      --- tests/btrfs/292.out    2023-06-21 13:27:12.764966120 +0800
>      +++ /xfstests-dev/results//btrfs/292.out.bad    2023-06-21
> 13:54:01.863082692 +0800
>      @@ -1,2 +1,3 @@
>       QA output created by 292
>      +fstrim: /mnt/scratch: the discard operation is not supported
>       Silence is golden
>      ...
>      (Run 'diff -u /xfstests-dev/tests/btrfs/292.out
> /xfstests-dev/results//btrfs/292.out.bad'  to see the entire diff)
>
> HINT: You _MAY_ be missing kernel fix:
>        xxxxxxxxxxxx btrfs: fix u32 overflows when left shifting @stripe_nr

That's true, I'll add the needed require line in the next update.

Thanks,
Qu
>
>
>
>
>>> Uneven device sizes will alter the distribution of chunk allocation.
>>> Since the default chunk allocation is also based on the device sizes
>>> and free spaces.
>>
>> That is not a big deal, if we have all 6 devices beyond 2G size, we're
>> already allocating the device stripe in 1G size anyway, and we're
>> ensured to have a 6G RAID0 chunk no matter if the sizes are uneven.
>
> Ah. RAID0. Got it.
>
>> It's the next new data chunk going to be affected, but our workload
>> would only need the initial RAID0 chunk, thus it's totally fine to have
>> uneven disk sizes.
>
>>>
>>>
>>>> +    fi
>>>> +done
>>>> +
>>>> +_scratch_pool_mkfs -m raid1 -d raid0 >> $seqres.full 2>&1
>>>> +_scratch_mount
>>>> +
>>>> +# Fill the data chunk with 5G data.
>>>> +for (( i = 0; i < $nr_files; i++ )); do
>>>> +    xfs_io -f -c "pwrite -i /dev/urandom 0 $filesize"
>>>> $SCRATCH_MNT/file_$i > /dev/null
>>>> +done
>>>> +sync
>>>> +echo "=== With initial 5G data written ===" >> $seqres.full
>>>> +$BTRFS_UTIL_PROG filesystem df $SCRATCH_MNT >> $seqres.full
>>>> +
>>>> +_scratch_unmount
>>>> +
>>>> +# Make sure we haven't corrupted anything.
>>>> +$BTRFS_UTIL_PROG check --check-data-csum $SCRATCH_DEV >> $seqres.full
>>>> 2>&1
>>>> +if [ $? -ne 0 ]; then
>>>> +    _fail "data corruption detected after initial data filling"
>>>> +fi
>>>> +
>>>> +_scratch_mount
>>>> +# Delete half of the data, and do discard
>>>> +rm -rf - "$SCRATCH_MNT/*[02468]"
>>>
>>> Are there any specific chunks that need to be deleted to successfully
>>> reproduce this test case?
>  >
>> No, there is and will only be one data chunk.
>
>
>   Right. I missed the point it is a raid0.
>
>
>> We're here only to create holes to cause extra trim workload.
>>
>
> Thanks, Anand
>
>>>
>>>> +sync
>>>> +$FSTRIM_PROG $SCRATCH_MNT
>>>
>>> Do we need fstrim if we use mount -o discard=sync instead?
>>
>> There is not much difference.
>
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-06-21  6:52 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-21  1:51 [PATCH] btrfs: add test case to verify the behavior with large RAID0 data chunks Qu Wenruo
2023-06-21  5:47 ` Anand Jain
2023-06-21  6:00   ` Qu Wenruo
2023-06-21  6:35     ` Anand Jain
2023-06-21  6:50       ` Qu Wenruo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox