* [RFC 1/1] btrfs/237: adapt the test to work with the new reclaim algorithm
2022-08-19 11:53 ` [RFC 0/1] adapting btrfs/237 to work with the new reclaim algorithm Pankaj Raghav
@ 2022-08-19 11:53 ` Pankaj Raghav
2022-08-22 9:40 ` Johannes Thumshirn
2022-08-22 14:29 ` [RFC 0/1] adapting btrfs/237 " Johannes Thumshirn
2022-12-05 7:56 ` Johannes Thumshirn
2 siblings, 1 reply; 12+ messages in thread
From: Pankaj Raghav @ 2022-08-19 11:53 UTC (permalink / raw)
To: fstests
Cc: Johannes.Thumshirn, damien.lemoal, pankydev8, naohiro.aota,
gost.dev, mcgrof, dsterba, Pankaj Raghav
Since 3687fcb0752a ("btrfs: zoned: make auto-reclaim less aggressive")
commit, reclaim algorithm has been changed to trigger auto-reclaim once
the fs used size is more than certain threshold. This change breaks this
test.
The test has been adapted so that the new auto-reclaim algorithm can be
tested along with relocation.
---
tests/btrfs/237 | 80 +++++++++++++++++++++++++++++++++++--------------
1 file changed, 57 insertions(+), 23 deletions(-)
diff --git a/tests/btrfs/237 b/tests/btrfs/237
index f96031d5..18945e78 100755
--- a/tests/btrfs/237
+++ b/tests/btrfs/237
@@ -54,46 +54,80 @@ if [[ "$uuid" == "" ]]; then
exit 1
fi
+fssize=$($BTRFS_UTIL_PROG fi usage -b $SCRATCH_MNT |grep "Device size" |\
+ grep -Eo "[0-9]+")
+
+allocated_fssize=$($BTRFS_UTIL_PROG fi usage -b $SCRATCH_MNT |grep "Device allocated" |\
+ grep -Eo "[0-9]+")
+
+
start_data_bg_phy=$(get_data_bg_physical)
start_data_bg_phy=$((start_data_bg_phy >> 9))
-size=$($BLKZONE_PROG report -o $start_data_bg_phy -l 1 $SCRATCH_DEV |\
+zone_cap=$($BLKZONE_PROG report -o $start_data_bg_phy -l 1 $SCRATCH_DEV |\
_filter_blkzone_report |\
grep -Po "cap 0x[[:xdigit:]]+" | cut -d ' ' -f 2)
-size=$((size << 9))
+zone_cap=$((zone_cap << 9))
-reclaim_threshold=75
-echo $reclaim_threshold > /sys/fs/btrfs/"$uuid"/bg_reclaim_threshold
-fill_percent=$((reclaim_threshold + 2))
-rest_percent=$((90 - fill_percent)) # make sure we're not creating a new BG
-fill_size=$((size * fill_percent / 100))
-rest=$((size * rest_percent / 100))
+fs_reclaim_threshold=60
+bg_reclaim_threshold=75
+echo $fs_reclaim_threshold > /sys/fs/btrfs/"$uuid"/bg_reclaim_threshold
+echo $bg_reclaim_threshold > /sys/fs/btrfs/"$uuid"/allocation/data/bg_reclaim_threshold
-# step 1, fill FS over $fillsize
-$XFS_IO_PROG -fc "pwrite 0 $fill_size" $SCRATCH_MNT/$seq.test1 >> $seqres.full
-$XFS_IO_PROG -fc "pwrite 0 $rest" $SCRATCH_MNT/$seq.test2 >> $seqres.full
+fs_fill_percent=$((fs_reclaim_threshold + 2))
+fill_size=$((fssize * fs_fill_percent / 100))
+
+# Remove the allocated size from the $fill_size
+fill_size=$((fill_size - allocated_fssize))
+
+bg_fill_percent=$((bg_reclaim_threshold + 2))
+zone_fill_size=$((zone_cap * bg_fill_percent / 100))
+
+# $fill_size might not cover the last zone block group with threshold
+# for reclaim. Add the remaining bytes so that it can also be reclaimed
+last_zone_offset=$((fill_size % zone_cap))
+
+if [ $last_zone_offset -lt $zone_fill_size ]; then
+ fill_size=$((fill_size + zone_fill_size - last_zone_offset))
+fi
+
+# This small file will be used to verify the relocation
+relocate_file_size=$((zone_cap * 2 / 100))
+
+# step 1, fill FS over $relocated_file_size and $fill_size
+$XFS_IO_PROG -fc "pwrite 0 $relocate_file_size" $SCRATCH_MNT/$seq.test1 >> $seqres.full
$BTRFS_UTIL_PROG filesystem sync $SCRATCH_MNT
-zones_before=$($BLKZONE_PROG report $SCRATCH_DEV | grep -v -e em -e nw | wc -l)
-echo "Before reclaim: $zones_before zones open" >> $seqres.full
old_data_zone=$(get_data_bg)
old_data_zone=$((old_data_zone >> 9))
printf "Old data zone 0x%x\n" $old_data_zone >> $seqres.full
-# step 2, delete the 1st $fill_size sized file to trigger reclaim
-rm $SCRATCH_MNT/$seq.test1
+$XFS_IO_PROG -fc "pwrite 0 $fill_size" $SCRATCH_MNT/$seq.test2 >> $seqres.full
$BTRFS_UTIL_PROG filesystem sync $SCRATCH_MNT
-sleep 2 # 1 transaction commit for 'rm' and 1 for balance
+
+open_zones_before_reclaim=$($BLKZONE_PROG report --offset $start_data_bg_phy $SCRATCH_DEV |\
+ grep -v -e em -e nw | wc -l)
+
+# sanity check
+if [ $open_zones_before_reclaim -eq 0 ]; then
+ echo "Error writing to the device"
+fi
+
+echo "Before reclaim: $open_zones_before_reclaim zones open" >> $seqres.full
+
+# step 2, delete the $fill_size sized file to trigger reclaim
+rm $SCRATCH_MNT/$seq.test2
+$BTRFS_UTIL_PROG filesystem sync $SCRATCH_MNT
+sleep 5 # sleep for transaction commit for 'rm' and for balance
# check that we don't have more zones open than before
-zones_after=$($BLKZONE_PROG report $SCRATCH_DEV | grep -v -e em -e nw | wc -l)
-echo "After reclaim: $zones_after zones open" >> $seqres.full
+open_zones_after_reclaim=$($BLKZONE_PROG report --offset $start_data_bg_phy $SCRATCH_DEV |\
+ grep -v -e em -e nw | wc -l)
+echo "After reclaim: $open_zones_after_reclaim zones open" >> $seqres.full
-# Check that old data zone was reset
-old_wptr=$($BLKZONE_PROG report -o $old_data_zone -c 1 $SCRATCH_DEV |\
- grep -Eo "wptr 0x[[:xdigit:]]+" | cut -d ' ' -f 2)
-if [ "$old_wptr" != "0x000000" ]; then
- _fail "Old wptr still at $old_wptr"
+# Check that data was really relocated to a different zone
+if [ $open_zones_after_reclaim != 1 ]; then
+ echo "Error relocating the data"
fi
new_data_zone=$(get_data_bg)
--
2.25.1
^ permalink raw reply related [flat|nested] 12+ messages in thread* Re: [RFC 1/1] btrfs/237: adapt the test to work with the new reclaim algorithm
2022-08-19 11:53 ` [RFC 1/1] btrfs/237: adapt the test " Pankaj Raghav
@ 2022-08-22 9:40 ` Johannes Thumshirn
2022-08-22 10:49 ` Pankaj Raghav
0 siblings, 1 reply; 12+ messages in thread
From: Johannes Thumshirn @ 2022-08-22 9:40 UTC (permalink / raw)
To: Pankaj Raghav, fstests@vger.kernel.org
Cc: damien.lemoal@opensource.wdc.com, pankydev8@gmail.com,
Naohiro Aota, gost.dev@samsung.com, mcgrof@kernel.org,
dsterba@suse.cz
On 19.08.22 13:53, Pankaj Raghav wrote:
> Since 3687fcb0752a ("btrfs: zoned: make auto-reclaim less aggressive")
> commit, reclaim algorithm has been changed to trigger auto-reclaim once
> the fs used size is more than certain threshold. This change breaks this
> test.
>
> The test has been adapted so that the new auto-reclaim algorithm can be
> tested along with relocation.
S-o-b missing.
Thanks for doing this!
Tested-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> ---
> tests/btrfs/237 | 80 +++++++++++++++++++++++++++++++++++--------------
> 1 file changed, 57 insertions(+), 23 deletions(-)
>
> diff --git a/tests/btrfs/237 b/tests/btrfs/237
> index f96031d5..18945e78 100755
> --- a/tests/btrfs/237
> +++ b/tests/btrfs/237
> @@ -54,46 +54,80 @@ if [[ "$uuid" == "" ]]; then
> exit 1
> fi
>
> +fssize=$($BTRFS_UTIL_PROG fi usage -b $SCRATCH_MNT |grep "Device size" |\
> + grep -Eo "[0-9]+")
> +
> +allocated_fssize=$($BTRFS_UTIL_PROG fi usage -b $SCRATCH_MNT |grep "Device allocated" |\
> + grep -Eo "[0-9]+")
> +
> +
> start_data_bg_phy=$(get_data_bg_physical)
> start_data_bg_phy=$((start_data_bg_phy >> 9))
>
> -size=$($BLKZONE_PROG report -o $start_data_bg_phy -l 1 $SCRATCH_DEV |\
> +zone_cap=$($BLKZONE_PROG report -o $start_data_bg_phy -l 1 $SCRATCH_DEV |\
> _filter_blkzone_report |\
> grep -Po "cap 0x[[:xdigit:]]+" | cut -d ' ' -f 2)
> -size=$((size << 9))
> +zone_cap=$((zone_cap << 9))
>
> -reclaim_threshold=75
> -echo $reclaim_threshold > /sys/fs/btrfs/"$uuid"/bg_reclaim_threshold
> -fill_percent=$((reclaim_threshold + 2))
> -rest_percent=$((90 - fill_percent)) # make sure we're not creating a new BG
> -fill_size=$((size * fill_percent / 100))
> -rest=$((size * rest_percent / 100))
> +fs_reclaim_threshold=60
> +bg_reclaim_threshold=75
> +echo $fs_reclaim_threshold > /sys/fs/btrfs/"$uuid"/bg_reclaim_threshold
> +echo $bg_reclaim_threshold > /sys/fs/btrfs/"$uuid"/allocation/data/bg_reclaim_threshold
>
> -# step 1, fill FS over $fillsize
> -$XFS_IO_PROG -fc "pwrite 0 $fill_size" $SCRATCH_MNT/$seq.test1 >> $seqres.full
> -$XFS_IO_PROG -fc "pwrite 0 $rest" $SCRATCH_MNT/$seq.test2 >> $seqres.full
> +fs_fill_percent=$((fs_reclaim_threshold + 2))
> +fill_size=$((fssize * fs_fill_percent / 100))
> +
> +# Remove the allocated size from the $fill_size
> +fill_size=$((fill_size - allocated_fssize))
> +
> +bg_fill_percent=$((bg_reclaim_threshold + 2))
> +zone_fill_size=$((zone_cap * bg_fill_percent / 100))
> +
> +# $fill_size might not cover the last zone block group with threshold
> +# for reclaim. Add the remaining bytes so that it can also be reclaimed
> +last_zone_offset=$((fill_size % zone_cap))
> +
> +if [ $last_zone_offset -lt $zone_fill_size ]; then
> + fill_size=$((fill_size + zone_fill_size - last_zone_offset))
> +fi
> +
> +# This small file will be used to verify the relocation
> +relocate_file_size=$((zone_cap * 2 / 100))
> +
> +# step 1, fill FS over $relocated_file_size and $fill_size
> +$XFS_IO_PROG -fc "pwrite 0 $relocate_file_size" $SCRATCH_MNT/$seq.test1 >> $seqres.full
> $BTRFS_UTIL_PROG filesystem sync $SCRATCH_MNT
>
> -zones_before=$($BLKZONE_PROG report $SCRATCH_DEV | grep -v -e em -e nw | wc -l)
> -echo "Before reclaim: $zones_before zones open" >> $seqres.full
> old_data_zone=$(get_data_bg)
> old_data_zone=$((old_data_zone >> 9))
> printf "Old data zone 0x%x\n" $old_data_zone >> $seqres.full
>
> -# step 2, delete the 1st $fill_size sized file to trigger reclaim
> -rm $SCRATCH_MNT/$seq.test1
> +$XFS_IO_PROG -fc "pwrite 0 $fill_size" $SCRATCH_MNT/$seq.test2 >> $seqres.full
> $BTRFS_UTIL_PROG filesystem sync $SCRATCH_MNT
> -sleep 2 # 1 transaction commit for 'rm' and 1 for balance
> +
> +open_zones_before_reclaim=$($BLKZONE_PROG report --offset $start_data_bg_phy $SCRATCH_DEV |\
> + grep -v -e em -e nw | wc -l)
> +
> +# sanity check
> +if [ $open_zones_before_reclaim -eq 0 ]; then
> + echo "Error writing to the device"
> +fi
> +
> +echo "Before reclaim: $open_zones_before_reclaim zones open" >> $seqres.full
> +
> +# step 2, delete the $fill_size sized file to trigger reclaim
> +rm $SCRATCH_MNT/$seq.test2
> +$BTRFS_UTIL_PROG filesystem sync $SCRATCH_MNT
> +sleep 5 # sleep for transaction commit for 'rm' and for balance
>
> # check that we don't have more zones open than before
> -zones_after=$($BLKZONE_PROG report $SCRATCH_DEV | grep -v -e em -e nw | wc -l)
> -echo "After reclaim: $zones_after zones open" >> $seqres.full
> +open_zones_after_reclaim=$($BLKZONE_PROG report --offset $start_data_bg_phy $SCRATCH_DEV |\
> + grep -v -e em -e nw | wc -l)
> +echo "After reclaim: $open_zones_after_reclaim zones open" >> $seqres.full
>
> -# Check that old data zone was reset
> -old_wptr=$($BLKZONE_PROG report -o $old_data_zone -c 1 $SCRATCH_DEV |\
> - grep -Eo "wptr 0x[[:xdigit:]]+" | cut -d ' ' -f 2)
> -if [ "$old_wptr" != "0x000000" ]; then
> - _fail "Old wptr still at $old_wptr"
> +# Check that data was really relocated to a different zone
> +if [ $open_zones_after_reclaim != 1 ]; then
> + echo "Error relocating the data"
> fi
>
> new_data_zone=$(get_data_bg)
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [RFC 1/1] btrfs/237: adapt the test to work with the new reclaim algorithm
2022-08-22 9:40 ` Johannes Thumshirn
@ 2022-08-22 10:49 ` Pankaj Raghav
2022-08-22 12:22 ` Johannes Thumshirn
0 siblings, 1 reply; 12+ messages in thread
From: Pankaj Raghav @ 2022-08-22 10:49 UTC (permalink / raw)
To: Johannes Thumshirn, fstests@vger.kernel.org
Cc: damien.lemoal@opensource.wdc.com, pankydev8@gmail.com,
Naohiro Aota, gost.dev@samsung.com, mcgrof@kernel.org,
dsterba@suse.cz
On 2022-08-22 11:40, Johannes Thumshirn wrote:
> On 19.08.22 13:53, Pankaj Raghav wrote:
>> Since 3687fcb0752a ("btrfs: zoned: make auto-reclaim less aggressive")
>> commit, reclaim algorithm has been changed to trigger auto-reclaim once
>> the fs used size is more than certain threshold. This change breaks this
>> test.
>>
>> The test has been adapted so that the new auto-reclaim algorithm can be
>> tested along with relocation.
>
> S-o-b missing.
Oops. I hope it can be added before committing.
Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
>
> Thanks for doing this!
> Tested-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>
Thanks for testing it Johannes.
I did encounter an issue when I was trying out this test in a larger zoned
device. I have explained the issue in my cover letter. I don't know if it
is an issue with QEMU. As I don't have a physical PO2 device with me, did
you see any of the issue I mentioned in the cover letter? If not, it might
be an issue with the QEMU ZNS implementation.
>> ---
>> tests/btrfs/237 | 80 +++++++++++++++++++++++++++++++++++--------------
>> 1 file changed, 57 insertions(+), 23 deletions(-)
>>
>> diff --git a/tests/btrfs/237 b/tests/btrfs/237
>> index f96031d5..18945e78 100755
>> --- a/tests/btrfs/237
>> +++ b/tests/btrfs/237
>> @@ -54,46 +54,80 @@ if [[ "$uuid" == "" ]]; then
>> exit 1
>> fi
>>
>> +fssize=$($BTRFS_UTIL_PROG fi usage -b $SCRATCH_MNT |grep "Device size" |\
>> + grep -Eo "[0-9]+")
>> +
>> +allocated_fssize=$($BTRFS_UTIL_PROG fi usage -b $SCRATCH_MNT |grep "Device allocated" |\
>> + grep -Eo "[0-9]+")
>> +
>> +
>> start_data_bg_phy=$(get_data_bg_physical)
>> start_data_bg_phy=$((start_data_bg_phy >> 9))
>>
>> -size=$($BLKZONE_PROG report -o $start_data_bg_phy -l 1 $SCRATCH_DEV |\
>> +zone_cap=$($BLKZONE_PROG report -o $start_data_bg_phy -l 1 $SCRATCH_DEV |\
>> _filter_blkzone_report |\
>> grep -Po "cap 0x[[:xdigit:]]+" | cut -d ' ' -f 2)
>> -size=$((size << 9))
>> +zone_cap=$((zone_cap << 9))
>>
>> -reclaim_threshold=75
>> -echo $reclaim_threshold > /sys/fs/btrfs/"$uuid"/bg_reclaim_threshold
>> -fill_percent=$((reclaim_threshold + 2))
>> -rest_percent=$((90 - fill_percent)) # make sure we're not creating a new BG
>> -fill_size=$((size * fill_percent / 100))
>> -rest=$((size * rest_percent / 100))
>> +fs_reclaim_threshold=60
>> +bg_reclaim_threshold=75
>> +echo $fs_reclaim_threshold > /sys/fs/btrfs/"$uuid"/bg_reclaim_threshold
>> +echo $bg_reclaim_threshold > /sys/fs/btrfs/"$uuid"/allocation/data/bg_reclaim_threshold
>>
>> -# step 1, fill FS over $fillsize
>> -$XFS_IO_PROG -fc "pwrite 0 $fill_size" $SCRATCH_MNT/$seq.test1 >> $seqres.full
>> -$XFS_IO_PROG -fc "pwrite 0 $rest" $SCRATCH_MNT/$seq.test2 >> $seqres.full
>> +fs_fill_percent=$((fs_reclaim_threshold + 2))
>> +fill_size=$((fssize * fs_fill_percent / 100))
>> +
>> +# Remove the allocated size from the $fill_size
>> +fill_size=$((fill_size - allocated_fssize))
>> +
>> +bg_fill_percent=$((bg_reclaim_threshold + 2))
>> +zone_fill_size=$((zone_cap * bg_fill_percent / 100))
>> +
>> +# $fill_size might not cover the last zone block group with threshold
>> +# for reclaim. Add the remaining bytes so that it can also be reclaimed
>> +last_zone_offset=$((fill_size % zone_cap))
>> +
>> +if [ $last_zone_offset -lt $zone_fill_size ]; then
>> + fill_size=$((fill_size + zone_fill_size - last_zone_offset))
>> +fi
>> +
>> +# This small file will be used to verify the relocation
>> +relocate_file_size=$((zone_cap * 2 / 100))
>> +
>> +# step 1, fill FS over $relocated_file_size and $fill_size
>> +$XFS_IO_PROG -fc "pwrite 0 $relocate_file_size" $SCRATCH_MNT/$seq.test1 >> $seqres.full
>> $BTRFS_UTIL_PROG filesystem sync $SCRATCH_MNT
>>
>> -zones_before=$($BLKZONE_PROG report $SCRATCH_DEV | grep -v -e em -e nw | wc -l)
>> -echo "Before reclaim: $zones_before zones open" >> $seqres.full
>> old_data_zone=$(get_data_bg)
>> old_data_zone=$((old_data_zone >> 9))
>> printf "Old data zone 0x%x\n" $old_data_zone >> $seqres.full
>>
>> -# step 2, delete the 1st $fill_size sized file to trigger reclaim
>> -rm $SCRATCH_MNT/$seq.test1
>> +$XFS_IO_PROG -fc "pwrite 0 $fill_size" $SCRATCH_MNT/$seq.test2 >> $seqres.full
>> $BTRFS_UTIL_PROG filesystem sync $SCRATCH_MNT
>> -sleep 2 # 1 transaction commit for 'rm' and 1 for balance
>> +
>> +open_zones_before_reclaim=$($BLKZONE_PROG report --offset $start_data_bg_phy $SCRATCH_DEV |\
>> + grep -v -e em -e nw | wc -l)
>> +
>> +# sanity check
>> +if [ $open_zones_before_reclaim -eq 0 ]; then
>> + echo "Error writing to the device"
>> +fi
>> +
>> +echo "Before reclaim: $open_zones_before_reclaim zones open" >> $seqres.full
>> +
>> +# step 2, delete the $fill_size sized file to trigger reclaim
>> +rm $SCRATCH_MNT/$seq.test2
>> +$BTRFS_UTIL_PROG filesystem sync $SCRATCH_MNT
>> +sleep 5 # sleep for transaction commit for 'rm' and for balance
>>
>> # check that we don't have more zones open than before
>> -zones_after=$($BLKZONE_PROG report $SCRATCH_DEV | grep -v -e em -e nw | wc -l)
>> -echo "After reclaim: $zones_after zones open" >> $seqres.full
>> +open_zones_after_reclaim=$($BLKZONE_PROG report --offset $start_data_bg_phy $SCRATCH_DEV |\
>> + grep -v -e em -e nw | wc -l)
>> +echo "After reclaim: $open_zones_after_reclaim zones open" >> $seqres.full
>>
>> -# Check that old data zone was reset
>> -old_wptr=$($BLKZONE_PROG report -o $old_data_zone -c 1 $SCRATCH_DEV |\
>> - grep -Eo "wptr 0x[[:xdigit:]]+" | cut -d ' ' -f 2)
>> -if [ "$old_wptr" != "0x000000" ]; then
>> - _fail "Old wptr still at $old_wptr"
>> +# Check that data was really relocated to a different zone
>> +if [ $open_zones_after_reclaim != 1 ]; then
>> + echo "Error relocating the data"
>> fi
>>
>> new_data_zone=$(get_data_bg)
>
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [RFC 1/1] btrfs/237: adapt the test to work with the new reclaim algorithm
2022-08-22 10:49 ` Pankaj Raghav
@ 2022-08-22 12:22 ` Johannes Thumshirn
0 siblings, 0 replies; 12+ messages in thread
From: Johannes Thumshirn @ 2022-08-22 12:22 UTC (permalink / raw)
To: Pankaj Raghav, fstests@vger.kernel.org
Cc: damien.lemoal@opensource.wdc.com, pankydev8@gmail.com,
Naohiro Aota, gost.dev@samsung.com, mcgrof@kernel.org,
dsterba@suse.cz
On 22.08.22 12:49, Pankaj Raghav wrote:
> On 2022-08-22 11:40, Johannes Thumshirn wrote:
>> On 19.08.22 13:53, Pankaj Raghav wrote:
>>> Since 3687fcb0752a ("btrfs: zoned: make auto-reclaim less aggressive")
>>> commit, reclaim algorithm has been changed to trigger auto-reclaim once
>>> the fs used size is more than certain threshold. This change breaks this
>>> test.
>>>
>>> The test has been adapted so that the new auto-reclaim algorithm can be
>>> tested along with relocation.
>>
>> S-o-b missing.
> Oops. I hope it can be added before committing.
> Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
>
>>
>> Thanks for doing this!
>> Tested-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>>
> Thanks for testing it Johannes.
>
> I did encounter an issue when I was trying out this test in a larger zoned
> device. I have explained the issue in my cover letter. I don't know if it
> is an issue with QEMU. As I don't have a physical PO2 device with me, did
> you see any of the issue I mentioned in the cover letter? If not, it might
> be an issue with the QEMU ZNS implementation.
>
I've been testing with my usual smoke test setup on null_blk with 128MiB
zone size and capacity == size. Haven't encountered any issues with this.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC 0/1] adapting btrfs/237 to work with the new reclaim algorithm
2022-08-19 11:53 ` [RFC 0/1] adapting btrfs/237 to work with the new reclaim algorithm Pankaj Raghav
2022-08-19 11:53 ` [RFC 1/1] btrfs/237: adapt the test " Pankaj Raghav
@ 2022-08-22 14:29 ` Johannes Thumshirn
2022-08-23 11:46 ` Pankaj Raghav
2022-12-05 7:56 ` Johannes Thumshirn
2 siblings, 1 reply; 12+ messages in thread
From: Johannes Thumshirn @ 2022-08-22 14:29 UTC (permalink / raw)
To: Pankaj Raghav, fstests@vger.kernel.org
Cc: damien.lemoal@opensource.wdc.com, pankydev8@gmail.com,
Naohiro Aota, gost.dev@samsung.com, mcgrof@kernel.org,
dsterba@suse.cz
On 19.08.22 13:53, Pankaj Raghav wrote:
> Hi ,
> Since 3687fcb0752a ("btrfs: zoned: make auto-reclaim less aggressive")
> commit, reclaim algorithm has been changed to trigger auto-reclaim once
> the fs used size is more than a certain threshold. This change breaks
> 237 test.
>
> I tried to adapt the test by doing the following:
> - Write a small file first
> - Write a big file that increases the disk usage to be more than the
> reclaim threshold
> - Delete the big file to trigger threshold
> - Ensure the small file is relocated and the space used by the big file
> is reclaimed.
>
> My test case works properly for small ZNS drives but not for bigger
> sized drives in QEMU. When I use a drive with a size of 100G, not all
> zones that were used by the big file are correctly reclaimed.
> Either I am not setting up the test correctly or there is something
> wrong on how reclaim works for zoned devices.
>
> I created a simple script to reproduce the scenario instead of running
> the test. Please adapt the $DEV and $big_file_size based on the drive
> size. As I am setting the bg_reclaim_threshold to be 51, $big_file_size
> should be at least 51% of the drive size.
>
> ```
> DEV=nvme0n3
> DEV_PATH=/dev/$DEV
> big_file_size=2500M
>
> echo "mq-deadline" > /sys/block/$DEV/queue/scheduler
> umount /mnt/scratch
> blkzone reset $DEV_PATH
> mkfs.btrfs -f -d single -m single $DEV_PATH > /dev/null; mount -t btrfs $DEV_PATH \
> /mnt/scratch
> uuid=$(btrfs fi show $DEV_PATH | grep 'uuid' | awk '{print $NF}')
>
> echo "51" > /sys/fs/btrfs/$uuid/bg_reclaim_threshold
>
> fio --filename=/mnt/scratch/test2 --size=1M --rw=write --bs=4k \
> --name=btrfs_zoned > /dev/null
> btrfs fi sync /mnt/scratch
>
> echo "Open zones before big file trasfer:"
> blkzone report $DEV_PATH | grep -v -e em -e nw | wc -l
>
> fio --filename=/mnt/scratch/test1 --size=$big_file_size --rw=write --bs=4k \
> --ioengine=io_uring --name=btrfs_zoned > /dev/null
> btrfs fi sync /mnt/scratch
>
> echo "Open zones before removing the file:"
> blkzone report $DEV_PATH | grep -v -e em -e nw | wc -l
> rm /mnt/scratch/test1
> btrfs fi sync /mnt/scratch
>
> echo "Going to sleep. Removed the file"
> sleep 30
>
> echo "Open zones after reclaim:"
> blkzone report $DEV_PATH | grep -v -e em -e nw | wc -l
> ```
>
> I am getting the following output in QEMU:
>
> - 5GB ZNS drive with 128MB zone size (and cap) and it is working as
> expected:
>
> ```
> Open zones before big file trasfer:
> 4
> Open zones before removing the file:
> 23
> Going to sleep. Removed the file
> Open zones after reclaim:
> 4
> ```
>
> - 100GB ZNS drive with 128MB zone size (and cap) and it is **not
> working** as expected:
>
> ```
> Open zones before big file trasfer:
> 4
> Open zones before removing the file:
> 455
> Going to sleep. Removed the file
> Open zones after reclaim:
> 411
> ```
>
> Only partial reclaim is happening for bigger sized drives. The issue
> with that is, if I do another FIO transfer, the drive spits out ENOSPC
> before its actual capacity is reached as most of the zones have not been
> reclaimed back and are basically in an unusable state.
>
> Is there a limit on how many bgs can be reclaimed?
>
> Let me know if I am doing something wrong in the test or if it is an
> actual issue.
Can you try setting max_active_zones to 0? I have the feeling it's yet
another (or perhaps already known, Naohiro shoudl know that) issue with
MAZ handling.
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [RFC 0/1] adapting btrfs/237 to work with the new reclaim algorithm
2022-08-22 14:29 ` [RFC 0/1] adapting btrfs/237 " Johannes Thumshirn
@ 2022-08-23 11:46 ` Pankaj Raghav
2022-12-05 14:53 ` Pankaj Raghav
0 siblings, 1 reply; 12+ messages in thread
From: Pankaj Raghav @ 2022-08-23 11:46 UTC (permalink / raw)
To: Johannes Thumshirn, fstests@vger.kernel.org
Cc: damien.lemoal@opensource.wdc.com, pankydev8@gmail.com,
Naohiro Aota, gost.dev@samsung.com, mcgrof@kernel.org,
dsterba@suse.cz
On 2022-08-22 16:29, Johannes Thumshirn wrote:
>>
>> Only partial reclaim is happening for bigger sized drives. The issue
>> with that is, if I do another FIO transfer, the drive spits out ENOSPC
>> before its actual capacity is reached as most of the zones have not been
>> reclaimed back and are basically in an unusable state.
>>
>> Is there a limit on how many bgs can be reclaimed?
>>
>> Let me know if I am doing something wrong in the test or if it is an
>> actual issue.
>
> Can you try setting max_active_zones to 0? I have the feeling it's yet
> another (or perhaps already known, Naohiro shoudl know that) issue with
> MAZ handling.
The Max active zones is set to 0 (QEMU defaults to 0). I also changed the
backing image format of QEMU from qcow to raw, and still the same issue of
partial reclaim for a drive size of 100G.
I tried the same test in a 100G drive with 1G zone size, and it is working
as expected.
root@zns-btrfs-simple-zns:/data# ./reclaim_script.sh
Open zones before big file transfer:
4
Open zones before removing the file:
59
Going to sleep. Removed the file
Open zones after reclaim:
4
I am not 100% sure what is causing this issue of partial reclaim when the
number of zones is higher.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC 0/1] adapting btrfs/237 to work with the new reclaim algorithm
2022-08-23 11:46 ` Pankaj Raghav
@ 2022-12-05 14:53 ` Pankaj Raghav
2022-12-05 16:04 ` Johannes Thumshirn
0 siblings, 1 reply; 12+ messages in thread
From: Pankaj Raghav @ 2022-12-05 14:53 UTC (permalink / raw)
To: Johannes Thumshirn
Cc: damien.lemoal@opensource.wdc.com, pankydev8@gmail.com,
Naohiro Aota, gost.dev@samsung.com, mcgrof@kernel.org,
dsterba@suse.cz, fstests@vger.kernel.org
Hi Johannes,
> Btw, what ever happend to this patch?
As I said before, I had trouble reproducing reclaim for 100G drive size,
and asked if you could reproduce the same on your end. I did not get any
reply to that.
I wanted to discuss with you what I was seeing during ALPSS, but we never
got around that!
Regards,
Pankaj
On 2022-08-23 13:46, Pankaj Raghav wrote:
> On 2022-08-22 16:29, Johannes Thumshirn wrote:
>>>
>>> Only partial reclaim is happening for bigger sized drives. The issue
>>> with that is, if I do another FIO transfer, the drive spits out ENOSPC
>>> before its actual capacity is reached as most of the zones have not been
>>> reclaimed back and are basically in an unusable state.
>>>
>>> Is there a limit on how many bgs can be reclaimed?
>>>
>>> Let me know if I am doing something wrong in the test or if it is an
>>> actual issue.
>>
>> Can you try setting max_active_zones to 0? I have the feeling it's yet
>> another (or perhaps already known, Naohiro shoudl know that) issue with
>> MAZ handling.
>
> The Max active zones is set to 0 (QEMU defaults to 0). I also changed the
> backing image format of QEMU from qcow to raw, and still the same issue of
> partial reclaim for a drive size of 100G.
>
> I tried the same test in a 100G drive with 1G zone size, and it is working
> as expected.
>
> root@zns-btrfs-simple-zns:/data# ./reclaim_script.sh
> Open zones before big file transfer:
> 4
> Open zones before removing the file:
> 59
> Going to sleep. Removed the file
> Open zones after reclaim:
> 4
>
> I am not 100% sure what is causing this issue of partial reclaim when the
> number of zones is higher.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC 0/1] adapting btrfs/237 to work with the new reclaim algorithm
2022-12-05 14:53 ` Pankaj Raghav
@ 2022-12-05 16:04 ` Johannes Thumshirn
2022-12-07 16:01 ` Pankaj Raghav
2022-12-13 13:35 ` Pankaj Raghav
0 siblings, 2 replies; 12+ messages in thread
From: Johannes Thumshirn @ 2022-12-05 16:04 UTC (permalink / raw)
To: Pankaj Raghav
Cc: damien.lemoal@opensource.wdc.com, pankydev8@gmail.com,
Naohiro Aota, gost.dev@samsung.com, mcgrof@kernel.org,
dsterba@suse.cz, fstests@vger.kernel.org
On 05.12.22 15:53, Pankaj Raghav wrote:
> Hi Johannes,
>
>> Btw, what ever happend to this patch?
>
> As I said before, I had trouble reproducing reclaim for 100G drive size,
> and asked if you could reproduce the same on your end. I did not get any
> reply to that.
>
> I wanted to discuss with you what I was seeing during ALPSS, but we never
> got around that!
Ah right! I'll try to reproduce it on my end as well.
But even with that one problem it makes the test pass again on my other setups.
So I think it's still an improvement to the status quo. Can you maybe resend it,
so it's again on Zorro's list?
Thanks,
Johannes
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC 0/1] adapting btrfs/237 to work with the new reclaim algorithm
2022-12-05 16:04 ` Johannes Thumshirn
@ 2022-12-07 16:01 ` Pankaj Raghav
2022-12-13 13:35 ` Pankaj Raghav
1 sibling, 0 replies; 12+ messages in thread
From: Pankaj Raghav @ 2022-12-07 16:01 UTC (permalink / raw)
To: Johannes Thumshirn
Cc: Pankaj Raghav, damien.lemoal@opensource.wdc.com, Naohiro Aota,
gost.dev@samsung.com, mcgrof@kernel.org, dsterba@suse.cz,
fstests@vger.kernel.org
On Mon, Dec 05, 2022 at 04:04:08PM +0000, Johannes Thumshirn wrote:
> On 05.12.22 15:53, Pankaj Raghav wrote:
> > Hi Johannes,
> >
> >> Btw, what ever happend to this patch?
> >
> > As I said before, I had trouble reproducing reclaim for 100G drive size,
> > and asked if you could reproduce the same on your end. I did not get any
> > reply to that.
> >
> > I wanted to discuss with you what I was seeing during ALPSS, but we never
> > got around that!
>
> Ah right! I'll try to reproduce it on my end as well.
>
> But even with that one problem it makes the test pass again on my other setups.
>
> So I think it's still an improvement to the status quo. Can you maybe resend it,
> so it's again on Zorro's list?
That sounds good. I will test it again and resend it next week. Thanks!
--
Pankaj Raghav
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC 0/1] adapting btrfs/237 to work with the new reclaim algorithm
2022-12-05 16:04 ` Johannes Thumshirn
2022-12-07 16:01 ` Pankaj Raghav
@ 2022-12-13 13:35 ` Pankaj Raghav
1 sibling, 0 replies; 12+ messages in thread
From: Pankaj Raghav @ 2022-12-13 13:35 UTC (permalink / raw)
To: Johannes Thumshirn
Cc: damien.lemoal@opensource.wdc.com, pankydev8@gmail.com,
Naohiro Aota, gost.dev@samsung.com, mcgrof@kernel.org,
dsterba@suse.cz, fstests@vger.kernel.org, boris
Hi Johannes,
I gave my test a retry, and it started failing for all cases. This commit
by Boris changed the behavior of reclaim to be less aggressive:
https://lore.kernel.org/linux-btrfs/977bdffbf57cca3ee6541efa1563167d4d282b08.1665701210.git.boris@bur.io/
It looks like I need to change the test to cater the current behavior.
The current reclaim algorithm is consistent across all sizes, unlike before.
I will change the test to do the following:
- Write a small file
- Write a big file that crosses the reclaim limit
- Delete the big file
- Check that **only** the block group that contained the small file is
reclaimed, and the small file is relocated to a new block group.
Let me know if the flow of the test case is correct.
On 2022-12-05 17:04, Johannes Thumshirn wrote:
> On 05.12.22 15:53, Pankaj Raghav wrote:
>> Hi Johannes,
>>
>>> Btw, what ever happend to this patch?
>>
>> As I said before, I had trouble reproducing reclaim for 100G drive size,
>> and asked if you could reproduce the same on your end. I did not get any
>> reply to that.
>>
>> I wanted to discuss with you what I was seeing during ALPSS, but we never
>> got around that!
>
> Ah right! I'll try to reproduce it on my end as well.
>
> But even with that one problem it makes the test pass again on my other setups.
>
> So I think it's still an improvement to the status quo. Can you maybe resend it,
> so it's again on Zorro's list?
>
> Thanks,
> Johannes
>
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC 0/1] adapting btrfs/237 to work with the new reclaim algorithm
2022-08-19 11:53 ` [RFC 0/1] adapting btrfs/237 to work with the new reclaim algorithm Pankaj Raghav
2022-08-19 11:53 ` [RFC 1/1] btrfs/237: adapt the test " Pankaj Raghav
2022-08-22 14:29 ` [RFC 0/1] adapting btrfs/237 " Johannes Thumshirn
@ 2022-12-05 7:56 ` Johannes Thumshirn
2 siblings, 0 replies; 12+ messages in thread
From: Johannes Thumshirn @ 2022-12-05 7:56 UTC (permalink / raw)
To: Pankaj Raghav, fstests@vger.kernel.org
Cc: damien.lemoal@opensource.wdc.com, pankydev8@gmail.com,
Naohiro Aota, gost.dev@samsung.com, mcgrof@kernel.org,
dsterba@suse.cz, zorro Lang
On 19.08.22 13:53, Pankaj Raghav wrote:
> Hi ,
> Since 3687fcb0752a ("btrfs: zoned: make auto-reclaim less aggressive")
> commit, reclaim algorithm has been changed to trigger auto-reclaim once
> the fs used size is more than a certain threshold. This change breaks
> 237 test.
>
> I tried to adapt the test by doing the following:
> - Write a small file first
> - Write a big file that increases the disk usage to be more than the
> reclaim threshold
> - Delete the big file to trigger threshold
> - Ensure the small file is relocated and the space used by the big file
> is reclaimed.
>
> My test case works properly for small ZNS drives but not for bigger
> sized drives in QEMU. When I use a drive with a size of 100G, not all
> zones that were used by the big file are correctly reclaimed.
> Either I am not setting up the test correctly or there is something
> wrong on how reclaim works for zoned devices.
>
> I created a simple script to reproduce the scenario instead of running
> the test. Please adapt the $DEV and $big_file_size based on the drive
> size. As I am setting the bg_reclaim_threshold to be 51, $big_file_size
> should be at least 51% of the drive size.
>
> ```
> DEV=nvme0n3
> DEV_PATH=/dev/$DEV
> big_file_size=2500M
>
> echo "mq-deadline" > /sys/block/$DEV/queue/scheduler
> umount /mnt/scratch
> blkzone reset $DEV_PATH
> mkfs.btrfs -f -d single -m single $DEV_PATH > /dev/null; mount -t btrfs $DEV_PATH \
> /mnt/scratch
> uuid=$(btrfs fi show $DEV_PATH | grep 'uuid' | awk '{print $NF}')
>
> echo "51" > /sys/fs/btrfs/$uuid/bg_reclaim_threshold
>
> fio --filename=/mnt/scratch/test2 --size=1M --rw=write --bs=4k \
> --name=btrfs_zoned > /dev/null
> btrfs fi sync /mnt/scratch
>
> echo "Open zones before big file trasfer:"
> blkzone report $DEV_PATH | grep -v -e em -e nw | wc -l
>
> fio --filename=/mnt/scratch/test1 --size=$big_file_size --rw=write --bs=4k \
> --ioengine=io_uring --name=btrfs_zoned > /dev/null
> btrfs fi sync /mnt/scratch
>
> echo "Open zones before removing the file:"
> blkzone report $DEV_PATH | grep -v -e em -e nw | wc -l
> rm /mnt/scratch/test1
> btrfs fi sync /mnt/scratch
>
> echo "Going to sleep. Removed the file"
> sleep 30
>
> echo "Open zones after reclaim:"
> blkzone report $DEV_PATH | grep -v -e em -e nw | wc -l
> ```
>
> I am getting the following output in QEMU:
>
> - 5GB ZNS drive with 128MB zone size (and cap) and it is working as
> expected:
>
> ```
> Open zones before big file trasfer:
> 4
> Open zones before removing the file:
> 23
> Going to sleep. Removed the file
> Open zones after reclaim:
> 4
> ```
>
> - 100GB ZNS drive with 128MB zone size (and cap) and it is **not
> working** as expected:
>
> ```
> Open zones before big file trasfer:
> 4
> Open zones before removing the file:
> 455
> Going to sleep. Removed the file
> Open zones after reclaim:
> 411
> ```
>
> Only partial reclaim is happening for bigger sized drives. The issue
> with that is, if I do another FIO transfer, the drive spits out ENOSPC
> before its actual capacity is reached as most of the zones have not been
> reclaimed back and are basically in an unusable state.
>
> Is there a limit on how many bgs can be reclaimed?
>
> Let me know if I am doing something wrong in the test or if it is an
> actual issue.
>
> Pankaj Raghav (1):
> btrfs/237: adapt the test to work with the new reclaim algorithm
>
> tests/btrfs/237 | 80 +++++++++++++++++++++++++++++++++++--------------
> 1 file changed, 57 insertions(+), 23 deletions(-)
>
Btw, what ever happend to this patch?
^ permalink raw reply [flat|nested] 12+ messages in thread