* [PATCH 1/5] common/dmdelay: remove DELAY_X enums
2023-12-07 7:20 [PATCH 0/5] btrfs/282: resolve intermittent failures David Disseldorp
@ 2023-12-07 7:20 ` David Disseldorp
2023-12-07 7:20 ` [PATCH 2/5] btrfs/282: append scrub status output to 282.full David Disseldorp
` (4 subsequent siblings)
5 siblings, 0 replies; 10+ messages in thread
From: David Disseldorp @ 2023-12-07 7:20 UTC (permalink / raw)
To: fstests; +Cc: Qu Wenruo, David Disseldorp
The DELAY_NONE and DELAY_READ enums map directly to DELAY_TABLE globals,
so remove the indirection and use the DELAY_TABLE variables instead.
Rename DELAY_TABLE to DELAY_TABLE_NODELAY to better reflect that the
table doesn't specify any IO delays.
Signed-off-by: David Disseldorp <ddiss@suse.de>
---
common/dmdelay | 13 ++++---------
tests/xfs/311 | 2 +-
2 files changed, 5 insertions(+), 10 deletions(-)
diff --git a/common/dmdelay b/common/dmdelay
index 66cac1a7..e85d8d62 100644
--- a/common/dmdelay
+++ b/common/dmdelay
@@ -4,16 +4,13 @@
#
# common functions for setting up and tearing down a dmdelay device
-DELAY_NONE=0
-DELAY_READ=1
-
_init_delay()
{
local BLK_DEV_SIZE=`blockdev --getsz $SCRATCH_DEV`
DELAY_DEV=/dev/mapper/delay-test
- DELAY_TABLE="0 $BLK_DEV_SIZE delay $SCRATCH_DEV 0 0"
+ DELAY_TABLE_NODELAY="0 $BLK_DEV_SIZE delay $SCRATCH_DEV 0 0"
DELAY_TABLE_RDELAY="0 $BLK_DEV_SIZE delay $SCRATCH_DEV 0 10000 $SCRATCH_DEV 0 0"
- _dmsetup_create delay-test --table "$DELAY_TABLE" || \
+ _dmsetup_create delay-test --table "$DELAY_TABLE_NODELAY" || \
_fatal "failed to create delay device"
}
@@ -44,10 +41,8 @@ _cleanup_delay()
# table
_load_delay_table()
{
- table="$DELAY_TABLE"
- [ $1 -eq $DELAY_READ ] && table="$DELAY_TABLE_RDELAY"
-
- suspend_opt="--nolockfs"
+ local table="$1"
+ local suspend_opt="--nolockfs"
[ $# -gt 1 ] && [ $2 -eq 1 ] && suspend_opt=""
# run a suspend/resume cycle to avoid excessive resume delays once a
diff --git a/tests/xfs/311 b/tests/xfs/311
index d5e3afbf..04161750 100755
--- a/tests/xfs/311
+++ b/tests/xfs/311
@@ -50,7 +50,7 @@ _unmount_delay
_mount_delay
# introduce a read I/O delay
-_load_delay_table $DELAY_READ
+_load_delay_table "$DELAY_TABLE_RDELAY"
# Map the directory and immediately unmount. This should invoke an asynchronous
# readahead on the first block of the directory. The readahead is delayed by
--
2.35.3
^ permalink raw reply related [flat|nested] 10+ messages in thread* [PATCH 2/5] btrfs/282: append scrub status output to 282.full
2023-12-07 7:20 [PATCH 0/5] btrfs/282: resolve intermittent failures David Disseldorp
2023-12-07 7:20 ` [PATCH 1/5] common/dmdelay: remove DELAY_X enums David Disseldorp
@ 2023-12-07 7:20 ` David Disseldorp
2023-12-07 9:46 ` Qu Wenruo
2023-12-07 7:20 ` [PATCH 3/5] btrfs/282: dmdelay scrub I/O to fix intermittent failures David Disseldorp
` (3 subsequent siblings)
5 siblings, 1 reply; 10+ messages in thread
From: David Disseldorp @ 2023-12-07 7:20 UTC (permalink / raw)
To: fstests; +Cc: Qu Wenruo, David Disseldorp
Also, collapse the grep/cut usage into the existing awk one-liner.
Signed-off-by: David Disseldorp <ddiss@suse.de>
---
tests/btrfs/282 | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/tests/btrfs/282 b/tests/btrfs/282
index 395e0626..66d14f82 100755
--- a/tests/btrfs/282
+++ b/tests/btrfs/282
@@ -60,8 +60,9 @@ $BTRFS_UTIL_PROG scrub start -B $SCRATCH_MNT >> $seqres.full
# Error summary: no errors found
#
# What we care is that Rate line.
-init_speed=$($BTRFS_UTIL_PROG scrub status --raw $SCRATCH_MNT | grep "Rate:" |\
- $AWK_PROG '{print $2}' | cut -f1 -d\/)
+init_speed=$($BTRFS_UTIL_PROG scrub status --raw $SCRATCH_MNT |\
+ tee -a $seqres.full |\
+ $AWK_PROG '$1 == "Rate:" {sub(/\/s/, "", $2); print $2}')
# This can happen for older progs
if [ -z "$init_speed" ]; then
@@ -76,8 +77,9 @@ echo "$target_speed" > "${devinfo_dir}/scrub_speed_max"
# The second scrub, to check the throttled speed.
$BTRFS_UTIL_PROG scrub start -B $SCRATCH_MNT >> $seqres.full
-speed=$($BTRFS_UTIL_PROG scrub status --raw $SCRATCH_MNT | grep "Rate:" |\
- $AWK_PROG '{print $2}' | cut -f1 -d\/)
+speed=$($BTRFS_UTIL_PROG scrub status --raw $SCRATCH_MNT |\
+ tee -a $seqres.full |\
+ $AWK_PROG '$1 == "Rate:" {sub(/\/s/, "", $2); print $2}')
# We gave a +- 10% tolerance for the throttle
if [ "$speed" -gt "$(( $target_speed * 11 / 10 ))" -o \
--
2.35.3
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [PATCH 2/5] btrfs/282: append scrub status output to 282.full
2023-12-07 7:20 ` [PATCH 2/5] btrfs/282: append scrub status output to 282.full David Disseldorp
@ 2023-12-07 9:46 ` Qu Wenruo
0 siblings, 0 replies; 10+ messages in thread
From: Qu Wenruo @ 2023-12-07 9:46 UTC (permalink / raw)
To: David Disseldorp, fstests; +Cc: Qu Wenruo
On 2023/12/7 17:50, David Disseldorp wrote:
> Also, collapse the grep/cut usage into the existing awk one-liner.
>
> Signed-off-by: David Disseldorp <ddiss@suse.de>
This change itself is totally fine independent of the fix for btrfs/282.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Thanks,
Qu
> ---
> tests/btrfs/282 | 10 ++++++----
> 1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/tests/btrfs/282 b/tests/btrfs/282
> index 395e0626..66d14f82 100755
> --- a/tests/btrfs/282
> +++ b/tests/btrfs/282
> @@ -60,8 +60,9 @@ $BTRFS_UTIL_PROG scrub start -B $SCRATCH_MNT >> $seqres.full
> # Error summary: no errors found
> #
> # What we care is that Rate line.
> -init_speed=$($BTRFS_UTIL_PROG scrub status --raw $SCRATCH_MNT | grep "Rate:" |\
> - $AWK_PROG '{print $2}' | cut -f1 -d\/)
> +init_speed=$($BTRFS_UTIL_PROG scrub status --raw $SCRATCH_MNT |\
> + tee -a $seqres.full |\
> + $AWK_PROG '$1 == "Rate:" {sub(/\/s/, "", $2); print $2}')
>
> # This can happen for older progs
> if [ -z "$init_speed" ]; then
> @@ -76,8 +77,9 @@ echo "$target_speed" > "${devinfo_dir}/scrub_speed_max"
>
> # The second scrub, to check the throttled speed.
> $BTRFS_UTIL_PROG scrub start -B $SCRATCH_MNT >> $seqres.full
> -speed=$($BTRFS_UTIL_PROG scrub status --raw $SCRATCH_MNT | grep "Rate:" |\
> - $AWK_PROG '{print $2}' | cut -f1 -d\/)
> +speed=$($BTRFS_UTIL_PROG scrub status --raw $SCRATCH_MNT |\
> + tee -a $seqres.full |\
> + $AWK_PROG '$1 == "Rate:" {sub(/\/s/, "", $2); print $2}')
>
> # We gave a +- 10% tolerance for the throttle
> if [ "$speed" -gt "$(( $target_speed * 11 / 10 ))" -o \
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 3/5] btrfs/282: dmdelay scrub I/O to fix intermittent failures
2023-12-07 7:20 [PATCH 0/5] btrfs/282: resolve intermittent failures David Disseldorp
2023-12-07 7:20 ` [PATCH 1/5] common/dmdelay: remove DELAY_X enums David Disseldorp
2023-12-07 7:20 ` [PATCH 2/5] btrfs/282: append scrub status output to 282.full David Disseldorp
@ 2023-12-07 7:20 ` David Disseldorp
2023-12-07 7:20 ` [PATCH 4/5] btrfs/282: reduce scrub dataset size David Disseldorp
` (2 subsequent siblings)
5 siblings, 0 replies; 10+ messages in thread
From: David Disseldorp @ 2023-12-07 7:20 UTC (permalink / raw)
To: fstests; +Cc: Qu Wenruo, David Disseldorp
btrfs/282 can fail intermittently if scrub completes quickly (e.g. atop
a RAM backed scratch device):
- if under one second, the reported scrub duration metric may be rounded
down to zero, which results in a "Rate: 0/s" metric
+ I have a btrfs-progs patch to address this confusing Rate reporting
- I've also observed one second scrub durations fail intermittently,
presumably because scheduling jitter doesn't allow throttling to take
effect
Avoid these intermittent failures by injecting a 100ms delay to device
read I/Os using dm-delay.
Signed-off-by: David Disseldorp <ddiss@suse.de>
---
tests/btrfs/282 | 20 ++++++++++++++++++--
1 file changed, 18 insertions(+), 2 deletions(-)
diff --git a/tests/btrfs/282 b/tests/btrfs/282
index 66d14f82..f9e22e12 100755
--- a/tests/btrfs/282
+++ b/tests/btrfs/282
@@ -10,6 +10,15 @@
_begin_fstest auto scrub
. ./common/filter
+. ./common/dmdelay
+
+_cleanup()
+{
+ cd /
+ rm -f $tmp.*
+ _scratch_unmount > /dev/null 2>&1
+ _cleanup_delay > /dev/null 2>&1
+}
# real QA test starts here
_supported_fs btrfs
@@ -18,6 +27,7 @@ _wants_kernel_commit eb3b50536642 \
# We want at least 5G for the scratch device.
_require_scratch_size $(( 5 * 1024 * 1024))
+_require_dm_target delay
# Make sure we can create scrub progress data file
if [ -e /var/lib/btrfs ]; then
@@ -27,7 +37,8 @@ else
fi
_scratch_mkfs >> $seqres.full 2>&1
-_scratch_mount
+_init_delay
+_mount_delay
uuid=$(findmnt -n -o UUID $SCRATCH_MNT)
@@ -45,6 +56,10 @@ $XFS_IO_PROG -f -c "pwrite -i /dev/urandom 0 2G" $SCRATCH_MNT/file | _filter_xfs
# Writeback above data, as scrub only verify the committed data.
sync
+# 100ms read I/O delay, so that the scrub doesn't complete too quickly
+bs=$(blockdev --getsz $SCRATCH_DEV)
+_load_delay_table "0 $bs delay $SCRATCH_DEV 0 100 $SCRATCH_DEV 0 0"
+
# The first scrub, mostly to grab the speed of the scrub.
$BTRFS_UTIL_PROG scrub start -B $SCRATCH_MNT >> $seqres.full
@@ -70,7 +85,8 @@ if [ -z "$init_speed" ]; then
fi
# Cycle mount to drop any possible cache.
-_scratch_cycle_mount
+_unmount_delay
+_mount_delay
target_speed=$(( $init_speed / 2 ))
echo "$target_speed" > "${devinfo_dir}/scrub_speed_max"
--
2.35.3
^ permalink raw reply related [flat|nested] 10+ messages in thread* [PATCH 4/5] btrfs/282: reduce scrub dataset size
2023-12-07 7:20 [PATCH 0/5] btrfs/282: resolve intermittent failures David Disseldorp
` (2 preceding siblings ...)
2023-12-07 7:20 ` [PATCH 3/5] btrfs/282: dmdelay scrub I/O to fix intermittent failures David Disseldorp
@ 2023-12-07 7:20 ` David Disseldorp
2023-12-07 7:20 ` [PATCH 5/5] btrfs/282: increase throttled scrub rate tolerance David Disseldorp
2023-12-07 9:49 ` [PATCH 0/5] btrfs/282: resolve intermittent failures Qu Wenruo
5 siblings, 0 replies; 10+ messages in thread
From: David Disseldorp @ 2023-12-07 7:20 UTC (permalink / raw)
To: fstests; +Cc: Qu Wenruo, David Disseldorp
The use of dmdelay means that we can use a smaller dataset while still
achieving a reasonable scrub duration. This also drops the xfs_io pwrite
/dev/urandom input file, instead relying on xfs_io's built-in
pseudorandom pattern generation.
Signed-off-by: David Disseldorp <ddiss@suse.de>
---
tests/btrfs/282 | 8 +++-----
tests/btrfs/282.out | 2 +-
2 files changed, 4 insertions(+), 6 deletions(-)
diff --git a/tests/btrfs/282 b/tests/btrfs/282
index f9e22e12..42d992a6 100755
--- a/tests/btrfs/282
+++ b/tests/btrfs/282
@@ -25,8 +25,7 @@ _supported_fs btrfs
_wants_kernel_commit eb3b50536642 \
"btrfs: scrub: per-device bandwidth control"
-# We want at least 5G for the scratch device.
-_require_scratch_size $(( 5 * 1024 * 1024))
+_require_scratch
_require_dm_target delay
# Make sure we can create scrub progress data file
@@ -49,9 +48,8 @@ if [ ! -f "${devinfo_dir}/scrub_speed_max" ]; then
_notrun "No sysfs interface for scrub speed throttle"
fi
-# Create a 2G file for later scrub workload.
-# The 2G size is chosen to fit even DUP on a 5G disk.
-$XFS_IO_PROG -f -c "pwrite -i /dev/urandom 0 2G" $SCRATCH_MNT/file | _filter_xfs_io
+# Create a 100M file for later scrub workload.
+$XFS_IO_PROG -f -c "pwrite 0 100M" $SCRATCH_MNT/file | _filter_xfs_io
# Writeback above data, as scrub only verify the committed data.
sync
diff --git a/tests/btrfs/282.out b/tests/btrfs/282.out
index 8d53e7eb..840e3826 100644
--- a/tests/btrfs/282.out
+++ b/tests/btrfs/282.out
@@ -1,3 +1,3 @@
QA output created by 282
-wrote 2147483648/2147483648 bytes at offset 0
+wrote 104857600/104857600 bytes at offset 0
XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
--
2.35.3
^ permalink raw reply related [flat|nested] 10+ messages in thread* [PATCH 5/5] btrfs/282: increase throttled scrub rate tolerance
2023-12-07 7:20 [PATCH 0/5] btrfs/282: resolve intermittent failures David Disseldorp
` (3 preceding siblings ...)
2023-12-07 7:20 ` [PATCH 4/5] btrfs/282: reduce scrub dataset size David Disseldorp
@ 2023-12-07 7:20 ` David Disseldorp
2023-12-07 9:49 ` [PATCH 0/5] btrfs/282: resolve intermittent failures Qu Wenruo
5 siblings, 0 replies; 10+ messages in thread
From: David Disseldorp @ 2023-12-07 7:20 UTC (permalink / raw)
To: fstests; +Cc: Qu Wenruo, David Disseldorp
Even with dmdelay injected latencies, I still observe occasional test
failures on shared storage (e.g. VMs). The extra 10% tolerance resolves
these intermittent failures on my system.
Signed-off-by: David Disseldorp <ddiss@suse.de>
---
tests/btrfs/282 | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/tests/btrfs/282 b/tests/btrfs/282
index 42d992a6..6649623a 100755
--- a/tests/btrfs/282
+++ b/tests/btrfs/282
@@ -95,9 +95,8 @@ speed=$($BTRFS_UTIL_PROG scrub status --raw $SCRATCH_MNT |\
tee -a $seqres.full |\
$AWK_PROG '$1 == "Rate:" {sub(/\/s/, "", $2); print $2}')
-# We gave a +- 10% tolerance for the throttle
-if [ "$speed" -gt "$(( $target_speed * 11 / 10 ))" -o \
- "$speed" -lt "$(( $target_speed * 9 / 10))" ]; then
+# We gave a +- 20% tolerance for the throttle
+if (( speed > target_speed * 12 / 10 || speed < target_speed * 8 / 10 )); then
echo "scrub speed $speed Bytes/s is not properly throttled, target is $target_speed Bytes/s"
fi
--
2.35.3
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [PATCH 0/5] btrfs/282: resolve intermittent failures
2023-12-07 7:20 [PATCH 0/5] btrfs/282: resolve intermittent failures David Disseldorp
` (4 preceding siblings ...)
2023-12-07 7:20 ` [PATCH 5/5] btrfs/282: increase throttled scrub rate tolerance David Disseldorp
@ 2023-12-07 9:49 ` Qu Wenruo
2023-12-07 13:49 ` David Disseldorp
5 siblings, 1 reply; 10+ messages in thread
From: Qu Wenruo @ 2023-12-07 9:49 UTC (permalink / raw)
To: David Disseldorp, fstests; +Cc: Qu Wenruo
On 2023/12/7 17:50, David Disseldorp wrote:
> btrfs/282 fails intermittently under some circumstances. This patchset
> adds dmdelay to make storage latencies more uniform and slightly
> increases throttled rate tolerances.
My concern using dm_delay is, is the delay per-merged-bio or something else?
If the delay is only per-bio (after merge), then I'm afraid it would not
be good enough.
The bio plug we use in scrub can have much higher chance to result a
difference in the scrub speed.
We may want a delay behavior which can take bio size into consideration
at least.
Thanks,
Qu
>
> common/dmdelay | 13 ++++---------
> tests/btrfs/282 | 43 +++++++++++++++++++++++++++++--------------
> tests/btrfs/282.out | 2 +-
> tests/xfs/311 | 2 +-
> 4 files changed, 35 insertions(+), 25 deletions(-)
>
>
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH 0/5] btrfs/282: resolve intermittent failures
2023-12-07 9:49 ` [PATCH 0/5] btrfs/282: resolve intermittent failures Qu Wenruo
@ 2023-12-07 13:49 ` David Disseldorp
2023-12-07 20:02 ` Qu Wenruo
0 siblings, 1 reply; 10+ messages in thread
From: David Disseldorp @ 2023-12-07 13:49 UTC (permalink / raw)
To: Qu Wenruo; +Cc: fstests, Qu Wenruo
On Thu, 7 Dec 2023 20:19:00 +1030, Qu Wenruo wrote:
> On 2023/12/7 17:50, David Disseldorp wrote:
> > btrfs/282 fails intermittently under some circumstances. This patchset
> > adds dmdelay to make storage latencies more uniform and slightly
> > increases throttled rate tolerances.
>
> My concern using dm_delay is, is the delay per-merged-bio or something else?
>
> If the delay is only per-bio (after merge), then I'm afraid it would not
> be good enough.
The dmdelay device presents itself as a regular block device, so I think
delay_map()->delay_bio() handling comes after any merges.
> The bio plug we use in scrub can have much higher chance to result a
> difference in the scrub speed.
>
> We may want a delay behavior which can take bio size into consideration
> at least.
Should we manipulate it a bit by fiddling with queue/max_sectors_kb and
queue/nomerges?
Thanks, David
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/5] btrfs/282: resolve intermittent failures
2023-12-07 13:49 ` David Disseldorp
@ 2023-12-07 20:02 ` Qu Wenruo
0 siblings, 0 replies; 10+ messages in thread
From: Qu Wenruo @ 2023-12-07 20:02 UTC (permalink / raw)
To: David Disseldorp; +Cc: fstests, Qu Wenruo
On 2023/12/8 00:19, David Disseldorp wrote:
> On Thu, 7 Dec 2023 20:19:00 +1030, Qu Wenruo wrote:
>
>> On 2023/12/7 17:50, David Disseldorp wrote:
>>> btrfs/282 fails intermittently under some circumstances. This patchset
>>> adds dmdelay to make storage latencies more uniform and slightly
>>> increases throttled rate tolerances.
>>
>> My concern using dm_delay is, is the delay per-merged-bio or something else?
>>
>> If the delay is only per-bio (after merge), then I'm afraid it would not
>> be good enough.
>
> The dmdelay device presents itself as a regular block device, so I think
> delay_map()->delay_bio() handling comes after any merges.
>
>> The bio plug we use in scrub can have much higher chance to result a
>> difference in the scrub speed.
>>
>> We may want a delay behavior which can take bio size into consideration
>> at least.
>
> Should we manipulate it a bit by fiddling with queue/max_sectors_kb and
> queue/nomerges?
I'm not sure if delay is our best friend.
If we have something to limits the read/write speed directly that would
be the best case.
Unfortunately I didn't see a dm-throttle in upstream, can we limit the
IO speed on certain device using cgroup?
Thanks,
Qu
>
> Thanks, David
^ permalink raw reply [flat|nested] 10+ messages in thread