* [PATCH] btrfs/213: avoid occasional failure due to already finished balance
@ 2023-05-18 11:08 fdmanana
2023-05-19 5:04 ` Anand Jain
2023-05-19 9:57 ` [PATCH v2] " fdmanana
0 siblings, 2 replies; 8+ messages in thread
From: fdmanana @ 2023-05-18 11:08 UTC (permalink / raw)
To: fstests; +Cc: linux-btrfs, Filipe Manana
From: Filipe Manana <fdmanana@suse.com>
btrfs/213 writes data, in 1M extents, for 4 seconds into a file, then
triggers a balance and then after 2 seconds it tries to cancel the
balance operation. More often than not, this works because the balance
is still running after 2 seconds. However it also fails sporadically
because balance has finished in less than 2 seconds, which is plausible
since data and metadata are cached or other factors such as virtualized
environment. When that's the case, it fails like this:
$ ./check btrfs/213
FSTYP -- btrfs
PLATFORM -- Linux/x86_64 debian0 6.4.0-rc1-btrfs-next-131+ #1 SMP PREEMPT_DYNAMIC Thu May 11 11:26:19 WEST 2023
MKFS_OPTIONS -- /dev/sdc
MOUNT_OPTIONS -- /dev/sdc /home/fdmanana/btrfs-tests/scratch_1
btrfs/213 51s ... - output mismatch (see /home/fdmanana/git/hub/xfstests/results//btrfs/213.out.bad)
--- tests/btrfs/213.out 2020-06-10 19:29:03.822519250 +0100
+++ /home/fdmanana/git/hub/xfstests/results//btrfs/213.out.bad 2023-05-17 15:39:32.653727223 +0100
@@ -1,2 +1,3 @@
QA output created by 213
+ERROR: balance cancel on '/home/fdmanana/btrfs-tests/scratch_1' failed: Not in progress
Silence is golden
...
(Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/btrfs/213.out /home/fdmanana/git/hub/xfstests/results//btrfs/213.out.bad' to see the entire diff)
Ran: btrfs/213
Failures: btrfs/213
Failed 1 of 1 tests
To make it much less likely that balance has already finished before we
try to cancel it, unmount and mount again the filesystem before starting
balance, to clear cached metadata and data, and also double the time we
spend writing 1M data extents. Also ignore when the balance failed because
it was already finished when we tried to cancel it.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
tests/btrfs/213 | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/tests/btrfs/213 b/tests/btrfs/213
index 8a10355c..cca0b3cc 100755
--- a/tests/btrfs/213
+++ b/tests/btrfs/213
@@ -28,7 +28,7 @@ _require_xfs_io_command pwrite -D
_scratch_mkfs >> $seqres.full
_scratch_mount
-runtime=4
+runtime=8
# Create enough IO so that we need around $runtime seconds to relocate it.
#
@@ -39,11 +39,18 @@ sleep $runtime
kill $write_pid
wait $write_pid
+# Unmount and mount again the fs to clear any cached data and metadata, so that
+# it's less likely balance has already finished when we try to cancel it below.
+_scratch_cycle_mount
+
# Now balance should take at least $runtime seconds, we can cancel it at
# $runtime/2 to ensure a success cancel.
_run_btrfs_balance_start -d --bg "$SCRATCH_MNT"
-sleep $(($runtime / 2))
-$BTRFS_UTIL_PROG balance cancel "$SCRATCH_MNT"
+sleep $(($runtime / 4))
+# It's possible that balance has already completed. It's unlikely but often
+# it may happen due to virtualization, caching and other factors, so ignore
+# any error about no balance currently running.
+$BTRFS_UTIL_PROG balance cancel "$SCRATCH_MNT" 2>&1 | grep -iv 'not in progress'
# Now check if we can finish relocating metadata, which should finish very
# quickly.
--
2.34.1
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [PATCH] btrfs/213: avoid occasional failure due to already finished balance
2023-05-18 11:08 [PATCH] btrfs/213: avoid occasional failure due to already finished balance fdmanana
@ 2023-05-19 5:04 ` Anand Jain
2023-05-19 9:58 ` Filipe Manana
2023-05-19 9:57 ` [PATCH v2] " fdmanana
1 sibling, 1 reply; 8+ messages in thread
From: Anand Jain @ 2023-05-19 5:04 UTC (permalink / raw)
To: fdmanana, fstests; +Cc: linux-btrfs, Filipe Manana
On 18/5/23 19:08, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
>
> btrfs/213 writes data, in 1M extents, for 4 seconds into a file, then
> triggers a balance and then after 2 seconds it tries to cancel the
> balance operation. More often than not, this works because the balance
> is still running after 2 seconds. However it also fails sporadically
> because balance has finished in less than 2 seconds, which is plausible
> since data and metadata are cached or other factors such as virtualized
> environment. When that's the case, it fails like this:
>
> $ ./check btrfs/213
> FSTYP -- btrfs
> PLATFORM -- Linux/x86_64 debian0 6.4.0-rc1-btrfs-next-131+ #1 SMP PREEMPT_DYNAMIC Thu May 11 11:26:19 WEST 2023
> MKFS_OPTIONS -- /dev/sdc
> MOUNT_OPTIONS -- /dev/sdc /home/fdmanana/btrfs-tests/scratch_1
>
> btrfs/213 51s ... - output mismatch (see /home/fdmanana/git/hub/xfstests/results//btrfs/213.out.bad)
> --- tests/btrfs/213.out 2020-06-10 19:29:03.822519250 +0100
> +++ /home/fdmanana/git/hub/xfstests/results//btrfs/213.out.bad 2023-05-17 15:39:32.653727223 +0100
> @@ -1,2 +1,3 @@
> QA output created by 213
> +ERROR: balance cancel on '/home/fdmanana/btrfs-tests/scratch_1' failed: Not in progress
> Silence is golden
> ...
> (Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/btrfs/213.out /home/fdmanana/git/hub/xfstests/results//btrfs/213.out.bad' to see the entire diff)
> Ran: btrfs/213
> Failures: btrfs/213
> Failed 1 of 1 tests
>
> To make it much less likely that balance has already finished before we
> try to cancel it, unmount and mount again the filesystem before starting
> balance, to clear cached metadata and data, and also double the time we
> spend writing 1M data extents. Also ignore when the balance failed because
> it was already finished when we tried to cancel it.
>
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> ---
> tests/btrfs/213 | 13 ++++++++++---
> 1 file changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/tests/btrfs/213 b/tests/btrfs/213
> index 8a10355c..cca0b3cc 100755
> --- a/tests/btrfs/213
> +++ b/tests/btrfs/213
> @@ -28,7 +28,7 @@ _require_xfs_io_command pwrite -D
> _scratch_mkfs >> $seqres.full
> _scratch_mount
>
> -runtime=4
> +runtime=8
>
> # Create enough IO so that we need around $runtime seconds to relocate it.
> #
> @@ -39,11 +39,18 @@ sleep $runtime
> kill $write_pid
> wait $write_pid
>
> +# Unmount and mount again the fs to clear any cached data and metadata, so that
> +# it's less likely balance has already finished when we try to cancel it below.
> +_scratch_cycle_mount
> +
> # Now balance should take at least $runtime seconds, we can cancel it at
> # $runtime/2 to ensure a success cancel.
> _run_btrfs_balance_start -d --bg "$SCRATCH_MNT"
> -sleep $(($runtime / 2))
> -$BTRFS_UTIL_PROG balance cancel "$SCRATCH_MNT"
> +sleep $(($runtime / 4))
> +# It's possible that balance has already completed. It's unlikely but often
> +# it may happen due to virtualization, caching and other factors, so ignore
> +# any error about no balance currently running.
> +$BTRFS_UTIL_PROG balance cancel "$SCRATCH_MNT" 2>&1 | grep -iv 'not in progress'
Cancel is an important step in this test case.
Why not call _notrun() if the test case fails to make sure
the balance is still in progress? This way, it provides
another opportunity to fix.
Thanks, Anand
>
> # Now check if we can finish relocating metadata, which should finish very
> # quickly.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] btrfs/213: avoid occasional failure due to already finished balance
2023-05-19 5:04 ` Anand Jain
@ 2023-05-19 9:58 ` Filipe Manana
0 siblings, 0 replies; 8+ messages in thread
From: Filipe Manana @ 2023-05-19 9:58 UTC (permalink / raw)
To: Anand Jain; +Cc: fstests, linux-btrfs
On Fri, May 19, 2023 at 6:05 AM Anand Jain <anand.jain@oracle.com> wrote:
>
> On 18/5/23 19:08, fdmanana@kernel.org wrote:
> > From: Filipe Manana <fdmanana@suse.com>
> >
> > btrfs/213 writes data, in 1M extents, for 4 seconds into a file, then
> > triggers a balance and then after 2 seconds it tries to cancel the
> > balance operation. More often than not, this works because the balance
> > is still running after 2 seconds. However it also fails sporadically
> > because balance has finished in less than 2 seconds, which is plausible
> > since data and metadata are cached or other factors such as virtualized
> > environment. When that's the case, it fails like this:
> >
> > $ ./check btrfs/213
> > FSTYP -- btrfs
> > PLATFORM -- Linux/x86_64 debian0 6.4.0-rc1-btrfs-next-131+ #1 SMP PREEMPT_DYNAMIC Thu May 11 11:26:19 WEST 2023
> > MKFS_OPTIONS -- /dev/sdc
> > MOUNT_OPTIONS -- /dev/sdc /home/fdmanana/btrfs-tests/scratch_1
> >
> > btrfs/213 51s ... - output mismatch (see /home/fdmanana/git/hub/xfstests/results//btrfs/213.out.bad)
> > --- tests/btrfs/213.out 2020-06-10 19:29:03.822519250 +0100
> > +++ /home/fdmanana/git/hub/xfstests/results//btrfs/213.out.bad 2023-05-17 15:39:32.653727223 +0100
> > @@ -1,2 +1,3 @@
> > QA output created by 213
> > +ERROR: balance cancel on '/home/fdmanana/btrfs-tests/scratch_1' failed: Not in progress
> > Silence is golden
> > ...
> > (Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/btrfs/213.out /home/fdmanana/git/hub/xfstests/results//btrfs/213.out.bad' to see the entire diff)
> > Ran: btrfs/213
> > Failures: btrfs/213
> > Failed 1 of 1 tests
> >
> > To make it much less likely that balance has already finished before we
> > try to cancel it, unmount and mount again the filesystem before starting
> > balance, to clear cached metadata and data, and also double the time we
> > spend writing 1M data extents. Also ignore when the balance failed because
> > it was already finished when we tried to cancel it.
> >
> > Signed-off-by: Filipe Manana <fdmanana@suse.com>
> > ---
> > tests/btrfs/213 | 13 ++++++++++---
> > 1 file changed, 10 insertions(+), 3 deletions(-)
> >
> > diff --git a/tests/btrfs/213 b/tests/btrfs/213
> > index 8a10355c..cca0b3cc 100755
> > --- a/tests/btrfs/213
> > +++ b/tests/btrfs/213
> > @@ -28,7 +28,7 @@ _require_xfs_io_command pwrite -D
> > _scratch_mkfs >> $seqres.full
> > _scratch_mount
> >
> > -runtime=4
> > +runtime=8
> >
> > # Create enough IO so that we need around $runtime seconds to relocate it.
> > #
> > @@ -39,11 +39,18 @@ sleep $runtime
> > kill $write_pid
> > wait $write_pid
> >
> > +# Unmount and mount again the fs to clear any cached data and metadata, so that
> > +# it's less likely balance has already finished when we try to cancel it below.
> > +_scratch_cycle_mount
> > +
> > # Now balance should take at least $runtime seconds, we can cancel it at
> > # $runtime/2 to ensure a success cancel.
> > _run_btrfs_balance_start -d --bg "$SCRATCH_MNT"
>
>
> > -sleep $(($runtime / 2))
> > -$BTRFS_UTIL_PROG balance cancel "$SCRATCH_MNT"
> > +sleep $(($runtime / 4))
> > +# It's possible that balance has already completed. It's unlikely but often
> > +# it may happen due to virtualization, caching and other factors, so ignore
> > +# any error about no balance currently running.
> > +$BTRFS_UTIL_PROG balance cancel "$SCRATCH_MNT" 2>&1 | grep -iv 'not in progress'
>
> Cancel is an important step in this test case.
> Why not call _notrun() if the test case fails to make sure
> the balance is still in progress? This way, it provides
> another opportunity to fix.
Sounds reasonable. Sent a v2 with that.
Thanks.
>
> Thanks, Anand
>
> >
> > # Now check if we can finish relocating metadata, which should finish very
> > # quickly.
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v2] btrfs/213: avoid occasional failure due to already finished balance
2023-05-18 11:08 [PATCH] btrfs/213: avoid occasional failure due to already finished balance fdmanana
2023-05-19 5:04 ` Anand Jain
@ 2023-05-19 9:57 ` fdmanana
2023-05-19 23:34 ` Qu Wenruo
` (2 more replies)
1 sibling, 3 replies; 8+ messages in thread
From: fdmanana @ 2023-05-19 9:57 UTC (permalink / raw)
To: fstests; +Cc: linux-btrfs, Filipe Manana
From: Filipe Manana <fdmanana@suse.com>
btrfs/213 writes data, in 1M extents, for 4 seconds into a file, then
triggers a balance and then after 2 seconds it tries to cancel the
balance operation. More often than not, this works because the balance
is still running after 2 seconds. However it also fails sporadically
because balance has finished in less than 2 seconds, which is plausible
since data and metadata are cached or other factors such as virtualized
environment. When that's the case, it fails like this:
$ ./check btrfs/213
FSTYP -- btrfs
PLATFORM -- Linux/x86_64 debian0 6.4.0-rc1-btrfs-next-131+ #1 SMP PREEMPT_DYNAMIC Thu May 11 11:26:19 WEST 2023
MKFS_OPTIONS -- /dev/sdc
MOUNT_OPTIONS -- /dev/sdc /home/fdmanana/btrfs-tests/scratch_1
btrfs/213 51s ... - output mismatch (see /home/fdmanana/git/hub/xfstests/results//btrfs/213.out.bad)
--- tests/btrfs/213.out 2020-06-10 19:29:03.822519250 +0100
+++ /home/fdmanana/git/hub/xfstests/results//btrfs/213.out.bad 2023-05-17 15:39:32.653727223 +0100
@@ -1,2 +1,3 @@
QA output created by 213
+ERROR: balance cancel on '/home/fdmanana/btrfs-tests/scratch_1' failed: Not in progress
Silence is golden
...
(Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/btrfs/213.out /home/fdmanana/git/hub/xfstests/results//btrfs/213.out.bad' to see the entire diff)
Ran: btrfs/213
Failures: btrfs/213
Failed 1 of 1 tests
To make it much less likely that balance has already finished before we
try to cancel it, unmount and mount again the filesystem before starting
balance, to clear cached metadata and data, and also double the time we
spend writing 1M data extents. Also make the test not run with an
informative message if we detect that balance finished before we could
cancel it.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
v2: Make the test _notrun if we detect that balance finished before we
could cancel it.
tests/btrfs/213 | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)
diff --git a/tests/btrfs/213 b/tests/btrfs/213
index e16e41c0..5666d9b9 100755
--- a/tests/btrfs/213
+++ b/tests/btrfs/213
@@ -31,7 +31,7 @@ _fixed_by_kernel_commit 1dae7e0e58b4 \
_scratch_mkfs >> $seqres.full
_scratch_mount
-runtime=4
+runtime=8
# Create enough IO so that we need around $runtime seconds to relocate it.
#
@@ -42,11 +42,21 @@ sleep $runtime
kill $write_pid
wait $write_pid
+# Unmount and mount again the fs to clear any cached data and metadata, so that
+# it's less likely balance has already finished when we try to cancel it below.
+_scratch_cycle_mount
+
# Now balance should take at least $runtime seconds, we can cancel it at
# $runtime/2 to ensure a success cancel.
_run_btrfs_balance_start -d --bg "$SCRATCH_MNT"
-sleep $(($runtime / 2))
-$BTRFS_UTIL_PROG balance cancel "$SCRATCH_MNT"
+sleep $(($runtime / 4))
+# It's possible that balance has already completed. It's unlikely but often
+# it may happen due to virtualization, caching and other factors, so ignore
+# any error about no balance currently running.
+$BTRFS_UTIL_PROG balance cancel "$SCRATCH_MNT" 2>&1 | grep -iq 'not in progress'
+if [ $? -eq 0 ]; then
+ _not_run "balance finished before we could cancel it"
+fi
# Now check if we can finish relocating metadata, which should finish very
# quickly.
--
2.34.1
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [PATCH v2] btrfs/213: avoid occasional failure due to already finished balance
2023-05-19 9:57 ` [PATCH v2] " fdmanana
@ 2023-05-19 23:34 ` Qu Wenruo
2023-05-21 20:14 ` Anand Jain
2023-08-12 12:48 ` Wang Yugui
2 siblings, 0 replies; 8+ messages in thread
From: Qu Wenruo @ 2023-05-19 23:34 UTC (permalink / raw)
To: fdmanana, fstests; +Cc: linux-btrfs, Filipe Manana
On 2023/5/19 17:57, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
>
> btrfs/213 writes data, in 1M extents, for 4 seconds into a file, then
> triggers a balance and then after 2 seconds it tries to cancel the
> balance operation. More often than not, this works because the balance
> is still running after 2 seconds. However it also fails sporadically
> because balance has finished in less than 2 seconds, which is plausible
> since data and metadata are cached or other factors such as virtualized
> environment. When that's the case, it fails like this:
>
> $ ./check btrfs/213
> FSTYP -- btrfs
> PLATFORM -- Linux/x86_64 debian0 6.4.0-rc1-btrfs-next-131+ #1 SMP PREEMPT_DYNAMIC Thu May 11 11:26:19 WEST 2023
> MKFS_OPTIONS -- /dev/sdc
> MOUNT_OPTIONS -- /dev/sdc /home/fdmanana/btrfs-tests/scratch_1
>
> btrfs/213 51s ... - output mismatch (see /home/fdmanana/git/hub/xfstests/results//btrfs/213.out.bad)
> --- tests/btrfs/213.out 2020-06-10 19:29:03.822519250 +0100
> +++ /home/fdmanana/git/hub/xfstests/results//btrfs/213.out.bad 2023-05-17 15:39:32.653727223 +0100
> @@ -1,2 +1,3 @@
> QA output created by 213
> +ERROR: balance cancel on '/home/fdmanana/btrfs-tests/scratch_1' failed: Not in progress
> Silence is golden
> ...
> (Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/btrfs/213.out /home/fdmanana/git/hub/xfstests/results//btrfs/213.out.bad' to see the entire diff)
> Ran: btrfs/213
> Failures: btrfs/213
> Failed 1 of 1 tests
>
> To make it much less likely that balance has already finished before we
> try to cancel it, unmount and mount again the filesystem before starting
> balance, to clear cached metadata and data, and also double the time we
> spend writing 1M data extents. Also make the test not run with an
> informative message if we detect that balance finished before we could
> cancel it.
>
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Thanks,
Qu
> ---
>
> v2: Make the test _notrun if we detect that balance finished before we
> could cancel it.
>
> tests/btrfs/213 | 16 +++++++++++++---
> 1 file changed, 13 insertions(+), 3 deletions(-)
>
> diff --git a/tests/btrfs/213 b/tests/btrfs/213
> index e16e41c0..5666d9b9 100755
> --- a/tests/btrfs/213
> +++ b/tests/btrfs/213
> @@ -31,7 +31,7 @@ _fixed_by_kernel_commit 1dae7e0e58b4 \
> _scratch_mkfs >> $seqres.full
> _scratch_mount
>
> -runtime=4
> +runtime=8
>
> # Create enough IO so that we need around $runtime seconds to relocate it.
> #
> @@ -42,11 +42,21 @@ sleep $runtime
> kill $write_pid
> wait $write_pid
>
> +# Unmount and mount again the fs to clear any cached data and metadata, so that
> +# it's less likely balance has already finished when we try to cancel it below.
> +_scratch_cycle_mount
> +
> # Now balance should take at least $runtime seconds, we can cancel it at
> # $runtime/2 to ensure a success cancel.
> _run_btrfs_balance_start -d --bg "$SCRATCH_MNT"
> -sleep $(($runtime / 2))
> -$BTRFS_UTIL_PROG balance cancel "$SCRATCH_MNT"
> +sleep $(($runtime / 4))
> +# It's possible that balance has already completed. It's unlikely but often
> +# it may happen due to virtualization, caching and other factors, so ignore
> +# any error about no balance currently running.
> +$BTRFS_UTIL_PROG balance cancel "$SCRATCH_MNT" 2>&1 | grep -iq 'not in progress'
> +if [ $? -eq 0 ]; then
> + _not_run "balance finished before we could cancel it"
> +fi
>
> # Now check if we can finish relocating metadata, which should finish very
> # quickly.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2] btrfs/213: avoid occasional failure due to already finished balance
2023-05-19 9:57 ` [PATCH v2] " fdmanana
2023-05-19 23:34 ` Qu Wenruo
@ 2023-05-21 20:14 ` Anand Jain
2023-08-12 12:48 ` Wang Yugui
2 siblings, 0 replies; 8+ messages in thread
From: Anand Jain @ 2023-05-21 20:14 UTC (permalink / raw)
To: fdmanana, fstests; +Cc: linux-btrfs, Filipe Manana
LGTM
Reviewed-by: Anand Jain <anand.jain@oracle.com>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2] btrfs/213: avoid occasional failure due to already finished balance
2023-05-19 9:57 ` [PATCH v2] " fdmanana
2023-05-19 23:34 ` Qu Wenruo
2023-05-21 20:14 ` Anand Jain
@ 2023-08-12 12:48 ` Wang Yugui
2023-08-13 11:01 ` Filipe Manana
2 siblings, 1 reply; 8+ messages in thread
From: Wang Yugui @ 2023-08-12 12:48 UTC (permalink / raw)
To: fdmanana; +Cc: fstests, linux-btrfs, Filipe Manana
Hi,
> From: Filipe Manana <fdmanana@suse.com>
>
> btrfs/213 writes data, in 1M extents, for 4 seconds into a file, then
> triggers a balance and then after 2 seconds it tries to cancel the
> balance operation. More often than not, this works because the balance
> is still running after 2 seconds. However it also fails sporadically
> because balance has finished in less than 2 seconds, which is plausible
> since data and metadata are cached or other factors such as virtualized
> environment. When that's the case, it fails like this:
>
> $ ./check btrfs/213
> FSTYP -- btrfs
> PLATFORM -- Linux/x86_64 debian0 6.4.0-rc1-btrfs-next-131+ #1 SMP PREEMPT_DYNAMIC Thu May 11 11:26:19 WEST 2023
> MKFS_OPTIONS -- /dev/sdc
> MOUNT_OPTIONS -- /dev/sdc /home/fdmanana/btrfs-tests/scratch_1
>
> btrfs/213 51s ... - output mismatch (see /home/fdmanana/git/hub/xfstests/results//btrfs/213.out.bad)
> --- tests/btrfs/213.out 2020-06-10 19:29:03.822519250 +0100
> +++ /home/fdmanana/git/hub/xfstests/results//btrfs/213.out.bad 2023-05-17 15:39:32.653727223 +0100
> @@ -1,2 +1,3 @@
> QA output created by 213
> +ERROR: balance cancel on '/home/fdmanana/btrfs-tests/scratch_1' failed: Not in progress
> Silence is golden
> ...
> (Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/btrfs/213.out /home/fdmanana/git/hub/xfstests/results//btrfs/213.out.bad' to see the entire diff)
> Ran: btrfs/213
> Failures: btrfs/213
> Failed 1 of 1 tests
>
> To make it much less likely that balance has already finished before we
> try to cancel it, unmount and mount again the filesystem before starting
> balance, to clear cached metadata and data, and also double the time we
> spend writing 1M data extents. Also make the test not run with an
> informative message if we detect that balance finished before we could
> cancel it.
>
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> ---
>
> v2: Make the test _notrun if we detect that balance finished before we
> could cancel it.
>
> tests/btrfs/213 | 16 +++++++++++++---
> 1 file changed, 13 insertions(+), 3 deletions(-)
>
> diff --git a/tests/btrfs/213 b/tests/btrfs/213
> index e16e41c0..5666d9b9 100755
> --- a/tests/btrfs/213
> +++ b/tests/btrfs/213
> @@ -31,7 +31,7 @@ _fixed_by_kernel_commit 1dae7e0e58b4 \
> _scratch_mkfs >> $seqres.full
> _scratch_mount
>
> -runtime=4
> +runtime=8
>
> # Create enough IO so that we need around $runtime seconds to relocate it.
> #
> @@ -42,11 +42,21 @@ sleep $runtime
> kill $write_pid
> wait $write_pid
>
> +# Unmount and mount again the fs to clear any cached data and metadata, so that
> +# it's less likely balance has already finished when we try to cancel it below.
> +_scratch_cycle_mount
> +
> # Now balance should take at least $runtime seconds, we can cancel it at
> # $runtime/2 to ensure a success cancel.
> _run_btrfs_balance_start -d --bg "$SCRATCH_MNT"
> -sleep $(($runtime / 2))
> -$BTRFS_UTIL_PROG balance cancel "$SCRATCH_MNT"
> +sleep $(($runtime / 4))
> +# It's possible that balance has already completed. It's unlikely but often
> +# it may happen due to virtualization, caching and other factors, so ignore
> +# any error about no balance currently running.
> +$BTRFS_UTIL_PROG balance cancel "$SCRATCH_MNT" 2>&1 | grep -iq 'not in progress'
> +if [ $? -eq 0 ]; then
> + _not_run "balance finished before we could cancel it"
> +fi
fstests(btrfs/213) failed once here.
btrfs/213 22s ... - output mismatch (see /usr/hpc-bio/xfstests/results//btrfs/213.out.bad)
--- tests/btrfs/213.out 2023-03-28 06:09:10.372680814 +0800
+++ /usr/hpc-bio/xfstests/results//btrfs/213.out.bad 2023-08-12 20:31:47.848303940 +0800
@@ -1,2 +1,5 @@
QA output created by 213
+/usr/hpc-bio/xfstests/tests/btrfs/213: line 59: _not_run: command not found
+ERROR: error during balancing '/mnt/scratch': No space left on device
+There may be more info in syslog - try dmesg | tail
Silence is golden
we need to fix the error of '_not_run: command not found' firstly.
I will update the info if fstests(btrfs/213) fails again.
Best Regards
Wang Yugui (wangyugui@e16-tech.com)
2023/08/12
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH v2] btrfs/213: avoid occasional failure due to already finished balance
2023-08-12 12:48 ` Wang Yugui
@ 2023-08-13 11:01 ` Filipe Manana
0 siblings, 0 replies; 8+ messages in thread
From: Filipe Manana @ 2023-08-13 11:01 UTC (permalink / raw)
To: Wang Yugui; +Cc: fstests, linux-btrfs, Filipe Manana
On Sat, Aug 12, 2023 at 1:54 PM Wang Yugui <wangyugui@e16-tech.com> wrote:
>
> Hi,
>
> > From: Filipe Manana <fdmanana@suse.com>
> >
> > btrfs/213 writes data, in 1M extents, for 4 seconds into a file, then
> > triggers a balance and then after 2 seconds it tries to cancel the
> > balance operation. More often than not, this works because the balance
> > is still running after 2 seconds. However it also fails sporadically
> > because balance has finished in less than 2 seconds, which is plausible
> > since data and metadata are cached or other factors such as virtualized
> > environment. When that's the case, it fails like this:
> >
> > $ ./check btrfs/213
> > FSTYP -- btrfs
> > PLATFORM -- Linux/x86_64 debian0 6.4.0-rc1-btrfs-next-131+ #1 SMP PREEMPT_DYNAMIC Thu May 11 11:26:19 WEST 2023
> > MKFS_OPTIONS -- /dev/sdc
> > MOUNT_OPTIONS -- /dev/sdc /home/fdmanana/btrfs-tests/scratch_1
> >
> > btrfs/213 51s ... - output mismatch (see /home/fdmanana/git/hub/xfstests/results//btrfs/213.out.bad)
> > --- tests/btrfs/213.out 2020-06-10 19:29:03.822519250 +0100
> > +++ /home/fdmanana/git/hub/xfstests/results//btrfs/213.out.bad 2023-05-17 15:39:32.653727223 +0100
> > @@ -1,2 +1,3 @@
> > QA output created by 213
> > +ERROR: balance cancel on '/home/fdmanana/btrfs-tests/scratch_1' failed: Not in progress
> > Silence is golden
> > ...
> > (Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/btrfs/213.out /home/fdmanana/git/hub/xfstests/results//btrfs/213.out.bad' to see the entire diff)
> > Ran: btrfs/213
> > Failures: btrfs/213
> > Failed 1 of 1 tests
> >
> > To make it much less likely that balance has already finished before we
> > try to cancel it, unmount and mount again the filesystem before starting
> > balance, to clear cached metadata and data, and also double the time we
> > spend writing 1M data extents. Also make the test not run with an
> > informative message if we detect that balance finished before we could
> > cancel it.
> >
> > Signed-off-by: Filipe Manana <fdmanana@suse.com>
> > ---
> >
> > v2: Make the test _notrun if we detect that balance finished before we
> > could cancel it.
> >
> > tests/btrfs/213 | 16 +++++++++++++---
> > 1 file changed, 13 insertions(+), 3 deletions(-)
> >
> > diff --git a/tests/btrfs/213 b/tests/btrfs/213
> > index e16e41c0..5666d9b9 100755
> > --- a/tests/btrfs/213
> > +++ b/tests/btrfs/213
> > @@ -31,7 +31,7 @@ _fixed_by_kernel_commit 1dae7e0e58b4 \
> > _scratch_mkfs >> $seqres.full
> > _scratch_mount
> >
> > -runtime=4
> > +runtime=8
> >
> > # Create enough IO so that we need around $runtime seconds to relocate it.
> > #
> > @@ -42,11 +42,21 @@ sleep $runtime
> > kill $write_pid
> > wait $write_pid
> >
> > +# Unmount and mount again the fs to clear any cached data and metadata, so that
> > +# it's less likely balance has already finished when we try to cancel it below.
> > +_scratch_cycle_mount
> > +
> > # Now balance should take at least $runtime seconds, we can cancel it at
> > # $runtime/2 to ensure a success cancel.
> > _run_btrfs_balance_start -d --bg "$SCRATCH_MNT"
> > -sleep $(($runtime / 2))
> > -$BTRFS_UTIL_PROG balance cancel "$SCRATCH_MNT"
> > +sleep $(($runtime / 4))
> > +# It's possible that balance has already completed. It's unlikely but often
> > +# it may happen due to virtualization, caching and other factors, so ignore
> > +# any error about no balance currently running.
> > +$BTRFS_UTIL_PROG balance cancel "$SCRATCH_MNT" 2>&1 | grep -iq 'not in progress'
> > +if [ $? -eq 0 ]; then
> > + _not_run "balance finished before we could cancel it"
> > +fi
>
> fstests(btrfs/213) failed once here.
>
> btrfs/213 22s ... - output mismatch (see /usr/hpc-bio/xfstests/results//btrfs/213.out.bad)
> --- tests/btrfs/213.out 2023-03-28 06:09:10.372680814 +0800
> +++ /usr/hpc-bio/xfstests/results//btrfs/213.out.bad 2023-08-12 20:31:47.848303940 +0800
> @@ -1,2 +1,5 @@
> QA output created by 213
> +/usr/hpc-bio/xfstests/tests/btrfs/213: line 59: _not_run: command not found
> +ERROR: error during balancing '/mnt/scratch': No space left on device
> +There may be more info in syslog - try dmesg | tail
> Silence is golden
>
> we need to fix the error of '_not_run: command not found' firstly.
>
> I will update the info if fstests(btrfs/213) fails again.
It's a misspelled function name, it should be _notrun instead of
_not_run. I just sent a fix for that.
The test is not very reliable as balance can finish quickly, so
occasionally it may be skipped in some environments.
Thanks.
>
> Best Regards
> Wang Yugui (wangyugui@e16-tech.com)
> 2023/08/12
>
>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2023-08-13 11:02 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-05-18 11:08 [PATCH] btrfs/213: avoid occasional failure due to already finished balance fdmanana
2023-05-19 5:04 ` Anand Jain
2023-05-19 9:58 ` Filipe Manana
2023-05-19 9:57 ` [PATCH v2] " fdmanana
2023-05-19 23:34 ` Qu Wenruo
2023-05-21 20:14 ` Anand Jain
2023-08-12 12:48 ` Wang Yugui
2023-08-13 11:01 ` Filipe Manana
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox