From: Wang Yugui <wangyugui@e16-tech.com>
To: fdmanana@kernel.org
Cc: fstests@vger.kernel.org, linux-btrfs@vger.kernel.org,
Filipe Manana <fdmanana@suse.com>
Subject: Re: [PATCH v2] btrfs/213: avoid occasional failure due to already finished balance
Date: Sat, 12 Aug 2023 20:48:57 +0800 [thread overview]
Message-ID: <20230812204850.B86D.409509F4@e16-tech.com> (raw)
In-Reply-To: <ff12321e5ddfdb763b9258f746e67fe3f6ea1321.1684489375.git.fdmanana@suse.com>
Hi,
> From: Filipe Manana <fdmanana@suse.com>
>
> btrfs/213 writes data, in 1M extents, for 4 seconds into a file, then
> triggers a balance and then after 2 seconds it tries to cancel the
> balance operation. More often than not, this works because the balance
> is still running after 2 seconds. However it also fails sporadically
> because balance has finished in less than 2 seconds, which is plausible
> since data and metadata are cached or other factors such as virtualized
> environment. When that's the case, it fails like this:
>
> $ ./check btrfs/213
> FSTYP -- btrfs
> PLATFORM -- Linux/x86_64 debian0 6.4.0-rc1-btrfs-next-131+ #1 SMP PREEMPT_DYNAMIC Thu May 11 11:26:19 WEST 2023
> MKFS_OPTIONS -- /dev/sdc
> MOUNT_OPTIONS -- /dev/sdc /home/fdmanana/btrfs-tests/scratch_1
>
> btrfs/213 51s ... - output mismatch (see /home/fdmanana/git/hub/xfstests/results//btrfs/213.out.bad)
> --- tests/btrfs/213.out 2020-06-10 19:29:03.822519250 +0100
> +++ /home/fdmanana/git/hub/xfstests/results//btrfs/213.out.bad 2023-05-17 15:39:32.653727223 +0100
> @@ -1,2 +1,3 @@
> QA output created by 213
> +ERROR: balance cancel on '/home/fdmanana/btrfs-tests/scratch_1' failed: Not in progress
> Silence is golden
> ...
> (Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/btrfs/213.out /home/fdmanana/git/hub/xfstests/results//btrfs/213.out.bad' to see the entire diff)
> Ran: btrfs/213
> Failures: btrfs/213
> Failed 1 of 1 tests
>
> To make it much less likely that balance has already finished before we
> try to cancel it, unmount and mount again the filesystem before starting
> balance, to clear cached metadata and data, and also double the time we
> spend writing 1M data extents. Also make the test not run with an
> informative message if we detect that balance finished before we could
> cancel it.
>
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> ---
>
> v2: Make the test _notrun if we detect that balance finished before we
> could cancel it.
>
> tests/btrfs/213 | 16 +++++++++++++---
> 1 file changed, 13 insertions(+), 3 deletions(-)
>
> diff --git a/tests/btrfs/213 b/tests/btrfs/213
> index e16e41c0..5666d9b9 100755
> --- a/tests/btrfs/213
> +++ b/tests/btrfs/213
> @@ -31,7 +31,7 @@ _fixed_by_kernel_commit 1dae7e0e58b4 \
> _scratch_mkfs >> $seqres.full
> _scratch_mount
>
> -runtime=4
> +runtime=8
>
> # Create enough IO so that we need around $runtime seconds to relocate it.
> #
> @@ -42,11 +42,21 @@ sleep $runtime
> kill $write_pid
> wait $write_pid
>
> +# Unmount and mount again the fs to clear any cached data and metadata, so that
> +# it's less likely balance has already finished when we try to cancel it below.
> +_scratch_cycle_mount
> +
> # Now balance should take at least $runtime seconds, we can cancel it at
> # $runtime/2 to ensure a success cancel.
> _run_btrfs_balance_start -d --bg "$SCRATCH_MNT"
> -sleep $(($runtime / 2))
> -$BTRFS_UTIL_PROG balance cancel "$SCRATCH_MNT"
> +sleep $(($runtime / 4))
> +# It's possible that balance has already completed. It's unlikely but often
> +# it may happen due to virtualization, caching and other factors, so ignore
> +# any error about no balance currently running.
> +$BTRFS_UTIL_PROG balance cancel "$SCRATCH_MNT" 2>&1 | grep -iq 'not in progress'
> +if [ $? -eq 0 ]; then
> + _not_run "balance finished before we could cancel it"
> +fi
fstests(btrfs/213) failed once here.
btrfs/213 22s ... - output mismatch (see /usr/hpc-bio/xfstests/results//btrfs/213.out.bad)
--- tests/btrfs/213.out 2023-03-28 06:09:10.372680814 +0800
+++ /usr/hpc-bio/xfstests/results//btrfs/213.out.bad 2023-08-12 20:31:47.848303940 +0800
@@ -1,2 +1,5 @@
QA output created by 213
+/usr/hpc-bio/xfstests/tests/btrfs/213: line 59: _not_run: command not found
+ERROR: error during balancing '/mnt/scratch': No space left on device
+There may be more info in syslog - try dmesg | tail
Silence is golden
we need to fix the error of '_not_run: command not found' firstly.
I will update the info if fstests(btrfs/213) fails again.
Best Regards
Wang Yugui (wangyugui@e16-tech.com)
2023/08/12
next prev parent reply other threads:[~2023-08-12 12:49 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-18 11:08 [PATCH] btrfs/213: avoid occasional failure due to already finished balance fdmanana
2023-05-19 5:04 ` Anand Jain
2023-05-19 9:58 ` Filipe Manana
2023-05-19 9:57 ` [PATCH v2] " fdmanana
2023-05-19 23:34 ` Qu Wenruo
2023-05-21 20:14 ` Anand Jain
2023-08-12 12:48 ` Wang Yugui [this message]
2023-08-13 11:01 ` Filipe Manana
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230812204850.B86D.409509F4@e16-tech.com \
--to=wangyugui@e16-tech.com \
--cc=fdmanana@kernel.org \
--cc=fdmanana@suse.com \
--cc=fstests@vger.kernel.org \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox