FS/XFS testing framework
 help / color / mirror / Atom feed
From: Anand Jain <anand.jain@oracle.com>
To: fdmanana@kernel.org, fstests@vger.kernel.org
Cc: linux-btrfs@vger.kernel.org, Filipe Manana <fdmanana@suse.com>
Subject: Re: [PATCH] btrfs/213: avoid occasional failure due to already finished balance
Date: Fri, 19 May 2023 13:04:46 +0800	[thread overview]
Message-ID: <966568ad-de9c-e395-1ee4-1b9028987df2@oracle.com> (raw)
In-Reply-To: <1e2924e9a604f781ad446ba8e2d789583e377837.1684408079.git.fdmanana@suse.com>

On 18/5/23 19:08, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> btrfs/213 writes data, in 1M extents, for 4 seconds into a file, then
> triggers a balance and then after 2 seconds it tries to cancel the
> balance operation. More often than not, this works because the balance
> is still running after 2 seconds. However it also fails sporadically
> because balance has finished in less than 2 seconds, which is plausible
> since data and metadata are cached or other factors such as virtualized
> environment. When that's the case, it fails like this:
> 
>    $ ./check btrfs/213
>    FSTYP         -- btrfs
>    PLATFORM      -- Linux/x86_64 debian0 6.4.0-rc1-btrfs-next-131+ #1 SMP PREEMPT_DYNAMIC Thu May 11 11:26:19 WEST 2023
>    MKFS_OPTIONS  -- /dev/sdc
>    MOUNT_OPTIONS -- /dev/sdc /home/fdmanana/btrfs-tests/scratch_1
> 
>    btrfs/213 51s ... - output mismatch (see /home/fdmanana/git/hub/xfstests/results//btrfs/213.out.bad)
>        --- tests/btrfs/213.out	2020-06-10 19:29:03.822519250 +0100
>        +++ /home/fdmanana/git/hub/xfstests/results//btrfs/213.out.bad	2023-05-17 15:39:32.653727223 +0100
>        @@ -1,2 +1,3 @@
>         QA output created by 213
>        +ERROR: balance cancel on '/home/fdmanana/btrfs-tests/scratch_1' failed: Not in progress
>         Silence is golden
>        ...
>        (Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/btrfs/213.out /home/fdmanana/git/hub/xfstests/results//btrfs/213.out.bad'  to see the entire diff)
>    Ran: btrfs/213
>    Failures: btrfs/213
>    Failed 1 of 1 tests
> 
> To make it much less likely that balance has already finished before we
> try to cancel it, unmount and mount again the filesystem before starting
> balance, to clear cached metadata and data, and also double the time we
> spend writing 1M data extents. Also ignore when the balance failed because
> it was already finished when we tried to cancel it.
> 
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> ---
>   tests/btrfs/213 | 13 ++++++++++---
>   1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/tests/btrfs/213 b/tests/btrfs/213
> index 8a10355c..cca0b3cc 100755
> --- a/tests/btrfs/213
> +++ b/tests/btrfs/213
> @@ -28,7 +28,7 @@ _require_xfs_io_command pwrite -D
>   _scratch_mkfs >> $seqres.full
>   _scratch_mount
>   
> -runtime=4
> +runtime=8
>   
>   # Create enough IO so that we need around $runtime seconds to relocate it.
>   #
> @@ -39,11 +39,18 @@ sleep $runtime
>   kill $write_pid
>   wait $write_pid
>   
> +# Unmount and mount again the fs to clear any cached data and metadata, so that
> +# it's less likely balance has already finished when we try to cancel it below.
> +_scratch_cycle_mount
> +
>   # Now balance should take at least $runtime seconds, we can cancel it at
>   # $runtime/2 to ensure a success cancel.
>   _run_btrfs_balance_start -d --bg "$SCRATCH_MNT"


> -sleep $(($runtime / 2))
> -$BTRFS_UTIL_PROG balance cancel "$SCRATCH_MNT"
> +sleep $(($runtime / 4))
> +# It's possible that balance has already completed. It's unlikely but often
> +# it may happen due to virtualization, caching and other factors, so ignore
> +# any error about no balance currently running.
> +$BTRFS_UTIL_PROG balance cancel "$SCRATCH_MNT" 2>&1 | grep -iv 'not in progress'

Cancel is an important step in this test case.
Why not call _notrun() if the test case fails to make sure
the balance is still in progress? This way, it provides
another opportunity to fix.

Thanks, Anand

>   
>   # Now check if we can finish relocating metadata, which should finish very
>   # quickly.


  reply	other threads:[~2023-05-19  5:05 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-18 11:08 [PATCH] btrfs/213: avoid occasional failure due to already finished balance fdmanana
2023-05-19  5:04 ` Anand Jain [this message]
2023-05-19  9:58   ` Filipe Manana
2023-05-19  9:57 ` [PATCH v2] " fdmanana
2023-05-19 23:34   ` Qu Wenruo
2023-05-21 20:14   ` Anand Jain
2023-08-12 12:48   ` Wang Yugui
2023-08-13 11:01     ` Filipe Manana

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=966568ad-de9c-e395-1ee4-1b9028987df2@oracle.com \
    --to=anand.jain@oracle.com \
    --cc=fdmanana@kernel.org \
    --cc=fdmanana@suse.com \
    --cc=fstests@vger.kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox