Re: [PATCH] generic: add test for missing btrfs csums in log when doing async on subpage vol

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Darrick J. Wong" <djwong@kernel.org>
To: Mark Harmstone <maharmstone@fb.com>
Cc: fstests@vger.kernel.org, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH] generic: add test for missing btrfs csums in log when doing async on subpage vol
Date: Mon, 28 Oct 2024 14:57:28 -0700	[thread overview]
Message-ID: <20241028215728.GU21840@frogsfrogsfrogs> (raw)
In-Reply-To: <20241015153957.2099812-1-maharmstone@fb.com>

On Tue, Oct 15, 2024 at 04:39:34PM +0100, Mark Harmstone wrote:
> Adds a test for a bug we encountered on Linux 6.4 on aarch64, where a
> race could mean that csums weren't getting written to the log tree,
> leading to corruption when it was replayed.
> 
> The patches to detect log this tree corruption are in btrfs-progs 6.11.
> 
> Signed-off-by: Mark Harmstone <maharmstone@fb.com>
> ---
> This is a genericized version of the test I originally proposed as
> btrfs/333.
> 
>  tests/generic/757     | 71 +++++++++++++++++++++++++++++++++++++++++++
>  tests/generic/757.out |  2 ++
>  2 files changed, 73 insertions(+)
>  create mode 100755 tests/generic/757
>  create mode 100644 tests/generic/757.out
> 
> diff --git a/tests/generic/757 b/tests/generic/757
> new file mode 100755
> index 00000000..6ad3d01e
> --- /dev/null
> +++ b/tests/generic/757
> @@ -0,0 +1,71 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +#
> +# FS QA Test 757
> +#
> +# Test async dio with fsync to test a btrfs bug where a race meant that csums
> +# weren't getting written to the log tree, causing corruptions on remount.
> +# This can be seen on subpage FSes on Linux 6.4.
> +#
> +. ./common/preamble
> +_begin_fstest auto quick metadata log recoveryloop
> +
> +_fixed_by_kernel_commit e917ff56c8e7 \
> +	"btrfs: determine synchronous writers from bio or writeback control"
> +
> +fio_config=$tmp.fio
> +
> +. ./common/dmlogwrites
> +
> +_require_scratch
> +_require_log_writes
> +
> +cat >$fio_config <<EOF
> +[global]
> +iodepth=128
> +direct=1
> +ioengine=libaio
> +rw=randwrite
> +runtime=1s
> +[job0]
> +rw=randwrite
> +filename=$SCRATCH_MNT/file
> +size=1g
> +fdatasync=1
> +EOF
> +
> +_require_fio $fio_config
> +
> +cat $fio_config >> $seqres.full
> +
> +_log_writes_init $SCRATCH_DEV
> +_log_writes_mkfs >> $seqres.full 2>&1
> +_log_writes_mark mkfs
> +
> +_log_writes_mount
> +
> +$FIO_PROG $fio_config > /dev/null 2>&1
> +_log_writes_unmount
> +
> +_log_writes_remove
> +
> +prev=$(_log_writes_mark_to_entry_number mkfs)
> +[ -z "$prev" ] && _fail "failed to locate entry mark 'mkfs'"
> +cur=$(_log_writes_find_next_fua $prev)
> +[ -z "$cur" ] && _fail "failed to locate next FUA write"
> +
> +while [ ! -z "$cur" ]; do
> +	_log_writes_replay_log_range $cur $SCRATCH_DEV >> $seqres.full
> +
> +	_check_scratch_fs

This test fails on xfs because (afaict) replaying the log to $cur
results in $SCRATCH_DEV being a filesystem with a dirty log; and
xfs_repair fails when it is given a filesystem with a dirty log.

I then fixed the test to mount and unmount the filesystem to recovery
the dirty log before invoking xfs_repair:

	# xfs_repair won't run if the log is dirty
	if [ $FSTYP = "xfs" ]; then
		_scratch_mount
		_scratch_unmount
	fi
	_check_scratch_fs

But now the test takes a very long time to run because (on my system
anyway) the fio run can initiate 17,000 FUAs, which means that this loop
runs that many times.  100 iterations takes about 45 seconds, which is
about two hours.

Is it necessary to iterate the loop that many times to reproduce
whatever issue btrfs had?

--D

> +
> +	prev=$cur
> +	cur=$(_log_writes_find_next_fua $(($cur + 1)))
> +	[ -z "$cur" ] && break
> +done
> +
> +echo "Silence is golden"
> +
> +# success, all done
> +status=0
> +exit
> diff --git a/tests/generic/757.out b/tests/generic/757.out
> new file mode 100644
> index 00000000..dfbc8094
> --- /dev/null
> +++ b/tests/generic/757.out
> @@ -0,0 +1,2 @@
> +QA output created by 757
> +Silence is golden
> -- 
> 2.44.2
> 
>

next prev parent reply	other threads:[~2024-10-28 21:57 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-15 15:39 [PATCH] generic: add test for missing btrfs csums in log when doing async on subpage vol Mark Harmstone
2024-10-16 11:09 ` Filipe Manana
2024-10-18 17:36   ` Mark Harmstone
2024-10-23  3:31     ` Zorro Lang
2024-10-23  3:43 ` Zorro Lang
2024-10-23  3:53 ` Zorro Lang
2024-10-23 10:34   ` Mark Harmstone
2024-10-28 21:57 ` Darrick J. Wong [this message]
2024-10-29  5:11   ` Zorro Lang
2024-10-29 10:03     ` Mark Harmstone

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20241028215728.GU21840@frogsfrogsfrogs \
    --to=djwong@kernel.org \
    --cc=fstests@vger.kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=maharmstone@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).