From: Eryu Guan <guaneryu@gmail.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: linux-xfs@vger.kernel.org, fstests@vger.kernel.org
Subject: Re: [PATCH ] xfs: check for COW overflows in i_delayed_blks
Date: Sun, 26 May 2019 22:27:35 +0800 [thread overview]
Message-ID: <20190526142735.GP15846@desktop> (raw)
In-Reply-To: <155839151219.62947.9627045046429149685.stgit@magnolia>
On Mon, May 20, 2019 at 03:31:52PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
>
> With the new copy on write functionality it's possible to reserve so
> much COW space for a file that we end up overflowing i_delayed_blks.
> The only user-visible effect of this is to cause totally wrong i_blocks
> output in stat, so check for that.
>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
I hit xfs_db killed by OOM killer (2 vcpu, 8G memory kvm guest) when
trying this test and the test takes too long time (I changed the fs size
from 300T to 300G and tried a test run), perhaps that's why you don't
put it in auto group?
> ---
> tests/xfs/907 | 180 +++++++++++++++++++++++++++++++++++++++++++++++++++++
> tests/xfs/907.out | 8 ++
> tests/xfs/group | 1
> 3 files changed, 189 insertions(+)
> create mode 100755 tests/xfs/907
> create mode 100644 tests/xfs/907.out
>
>
> diff --git a/tests/xfs/907 b/tests/xfs/907
> new file mode 100755
> index 00000000..2c21ac8e
> --- /dev/null
> +++ b/tests/xfs/907
> @@ -0,0 +1,180 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0+
> +# Copyright (c) 2019 Oracle, Inc. All Rights Reserved.
> +#
> +# FS QA Test No. 907
> +#
> +# Try to overflow i_delayed_blks by setting the largest cowextsize hint
> +# possible, creating a sparse file with a single byte every cowextsize bytes,
> +# reflinking it, and retouching every written byte to see if we can create
> +# enough speculative COW reservations to overflow i_delayed_blks.
> +#
> +seq=`basename $0`
> +seqres=$RESULT_DIR/$seq
> +echo "QA output created by $seq"
> +
> +here=`pwd`
> +tmp=/tmp/$$
> +status=1 # failure is the default!
> +trap "_cleanup; exit \$status" 0 1 2 3 7 15
> +
> +_cleanup()
> +{
> + cd /
Need to '_destroy_loop_device $loop_dev' too
> + umount $loop_mount > /dev/null 2>&1
$UMOUNT_PROG
> + rm -rf $tmp.*
> +}
And loop_dev and loop_mount should be defined before _cleanup()?
> +
> +# get standard environment, filters and checks
> +. ./common/rc
> +. ./common/reflink
> +. ./common/filter
> +
> +# real QA test starts here
> +_supported_os Linux
> +_supported_fs xfs
> +_require_scratch_reflink
> +_require_loop
> +_require_xfs_debug # needed for xfs_bmap -c
_require_cp_reflink
> +
> +MAXEXTLEN=2097151 # cowextsize can't be more than MAXEXTLEN
> +
> +# Create a huge sparse filesystem on the scratch device because that's what
> +# we're going to need to guarantee that we have enough blocks to overflow in
> +# the first place. In the worst case we have a 64k-block filesystem in which
> +# we have to be able to reserve 2^32 blocks. Adding in 20% overhead and a
> +# 128M log, we get about 300T.
> +echo "Format and mount"
> +_scratch_mkfs > "$seqres.full" 2>&1
> +_scratch_mount
> +_require_fs_space $SCRATCH_MNT 200000 # 300T fs requires ~200MB of space
I noticed the 'a.img' file consumed more than 5G space, is 200MB
enough?
> +
> +loop_file=$SCRATCH_MNT/a.img
> +loop_mount=$SCRATCH_MNT/a
> +truncate -s 300T $loop_file
$XFS_IO_PROG -fc "truncate 300T" $loop_file
> +loop_dev=$(_create_loop_device $loop_file)
> +
> +# Now we have to create the source file. The goal is to overflow a 32-bit
> +# i_delayed_blks, which means that we have to create at least that many delayed
> +# allocation block reservations. Take advantage of the fact that a cowextsize
> +# hint causes creation of large speculative delalloc reservations in the cow
> +# fork to reduce the amount of work we have to do.
> +#
> +# The maximum cowextsize is going to be MAXEXTLEN fs blocks on a 100T
> +# filesystem, so start by setting up the hint. Note that the current fsxattr
> +# interface specifies its u32 cowextsize hint in units of bytes and therefore
> +# can't handle MAXEXTLEN * blksz on most filesystems, so we set it via mkfs
> +# because mkfs takes units of fs blocks, not bytes.
> +
> +_mkfs_dev -d cowextsize=$MAXEXTLEN -l size=128m $loop_dev >> $seqres.full
> +mkdir $loop_mount
> +mount -t xfs $loop_dev $loop_mount
_mount $loop_dev $loop_mount
> +
> +echo "Create crazy huge file"
> +huge_file="$loop_mount/a"
> +touch "$huge_file"
> +blksz=$(_get_file_block_size "$loop_mount")
> +extsize_bytes="$(( MAXEXTLEN * blksz ))"
> +
> +# Make sure it actually set a hint.
> +curr_cowextsize_str="$($XFS_IO_PROG -c 'cowextsize' "$huge_file")"
> +echo "$curr_cowextsize_str" >> $seqres.full
> +cowextsize_bytes="$(echo "$curr_cowextsize_str" | sed -e 's/^.\([0-9]*\).*$/\1/g')"
> +test "$cowextsize_bytes" -eq 0 && echo "could not set cowextsize?"
> +
> +# Now we have to seed the file with sparse contents. Remember, the goal is to
> +# create a little more than 2^32 delayed allocation blocks in the COW fork with
> +# as little effort as possible. We know that speculative COW preallocation
> +# will create MAXEXTLEN-length reservations for us, so that means we should
> +# be able to get away with touching a single byte every extsize_bytes. We
> +# do this backwards to avoid having to move EOF.
> +nr="$(( ((2 ** 32) / MAXEXTLEN) + 100 ))"
> +seq $nr -1 0 | while read n; do
> + off="$((n * extsize_bytes))"
> + $XFS_IO_PROG -c "pwrite $off 1" "$huge_file" > /dev/null
> +done
> +
> +echo "Reflink crazy huge file"
> +_cp_reflink "$huge_file" "$huge_file.b"
> +
> +# Now that we've shared all the blocks in the file, we touch them all again
> +# to create speculative COW preallocations.
> +echo "COW crazy huge file"
> +seq $nr -1 0 | while read n; do
> + off="$((n * extsize_bytes))"
> + $XFS_IO_PROG -c "pwrite $off 1" "$huge_file" > /dev/null
> +done
> +
> +# Compare the number of blocks allocated to this file (as reported by stat)
> +# against the number of blocks that are in the COW fork. If either one is
> +# less than 2^32 then we have evidence of an overflow problem.
> +echo "Check crazy huge file"
> +allocated_stat_blocks="$(stat -c %b "$huge_file")"
> +stat_blksz="$(stat -c %B "$huge_file")"
> +allocated_fsblocks=$(( allocated_stat_blocks * stat_blksz / blksz ))
> +
> +# Make sure we got enough COW reservations to overflow a 32-bit counter.
> +
> +# Return the number of delalloc & real blocks given bmap output for a fork of a
> +# file. Output is in units of 512-byte blocks.
> +count_fork_blocks() {
> + awk "
$AWK_PROG
> +{
> + if (\$3 == \"delalloc\") {
> + x += \$4;
> + } else if (\$3 == \"hole\") {
> + ;
> + } else {
> + x += \$6;
> + }
> +}
> +END {
> + print(x);
> +}
> +"
> +}
> +
> +# Count the number of blocks allocated to a file based on the xfs_bmap output.
> +# Output is in units of filesystem blocks.
> +count_file_fork_blocks() {
> + local tag="$1"
> + local file="$2"
> + local args="$3"
> +
> + $XFS_IO_PROG -c "bmap $args -l -p -v" "$huge_file" > $tmp.extents
> + echo "$tag fork map" >> $seqres.full
> + cat $tmp.extents >> $seqres.full
> + local sectors="$(count_fork_blocks < $tmp.extents)"
> + echo "$(( sectors / (blksz / 512) ))"
> +}
> +
> +cowblocks=$(count_file_fork_blocks cow "$huge_file" "-c")
> +attrblocks=$(count_file_fork_blocks attr "$huge_file" "-a")
> +datablocks=$(count_file_fork_blocks data "$huge_file" "")
> +
> +# Did we create more than 2^32 blocks in the cow fork?
> +echo "datablocks is $datablocks" >> $seqres.full
> +echo "attrblocks is $attrblocks" >> $seqres.full
> +echo "cowblocks is $cowblocks" >> $seqres.full
> +test "$cowblocks" -lt $((2 ** 32)) && \
> + echo "cowblocks (${cowblocks}) should be more than 2^32!"
> +
> +# Does stat's block allocation count exceed 2^32?
> +echo "stat blocks is $allocated_fsblocks" >> $seqres.full
> +test "$allocated_fsblocks" -lt $((2 ** 32)) && \
> + echo "stat blocks (${allocated_fsblocks}) should be more than 2^32!"
> +
> +# Finally, does st_blocks match what we computed from the forks?
> +expected_allocated_fsblocks=$((datablocks + cowblocks + attrblocks))
> +echo "expected stat blocks is $expected_allocated_fsblocks" >> $seqres.full
> +
> +_within_tolerance "st_blocks" $allocated_fsblocks $expected_allocated_fsblocks 2% -v
> +
> +echo "Test done"
> +_check_xfs_filesystem $loop_dev none none
> +umount $loop_mount
$UMOUNT_PROG
Thanks,
Eryu
> +_destroy_loop_device $loop_dev
> +
> +# success, all done
> +status=0
> +exit
> diff --git a/tests/xfs/907.out b/tests/xfs/907.out
> new file mode 100644
> index 00000000..cc07d659
> --- /dev/null
> +++ b/tests/xfs/907.out
> @@ -0,0 +1,8 @@
> +QA output created by 907
> +Format and mount
> +Create crazy huge file
> +Reflink crazy huge file
> +COW crazy huge file
> +Check crazy huge file
> +st_blocks is in range
> +Test done
>
next prev parent reply other threads:[~2019-05-26 14:27 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-05-20 22:31 [PATCH 0/1] xfs: test overflow of delalloc block counters Darrick J. Wong
2019-05-20 22:31 ` [PATCH ] xfs: check for COW overflows in i_delayed_blks Darrick J. Wong
2019-05-26 14:27 ` Eryu Guan [this message]
2019-05-28 17:01 ` Darrick J. Wong
2019-05-30 7:20 ` Eryu Guan
2019-05-30 16:32 ` Darrick J. Wong
-- strict thread matches above, loose matches on Subject: below --
2018-08-02 18:03 [PATCH] " Darrick J. Wong
2018-08-02 21:32 ` Eric Sandeen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190526142735.GP15846@desktop \
--to=guaneryu@gmail.com \
--cc=darrick.wong@oracle.com \
--cc=fstests@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox