Re: btrfs -o discard bug in latest dev branches

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jeff Mahoney <jeffm@suse.com>
To: Jun He <jhe@cs.wisc.edu>, linux-btrfs@vger.kernel.org
Subject: Re: btrfs -o discard bug in latest dev branches
Date: Wed, 26 Aug 2015 09:24:34 -0400	[thread overview]
Message-ID: <55DDBE12.1020104@suse.com> (raw)
In-Reply-To: <20150826040442.GC10927@Juns-MacBook-Pro.local>

[-- Attachment #1: Type: text/plain, Size: 3118 bytes --]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 8/26/15 12:04 AM, Jun He wrote:
> Hi, I have been playing with btrfs discard for a while and found
> that btrfs may fail to discard some extents with 'mount -o
> discard'. I am aware of Jeff Mahoney's patches (
> https://patchwork.kernel.org/patch/6609491/ ). It seems that the
> patches do not fix the problem. I have seen the same problematic 
> behavior for the following versions
> 
> - https://git.kernel.org/cgit/linux/kernel/git/fdmanana/linux.git/ 
> integration-4.3 commit:477594f93c43b1ee685 - 3.16.0 - 4.2.0-rc7
> 
> The problem can be reproduced by writing and fsyncing a 4MB file
> for 50 times on a 256MB empty FS (mount option: -o discard). You
> will find that some extents are not discarded (my expected behavior
> is that, after overwriting, an old version of a file extent should
> be discarded). I use several ways to confirm this:
> 
> 1. I created a loop device back by a sparse file in tmpfs. After
> running the workload, I found the file is 29MB (ls -lsh). If you
> fstrim the file system, the sparse file will become 4.1MB. This
> proves that there are a lot of data not discarded.

Can you capture the output of 'filefrag -b -v' for the sparse file in
tmpfs both before and after trim?  Also btrfs-debug-tree output for it.

> 2. I collected blktrace + blkparse output and plotted the write and
> discard operations in a space-time graph, where you can intuitively
> see some extents are overwritten but not discarded. Here is the
> space-time graph 
> https://gist.githubusercontent.com/junhe/b6ce39eeb6de8887e66a/raw/825a
3c2946b52a50c2b6032a98d637f5a32bc5c3/integration-4.3.png
>
>  Is it a known problem or is it not a problem? If it is a known
> problem and there exists a patch that I am not aware of, can
> somebody direct me to it? If it is specifically designed this way,
> can the designers give the rationale of discarding some, but not
> all of, old extents?
> 
> The reproducer can be run by: git clone
> https://gist.github.com/b6ce39eeb6de8887e66a.git cd
> b6ce39eeb6de8887e66a sudo python main.py
> 
> It also has a readme.

I tested my discard patches the other day against 4.2-rc8 and it
passed my 326 test.  I'll check out your reproducer.

- -Jeff



- -- 
Jeff Mahoney
SUSE Labs
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.19 (Darwin)

iQIcBAEBAgAGBQJV3b4SAAoJEB57S2MheeWyTUMP/R2mF2/6RcXO0iXNJw466/qI
VzEV09C5Cs04NnDSfY0ZPk3W6iD1B4iOnHB1HF+lK8ENeF/Ug+0Tnb1uJpEo4lvn
+yTxUx+qYwo737SU9oTjmvpr4aXzmV3I223gylcPnaXCQu8w80BsldIBvL98ExIq
Ad/ZzuPMAfc3jvhcEZrDTN6L77rIyhXkhGT/fMnRcycW/26myd8HbJDjE0lebmvY
9+L15Ykgm52V4zpW5wSQh1xww8WWsKv/K6nF3shuNPlhmovOwtAAEJhLRQcpJp5e
+jSijexTsd9so5smwc+JJ+sW86zZ6G6opL/K96MJNEXZ4UMBQPaOYNrewAHTh8Xe
V4WzvbL/JWcKAAzPugvhRGjP2xjM+E1oBQGvZEGEmZb0lneW5Fouao9xGXtftRr9
NAc7A8D/WkqSxDb7qhEdmQpexsQNfPMQi4JmTj0/kGCjYAhxCuVlPgNjorE70BW/
nze9Fg1zydZZs0XeQW4Bh4+HU3yjexLshQCgXjnhabmc9+VwBnEuizYVtcjFgm8r
vBCj80cAID7n7PtPhDNE7DfFHLuigpA6hTw+vgUxL1zd4yHEjQ3pGZ0lZuVK42oZ
EIqKX7r0maU24pQGj1v9NL8QPqYQmrj9ljUmk3QSlvjTUGVv5y8U+KVX7h3cGJRO
YMs7n+5BMmI6WKZBn+fr
=StBb
-----END PGP SIGNATURE-----

[-- Attachment #2: 326 --]
[-- Type: text/plain, Size: 4896 bytes --]

#! /bin/bash
# FSQA Test No. 326
#
# This test uses a loopback mount with PUNCH_HOLE support to test
# whether discard operations are working as expected.
#
# It tests both -odiscard and fstrim.
#
# Copyright (C) 2015 SUSE. All Rights Reserved.
# Author: Jeff Mahoney <jeffm@suse.com>
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License as
# published by the Free Software Foundation.
#
# This program is distributed in the hope that it would be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write the Free Software Foundation,
# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
#-----------------------------------------------------------------------
#

seq=`basename $0`
seqres=$RESULT_DIR/$seq
echo "QA output created by $seq"

tmp=/tmp/$$
status=1	# failure is the default!
trap "_cleanup; exit \$status" 0 1 2 3 15

loopdev=
tmpdir=
_cleanup()
{
	[ -n "$tmpdir" ] && umount $tmpdir
	[ -n "$loopdev" ] && losetup -d $loopdev
}

# get standard environment, filters and checks
. ./common/rc
. ./common/filter

# real QA test starts here
_need_to_be_root
_supported_fs generic
_supported_os Linux
_require_scratch
_require_fstrim

MB_BYTES=1048576
GB_BYTES=1073741824

TEST_FS_SIZE_MB=1024
SMFILE_MIN_SIZE=512
SMFILE_MAX_SIZE=$(( 3 * $MB_BYTES ))
LARGEFILE_SIZE_MB=100

# The test methodology is generic, but the values used by each file system
# for the results to make sense may not be.
if [ "$FSTYP" = "btrfs" ]; then
	MAX_REMAINING_MB=10
	TEST_FS_SIZE_MB=10240
	# Larger than a single block group
	LARGEFILE_SIZE_MB=$(( 12 * $GB_BYTES ))
elif [ "$FSTYP" = "xfs" ]; then
	MAX_REMAINING_MB=50
else # ext4, probably, but sufficiently permissive that it should work anywhere
	# Special; Means at
	MAX_REMAINING_MB=0
fi

test_fs_size_bytes=$(( $TEST_FS_SIZE_MB * $MB_BYTES ))

rm -f $seqres.full

_scratch_mkfs &>> $seqres.full
_require_fs_space $SCRATCH_MNT $TEST_FS_SIZE_MB
_scratch_mount

blocks_used_kb()
{
	blocks=$(( $(stat --format="%b*%B/1024" "$1") ))
	echo "# blocks_used_kb $1 ($2)" >> $seqres.full
	echo "$blocks $1" >> $seqres.full
	echo $blocks
}

random_size()
{
	MIN=$1
	MAX=$2
	echo $(( $MIN + ($MAX - $MIN) * $RANDOM / 32768 ))
}

test_discard()
{
	discard=$1
	files=$2

	tmpfile=$SCRATCH_MNT/testfs.img.$$
	tmpdir=$SCRATCH_MNT/testdir.$$
	testdir=$tmpdir/testdir
	mkdir -p $tmpdir || _fail "!!! failed to create temp mount dir"

	# Create a sparse file to host the file system
	$XFS_IO_PROG -f -t -c "truncate $test_fs_size_bytes" $tmpfile \
		|| _fail "!!! failed to create fs image file"

	opts=""
	if [ "$discard" = "discard" ]; then
		opts="-o discard"
	fi

	loopdev=$(losetup --show -f $tmpfile)
	_mkfs_dev $loopdev &> $seqres.full
	$MOUNT_PROG $opts $loopdev $tmpdir \
		|| _fail "!!! failed to loopback mount"


	$FSTRIM_PROG $tmpdir

	ESIZE="$(blocks_used_kb $tmpfile "empty filesystem")"

	if [ "$files" = "large" ]; then
		count=$(( $TEST_FS_SIZE_MB / $LARGEFILE_SIZE_MB ))
		random=false
	else
		count=$(( $TEST_FS_SIZE_MB * $MB_BYTES / $SMFILE_MAX_SIZE ))
		random=true
	fi

	mkdir -p $testdir
	for ((i = 1; i <= count; i++)); do
		SIZE=$(( $LARGEFILE_SIZE_MB * $MB_BYTES ))
		if $random; then
			SIZE=$(random_size $SMFILE_MIN_SIZE $SMFILE_MAX_SIZE)
		fi

		fn=${seq}_${i}
		$XFS_IO_PROG -f -c "pwrite -S 0xaa 0 $SIZE" $testdir/$fn \
			&> /dev/null
		if [ $? -ne 0 ]; then
			echo "Failed creating file $fn" \
				>>$seqres.full
			break
		fi
	done

	sync
	OSIZE="$(blocks_used_kb $tmpfile "before removing files")"

	rm -rf $testdir

	# Ensure everything's actually on the hosted file system
	if [ "$FSTYP" = "btrfs" ]; then
		_run_btrfs_util_prog filesystem sync $tmpdir
	fi
	sync
	if [ "$discard" = "trim" ]; then
		$FSTRIM_PROG $tmpdir
	fi

	$UMOUNT_PROG $tmpdir
	rmdir $tmpdir
	tmpdir=

	# Sync the backing file system to ensure the hole punches have
	# happened and we can trust the result.
	if [ "$FSTYP" = "btrfs" ]; then
		_run_btrfs_util_prog filesystem sync $SCRATCH_MNT
	fi
	sync

	NSIZE="$(blocks_used_kb $tmpfile "after trim")"

	# Going from ~ 10GB to 50MB is a good enough test to account for
	# metadata remaining on different file systems.
	if [ "$NSIZE" -gt $(( 50 * 1024 )) ]; then
		_fail "TRIM failed: before rm ${OSIZE}kB, after rm ${NSIZE}kB"
	fi
	rm $tmpfile
	losetup -d $loopdev
	loopdev=
}


echo "Testing with -odiscard, many small files"
test_discard discard many

echo "Testing with -odiscard, several large files"
test_discard discard large

echo "Testing with fstrim, many small files"
test_discard trim many

echo "Testing with fstrim, several large files"
test_discard trim large

status=0
exit

[-- Attachment #3: 326.out --]
[-- Type: text/plain, Size: 189 bytes --]

QA output created by 326
Testing with -odiscard, many small files
Testing with -odiscard, several large files
Testing with fstrim, many small files
Testing with fstrim, several large files

next prev parent reply	other threads:[~2015-08-26 13:24 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-26  4:04 btrfs -o discard bug in latest dev branches Jun He
2015-08-26  6:14 ` Duncan
2015-08-26 19:09   ` Jun He
2015-08-26 13:24 ` Jeff Mahoney [this message]
2015-08-26 16:02   ` Jun He
2015-08-27  3:03     ` Jun He

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55DDBE12.1020104@suse.com \
    --to=jeffm@suse.com \
    --cc=jhe@cs.wisc.edu \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).