btrfs -o discard bug in latest dev branches

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* btrfs -o discard bug in latest dev branches
@ 2015-08-26  4:04 Jun He
  2015-08-26  6:14 ` Duncan
  2015-08-26 13:24 ` Jeff Mahoney
  0 siblings, 2 replies; 6+ messages in thread
From: Jun He @ 2015-08-26  4:04 UTC (permalink / raw)
  To: linux-btrfs, jeffm

Hi,
I have been playing with btrfs discard for a while and found that btrfs may
fail to discard some extents with 'mount -o discard'. I am aware of Jeff
Mahoney's patches ( https://patchwork.kernel.org/patch/6609491/ ). It seems
that the patches do not fix the problem. I have seen the same problematic
behavior for the following versions 

- https://git.kernel.org/cgit/linux/kernel/git/fdmanana/linux.git/
  integration-4.3 commit:477594f93c43b1ee685
- 3.16.0
- 4.2.0-rc7

The problem can be reproduced by writing and fsyncing a 4MB file for 50 times
on a 256MB empty FS (mount option: -o discard). You will find that some extents
are not discarded (my expected behavior is that, after overwriting, an old
version of a file extent should be discarded). I use several ways to confirm
this:

1. I created a loop device back by a sparse file in tmpfs. After running the
workload, I found the file is 29MB (ls -lsh). If you fstrim the file system,
the sparse file will become 4.1MB. This proves that there are a lot of data not
discarded.

2. I collected blktrace + blkparse output and plotted the write and discard
operations in a space-time graph, where you can intuitively see some extents
are overwritten but not discarded. Here is the space-time graph
https://gist.githubusercontent.com/junhe/b6ce39eeb6de8887e66a/raw/825a3c2946b52a50c2b6032a98d637f5a32bc5c3/integration-4.3.png

Is it a known problem or is it not a problem? If it is a known problem and
there exists a patch that I am not aware of, can somebody direct me to it? If
it is specifically designed this way, can the designers give the rationale of
discarding some, but not all of, old extents?

The reproducer can be run by:
git clone https://gist.github.com/b6ce39eeb6de8887e66a.git
cd b6ce39eeb6de8887e66a
sudo python main.py

It also has a readme.

Thanks,
Jun

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: btrfs -o discard bug in latest dev branches
  2015-08-26  4:04 btrfs -o discard bug in latest dev branches Jun He
@ 2015-08-26  6:14 ` Duncan
  2015-08-26 19:09   ` Jun He
  2015-08-26 13:24 ` Jeff Mahoney
  1 sibling, 1 reply; 6+ messages in thread
From: Duncan @ 2015-08-26  6:14 UTC (permalink / raw)
  To: linux-btrfs

Jun He posted on Tue, 25 Aug 2015 23:04:42 -0500 as excerpted:

> I have been playing with btrfs discard for a while and found that btrfs
> may fail to discard some extents with 'mount -o discard'. I am aware of
> Jeff Mahoney's patches ( https://patchwork.kernel.org/patch/6609491/ ).
> It seems that the patches do not fix the problem. I have seen the same
> problematic behavior for the following versions
> 
> - https://git.kernel.org/cgit/linux/kernel/git/fdmanana/linux.git/
>   integration-4.3 commit:477594f93c43b1ee685
> - 3.16.0 - 4.2.0-rc7
> 
> The problem can be reproduced by writing and fsyncing a 4MB file for 50
> times on a 256MB empty FS (mount option: -o discard). You will find that
> some extents are not discarded (my expected behavior is that, after
> overwriting, an old version of a file extent should be discarded). I use
> several ways to confirm this:
> 
> 1. I created a loop device back by a sparse file in tmpfs. After running
> the workload, I found the file is 29MB (ls -lsh). If you fstrim the file
> system,
> the sparse file will become 4.1MB. This proves that there are a lot of
> data not discarded.
> 
> 2. I collected blktrace + blkparse output and plotted the write and
> discard operations in a space-time graph, where you can intuitively see
> some extents are overwritten but not discarded. Here is the space-time
> graph
> https://gist.githubusercontent.com/junhe/b6ce39eeb6de8887e66a/
raw/825a3c2946b52a50c2b6032a98d637f5a32bc5c3/integration-4.3.png
> 
> Is it a known problem or is it not a problem? If it is a known problem
> and there exists a patch that I am not aware of, can somebody direct me
> to it?
> If it is specifically designed this way, can the designers give the
> rationale of discarding some, but not all of, old extents?

I'm an admin, not a dev, far from an expert on fsync, and didn't pull 
your reproducer down from the linked git to check, but... do the numbers 
continue to change for some time (nominally 30 seconds) after the last 
operation?  Do you do a final sync (not fsync) after the last file write, 
and does that affect the result?

What I'm getting at is that there's a difference between sync and fsync, 
and you mentioned only fsync.  After an fsync, the file's own data and 
metadata should be reliably synced to storage device, but unlike 
filesystems like ext3, where (I've read that) an fsync forces a sync of 
the entire filesystem, on btrfs, other data and metadata related to the 
filesystem, in this case, those discards clearing where the file WAS but 
is no longer due to COW, are not necessarily synced to storage device, 
yet.

In the absence of a full filesystem sync, this outstanding activity may 
remain uncommitted until the normal btrfs commit timeout, 30 seconds by 
default, tho there's a mount option to change it.  In the absence of that 
sync, a failure to discard before the commit, upto 30 seconds later, is 
entirely expected.

Of course if you're either already doing that full filesystem sync, or 
are waiting at least 30 seconds (or whatever you have commit set to if 
non-default) before checking to see if the discard has been done, then 
indeed, it would appear that something's wrong.  But there's no 
indication in your post that you're already doing that.

FWIW, if you prefer to sync just the btrfs in question, not other 
filesystems btrfs and non-btrfs alike (as a full sync would do), you can 
use the btrfs filesystem sync <path> command, as covered in the btrfs-
filesystem manpage.  This command can be used in test scripts, etc, in 
place of sleeping 30 seconds or invoking a full system sync, where what's 
actually on the device counts.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: btrfs -o discard bug in latest dev branches
  2015-08-26  6:14 ` Duncan
@ 2015-08-26 19:09   ` Jun He
  0 siblings, 0 replies; 6+ messages in thread
From: Jun He @ 2015-08-26 19:09 UTC (permalink / raw)
  To: linux-btrfs

Thanks for the info. In the attached reproducer, I don't do sync after running
the workload (but I've tried that before.). Just to be sure, I ran again with
sync and waited for 60 seconds before collecting data. The behavior was the
same. 

Then I tried one more thing, which corrected btrfs behavior. I sync-ed after
each write, which produced the following space-time graph: 

https://gist.githubusercontent.com/junhe/b6ce39eeb6de8887e66a/raw/d31ac85559f83c
d86aef8e4250b3d3d859226c8d/sync-after-each-write.png

Each time a write is finished, the extent of the last write is discarded. 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: btrfs -o discard bug in latest dev branches
  2015-08-26  4:04 btrfs -o discard bug in latest dev branches Jun He
  2015-08-26  6:14 ` Duncan
@ 2015-08-26 13:24 ` Jeff Mahoney
  2015-08-26 16:02   ` Jun He
  1 sibling, 1 reply; 6+ messages in thread
From: Jeff Mahoney @ 2015-08-26 13:24 UTC (permalink / raw)
  To: Jun He, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 3118 bytes --]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 8/26/15 12:04 AM, Jun He wrote:
> Hi, I have been playing with btrfs discard for a while and found
> that btrfs may fail to discard some extents with 'mount -o
> discard'. I am aware of Jeff Mahoney's patches (
> https://patchwork.kernel.org/patch/6609491/ ). It seems that the
> patches do not fix the problem. I have seen the same problematic 
> behavior for the following versions
> 
> - https://git.kernel.org/cgit/linux/kernel/git/fdmanana/linux.git/ 
> integration-4.3 commit:477594f93c43b1ee685 - 3.16.0 - 4.2.0-rc7
> 
> The problem can be reproduced by writing and fsyncing a 4MB file
> for 50 times on a 256MB empty FS (mount option: -o discard). You
> will find that some extents are not discarded (my expected behavior
> is that, after overwriting, an old version of a file extent should
> be discarded). I use several ways to confirm this:
> 
> 1. I created a loop device back by a sparse file in tmpfs. After
> running the workload, I found the file is 29MB (ls -lsh). If you
> fstrim the file system, the sparse file will become 4.1MB. This
> proves that there are a lot of data not discarded.

Can you capture the output of 'filefrag -b -v' for the sparse file in
tmpfs both before and after trim?  Also btrfs-debug-tree output for it.

> 2. I collected blktrace + blkparse output and plotted the write and
> discard operations in a space-time graph, where you can intuitively
> see some extents are overwritten but not discarded. Here is the
> space-time graph 
> https://gist.githubusercontent.com/junhe/b6ce39eeb6de8887e66a/raw/825a
3c2946b52a50c2b6032a98d637f5a32bc5c3/integration-4.3.png
>
>  Is it a known problem or is it not a problem? If it is a known
> problem and there exists a patch that I am not aware of, can
> somebody direct me to it? If it is specifically designed this way,
> can the designers give the rationale of discarding some, but not
> all of, old extents?
> 
> The reproducer can be run by: git clone
> https://gist.github.com/b6ce39eeb6de8887e66a.git cd
> b6ce39eeb6de8887e66a sudo python main.py
> 
> It also has a readme.

I tested my discard patches the other day against 4.2-rc8 and it
passed my 326 test.  I'll check out your reproducer.

- -Jeff



- -- 
Jeff Mahoney
SUSE Labs
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.19 (Darwin)

iQIcBAEBAgAGBQJV3b4SAAoJEB57S2MheeWyTUMP/R2mF2/6RcXO0iXNJw466/qI
VzEV09C5Cs04NnDSfY0ZPk3W6iD1B4iOnHB1HF+lK8ENeF/Ug+0Tnb1uJpEo4lvn
+yTxUx+qYwo737SU9oTjmvpr4aXzmV3I223gylcPnaXCQu8w80BsldIBvL98ExIq
Ad/ZzuPMAfc3jvhcEZrDTN6L77rIyhXkhGT/fMnRcycW/26myd8HbJDjE0lebmvY
9+L15Ykgm52V4zpW5wSQh1xww8WWsKv/K6nF3shuNPlhmovOwtAAEJhLRQcpJp5e
+jSijexTsd9so5smwc+JJ+sW86zZ6G6opL/K96MJNEXZ4UMBQPaOYNrewAHTh8Xe
V4WzvbL/JWcKAAzPugvhRGjP2xjM+E1oBQGvZEGEmZb0lneW5Fouao9xGXtftRr9
NAc7A8D/WkqSxDb7qhEdmQpexsQNfPMQi4JmTj0/kGCjYAhxCuVlPgNjorE70BW/
nze9Fg1zydZZs0XeQW4Bh4+HU3yjexLshQCgXjnhabmc9+VwBnEuizYVtcjFgm8r
vBCj80cAID7n7PtPhDNE7DfFHLuigpA6hTw+vgUxL1zd4yHEjQ3pGZ0lZuVK42oZ
EIqKX7r0maU24pQGj1v9NL8QPqYQmrj9ljUmk3QSlvjTUGVv5y8U+KVX7h3cGJRO
YMs7n+5BMmI6WKZBn+fr
=StBb
-----END PGP SIGNATURE-----

[-- Attachment #2: 326 --]
[-- Type: text/plain, Size: 4896 bytes --]

#! /bin/bash
# FSQA Test No. 326
#
# This test uses a loopback mount with PUNCH_HOLE support to test
# whether discard operations are working as expected.
#
# It tests both -odiscard and fstrim.
#
# Copyright (C) 2015 SUSE. All Rights Reserved.
# Author: Jeff Mahoney <jeffm@suse.com>
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License as
# published by the Free Software Foundation.
#
# This program is distributed in the hope that it would be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write the Free Software Foundation,
# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
#-----------------------------------------------------------------------
#

seq=`basename $0`
seqres=$RESULT_DIR/$seq
echo "QA output created by $seq"

tmp=/tmp/$$
status=1	# failure is the default!
trap "_cleanup; exit \$status" 0 1 2 3 15

loopdev=
tmpdir=
_cleanup()
{
	[ -n "$tmpdir" ] && umount $tmpdir
	[ -n "$loopdev" ] && losetup -d $loopdev
}

# get standard environment, filters and checks
. ./common/rc
. ./common/filter

# real QA test starts here
_need_to_be_root
_supported_fs generic
_supported_os Linux
_require_scratch
_require_fstrim

MB_BYTES=1048576
GB_BYTES=1073741824

TEST_FS_SIZE_MB=1024
SMFILE_MIN_SIZE=512
SMFILE_MAX_SIZE=$(( 3 * $MB_BYTES ))
LARGEFILE_SIZE_MB=100

# The test methodology is generic, but the values used by each file system
# for the results to make sense may not be.
if [ "$FSTYP" = "btrfs" ]; then
	MAX_REMAINING_MB=10
	TEST_FS_SIZE_MB=10240
	# Larger than a single block group
	LARGEFILE_SIZE_MB=$(( 12 * $GB_BYTES ))
elif [ "$FSTYP" = "xfs" ]; then
	MAX_REMAINING_MB=50
else # ext4, probably, but sufficiently permissive that it should work anywhere
	# Special; Means at
	MAX_REMAINING_MB=0
fi

test_fs_size_bytes=$(( $TEST_FS_SIZE_MB * $MB_BYTES ))

rm -f $seqres.full

_scratch_mkfs &>> $seqres.full
_require_fs_space $SCRATCH_MNT $TEST_FS_SIZE_MB
_scratch_mount

blocks_used_kb()
{
	blocks=$(( $(stat --format="%b*%B/1024" "$1") ))
	echo "# blocks_used_kb $1 ($2)" >> $seqres.full
	echo "$blocks $1" >> $seqres.full
	echo $blocks
}

random_size()
{
	MIN=$1
	MAX=$2
	echo $(( $MIN + ($MAX - $MIN) * $RANDOM / 32768 ))
}

test_discard()
{
	discard=$1
	files=$2

	tmpfile=$SCRATCH_MNT/testfs.img.$$
	tmpdir=$SCRATCH_MNT/testdir.$$
	testdir=$tmpdir/testdir
	mkdir -p $tmpdir || _fail "!!! failed to create temp mount dir"

	# Create a sparse file to host the file system
	$XFS_IO_PROG -f -t -c "truncate $test_fs_size_bytes" $tmpfile \
		|| _fail "!!! failed to create fs image file"

	opts=""
	if [ "$discard" = "discard" ]; then
		opts="-o discard"
	fi

	loopdev=$(losetup --show -f $tmpfile)
	_mkfs_dev $loopdev &> $seqres.full
	$MOUNT_PROG $opts $loopdev $tmpdir \
		|| _fail "!!! failed to loopback mount"


	$FSTRIM_PROG $tmpdir

	ESIZE="$(blocks_used_kb $tmpfile "empty filesystem")"

	if [ "$files" = "large" ]; then
		count=$(( $TEST_FS_SIZE_MB / $LARGEFILE_SIZE_MB ))
		random=false
	else
		count=$(( $TEST_FS_SIZE_MB * $MB_BYTES / $SMFILE_MAX_SIZE ))
		random=true
	fi

	mkdir -p $testdir
	for ((i = 1; i <= count; i++)); do
		SIZE=$(( $LARGEFILE_SIZE_MB * $MB_BYTES ))
		if $random; then
			SIZE=$(random_size $SMFILE_MIN_SIZE $SMFILE_MAX_SIZE)
		fi

		fn=${seq}_${i}
		$XFS_IO_PROG -f -c "pwrite -S 0xaa 0 $SIZE" $testdir/$fn \
			&> /dev/null
		if [ $? -ne 0 ]; then
			echo "Failed creating file $fn" \
				>>$seqres.full
			break
		fi
	done

	sync
	OSIZE="$(blocks_used_kb $tmpfile "before removing files")"

	rm -rf $testdir

	# Ensure everything's actually on the hosted file system
	if [ "$FSTYP" = "btrfs" ]; then
		_run_btrfs_util_prog filesystem sync $tmpdir
	fi
	sync
	if [ "$discard" = "trim" ]; then
		$FSTRIM_PROG $tmpdir
	fi

	$UMOUNT_PROG $tmpdir
	rmdir $tmpdir
	tmpdir=

	# Sync the backing file system to ensure the hole punches have
	# happened and we can trust the result.
	if [ "$FSTYP" = "btrfs" ]; then
		_run_btrfs_util_prog filesystem sync $SCRATCH_MNT
	fi
	sync

	NSIZE="$(blocks_used_kb $tmpfile "after trim")"

	# Going from ~ 10GB to 50MB is a good enough test to account for
	# metadata remaining on different file systems.
	if [ "$NSIZE" -gt $(( 50 * 1024 )) ]; then
		_fail "TRIM failed: before rm ${OSIZE}kB, after rm ${NSIZE}kB"
	fi
	rm $tmpfile
	losetup -d $loopdev
	loopdev=
}


echo "Testing with -odiscard, many small files"
test_discard discard many

echo "Testing with -odiscard, several large files"
test_discard discard large

echo "Testing with fstrim, many small files"
test_discard trim many

echo "Testing with fstrim, several large files"
test_discard trim large

status=0
exit

[-- Attachment #3: 326.out --]
[-- Type: text/plain, Size: 189 bytes --]

QA output created by 326
Testing with -odiscard, many small files
Testing with -odiscard, several large files
Testing with fstrim, many small files
Testing with fstrim, several large files

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: btrfs -o discard bug in latest dev branches
  2015-08-26 13:24 ` Jeff Mahoney
@ 2015-08-26 16:02   ` Jun He
  2015-08-27  3:03     ` Jun He
  0 siblings, 1 reply; 6+ messages in thread
From: Jun He @ 2015-08-26 16:02 UTC (permalink / raw)
  To: Jeff Mahoney; +Cc: linux-btrfs

jun@jun-VirtualBox:~/workdir/b6ce39eeb6de8887e66a$ uname -r
4.2.0-rc5-integration-4.3+

https://gist.githubusercontent.com/junhe/b6ce39eeb6de8887e66a/raw/1cd84e6bb138eeb08dc07af2c44dedf99866bde0/btrfs-debug-tree.out

https://gist.githubusercontent.com/junhe/b6ce39eeb6de8887e66a/raw/1cd84e6bb138eeb08dc07af2c44dedf99866bde0/filefrag.out

https://gist.githubusercontent.com/junhe/b6ce39eeb6de8887e66a/raw/1cd84e6bb138eeb08dc07af2c44dedf99866bde0/filefrag.after.fstrim.out



On Wed, Aug 26, 2015 at 8:24 AM, Jeff Mahoney <jeffm@suse.com> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 8/26/15 12:04 AM, Jun He wrote:
>> Hi, I have been playing with btrfs discard for a while and found
>> that btrfs may fail to discard some extents with 'mount -o
>> discard'. I am aware of Jeff Mahoney's patches (
>> https://patchwork.kernel.org/patch/6609491/ ). It seems that the
>> patches do not fix the problem. I have seen the same problematic
>> behavior for the following versions
>>
>> - https://git.kernel.org/cgit/linux/kernel/git/fdmanana/linux.git/
>> integration-4.3 commit:477594f93c43b1ee685 - 3.16.0 - 4.2.0-rc7
>>
>> The problem can be reproduced by writing and fsyncing a 4MB file
>> for 50 times on a 256MB empty FS (mount option: -o discard). You
>> will find that some extents are not discarded (my expected behavior
>> is that, after overwriting, an old version of a file extent should
>> be discarded). I use several ways to confirm this:
>>
>> 1. I created a loop device back by a sparse file in tmpfs. After
>> running the workload, I found the file is 29MB (ls -lsh). If you
>> fstrim the file system, the sparse file will become 4.1MB. This
>> proves that there are a lot of data not discarded.
>
> Can you capture the output of 'filefrag -b -v' for the sparse file in
> tmpfs both before and after trim?  Also btrfs-debug-tree output for it.
>
>> 2. I collected blktrace + blkparse output and plotted the write and
>> discard operations in a space-time graph, where you can intuitively
>> see some extents are overwritten but not discarded. Here is the
>> space-time graph
>> https://gist.githubusercontent.com/junhe/b6ce39eeb6de8887e66a/raw/825a
> 3c2946b52a50c2b6032a98d637f5a32bc5c3/integration-4.3.png
>>
>>  Is it a known problem or is it not a problem? If it is a known
>> problem and there exists a patch that I am not aware of, can
>> somebody direct me to it? If it is specifically designed this way,
>> can the designers give the rationale of discarding some, but not
>> all of, old extents?
>>
>> The reproducer can be run by: git clone
>> https://gist.github.com/b6ce39eeb6de8887e66a.git cd
>> b6ce39eeb6de8887e66a sudo python main.py
>>
>> It also has a readme.
>
> I tested my discard patches the other day against 4.2-rc8 and it
> passed my 326 test.  I'll check out your reproducer.
>
> - -Jeff
>
>
>
> - --
> Jeff Mahoney
> SUSE Labs
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG/MacGPG2 v2.0.19 (Darwin)
>
> iQIcBAEBAgAGBQJV3b4SAAoJEB57S2MheeWyTUMP/R2mF2/6RcXO0iXNJw466/qI
> VzEV09C5Cs04NnDSfY0ZPk3W6iD1B4iOnHB1HF+lK8ENeF/Ug+0Tnb1uJpEo4lvn
> +yTxUx+qYwo737SU9oTjmvpr4aXzmV3I223gylcPnaXCQu8w80BsldIBvL98ExIq
> Ad/ZzuPMAfc3jvhcEZrDTN6L77rIyhXkhGT/fMnRcycW/26myd8HbJDjE0lebmvY
> 9+L15Ykgm52V4zpW5wSQh1xww8WWsKv/K6nF3shuNPlhmovOwtAAEJhLRQcpJp5e
> +jSijexTsd9so5smwc+JJ+sW86zZ6G6opL/K96MJNEXZ4UMBQPaOYNrewAHTh8Xe
> V4WzvbL/JWcKAAzPugvhRGjP2xjM+E1oBQGvZEGEmZb0lneW5Fouao9xGXtftRr9
> NAc7A8D/WkqSxDb7qhEdmQpexsQNfPMQi4JmTj0/kGCjYAhxCuVlPgNjorE70BW/
> nze9Fg1zydZZs0XeQW4Bh4+HU3yjexLshQCgXjnhabmc9+VwBnEuizYVtcjFgm8r
> vBCj80cAID7n7PtPhDNE7DfFHLuigpA6hTw+vgUxL1zd4yHEjQ3pGZ0lZuVK42oZ
> EIqKX7r0maU24pQGj1v9NL8QPqYQmrj9ljUmk3QSlvjTUGVv5y8U+KVX7h3cGJRO
> YMs7n+5BMmI6WKZBn+fr
> =StBb
> -----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: btrfs -o discard bug in latest dev branches
  2015-08-26 16:02   ` Jun He
@ 2015-08-27  3:03     ` Jun He
  0 siblings, 0 replies; 6+ messages in thread
From: Jun He @ 2015-08-27  3:03 UTC (permalink / raw)
  To: Jeff Mahoney; +Cc: linux-btrfs

Sorry, the filefrag -b option was not used correctly. Here are the new output:
https://gist.github.com/junhe/b6ce39eeb6de8887e66a#file-btrfs-debug-tree-out
https://gist.github.com/junhe/b6ce39eeb6de8887e66a#file-filefrag-out

After doing fstrim /mnt/fsonloop
https://gist.github.com/junhe/b6ce39eeb6de8887e66a#file-filefrag-after-fstrim-out

On Wed, Aug 26, 2015 at 11:02 AM, Jun He <jhe@cs.wisc.edu> wrote:
> jun@jun-VirtualBox:~/workdir/b6ce39eeb6de8887e66a$ uname -r
> 4.2.0-rc5-integration-4.3+
>
> https://gist.githubusercontent.com/junhe/b6ce39eeb6de8887e66a/raw/1cd84e6bb138eeb08dc07af2c44dedf99866bde0/btrfs-debug-tree.out
>
> https://gist.githubusercontent.com/junhe/b6ce39eeb6de8887e66a/raw/1cd84e6bb138eeb08dc07af2c44dedf99866bde0/filefrag.out
>
> https://gist.githubusercontent.com/junhe/b6ce39eeb6de8887e66a/raw/1cd84e6bb138eeb08dc07af2c44dedf99866bde0/filefrag.after.fstrim.out
>
>
>
> On Wed, Aug 26, 2015 at 8:24 AM, Jeff Mahoney <jeffm@suse.com> wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> On 8/26/15 12:04 AM, Jun He wrote:
>>> Hi, I have been playing with btrfs discard for a while and found
>>> that btrfs may fail to discard some extents with 'mount -o
>>> discard'. I am aware of Jeff Mahoney's patches (
>>> https://patchwork.kernel.org/patch/6609491/ ). It seems that the
>>> patches do not fix the problem. I have seen the same problematic
>>> behavior for the following versions
>>>
>>> - https://git.kernel.org/cgit/linux/kernel/git/fdmanana/linux.git/
>>> integration-4.3 commit:477594f93c43b1ee685 - 3.16.0 - 4.2.0-rc7
>>>
>>> The problem can be reproduced by writing and fsyncing a 4MB file
>>> for 50 times on a 256MB empty FS (mount option: -o discard). You
>>> will find that some extents are not discarded (my expected behavior
>>> is that, after overwriting, an old version of a file extent should
>>> be discarded). I use several ways to confirm this:
>>>
>>> 1. I created a loop device back by a sparse file in tmpfs. After
>>> running the workload, I found the file is 29MB (ls -lsh). If you
>>> fstrim the file system, the sparse file will become 4.1MB. This
>>> proves that there are a lot of data not discarded.
>>
>> Can you capture the output of 'filefrag -b -v' for the sparse file in
>> tmpfs both before and after trim?  Also btrfs-debug-tree output for it.
>>
>>> 2. I collected blktrace + blkparse output and plotted the write and
>>> discard operations in a space-time graph, where you can intuitively
>>> see some extents are overwritten but not discarded. Here is the
>>> space-time graph
>>> https://gist.githubusercontent.com/junhe/b6ce39eeb6de8887e66a/raw/825a
>> 3c2946b52a50c2b6032a98d637f5a32bc5c3/integration-4.3.png
>>>
>>>  Is it a known problem or is it not a problem? If it is a known
>>> problem and there exists a patch that I am not aware of, can
>>> somebody direct me to it? If it is specifically designed this way,
>>> can the designers give the rationale of discarding some, but not
>>> all of, old extents?
>>>
>>> The reproducer can be run by: git clone
>>> https://gist.github.com/b6ce39eeb6de8887e66a.git cd
>>> b6ce39eeb6de8887e66a sudo python main.py
>>>
>>> It also has a readme.
>>
>> I tested my discard patches the other day against 4.2-rc8 and it
>> passed my 326 test.  I'll check out your reproducer.
>>
>> - -Jeff
>>
>>
>>
>> - --
>> Jeff Mahoney
>> SUSE Labs
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG/MacGPG2 v2.0.19 (Darwin)
>>
>> iQIcBAEBAgAGBQJV3b4SAAoJEB57S2MheeWyTUMP/R2mF2/6RcXO0iXNJw466/qI
>> VzEV09C5Cs04NnDSfY0ZPk3W6iD1B4iOnHB1HF+lK8ENeF/Ug+0Tnb1uJpEo4lvn
>> +yTxUx+qYwo737SU9oTjmvpr4aXzmV3I223gylcPnaXCQu8w80BsldIBvL98ExIq
>> Ad/ZzuPMAfc3jvhcEZrDTN6L77rIyhXkhGT/fMnRcycW/26myd8HbJDjE0lebmvY
>> 9+L15Ykgm52V4zpW5wSQh1xww8WWsKv/K6nF3shuNPlhmovOwtAAEJhLRQcpJp5e
>> +jSijexTsd9so5smwc+JJ+sW86zZ6G6opL/K96MJNEXZ4UMBQPaOYNrewAHTh8Xe
>> V4WzvbL/JWcKAAzPugvhRGjP2xjM+E1oBQGvZEGEmZb0lneW5Fouao9xGXtftRr9
>> NAc7A8D/WkqSxDb7qhEdmQpexsQNfPMQi4JmTj0/kGCjYAhxCuVlPgNjorE70BW/
>> nze9Fg1zydZZs0XeQW4Bh4+HU3yjexLshQCgXjnhabmc9+VwBnEuizYVtcjFgm8r
>> vBCj80cAID7n7PtPhDNE7DfFHLuigpA6hTw+vgUxL1zd4yHEjQ3pGZ0lZuVK42oZ
>> EIqKX7r0maU24pQGj1v9NL8QPqYQmrj9ljUmk3QSlvjTUGVv5y8U+KVX7h3cGJRO
>> YMs7n+5BMmI6WKZBn+fr
>> =StBb
>> -----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-08-27  3:04 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-26  4:04 btrfs -o discard bug in latest dev branches Jun He
2015-08-26  6:14 ` Duncan
2015-08-26 19:09   ` Jun He
2015-08-26 13:24 ` Jeff Mahoney
2015-08-26 16:02   ` Jun He
2015-08-27  3:03     ` Jun He

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).