From: Jeff Mahoney <jeffm@suse.com>
To: Jun He <jhe@cs.wisc.edu>, linux-btrfs@vger.kernel.org
Subject: Re: btrfs -o discard bug in latest dev branches
Date: Wed, 26 Aug 2015 09:24:34 -0400 [thread overview]
Message-ID: <55DDBE12.1020104@suse.com> (raw)
In-Reply-To: <20150826040442.GC10927@Juns-MacBook-Pro.local>
[-- Attachment #1: Type: text/plain, Size: 3118 bytes --]
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 8/26/15 12:04 AM, Jun He wrote:
> Hi, I have been playing with btrfs discard for a while and found
> that btrfs may fail to discard some extents with 'mount -o
> discard'. I am aware of Jeff Mahoney's patches (
> https://patchwork.kernel.org/patch/6609491/ ). It seems that the
> patches do not fix the problem. I have seen the same problematic
> behavior for the following versions
>
> - https://git.kernel.org/cgit/linux/kernel/git/fdmanana/linux.git/
> integration-4.3 commit:477594f93c43b1ee685 - 3.16.0 - 4.2.0-rc7
>
> The problem can be reproduced by writing and fsyncing a 4MB file
> for 50 times on a 256MB empty FS (mount option: -o discard). You
> will find that some extents are not discarded (my expected behavior
> is that, after overwriting, an old version of a file extent should
> be discarded). I use several ways to confirm this:
>
> 1. I created a loop device back by a sparse file in tmpfs. After
> running the workload, I found the file is 29MB (ls -lsh). If you
> fstrim the file system, the sparse file will become 4.1MB. This
> proves that there are a lot of data not discarded.
Can you capture the output of 'filefrag -b -v' for the sparse file in
tmpfs both before and after trim? Also btrfs-debug-tree output for it.
> 2. I collected blktrace + blkparse output and plotted the write and
> discard operations in a space-time graph, where you can intuitively
> see some extents are overwritten but not discarded. Here is the
> space-time graph
> https://gist.githubusercontent.com/junhe/b6ce39eeb6de8887e66a/raw/825a
3c2946b52a50c2b6032a98d637f5a32bc5c3/integration-4.3.png
>
> Is it a known problem or is it not a problem? If it is a known
> problem and there exists a patch that I am not aware of, can
> somebody direct me to it? If it is specifically designed this way,
> can the designers give the rationale of discarding some, but not
> all of, old extents?
>
> The reproducer can be run by: git clone
> https://gist.github.com/b6ce39eeb6de8887e66a.git cd
> b6ce39eeb6de8887e66a sudo python main.py
>
> It also has a readme.
I tested my discard patches the other day against 4.2-rc8 and it
passed my 326 test. I'll check out your reproducer.
- -Jeff
- --
Jeff Mahoney
SUSE Labs
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.19 (Darwin)
iQIcBAEBAgAGBQJV3b4SAAoJEB57S2MheeWyTUMP/R2mF2/6RcXO0iXNJw466/qI
VzEV09C5Cs04NnDSfY0ZPk3W6iD1B4iOnHB1HF+lK8ENeF/Ug+0Tnb1uJpEo4lvn
+yTxUx+qYwo737SU9oTjmvpr4aXzmV3I223gylcPnaXCQu8w80BsldIBvL98ExIq
Ad/ZzuPMAfc3jvhcEZrDTN6L77rIyhXkhGT/fMnRcycW/26myd8HbJDjE0lebmvY
9+L15Ykgm52V4zpW5wSQh1xww8WWsKv/K6nF3shuNPlhmovOwtAAEJhLRQcpJp5e
+jSijexTsd9so5smwc+JJ+sW86zZ6G6opL/K96MJNEXZ4UMBQPaOYNrewAHTh8Xe
V4WzvbL/JWcKAAzPugvhRGjP2xjM+E1oBQGvZEGEmZb0lneW5Fouao9xGXtftRr9
NAc7A8D/WkqSxDb7qhEdmQpexsQNfPMQi4JmTj0/kGCjYAhxCuVlPgNjorE70BW/
nze9Fg1zydZZs0XeQW4Bh4+HU3yjexLshQCgXjnhabmc9+VwBnEuizYVtcjFgm8r
vBCj80cAID7n7PtPhDNE7DfFHLuigpA6hTw+vgUxL1zd4yHEjQ3pGZ0lZuVK42oZ
EIqKX7r0maU24pQGj1v9NL8QPqYQmrj9ljUmk3QSlvjTUGVv5y8U+KVX7h3cGJRO
YMs7n+5BMmI6WKZBn+fr
=StBb
-----END PGP SIGNATURE-----
[-- Attachment #2: 326 --]
[-- Type: text/plain, Size: 4896 bytes --]
#! /bin/bash
# FSQA Test No. 326
#
# This test uses a loopback mount with PUNCH_HOLE support to test
# whether discard operations are working as expected.
#
# It tests both -odiscard and fstrim.
#
# Copyright (C) 2015 SUSE. All Rights Reserved.
# Author: Jeff Mahoney <jeffm@suse.com>
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License as
# published by the Free Software Foundation.
#
# This program is distributed in the hope that it would be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write the Free Software Foundation,
# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
#-----------------------------------------------------------------------
#
seq=`basename $0`
seqres=$RESULT_DIR/$seq
echo "QA output created by $seq"
tmp=/tmp/$$
status=1 # failure is the default!
trap "_cleanup; exit \$status" 0 1 2 3 15
loopdev=
tmpdir=
_cleanup()
{
[ -n "$tmpdir" ] && umount $tmpdir
[ -n "$loopdev" ] && losetup -d $loopdev
}
# get standard environment, filters and checks
. ./common/rc
. ./common/filter
# real QA test starts here
_need_to_be_root
_supported_fs generic
_supported_os Linux
_require_scratch
_require_fstrim
MB_BYTES=1048576
GB_BYTES=1073741824
TEST_FS_SIZE_MB=1024
SMFILE_MIN_SIZE=512
SMFILE_MAX_SIZE=$(( 3 * $MB_BYTES ))
LARGEFILE_SIZE_MB=100
# The test methodology is generic, but the values used by each file system
# for the results to make sense may not be.
if [ "$FSTYP" = "btrfs" ]; then
MAX_REMAINING_MB=10
TEST_FS_SIZE_MB=10240
# Larger than a single block group
LARGEFILE_SIZE_MB=$(( 12 * $GB_BYTES ))
elif [ "$FSTYP" = "xfs" ]; then
MAX_REMAINING_MB=50
else # ext4, probably, but sufficiently permissive that it should work anywhere
# Special; Means at
MAX_REMAINING_MB=0
fi
test_fs_size_bytes=$(( $TEST_FS_SIZE_MB * $MB_BYTES ))
rm -f $seqres.full
_scratch_mkfs &>> $seqres.full
_require_fs_space $SCRATCH_MNT $TEST_FS_SIZE_MB
_scratch_mount
blocks_used_kb()
{
blocks=$(( $(stat --format="%b*%B/1024" "$1") ))
echo "# blocks_used_kb $1 ($2)" >> $seqres.full
echo "$blocks $1" >> $seqres.full
echo $blocks
}
random_size()
{
MIN=$1
MAX=$2
echo $(( $MIN + ($MAX - $MIN) * $RANDOM / 32768 ))
}
test_discard()
{
discard=$1
files=$2
tmpfile=$SCRATCH_MNT/testfs.img.$$
tmpdir=$SCRATCH_MNT/testdir.$$
testdir=$tmpdir/testdir
mkdir -p $tmpdir || _fail "!!! failed to create temp mount dir"
# Create a sparse file to host the file system
$XFS_IO_PROG -f -t -c "truncate $test_fs_size_bytes" $tmpfile \
|| _fail "!!! failed to create fs image file"
opts=""
if [ "$discard" = "discard" ]; then
opts="-o discard"
fi
loopdev=$(losetup --show -f $tmpfile)
_mkfs_dev $loopdev &> $seqres.full
$MOUNT_PROG $opts $loopdev $tmpdir \
|| _fail "!!! failed to loopback mount"
$FSTRIM_PROG $tmpdir
ESIZE="$(blocks_used_kb $tmpfile "empty filesystem")"
if [ "$files" = "large" ]; then
count=$(( $TEST_FS_SIZE_MB / $LARGEFILE_SIZE_MB ))
random=false
else
count=$(( $TEST_FS_SIZE_MB * $MB_BYTES / $SMFILE_MAX_SIZE ))
random=true
fi
mkdir -p $testdir
for ((i = 1; i <= count; i++)); do
SIZE=$(( $LARGEFILE_SIZE_MB * $MB_BYTES ))
if $random; then
SIZE=$(random_size $SMFILE_MIN_SIZE $SMFILE_MAX_SIZE)
fi
fn=${seq}_${i}
$XFS_IO_PROG -f -c "pwrite -S 0xaa 0 $SIZE" $testdir/$fn \
&> /dev/null
if [ $? -ne 0 ]; then
echo "Failed creating file $fn" \
>>$seqres.full
break
fi
done
sync
OSIZE="$(blocks_used_kb $tmpfile "before removing files")"
rm -rf $testdir
# Ensure everything's actually on the hosted file system
if [ "$FSTYP" = "btrfs" ]; then
_run_btrfs_util_prog filesystem sync $tmpdir
fi
sync
if [ "$discard" = "trim" ]; then
$FSTRIM_PROG $tmpdir
fi
$UMOUNT_PROG $tmpdir
rmdir $tmpdir
tmpdir=
# Sync the backing file system to ensure the hole punches have
# happened and we can trust the result.
if [ "$FSTYP" = "btrfs" ]; then
_run_btrfs_util_prog filesystem sync $SCRATCH_MNT
fi
sync
NSIZE="$(blocks_used_kb $tmpfile "after trim")"
# Going from ~ 10GB to 50MB is a good enough test to account for
# metadata remaining on different file systems.
if [ "$NSIZE" -gt $(( 50 * 1024 )) ]; then
_fail "TRIM failed: before rm ${OSIZE}kB, after rm ${NSIZE}kB"
fi
rm $tmpfile
losetup -d $loopdev
loopdev=
}
echo "Testing with -odiscard, many small files"
test_discard discard many
echo "Testing with -odiscard, several large files"
test_discard discard large
echo "Testing with fstrim, many small files"
test_discard trim many
echo "Testing with fstrim, several large files"
test_discard trim large
status=0
exit
[-- Attachment #3: 326.out --]
[-- Type: text/plain, Size: 189 bytes --]
QA output created by 326
Testing with -odiscard, many small files
Testing with -odiscard, several large files
Testing with fstrim, many small files
Testing with fstrim, several large files
next prev parent reply other threads:[~2015-08-26 13:24 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-08-26 4:04 btrfs -o discard bug in latest dev branches Jun He
2015-08-26 6:14 ` Duncan
2015-08-26 19:09 ` Jun He
2015-08-26 13:24 ` Jeff Mahoney [this message]
2015-08-26 16:02 ` Jun He
2015-08-27 3:03 ` Jun He
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55DDBE12.1020104@suse.com \
--to=jeffm@suse.com \
--cc=jhe@cs.wisc.edu \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).