From: Jeff Mahoney <jeffm@suse.com>
To: Jun He <jhe@cs.wisc.edu>, linux-btrfs@vger.kernel.org
Subject: Re: btrfs -o discard bug in latest dev branches
Date: Wed, 26 Aug 2015 09:24:34 -0400 [thread overview]
Message-ID: <55DDBE12.1020104@suse.com> (raw)
In-Reply-To: <20150826040442.GC10927@Juns-MacBook-Pro.local>
[-- Attachment #1: Type: text/plain, Size: 3118 bytes --]
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 8/26/15 12:04 AM, Jun He wrote:
> Hi, I have been playing with btrfs discard for a while and found
> that btrfs may fail to discard some extents with 'mount -o
> discard'. I am aware of Jeff Mahoney's patches (
> https://patchwork.kernel.org/patch/6609491/ ). It seems that the
> patches do not fix the problem. I have seen the same problematic
> behavior for the following versions
>
> - https://git.kernel.org/cgit/linux/kernel/git/fdmanana/linux.git/
> integration-4.3 commit:477594f93c43b1ee685 - 3.16.0 - 4.2.0-rc7
>
> The problem can be reproduced by writing and fsyncing a 4MB file
> for 50 times on a 256MB empty FS (mount option: -o discard). You
> will find that some extents are not discarded (my expected behavior
> is that, after overwriting, an old version of a file extent should
> be discarded). I use several ways to confirm this:
>
> 1. I created a loop device back by a sparse file in tmpfs. After
> running the workload, I found the file is 29MB (ls -lsh). If you
> fstrim the file system, the sparse file will become 4.1MB. This
> proves that there are a lot of data not discarded.
Can you capture the output of 'filefrag -b -v' for the sparse file in
tmpfs both before and after trim? Also btrfs-debug-tree output for it.
> 2. I collected blktrace + blkparse output and plotted the write and
> discard operations in a space-time graph, where you can intuitively
> see some extents are overwritten but not discarded. Here is the
> space-time graph
> https://gist.githubusercontent.com/junhe/b6ce39eeb6de8887e66a/raw/825a
3c2946b52a50c2b6032a98d637f5a32bc5c3/integration-4.3.png
>
> Is it a known problem or is it not a problem? If it is a known
> problem and there exists a patch that I am not aware of, can
> somebody direct me to it? If it is specifically designed this way,
> can the designers give the rationale of discarding some, but not
> all of, old extents?
>
> The reproducer can be run by: git clone
> https://gist.github.com/b6ce39eeb6de8887e66a.git cd
> b6ce39eeb6de8887e66a sudo python main.py
>
> It also has a readme.
I tested my discard patches the other day against 4.2-rc8 and it
passed my 326 test. I'll check out your reproducer.
- -Jeff
- --
Jeff Mahoney
SUSE Labs
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.19 (Darwin)
iQIcBAEBAgAGBQJV3b4SAAoJEB57S2MheeWyTUMP/R2mF2/6RcXO0iXNJw466/qI
VzEV09C5Cs04NnDSfY0ZPk3W6iD1B4iOnHB1HF+lK8ENeF/Ug+0Tnb1uJpEo4lvn
+yTxUx+qYwo737SU9oTjmvpr4aXzmV3I223gylcPnaXCQu8w80BsldIBvL98ExIq
Ad/ZzuPMAfc3jvhcEZrDTN6L77rIyhXkhGT/fMnRcycW/26myd8HbJDjE0lebmvY
9+L15Ykgm52V4zpW5wSQh1xww8WWsKv/K6nF3shuNPlhmovOwtAAEJhLRQcpJp5e
+jSijexTsd9so5smwc+JJ+sW86zZ6G6opL/K96MJNEXZ4UMBQPaOYNrewAHTh8Xe
V4WzvbL/JWcKAAzPugvhRGjP2xjM+E1oBQGvZEGEmZb0lneW5Fouao9xGXtftRr9
NAc7A8D/WkqSxDb7qhEdmQpexsQNfPMQi4JmTj0/kGCjYAhxCuVlPgNjorE70BW/
nze9Fg1zydZZs0XeQW4Bh4+HU3yjexLshQCgXjnhabmc9+VwBnEuizYVtcjFgm8r
vBCj80cAID7n7PtPhDNE7DfFHLuigpA6hTw+vgUxL1zd4yHEjQ3pGZ0lZuVK42oZ
EIqKX7r0maU24pQGj1v9NL8QPqYQmrj9ljUmk3QSlvjTUGVv5y8U+KVX7h3cGJRO
YMs7n+5BMmI6WKZBn+fr
=StBb
-----END PGP SIGNATURE-----
[-- Attachment #2: 326 --]
[-- Type: text/plain, Size: 4896 bytes --]
#! /bin/bash
# FSQA Test No. 326
#
# This test uses a loopback mount with PUNCH_HOLE support to test
# whether discard operations are working as expected.
#
# It tests both -odiscard and fstrim.
#
# Copyright (C) 2015 SUSE. All Rights Reserved.
# Author: Jeff Mahoney <jeffm@suse.com>
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License as
# published by the Free Software Foundation.
#
# This program is distributed in the hope that it would be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write the Free Software Foundation,
# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
#-----------------------------------------------------------------------
#
seq=`basename $0`
seqres=$RESULT_DIR/$seq
echo "QA output created by $seq"
tmp=/tmp/$$
status=1 # failure is the default!
trap "_cleanup; exit \$status" 0 1 2 3 15
loopdev=
tmpdir=
_cleanup()
{
[ -n "$tmpdir" ] && umount $tmpdir
[ -n "$loopdev" ] && losetup -d $loopdev
}
# get standard environment, filters and checks
. ./common/rc
. ./common/filter
# real QA test starts here
_need_to_be_root
_supported_fs generic
_supported_os Linux
_require_scratch
_require_fstrim
MB_BYTES=1048576
GB_BYTES=1073741824
TEST_FS_SIZE_MB=1024
SMFILE_MIN_SIZE=512
SMFILE_MAX_SIZE=$(( 3 * $MB_BYTES ))
LARGEFILE_SIZE_MB=100
# The test methodology is generic, but the values used by each file system
# for the results to make sense may not be.
if [ "$FSTYP" = "btrfs" ]; then
MAX_REMAINING_MB=10
TEST_FS_SIZE_MB=10240
# Larger than a single block group
LARGEFILE_SIZE_MB=$(( 12 * $GB_BYTES ))
elif [ "$FSTYP" = "xfs" ]; then
MAX_REMAINING_MB=50
else # ext4, probably, but sufficiently permissive that it should work anywhere
# Special; Means at
MAX_REMAINING_MB=0
fi
test_fs_size_bytes=$(( $TEST_FS_SIZE_MB * $MB_BYTES ))
rm -f $seqres.full
_scratch_mkfs &>> $seqres.full
_require_fs_space $SCRATCH_MNT $TEST_FS_SIZE_MB
_scratch_mount
blocks_used_kb()
{
blocks=$(( $(stat --format="%b*%B/1024" "$1") ))
echo "# blocks_used_kb $1 ($2)" >> $seqres.full
echo "$blocks $1" >> $seqres.full
echo $blocks
}
random_size()
{
MIN=$1
MAX=$2
echo $(( $MIN + ($MAX - $MIN) * $RANDOM / 32768 ))
}
test_discard()
{
discard=$1
files=$2
tmpfile=$SCRATCH_MNT/testfs.img.$$
tmpdir=$SCRATCH_MNT/testdir.$$
testdir=$tmpdir/testdir
mkdir -p $tmpdir || _fail "!!! failed to create temp mount dir"
# Create a sparse file to host the file system
$XFS_IO_PROG -f -t -c "truncate $test_fs_size_bytes" $tmpfile \
|| _fail "!!! failed to create fs image file"
opts=""
if [ "$discard" = "discard" ]; then
opts="-o discard"
fi
loopdev=$(losetup --show -f $tmpfile)
_mkfs_dev $loopdev &> $seqres.full
$MOUNT_PROG $opts $loopdev $tmpdir \
|| _fail "!!! failed to loopback mount"
$FSTRIM_PROG $tmpdir
ESIZE="$(blocks_used_kb $tmpfile "empty filesystem")"
if [ "$files" = "large" ]; then
count=$(( $TEST_FS_SIZE_MB / $LARGEFILE_SIZE_MB ))
random=false
else
count=$(( $TEST_FS_SIZE_MB * $MB_BYTES / $SMFILE_MAX_SIZE ))
random=true
fi
mkdir -p $testdir
for ((i = 1; i <= count; i++)); do
SIZE=$(( $LARGEFILE_SIZE_MB * $MB_BYTES ))
if $random; then
SIZE=$(random_size $SMFILE_MIN_SIZE $SMFILE_MAX_SIZE)
fi
fn=${seq}_${i}
$XFS_IO_PROG -f -c "pwrite -S 0xaa 0 $SIZE" $testdir/$fn \
&> /dev/null
if [ $? -ne 0 ]; then
echo "Failed creating file $fn" \
>>$seqres.full
break
fi
done
sync
OSIZE="$(blocks_used_kb $tmpfile "before removing files")"
rm -rf $testdir
# Ensure everything's actually on the hosted file system
if [ "$FSTYP" = "btrfs" ]; then
_run_btrfs_util_prog filesystem sync $tmpdir
fi
sync
if [ "$discard" = "trim" ]; then
$FSTRIM_PROG $tmpdir
fi
$UMOUNT_PROG $tmpdir
rmdir $tmpdir
tmpdir=
# Sync the backing file system to ensure the hole punches have
# happened and we can trust the result.
if [ "$FSTYP" = "btrfs" ]; then
_run_btrfs_util_prog filesystem sync $SCRATCH_MNT
fi
sync
NSIZE="$(blocks_used_kb $tmpfile "after trim")"
# Going from ~ 10GB to 50MB is a good enough test to account for
# metadata remaining on different file systems.
if [ "$NSIZE" -gt $(( 50 * 1024 )) ]; then
_fail "TRIM failed: before rm ${OSIZE}kB, after rm ${NSIZE}kB"
fi
rm $tmpfile
losetup -d $loopdev
loopdev=
}
echo "Testing with -odiscard, many small files"
test_discard discard many
echo "Testing with -odiscard, several large files"
test_discard discard large
echo "Testing with fstrim, many small files"
test_discard trim many
echo "Testing with fstrim, several large files"
test_discard trim large
status=0
exit
[-- Attachment #3: 326.out --]
[-- Type: text/plain, Size: 189 bytes --]
QA output created by 326
Testing with -odiscard, many small files
Testing with -odiscard, several large files
Testing with fstrim, many small files
Testing with fstrim, several large files
next prev parent reply other threads:[~2015-08-26 13:24 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-08-26 4:04 btrfs -o discard bug in latest dev branches Jun He
2015-08-26 6:14 ` Duncan
2015-08-26 19:09 ` Jun He
2015-08-26 13:24 ` Jeff Mahoney [this message]
2015-08-26 16:02 ` Jun He
2015-08-27 3:03 ` Jun He
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55DDBE12.1020104@suse.com \
--to=jeffm@suse.com \
--cc=jhe@cs.wisc.edu \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.